# MetaScraper Frequently Asked Questions (FAQ) ## 🚀 Getting Started ### Q: How do I install MetaScraper? ```bash npm install metascraper ``` ### Q: What are the system requirements? **Node.js**: 18+ (recommended 20+) **Memory**: Minimum 50MB for static mode, 200MB+ for headless mode **Network**: Internet connection to Netflix ```bash # Check your Node.js version node --version # Should be 18.x or higher ``` ### Q: Does MetaScraper work with TypeScript? Yes! MetaScraper provides TypeScript support out of the box: ```typescript import { scraperNetflix } from 'metascraper'; interface NetflixMetadata { url: string; id: string; name: string; year: string | number | undefined; seasons: string | null; } const result: Promise = scraperNetflix('https://www.netflix.com/title/80189685'); ``` ## 🔧 Technical Questions ### Q: What's the difference between static and headless mode? **Static Mode** (default): - ✅ Faster (200-500ms) - ✅ Lower memory usage - ✅ No browser required - ⚠️ 85% success rate **Headless Mode** (fallback): - ✅ Higher success rate (99%) - ✅ Handles JavaScript-rendered content - ❌ Slower (2-5 seconds) - ❌ Requires Playwright ```javascript // Force static mode only await scraperNetflix(url, { headless: false }); // Enable headless fallback await scraperNetflix(url, { headless: true }); ``` ### Q: Do I need to install Playwright? **No**, Playwright is optional. MetaScraper works without it using static HTML parsing. Install Playwright only if: - You need higher success rates - Static mode fails for specific titles - You want JavaScript-rendered content ```bash # Optional: Install for better success rates npm install playwright npx playwright install chromium ``` ### Q: Can MetaScraper work in the browser? **Not currently**. MetaScraper is designed for Node.js environments due to: - CORS restrictions in browsers - Netflix's bot protection - Node.js-specific APIs (fetch, cheerio) For browser usage, consider: - Creating a proxy API server - Using serverless functions - Implementing browser-based scraping separately ### Q: How does MetaScraper handle Netflix's bot protection? MetaScraper uses several techniques: - **Realistic User-Agent strings** that mimic regular browsers - **Proper HTTP headers** including Accept-Language - **Rate limiting considerations** to avoid detection - **JavaScript rendering** (when needed) to appear more human ```javascript const result = await scraperNetflix(url, { userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' }); ``` ## 🌍 Localization & Turkish Support ### Q: What Turkish UI patterns does MetaScraper remove? MetaScraper removes these Turkish Netflix UI patterns: | Pattern | English Equivalent | Example | |---------|-------------------|---------| | `izlemenizi bekliyor` | "waiting for you to watch" | "The Witcher izlemenizi bekliyor" | | `izleyin` | "watch" | "Dark izleyin" | | `devam et` | "continue" | "Money Heist devam et" | | `başla` | "start" | "Stranger Things başla" | | `izlemeye devam` | "continue watching" | "The Crown izlemeye devam" | ### Q: Does MetaScraper support other languages? Currently optimized for Turkish Netflix interfaces, but also removes universal English patterns: - ✅ **Turkish**: Full support with specific patterns - ✅ **English**: Basic UI text removal - 🔄 **Other languages**: Can be extended (file an issue) ### Q: What about regional Netflix content? MetaScraper works globally but: - **Content availability** varies by region - **Some titles** may be region-locked - **URL formats** work universally ```javascript // Test different regional URLs const regionalUrls = [ 'https://www.netflix.com/title/80189685', // Global 'https://www.netflix.com/tr/title/80189685', // Turkey 'https://www.netflix.com/us/title/80189685' // US ]; ``` ## ⚡ Performance & Usage ### Q: How fast is MetaScraper? **Response Times**: - **Static mode**: 200-500ms - **Headless fallback**: 2-5 seconds - **Batch processing**: 10-50 URLs per second (static mode) **Resource Usage**: - **Memory**: <50MB (static), 100-200MB (headless) - **CPU**: Low impact for normal usage - **Network**: 1 HTTP request per title ```javascript // Performance monitoring import { performance } from 'node:perf_hooks'; const start = performance.now(); await scraperNetflix(url); const duration = performance.now() - start; console.log(`Scraping took ${duration}ms`); ``` ### Q: Can I use MetaScraper for bulk scraping? **Yes**, but consider: ```javascript // Good: Sequential processing with delays async function bulkScrape(urls) { const results = []; for (const url of urls) { const result = await scraperNetflix(url); results.push(result); // Be respectful: add delay between requests await new Promise(resolve => setTimeout(resolve, 1000)); } return results; } // Better: Concurrent processing with limits async function concurrentScrape(urls, concurrency = 5) { const chunks = []; for (let i = 0; i < urls.length; i += concurrency) { chunks.push(urls.slice(i, i + concurrency)); } const results = []; for (const chunk of chunks) { const chunkResults = await Promise.allSettled( chunk.map(url => scraperNetflix(url, { headless: false })) ); results.push(...chunkResults); // Delay between chunks await new Promise(resolve => setTimeout(resolve, 2000)); } return results; } ``` ### Q: Does MetaScraper cache results? **No built-in caching**, but easy to implement: ```javascript // Simple cache implementation const cache = new Map(); const CACHE_TTL = 30 * 60 * 1000; // 30 minutes async function scrapeWithCache(url, options = {}) { const cacheKey = `${url}:${JSON.stringify(options)}`; if (cache.has(cacheKey)) { const { data, timestamp } = cache.get(cacheKey); if (Date.now() - timestamp < CACHE_TTL) { return data; } } const result = await scraperNetflix(url, options); cache.set(cacheKey, { data: result, timestamp: Date.now() }); return result; } ``` ## 🛠️ Troubleshooting ### Q: Why am I getting "File is not defined" errors? This happens on Node.js 18 without proper polyfills: ```bash # Solution 1: Update to Node.js 20+ nvm install 20 nvm use 20 # Solution 2: Use latest MetaScraper version npm update metascraper ``` ### Q: Why does scraping fail for some titles? Common reasons: 1. **Region restrictions**: Title not available in your location 2. **Invalid URL**: Netflix URL format changed or incorrect 3. **Netflix changes**: HTML structure updated 4. **Network issues**: Connection problems or timeouts **Debug steps**: ```javascript async function debugScraping(url) { try { console.log('Testing URL:', url); // Test URL normalization const normalized = normalizeNetflixUrl(url); console.log('Normalized:', normalized); // Test with different configurations const configs = [ { headless: false, timeoutMs: 30000 }, { headless: true, timeoutMs: 30000 }, { headless: false, userAgent: 'different-ua' } ]; for (const config of configs) { try { const result = await scraperNetflix(url, config); console.log('✅ Success with config:', config, result.name); return result; } catch (error) { console.log('❌ Failed with config:', config, error.message); } } } catch (error) { console.error('Debug error:', error); } } ``` ### Q: How do I handle rate limiting? MetaScraper doesn't include built-in rate limiting, but you can implement it: ```javascript class RateLimiter { constructor(requestsPerSecond = 1) { this.delay = 1000 / requestsPerSecond; this.lastRequest = 0; } async wait() { const now = Date.now(); const timeSinceLastRequest = now - this.lastRequest; if (timeSinceLastRequest < this.delay) { const waitTime = this.delay - timeSinceLastRequest; await new Promise(resolve => setTimeout(resolve, waitTime)); } this.lastRequest = Date.now(); } } const rateLimiter = new RateLimiter(0.5); // 0.5 requests per second async function rateLimitedScrape(url) { await rateLimiter.wait(); return await scraperNetflix(url); } ``` ## 🔒 Legal & Ethical Questions ### Q: Is scraping Netflix legal? **Important**: Web scraping exists in a legal gray area. Consider: **✅ Generally Acceptable**: - Personal use and research - Educational purposes - Non-commercial applications - Respectful scraping (low frequency) **⚠️ Potentially Problematic**: - Commercial use without permission - High-frequency scraping - Competing with Netflix's services - Violating Netflix's Terms of Service **📋 Best Practices**: - Be respectful with request frequency - Don't scrape at commercial scale - Use results for personal/educational purposes - Consider Netflix's ToS ### Q: Does MetaScraper respect robots.txt? MetaScraper doesn't automatically check robots.txt, but you can: ```javascript import { robotsParser } from 'robots-parser'; async function scrapeWithRobotsCheck(url) { const robotsUrl = new URL('/robots.txt', url).href; const robots = robotsParser(robotsUrl, 'User-agent: *\nDisallow: /'); if (robots.isAllowed(url, 'MetaScraper')) { return await scraperNetflix(url); } else { throw new Error('Scraping disallowed by robots.txt'); } } ``` ## 📦 Development & Contributing ### Q: How can I contribute to MetaScraper? 1. **Report Issues**: Found bugs or new Turkish UI patterns 2. **Suggest Features**: Ideas for improvement 3. **Submit Pull Requests**: Code contributions 4. **Improve Documentation**: Better examples and guides ```bash # Development setup git clone https://github.com/username/flixscaper.git cd flixscaper npm install npm test npm run demo ``` ### Q: How do I add new Turkish UI patterns? If you discover new Turkish Netflix UI text patterns: 1. **Create an issue** with examples: ```markdown **New Pattern**: "yeni bölüm" **Example**: "Dizi Adı yeni bölüm | Netflix" **Expected**: "Dizi Adı" ``` 2. **Or submit a PR** adding the pattern: ```javascript // src/parser.js const TURKISH_UI_PATTERNS = [ // ... existing patterns /\s+yeni bölüm$/i, // Add new pattern ]; ``` ### Q: How can I test MetaScraper locally? ```bash # Clone repository git clone https://github.com/username/flixscaper.git cd flixscaper # Install dependencies npm install # Run tests npm test # Test with demo npm run demo # Test your own URLs node -e " import('./src/index.js').then(async (m) => { const result = await m.scraperNetflix('https://www.netflix.com/title/80189685'); console.log(result); }) " ``` ## 🔮 Future Questions ### Q: Will MetaScraper support other streaming platforms? Currently focused on Netflix, but the architecture could be adapted. If you're interested in other platforms, create an issue to discuss: - YouTube metadata extraction - Amazon Prime scraping - Disney+ integration - Multi-platform support ### Q: Is there a REST API version available? Not currently, but you could easily create one: ```javascript // Example Express.js server import express from 'express'; import { scraperNetflix } from 'metascraper'; const app = express(); app.use(express.json()); app.post('/scrape', async (req, res) => { try { const { url, options } = req.body; const result = await scraperNetflix(url, options); res.json(result); } catch (error) { res.status(500).json({ error: error.message }); } }); app.listen(3000, () => console.log('API server running on port 3000')); ``` --- ## 📞 Still Have Questions? - **Documentation**: Check the `/doc` directory for detailed guides - **Issues**: [GitHub Issues](https://github.com/username/flixscaper/issues) - **Examples**: See `local-demo.js` for usage patterns - **Testing**: Run `npm test` to see functionality in action --- *FAQ last updated: 2025-11-23*