12 KiB
MetaScraper Frequently Asked Questions (FAQ)
🚀 Getting Started
Q: How do I install MetaScraper?
npm install metascraper
Q: What are the system requirements?
Node.js: 18+ (recommended 20+) Memory: Minimum 50MB for static mode, 200MB+ for headless mode Network: Internet connection to Netflix
# Check your Node.js version
node --version # Should be 18.x or higher
Q: Does MetaScraper work with TypeScript?
Yes! MetaScraper provides TypeScript support out of the box:
import { scraperNetflix } from 'metascraper';
interface NetflixMetadata {
url: string;
id: string;
name: string;
year: string | number | undefined;
seasons: string | null;
}
const result: Promise<NetflixMetadata> = scraperNetflix('https://www.netflix.com/title/80189685');
🔧 Technical Questions
Q: What's the difference between static and headless mode?
Static Mode (default):
- ✅ Faster (200-500ms)
- ✅ Lower memory usage
- ✅ No browser required
- ⚠️ 85% success rate
Headless Mode (fallback):
- ✅ Higher success rate (99%)
- ✅ Handles JavaScript-rendered content
- ❌ Slower (2-5 seconds)
- ❌ Requires Playwright
// Force static mode only
await scraperNetflix(url, { headless: false });
// Enable headless fallback
await scraperNetflix(url, { headless: true });
Q: Do I need to install Playwright?
No, Playwright is optional. MetaScraper works without it using static HTML parsing.
Install Playwright only if:
- You need higher success rates
- Static mode fails for specific titles
- You want JavaScript-rendered content
# Optional: Install for better success rates
npm install playwright
npx playwright install chromium
Q: Can MetaScraper work in the browser?
Not currently. MetaScraper is designed for Node.js environments due to:
- CORS restrictions in browsers
- Netflix's bot protection
- Node.js-specific APIs (fetch, cheerio)
For browser usage, consider:
- Creating a proxy API server
- Using serverless functions
- Implementing browser-based scraping separately
Q: How does MetaScraper handle Netflix's bot protection?
MetaScraper uses several techniques:
- Realistic User-Agent strings that mimic regular browsers
- Proper HTTP headers including Accept-Language
- Rate limiting considerations to avoid detection
- JavaScript rendering (when needed) to appear more human
const result = await scraperNetflix(url, {
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
});
🌍 Localization & Turkish Support
Q: What Turkish UI patterns does MetaScraper remove?
MetaScraper removes these Turkish Netflix UI patterns:
| Pattern | English Equivalent | Example |
|---|---|---|
izlemenizi bekliyor |
"waiting for you to watch" | "The Witcher izlemenizi bekliyor" |
izleyin |
"watch" | "Dark izleyin" |
devam et |
"continue" | "Money Heist devam et" |
başla |
"start" | "Stranger Things başla" |
izlemeye devam |
"continue watching" | "The Crown izlemeye devam" |
Q: Does MetaScraper support other languages?
Currently optimized for Turkish Netflix interfaces, but also removes universal English patterns:
- ✅ Turkish: Full support with specific patterns
- ✅ English: Basic UI text removal
- 🔄 Other languages: Can be extended (file an issue)
Q: What about regional Netflix content?
MetaScraper works globally but:
- Content availability varies by region
- Some titles may be region-locked
- URL formats work universally
// Test different regional URLs
const regionalUrls = [
'https://www.netflix.com/title/80189685', // Global
'https://www.netflix.com/tr/title/80189685', // Turkey
'https://www.netflix.com/us/title/80189685' // US
];
⚡ Performance & Usage
Q: How fast is MetaScraper?
Response Times:
- Static mode: 200-500ms
- Headless fallback: 2-5 seconds
- Batch processing: 10-50 URLs per second (static mode)
Resource Usage:
- Memory: <50MB (static), 100-200MB (headless)
- CPU: Low impact for normal usage
- Network: 1 HTTP request per title
// Performance monitoring
import { performance } from 'node:perf_hooks';
const start = performance.now();
await scraperNetflix(url);
const duration = performance.now() - start;
console.log(`Scraping took ${duration}ms`);
Q: Can I use MetaScraper for bulk scraping?
Yes, but consider:
// Good: Sequential processing with delays
async function bulkScrape(urls) {
const results = [];
for (const url of urls) {
const result = await scraperNetflix(url);
results.push(result);
// Be respectful: add delay between requests
await new Promise(resolve => setTimeout(resolve, 1000));
}
return results;
}
// Better: Concurrent processing with limits
async function concurrentScrape(urls, concurrency = 5) {
const chunks = [];
for (let i = 0; i < urls.length; i += concurrency) {
chunks.push(urls.slice(i, i + concurrency));
}
const results = [];
for (const chunk of chunks) {
const chunkResults = await Promise.allSettled(
chunk.map(url => scraperNetflix(url, { headless: false }))
);
results.push(...chunkResults);
// Delay between chunks
await new Promise(resolve => setTimeout(resolve, 2000));
}
return results;
}
Q: Does MetaScraper cache results?
No built-in caching, but easy to implement:
// Simple cache implementation
const cache = new Map();
const CACHE_TTL = 30 * 60 * 1000; // 30 minutes
async function scrapeWithCache(url, options = {}) {
const cacheKey = `${url}:${JSON.stringify(options)}`;
if (cache.has(cacheKey)) {
const { data, timestamp } = cache.get(cacheKey);
if (Date.now() - timestamp < CACHE_TTL) {
return data;
}
}
const result = await scraperNetflix(url, options);
cache.set(cacheKey, { data: result, timestamp: Date.now() });
return result;
}
🛠️ Troubleshooting
Q: Why am I getting "File is not defined" errors?
This happens on Node.js 18 without proper polyfills:
# Solution 1: Update to Node.js 20+
nvm install 20
nvm use 20
# Solution 2: Use latest MetaScraper version
npm update metascraper
Q: Why does scraping fail for some titles?
Common reasons:
- Region restrictions: Title not available in your location
- Invalid URL: Netflix URL format changed or incorrect
- Netflix changes: HTML structure updated
- Network issues: Connection problems or timeouts
Debug steps:
async function debugScraping(url) {
try {
console.log('Testing URL:', url);
// Test URL normalization
const normalized = normalizeNetflixUrl(url);
console.log('Normalized:', normalized);
// Test with different configurations
const configs = [
{ headless: false, timeoutMs: 30000 },
{ headless: true, timeoutMs: 30000 },
{ headless: false, userAgent: 'different-ua' }
];
for (const config of configs) {
try {
const result = await scraperNetflix(url, config);
console.log('✅ Success with config:', config, result.name);
return result;
} catch (error) {
console.log('❌ Failed with config:', config, error.message);
}
}
} catch (error) {
console.error('Debug error:', error);
}
}
Q: How do I handle rate limiting?
MetaScraper doesn't include built-in rate limiting, but you can implement it:
class RateLimiter {
constructor(requestsPerSecond = 1) {
this.delay = 1000 / requestsPerSecond;
this.lastRequest = 0;
}
async wait() {
const now = Date.now();
const timeSinceLastRequest = now - this.lastRequest;
if (timeSinceLastRequest < this.delay) {
const waitTime = this.delay - timeSinceLastRequest;
await new Promise(resolve => setTimeout(resolve, waitTime));
}
this.lastRequest = Date.now();
}
}
const rateLimiter = new RateLimiter(0.5); // 0.5 requests per second
async function rateLimitedScrape(url) {
await rateLimiter.wait();
return await scraperNetflix(url);
}
🔒 Legal & Ethical Questions
Q: Is scraping Netflix legal?
Important: Web scraping exists in a legal gray area. Consider:
✅ Generally Acceptable:
- Personal use and research
- Educational purposes
- Non-commercial applications
- Respectful scraping (low frequency)
⚠️ Potentially Problematic:
- Commercial use without permission
- High-frequency scraping
- Competing with Netflix's services
- Violating Netflix's Terms of Service
📋 Best Practices:
- Be respectful with request frequency
- Don't scrape at commercial scale
- Use results for personal/educational purposes
- Consider Netflix's ToS
Q: Does MetaScraper respect robots.txt?
MetaScraper doesn't automatically check robots.txt, but you can:
import { robotsParser } from 'robots-parser';
async function scrapeWithRobotsCheck(url) {
const robotsUrl = new URL('/robots.txt', url).href;
const robots = robotsParser(robotsUrl, 'User-agent: *\nDisallow: /');
if (robots.isAllowed(url, 'MetaScraper')) {
return await scraperNetflix(url);
} else {
throw new Error('Scraping disallowed by robots.txt');
}
}
📦 Development & Contributing
Q: How can I contribute to MetaScraper?
- Report Issues: Found bugs or new Turkish UI patterns
- Suggest Features: Ideas for improvement
- Submit Pull Requests: Code contributions
- Improve Documentation: Better examples and guides
# Development setup
git clone https://github.com/username/flixscaper.git
cd flixscaper
npm install
npm test
npm run demo
Q: How do I add new Turkish UI patterns?
If you discover new Turkish Netflix UI text patterns:
-
Create an issue with examples:
**New Pattern**: "yeni bölüm" **Example**: "Dizi Adı yeni bölüm | Netflix" **Expected**: "Dizi Adı" -
Or submit a PR adding the pattern:
// src/parser.js const TURKISH_UI_PATTERNS = [ // ... existing patterns /\s+yeni bölüm$/i, // Add new pattern ];
Q: How can I test MetaScraper locally?
# Clone repository
git clone https://github.com/username/flixscaper.git
cd flixscaper
# Install dependencies
npm install
# Run tests
npm test
# Test with demo
npm run demo
# Test your own URLs
node -e "
import('./src/index.js').then(async (m) => {
const result = await m.scraperNetflix('https://www.netflix.com/title/80189685');
console.log(result);
})
"
🔮 Future Questions
Q: Will MetaScraper support other streaming platforms?
Currently focused on Netflix, but the architecture could be adapted. If you're interested in other platforms, create an issue to discuss:
- YouTube metadata extraction
- Amazon Prime scraping
- Disney+ integration
- Multi-platform support
Q: Is there a REST API version available?
Not currently, but you could easily create one:
// Example Express.js server
import express from 'express';
import { scraperNetflix } from 'metascraper';
const app = express();
app.use(express.json());
app.post('/scrape', async (req, res) => {
try {
const { url, options } = req.body;
const result = await scraperNetflix(url, options);
res.json(result);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => console.log('API server running on port 3000'));
📞 Still Have Questions?
- Documentation: Check the
/docdirectory for detailed guides - Issues: GitHub Issues
- Examples: See
local-demo.jsfor usage patterns - Testing: Run
npm testto see functionality in action
FAQ last updated: 2025-11-23