13 KiB
MetaScraper Troubleshooting Guide
🚨 Common Issues & Solutions
1. Module Import Errors
❌ Error: Cannot resolve import 'flixscaper'
Problem: Cannot import the library in your project
import { scraperNetflix } from 'metascraper';
// Throws: Cannot resolve import 'flixscaper'
Causes & Solutions:
-
Not installed properly
npm install flixscaper # or yarn add flixscaper -
Using local development without proper path
// Instead of this: import { scraperNetflix } from 'metascraper'; // Use this for local development: import { scraperNetflix } from './src/index.js'; -
TypeScript configuration issue
// tsconfig.json { "compilerOptions": { "moduleResolution": "node", "allowSyntheticDefaultImports": true } }
❌ Error: Failed to load url ../globals-polyfill.mjs
Problem: Polyfill file missing after Node.js upgrade
Solution: The library has been updated to use a minimal polyfill. Ensure you're using the latest version:
npm update flixscaper
If still occurring, check your Node.js version:
node --version # Should be 18+
2. Network & Connection Issues
❌ Error: Request timed out while reaching Netflix
Problem: Network requests are timing out
Solutions:
-
Increase timeout
await scraperNetflix(url, { timeoutMs: 30000 // 30 seconds instead of 15 }); -
Check internet connection
# Test connectivity to Netflix curl -I https://www.netflix.com -
Use different User-Agent
await scraperNetflix(url, { userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' });
❌ Error: Netflix title not found (404)
Problem: Title ID doesn't exist or is not available
Solutions:
-
Verify URL is correct
// Test with known working URL await scraperNetflix('https://www.netflix.com/title/80189685'); -
Check title availability in your region
// Some titles are region-locked console.log('Title may not be available in your region'); -
Use browser to verify
- Open the URL in your browser
- If it shows 404 in browser, it's not a library issue
3. Parsing & Data Issues
❌ Error: Netflix sayfa meta verisi parse edilemedi
Problem: Cannot extract metadata from Netflix page
Causes & Solutions:
-
Netflix changed their HTML structure
// Enable headless mode to get JavaScript-rendered content await scraperNetflix(url, { headless: true }); -
Title has unusual formatting
// Debug by examining the HTML const html = await fetchStaticHtml(url); console.log(html.slice(0, 1000)); // First 1000 chars -
Missing JSON-LD data
- Netflix may have removed structured data
- Use headless mode as fallback
❌ Problem: Turkish UI text not being removed
Problem: Titles still contain Turkish UI text like "izlemenizi bekliyor"
Solutions:
-
Check if pattern is covered
import { cleanTitle } from 'flixscaper/parser'; const testTitle = "The Witcher izlemenizi bekliyor"; const cleaned = cleanTitle(testTitle); console.log('Cleaned:', cleaned); -
Add new pattern if needed
// If Netflix added new UI text, file an issue with: // 1. The problematic title // 2. The expected cleaned title // 3. The new UI pattern that needs to be added
4. Playwright/Browser Issues
❌ Error: Playwright is not installed
Problem: Headless mode not available
Solutions:
-
Install Playwright
npm install playwright npx playwright install chromium -
Use library without headless mode
await scraperNetflix(url, { headless: false }); -
Check if you really need headless mode
- Most titles work with static mode
- Only use headless if static parsing fails
❌ Error: Playwright chromium browser is unavailable
Problem: Chromium browser not installed
Solution:
npx playwright install chromium
❌ Error: Memory issues with Playwright
Problem: Browser automation using too much memory
Solutions:
-
Limit concurrent requests
const urls = ['url1', 'url2', 'url3']; // Process sequentially instead of parallel for (const url of urls) { const result = await scraperNetflix(url); // Process result } -
Close browser resources properly
- The library handles this automatically
- Ensure you're not calling Playwright directly
5. Environment & Compatibility Issues
❌ Error: File is not defined (Node.js 18)
Problem: Node.js 18 missing File API for undici
Solutions:
-
Use latest library version
npm update flixscaper -
Upgrade Node.js
# Upgrade to Node.js 20+ to avoid polyfill issues nvm install 20 nvm use 20 -
Manual polyfill (if needed)
import './src/polyfill.js'; // Include before library import import { scraperNetflix } from './src/index.js';
❌ Problem: Works on one machine but not another
Diagnosis Steps:
-
Check Node.js versions
node --version # Should be 18+ npm --version # Should be 8+ -
Check Netflix accessibility
curl -I "https://www.netflix.com/title/80189685" -
Compare User-Agent strings
console.log(navigator.userAgent); // Browser console.log(process.userAgent); // Node.js (may be undefined)
🔍 Debugging Techniques
1. Enable Verbose Logging
// Add debug logging to your code
async function debugScraping(url) {
console.log('🚀 Starting scrape for:', url);
try {
const result = await scraperNetflix(url, {
headless: false, // Try without browser first
timeoutMs: 30000
});
console.log('✅ Success:', result);
return result;
} catch (error) {
console.error('❌ Error details:', {
message: error.message,
stack: error.stack,
url: url
});
throw error;
}
}
2. Test with Known Working URLs
// Test with URLs that should definitely work
const testUrls = [
'https://www.netflix.com/title/80189685', // The Witcher
'https://www.netflix.com/title/82123114' // ONE SHOT
];
for (const url of testUrls) {
try {
const result = await scraperNetflix(url);
console.log(`✅ ${url}: ${result.name}`);
} catch (error) {
console.error(`❌ ${url}: ${error.message}`);
}
}
3. Isolate the Problem
// Test each component separately
import { normalizeNetflixUrl } from 'flixscaper/index';
import { parseNetflixHtml } from 'flixscaper/parser';
async function isolateProblem(url) {
try {
// 1. Test URL normalization
const normalized = normalizeNetflixUrl(url);
console.log('✅ URL normalized:', normalized);
// 2. Test HTML fetching
const html = await fetchStaticHtml(normalized);
console.log('✅ HTML fetched, length:', html.length);
// 3. Test parsing
const parsed = parseNetflixHtml(html);
console.log('✅ Parsed:', parsed);
} catch (error) {
console.error('❌ Step failed:', error.message);
}
}
4. Browser Mode Debugging
// Test with visible browser for debugging
const result = await scraperNetflix(url, {
headless: false, // Show browser window
timeoutMs: 60000 // Longer timeout for manual inspection
});
// Keep browser open by adding delay if needed
await new Promise(resolve => setTimeout(resolve, 5000));
🌍 Regional & Language Issues
Turkish Netflix Specific Issues
❌ Problem: Turkish URLs not working
Test different URL formats:
const turkishUrls = [
'https://www.netflix.com/title/80189685', // Standard
'https://www.netflix.com/tr/title/80189685', // Turkish subdomain
'https://www.netflix.com/tr/title/80189685?s=i', // With Turkish params
'https://www.netflix.com/tr/title/80189685?vlang=tr' // Turkish language
];
for (const url of turkishUrls) {
try {
const result = await scraperNetflix(url);
console.log(`✅ ${url}: ${result.name}`);
} catch (error) {
console.error(`❌ ${url}: ${error.message}`);
}
}
❌ Problem: New Turkish UI patterns not recognized
Report the issue with:
- Original title: What Netflix returned
- Expected title: What it should be after cleaning
- URL: The Netflix URL where this occurs
- Region: Your geographic location
Example issue report:
**URL**: https://www.netflix.com/tr/title/12345678
**Original**: "Dizi Adı yeni başlık | Netflix"
**Expected**: "Dizi Adı"
**Pattern to add**: "yeni başlık"
**Region**: Turkey
📊 Performance Issues
Slow Response Times
Diagnose the bottleneck:
import { performance } from 'node:perf_hooks';
async function profileScraping(url) {
const steps = {};
// URL Normalization
steps.normStart = performance.now();
const normalized = normalizeNetflixUrl(url);
steps.normEnd = performance.now();
// HTML Fetch
steps.fetchStart = performance.now();
const html = await fetchStaticHtml(normalized);
steps.fetchEnd = performance.now();
// Parsing
steps.parseStart = performance.now();
const parsed = parseNetflixHtml(html);
steps.parseEnd = performance.now();
console.log('Performance breakdown:', {
normalization: steps.normEnd - steps.normStart,
fetch: steps.fetchEnd - steps.fetchStart,
parsing: steps.parseEnd - steps.parseStart,
htmlSize: html.length
});
return parsed;
}
Optimization Solutions:
-
Disable headless mode (if not needed)
await scraperNetflix(url, { headless: false }); -
Reduce timeout (if network is fast)
await scraperNetflix(url, { timeoutMs: 5000 }); -
Cache results (for repeated requests)
const cache = new Map(); async function scrapeWithCache(url) { if (cache.has(url)) { return cache.get(url); } const result = await scraperNetflix(url); cache.set(url, result); return result; }
🔧 Common Fixes
Quick Fix Checklist
-
Update dependencies
npm update flixscaper npm update -
Clear npm cache
npm cache clean --force rm -rf node_modules package-lock.json npm install -
Check Node.js version
node --version # Should be 18+ # If older, upgrade: nvm install 20 && nvm use 20 -
Test with minimal example
import { scraperNetflix } from 'metascraper'; scraperNetflix('https://www.netflix.com/title/80189685') .then(result => console.log('Success:', result)) .catch(error => console.error('Error:', error.message)); -
Try different options
// If failing, try with different configurations const configs = [ { headless: false }, { headless: true, timeoutMs: 30000 }, { headless: false, userAgent: 'different-ua' } ]; for (const config of configs) { try { const result = await scraperNetflix(url, config); console.log('✅ Working config:', config); break; } catch (error) { console.log('❌ Failed config:', config, error.message); } }
📞 Getting Help
When to Report an Issue
Report an issue when:
- Previously working URL suddenly fails
- Error messages are unclear or unhelpful
- Turkish UI patterns not being removed
- Performance degrades significantly
- Documentation is unclear or incomplete
Issue Report Template
## Issue Description
Brief description of the problem
## Steps to Reproduce
1. URL used: ...
2. Code executed: ...
3. Expected result: ...
4. Actual result: ...
## Environment
- Node.js version: ...
- OS: ...
- flixscaper version: ...
- Browser (if relevant): ...
## Error Message
Paste full error message here
## Additional Context
Any additional information that might help
Debug Information to Include
// Include this information in issue reports
const debugInfo = {
nodeVersion: process.version,
platform: process.platform,
arch: process.arch,
flixscaperVersion: require('flixscaper/package.json').version,
timestamp: new Date().toISOString()
};
console.log('Debug Info:', JSON.stringify(debugInfo, null, 2));
Troubleshooting guide last updated: 2025-11-23