# MetaScraper Troubleshooting Guide ## 🚨 Common Issues & Solutions ### 1. Module Import Errors #### ❌ Error: `Cannot resolve import 'flixscaper'` **Problem**: Cannot import the library in your project ```javascript import { scraperNetflix } from 'metascraper'; // Throws: Cannot resolve import 'flixscaper' ``` **Causes & Solutions**: 1. **Not installed properly** ```bash npm install flixscaper # or yarn add flixscaper ``` 2. **Using local development without proper path** ```javascript // Instead of this: import { scraperNetflix } from 'metascraper'; // Use this for local development: import { scraperNetflix } from './src/index.js'; ``` 3. **TypeScript configuration issue** ```json // tsconfig.json { "compilerOptions": { "moduleResolution": "node", "allowSyntheticDefaultImports": true } } ``` #### ❌ Error: `Failed to load url ../globals-polyfill.mjs` **Problem**: Polyfill file missing after Node.js upgrade **Solution**: The library has been updated to use a minimal polyfill. Ensure you're using the latest version: ```bash npm update flixscaper ``` If still occurring, check your Node.js version: ```bash node --version # Should be 18+ ``` ### 2. Network & Connection Issues #### ❌ Error: `Request timed out while reaching Netflix` **Problem**: Network requests are timing out **Solutions**: 1. **Increase timeout** ```javascript await scraperNetflix(url, { timeoutMs: 30000 // 30 seconds instead of 15 }); ``` 2. **Check internet connection** ```bash # Test connectivity to Netflix curl -I https://www.netflix.com ``` 3. **Use different User-Agent** ```javascript await scraperNetflix(url, { userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' }); ``` #### ❌ Error: `Netflix title not found (404)` **Problem**: Title ID doesn't exist or is not available **Solutions**: 1. **Verify URL is correct** ```javascript // Test with known working URL await scraperNetflix('https://www.netflix.com/title/80189685'); ``` 2. **Check title availability in your region** ```javascript // Some titles are region-locked console.log('Title may not be available in your region'); ``` 3. **Use browser to verify** - Open the URL in your browser - If it shows 404 in browser, it's not a library issue ### 3. Parsing & Data Issues #### ❌ Error: `Netflix sayfa meta verisi parse edilemedi` **Problem**: Cannot extract metadata from Netflix page **Causes & Solutions**: 1. **Netflix changed their HTML structure** ```javascript // Enable headless mode to get JavaScript-rendered content await scraperNetflix(url, { headless: true }); ``` 2. **Title has unusual formatting** ```javascript // Debug by examining the HTML const html = await fetchStaticHtml(url); console.log(html.slice(0, 1000)); // First 1000 chars ``` 3. **Missing JSON-LD data** - Netflix may have removed structured data - Use headless mode as fallback #### ❌ Problem: Turkish UI text not being removed **Problem**: Titles still contain Turkish UI text like "izlemenizi bekliyor" **Solutions**: 1. **Check if pattern is covered** ```javascript import { cleanTitle } from 'flixscaper/parser'; const testTitle = "The Witcher izlemenizi bekliyor"; const cleaned = cleanTitle(testTitle); console.log('Cleaned:', cleaned); ``` 2. **Add new pattern if needed** ```javascript // If Netflix added new UI text, file an issue with: // 1. The problematic title // 2. The expected cleaned title // 3. The new UI pattern that needs to be added ``` ### 4. Playwright/Browser Issues #### ❌ Error: `Playwright is not installed` **Problem**: Headless mode not available **Solutions**: 1. **Install Playwright** ```bash npm install playwright npx playwright install chromium ``` 2. **Use library without headless mode** ```javascript await scraperNetflix(url, { headless: false }); ``` 3. **Check if you really need headless mode** - Most titles work with static mode - Only use headless if static parsing fails #### ❌ Error: `Playwright chromium browser is unavailable` **Problem**: Chromium browser not installed **Solution**: ```bash npx playwright install chromium ``` #### ❌ Error: Memory issues with Playwright **Problem**: Browser automation using too much memory **Solutions**: 1. **Limit concurrent requests** ```javascript const urls = ['url1', 'url2', 'url3']; // Process sequentially instead of parallel for (const url of urls) { const result = await scraperNetflix(url); // Process result } ``` 2. **Close browser resources properly** - The library handles this automatically - Ensure you're not calling Playwright directly ### 5. Environment & Compatibility Issues #### ❌ Error: `File is not defined` (Node.js 18) **Problem**: Node.js 18 missing File API for undici **Solutions**: 1. **Use latest library version** ```bash npm update flixscaper ``` 2. **Upgrade Node.js** ```bash # Upgrade to Node.js 20+ to avoid polyfill issues nvm install 20 nvm use 20 ``` 3. **Manual polyfill (if needed)** ```javascript import './src/polyfill.js'; // Include before library import import { scraperNetflix } from './src/index.js'; ``` #### ❌ Problem: Works on one machine but not another **Diagnosis Steps**: 1. **Check Node.js versions** ```bash node --version # Should be 18+ npm --version # Should be 8+ ``` 2. **Check Netflix accessibility** ```bash curl -I "https://www.netflix.com/title/80189685" ``` 3. **Compare User-Agent strings** ```javascript console.log(navigator.userAgent); // Browser console.log(process.userAgent); // Node.js (may be undefined) ``` ## 🔍 Debugging Techniques ### 1. Enable Verbose Logging ```javascript // Add debug logging to your code async function debugScraping(url) { console.log('🚀 Starting scrape for:', url); try { const result = await scraperNetflix(url, { headless: false, // Try without browser first timeoutMs: 30000 }); console.log('✅ Success:', result); return result; } catch (error) { console.error('❌ Error details:', { message: error.message, stack: error.stack, url: url }); throw error; } } ``` ### 2. Test with Known Working URLs ```javascript // Test with URLs that should definitely work const testUrls = [ 'https://www.netflix.com/title/80189685', // The Witcher 'https://www.netflix.com/title/82123114' // ONE SHOT ]; for (const url of testUrls) { try { const result = await scraperNetflix(url); console.log(`✅ ${url}: ${result.name}`); } catch (error) { console.error(`❌ ${url}: ${error.message}`); } } ``` ### 3. Isolate the Problem ```javascript // Test each component separately import { normalizeNetflixUrl } from 'flixscaper/index'; import { parseNetflixHtml } from 'flixscaper/parser'; async function isolateProblem(url) { try { // 1. Test URL normalization const normalized = normalizeNetflixUrl(url); console.log('✅ URL normalized:', normalized); // 2. Test HTML fetching const html = await fetchStaticHtml(normalized); console.log('✅ HTML fetched, length:', html.length); // 3. Test parsing const parsed = parseNetflixHtml(html); console.log('✅ Parsed:', parsed); } catch (error) { console.error('❌ Step failed:', error.message); } } ``` ### 4. Browser Mode Debugging ```javascript // Test with visible browser for debugging const result = await scraperNetflix(url, { headless: false, // Show browser window timeoutMs: 60000 // Longer timeout for manual inspection }); // Keep browser open by adding delay if needed await new Promise(resolve => setTimeout(resolve, 5000)); ``` ## 🌍 Regional & Language Issues ### Turkish Netflix Specific Issues #### ❌ Problem: Turkish URLs not working **Test different URL formats**: ```javascript const turkishUrls = [ 'https://www.netflix.com/title/80189685', // Standard 'https://www.netflix.com/tr/title/80189685', // Turkish subdomain 'https://www.netflix.com/tr/title/80189685?s=i', // With Turkish params 'https://www.netflix.com/tr/title/80189685?vlang=tr' // Turkish language ]; for (const url of turkishUrls) { try { const result = await scraperNetflix(url); console.log(`✅ ${url}: ${result.name}`); } catch (error) { console.error(`❌ ${url}: ${error.message}`); } } ``` #### ❌ Problem: New Turkish UI patterns not recognized **Report the issue with**: 1. **Original title**: What Netflix returned 2. **Expected title**: What it should be after cleaning 3. **URL**: The Netflix URL where this occurs 4. **Region**: Your geographic location Example issue report: ```markdown **URL**: https://www.netflix.com/tr/title/12345678 **Original**: "Dizi Adı yeni başlık | Netflix" **Expected**: "Dizi Adı" **Pattern to add**: "yeni başlık" **Region**: Turkey ``` ## 📊 Performance Issues ### Slow Response Times #### Diagnose the bottleneck: ```javascript import { performance } from 'node:perf_hooks'; async function profileScraping(url) { const steps = {}; // URL Normalization steps.normStart = performance.now(); const normalized = normalizeNetflixUrl(url); steps.normEnd = performance.now(); // HTML Fetch steps.fetchStart = performance.now(); const html = await fetchStaticHtml(normalized); steps.fetchEnd = performance.now(); // Parsing steps.parseStart = performance.now(); const parsed = parseNetflixHtml(html); steps.parseEnd = performance.now(); console.log('Performance breakdown:', { normalization: steps.normEnd - steps.normStart, fetch: steps.fetchEnd - steps.fetchStart, parsing: steps.parseEnd - steps.parseStart, htmlSize: html.length }); return parsed; } ``` #### Optimization Solutions: 1. **Disable headless mode** (if not needed) ```javascript await scraperNetflix(url, { headless: false }); ``` 2. **Reduce timeout** (if network is fast) ```javascript await scraperNetflix(url, { timeoutMs: 5000 }); ``` 3. **Cache results** (for repeated requests) ```javascript const cache = new Map(); async function scrapeWithCache(url) { if (cache.has(url)) { return cache.get(url); } const result = await scraperNetflix(url); cache.set(url, result); return result; } ``` ## 🔧 Common Fixes ### Quick Fix Checklist 1. **Update dependencies** ```bash npm update flixscaper npm update ``` 2. **Clear npm cache** ```bash npm cache clean --force rm -rf node_modules package-lock.json npm install ``` 3. **Check Node.js version** ```bash node --version # Should be 18+ # If older, upgrade: nvm install 20 && nvm use 20 ``` 4. **Test with minimal example** ```javascript import { scraperNetflix } from 'metascraper'; scraperNetflix('https://www.netflix.com/title/80189685') .then(result => console.log('Success:', result)) .catch(error => console.error('Error:', error.message)); ``` 5. **Try different options** ```javascript // If failing, try with different configurations const configs = [ { headless: false }, { headless: true, timeoutMs: 30000 }, { headless: false, userAgent: 'different-ua' } ]; for (const config of configs) { try { const result = await scraperNetflix(url, config); console.log('✅ Working config:', config); break; } catch (error) { console.log('❌ Failed config:', config, error.message); } } ``` ## 📞 Getting Help ### When to Report an Issue Report an issue when: 1. **Previously working URL suddenly fails** 2. **Error messages are unclear or unhelpful** 3. **Turkish UI patterns not being removed** 4. **Performance degrades significantly** 5. **Documentation is unclear or incomplete** ### Issue Report Template ```markdown ## Issue Description Brief description of the problem ## Steps to Reproduce 1. URL used: ... 2. Code executed: ... 3. Expected result: ... 4. Actual result: ... ## Environment - Node.js version: ... - OS: ... - flixscaper version: ... - Browser (if relevant): ... ## Error Message ``` Paste full error message here ``` ## Additional Context Any additional information that might help ``` ### Debug Information to Include ```javascript // Include this information in issue reports const debugInfo = { nodeVersion: process.version, platform: process.platform, arch: process.arch, flixscaperVersion: require('flixscaper/package.json').version, timestamp: new Date().toISOString() }; console.log('Debug Info:', JSON.stringify(debugInfo, null, 2)); ``` --- *Troubleshooting guide last updated: 2025-11-23*