Files
metascraper/doc/TROUBLESHOOTING.md
2025-11-23 14:25:09 +03:00

13 KiB
Raw Blame History

MetaScraper Troubleshooting Guide

🚨 Common Issues & Solutions

1. Module Import Errors

Error: Cannot resolve import 'flixscaper'

Problem: Cannot import the library in your project

import { scraperNetflix } from 'metascraper';
// Throws: Cannot resolve import 'flixscaper'

Causes & Solutions:

  1. Not installed properly

    npm install flixscaper
    # or
    yarn add flixscaper
    
  2. Using local development without proper path

    // Instead of this:
    import { scraperNetflix } from 'metascraper';
    
    // Use this for local development:
    import { scraperNetflix } from './src/index.js';
    
  3. TypeScript configuration issue

    // tsconfig.json
    {
      "compilerOptions": {
        "moduleResolution": "node",
        "allowSyntheticDefaultImports": true
      }
    }
    

Error: Failed to load url ../globals-polyfill.mjs

Problem: Polyfill file missing after Node.js upgrade

Solution: The library has been updated to use a minimal polyfill. Ensure you're using the latest version:

npm update flixscaper

If still occurring, check your Node.js version:

node --version  # Should be 18+

2. Network & Connection Issues

Error: Request timed out while reaching Netflix

Problem: Network requests are timing out

Solutions:

  1. Increase timeout

    await scraperNetflix(url, {
      timeoutMs: 30000  // 30 seconds instead of 15
    });
    
  2. Check internet connection

    # Test connectivity to Netflix
    curl -I https://www.netflix.com
    
  3. Use different User-Agent

    await scraperNetflix(url, {
      userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    });
    

Error: Netflix title not found (404)

Problem: Title ID doesn't exist or is not available

Solutions:

  1. Verify URL is correct

    // Test with known working URL
    await scraperNetflix('https://www.netflix.com/title/80189685');
    
  2. Check title availability in your region

    // Some titles are region-locked
    console.log('Title may not be available in your region');
    
  3. Use browser to verify

    • Open the URL in your browser
    • If it shows 404 in browser, it's not a library issue

3. Parsing & Data Issues

Error: Netflix sayfa meta verisi parse edilemedi

Problem: Cannot extract metadata from Netflix page

Causes & Solutions:

  1. Netflix changed their HTML structure

    // Enable headless mode to get JavaScript-rendered content
    await scraperNetflix(url, { headless: true });
    
  2. Title has unusual formatting

    // Debug by examining the HTML
    const html = await fetchStaticHtml(url);
    console.log(html.slice(0, 1000)); // First 1000 chars
    
  3. Missing JSON-LD data

    • Netflix may have removed structured data
    • Use headless mode as fallback

Problem: Turkish UI text not being removed

Problem: Titles still contain Turkish UI text like "izlemenizi bekliyor"

Solutions:

  1. Check if pattern is covered

    import { cleanTitle } from 'flixscaper/parser';
    
    const testTitle = "The Witcher izlemenizi bekliyor";
    const cleaned = cleanTitle(testTitle);
    console.log('Cleaned:', cleaned);
    
  2. Add new pattern if needed

    // If Netflix added new UI text, file an issue with:
    // 1. The problematic title
    // 2. The expected cleaned title
    // 3. The new UI pattern that needs to be added
    

4. Playwright/Browser Issues

Error: Playwright is not installed

Problem: Headless mode not available

Solutions:

  1. Install Playwright

    npm install playwright
    npx playwright install chromium
    
  2. Use library without headless mode

    await scraperNetflix(url, { headless: false });
    
  3. Check if you really need headless mode

    • Most titles work with static mode
    • Only use headless if static parsing fails

Error: Playwright chromium browser is unavailable

Problem: Chromium browser not installed

Solution:

npx playwright install chromium

Error: Memory issues with Playwright

Problem: Browser automation using too much memory

Solutions:

  1. Limit concurrent requests

    const urls = ['url1', 'url2', 'url3'];
    
    // Process sequentially instead of parallel
    for (const url of urls) {
      const result = await scraperNetflix(url);
      // Process result
    }
    
  2. Close browser resources properly

    • The library handles this automatically
    • Ensure you're not calling Playwright directly

5. Environment & Compatibility Issues

Error: File is not defined (Node.js 18)

Problem: Node.js 18 missing File API for undici

Solutions:

  1. Use latest library version

    npm update flixscaper
    
  2. Upgrade Node.js

    # Upgrade to Node.js 20+ to avoid polyfill issues
    nvm install 20
    nvm use 20
    
  3. Manual polyfill (if needed)

    import './src/polyfill.js';  // Include before library import
    import { scraperNetflix } from './src/index.js';
    

Problem: Works on one machine but not another

Diagnosis Steps:

  1. Check Node.js versions

    node --version  # Should be 18+
    npm --version   # Should be 8+
    
  2. Check Netflix accessibility

    curl -I "https://www.netflix.com/title/80189685"
    
  3. Compare User-Agent strings

    console.log(navigator.userAgent);  // Browser
    console.log(process.userAgent);    // Node.js (may be undefined)
    

🔍 Debugging Techniques

1. Enable Verbose Logging

// Add debug logging to your code
async function debugScraping(url) {
  console.log('🚀 Starting scrape for:', url);

  try {
    const result = await scraperNetflix(url, {
      headless: false,  // Try without browser first
      timeoutMs: 30000
    });

    console.log('✅ Success:', result);
    return result;
  } catch (error) {
    console.error('❌ Error details:', {
      message: error.message,
      stack: error.stack,
      url: url
    });
    throw error;
  }
}

2. Test with Known Working URLs

// Test with URLs that should definitely work
const testUrls = [
  'https://www.netflix.com/title/80189685',  // The Witcher
  'https://www.netflix.com/title/82123114'   // ONE SHOT
];

for (const url of testUrls) {
  try {
    const result = await scraperNetflix(url);
    console.log(`✅ ${url}: ${result.name}`);
  } catch (error) {
    console.error(`❌ ${url}: ${error.message}`);
  }
}

3. Isolate the Problem

// Test each component separately
import { normalizeNetflixUrl } from 'flixscaper/index';
import { parseNetflixHtml } from 'flixscaper/parser';

async function isolateProblem(url) {
  try {
    // 1. Test URL normalization
    const normalized = normalizeNetflixUrl(url);
    console.log('✅ URL normalized:', normalized);

    // 2. Test HTML fetching
    const html = await fetchStaticHtml(normalized);
    console.log('✅ HTML fetched, length:', html.length);

    // 3. Test parsing
    const parsed = parseNetflixHtml(html);
    console.log('✅ Parsed:', parsed);

  } catch (error) {
    console.error('❌ Step failed:', error.message);
  }
}

4. Browser Mode Debugging

// Test with visible browser for debugging
const result = await scraperNetflix(url, {
  headless: false,     // Show browser window
  timeoutMs: 60000     // Longer timeout for manual inspection
});

// Keep browser open by adding delay if needed
await new Promise(resolve => setTimeout(resolve, 5000));

🌍 Regional & Language Issues

Turkish Netflix Specific Issues

Problem: Turkish URLs not working

Test different URL formats:

const turkishUrls = [
  'https://www.netflix.com/title/80189685',           // Standard
  'https://www.netflix.com/tr/title/80189685',       // Turkish subdomain
  'https://www.netflix.com/tr/title/80189685?s=i',   // With Turkish params
  'https://www.netflix.com/tr/title/80189685?vlang=tr' // Turkish language
];

for (const url of turkishUrls) {
  try {
    const result = await scraperNetflix(url);
    console.log(`✅ ${url}: ${result.name}`);
  } catch (error) {
    console.error(`❌ ${url}: ${error.message}`);
  }
}

Problem: New Turkish UI patterns not recognized

Report the issue with:

  1. Original title: What Netflix returned
  2. Expected title: What it should be after cleaning
  3. URL: The Netflix URL where this occurs
  4. Region: Your geographic location

Example issue report:

**URL**: https://www.netflix.com/tr/title/12345678
**Original**: "Dizi Adı yeni başlık | Netflix"
**Expected**: "Dizi Adı"
**Pattern to add**: "yeni başlık"
**Region**: Turkey

📊 Performance Issues

Slow Response Times

Diagnose the bottleneck:

import { performance } from 'node:perf_hooks';

async function profileScraping(url) {
  const steps = {};

  // URL Normalization
  steps.normStart = performance.now();
  const normalized = normalizeNetflixUrl(url);
  steps.normEnd = performance.now();

  // HTML Fetch
  steps.fetchStart = performance.now();
  const html = await fetchStaticHtml(normalized);
  steps.fetchEnd = performance.now();

  // Parsing
  steps.parseStart = performance.now();
  const parsed = parseNetflixHtml(html);
  steps.parseEnd = performance.now();

  console.log('Performance breakdown:', {
    normalization: steps.normEnd - steps.normStart,
    fetch: steps.fetchEnd - steps.fetchStart,
    parsing: steps.parseEnd - steps.parseStart,
    htmlSize: html.length
  });

  return parsed;
}

Optimization Solutions:

  1. Disable headless mode (if not needed)

    await scraperNetflix(url, { headless: false });
    
  2. Reduce timeout (if network is fast)

    await scraperNetflix(url, { timeoutMs: 5000 });
    
  3. Cache results (for repeated requests)

    const cache = new Map();
    
    async function scrapeWithCache(url) {
      if (cache.has(url)) {
        return cache.get(url);
      }
    
      const result = await scraperNetflix(url);
      cache.set(url, result);
      return result;
    }
    

🔧 Common Fixes

Quick Fix Checklist

  1. Update dependencies

    npm update flixscaper
    npm update
    
  2. Clear npm cache

    npm cache clean --force
    rm -rf node_modules package-lock.json
    npm install
    
  3. Check Node.js version

    node --version  # Should be 18+
    # If older, upgrade: nvm install 20 && nvm use 20
    
  4. Test with minimal example

    import { scraperNetflix } from 'metascraper';
    
    scraperNetflix('https://www.netflix.com/title/80189685')
      .then(result => console.log('Success:', result))
      .catch(error => console.error('Error:', error.message));
    
  5. Try different options

    // If failing, try with different configurations
    const configs = [
      { headless: false },
      { headless: true, timeoutMs: 30000 },
      { headless: false, userAgent: 'different-ua' }
    ];
    
    for (const config of configs) {
      try {
        const result = await scraperNetflix(url, config);
        console.log('✅ Working config:', config);
        break;
      } catch (error) {
        console.log('❌ Failed config:', config, error.message);
      }
    }
    

📞 Getting Help

When to Report an Issue

Report an issue when:

  1. Previously working URL suddenly fails
  2. Error messages are unclear or unhelpful
  3. Turkish UI patterns not being removed
  4. Performance degrades significantly
  5. Documentation is unclear or incomplete

Issue Report Template

## Issue Description
Brief description of the problem

## Steps to Reproduce
1. URL used: ...
2. Code executed: ...
3. Expected result: ...
4. Actual result: ...

## Environment
- Node.js version: ...
- OS: ...
- flixscaper version: ...
- Browser (if relevant): ...

## Error Message

Paste full error message here


## Additional Context
Any additional information that might help

Debug Information to Include

// Include this information in issue reports
const debugInfo = {
  nodeVersion: process.version,
  platform: process.platform,
  arch: process.arch,
  flixscaperVersion: require('flixscaper/package.json').version,
  timestamp: new Date().toISOString()
};

console.log('Debug Info:', JSON.stringify(debugInfo, null, 2));

Troubleshooting guide last updated: 2025-11-23