wisecolt/metascraper

Fork 0

Files

sbilketay 46d75b64d5 first commit

2025-11-23 14:25:09 +03:00

13 KiB

Raw Blame History

MetaScraper Troubleshooting Guide

🚨 Common Issues & Solutions

1. Module Import Errors

❌ Error: `Cannot resolve import 'flixscaper'`

Problem: Cannot import the library in your project

import { scraperNetflix } from 'metascraper';
// Throws: Cannot resolve import 'flixscaper'

Causes & Solutions:

Not installed properly

npm install flixscaper
# or
yarn add flixscaper

Using local development without proper path

// Instead of this:
import { scraperNetflix } from 'metascraper';

// Use this for local development:
import { scraperNetflix } from './src/index.js';

TypeScript configuration issue

// tsconfig.json
{
  "compilerOptions": {
    "moduleResolution": "node",
    "allowSyntheticDefaultImports": true
  }
}

❌ Error: `Failed to load url ../globals-polyfill.mjs`

Problem: Polyfill file missing after Node.js upgrade

Solution: The library has been updated to use a minimal polyfill. Ensure you're using the latest version:

npm update flixscaper

If still occurring, check your Node.js version:

node --version  # Should be 18+

2. Network & Connection Issues

❌ Error: `Request timed out while reaching Netflix`

Problem: Network requests are timing out

Solutions:

Increase timeout

await scraperNetflix(url, {
  timeoutMs: 30000  // 30 seconds instead of 15
});

Check internet connection

# Test connectivity to Netflix
curl -I https://www.netflix.com

Use different User-Agent

await scraperNetflix(url, {
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
});

❌ Error: `Netflix title not found (404)`

Problem: Title ID doesn't exist or is not available

Solutions:

Verify URL is correct

// Test with known working URL
await scraperNetflix('https://www.netflix.com/title/80189685');

Check title availability in your region

// Some titles are region-locked
console.log('Title may not be available in your region');

Use browser to verify
- Open the URL in your browser
- If it shows 404 in browser, it's not a library issue

3. Parsing & Data Issues

❌ Error: `Netflix sayfa meta verisi parse edilemedi`

Problem: Cannot extract metadata from Netflix page

Causes & Solutions:

Netflix changed their HTML structure

// Enable headless mode to get JavaScript-rendered content
await scraperNetflix(url, { headless: true });

Title has unusual formatting

// Debug by examining the HTML
const html = await fetchStaticHtml(url);
console.log(html.slice(0, 1000)); // First 1000 chars

Missing JSON-LD data
- Netflix may have removed structured data
- Use headless mode as fallback

❌ Problem: Turkish UI text not being removed

Problem: Titles still contain Turkish UI text like "izlemenizi bekliyor"

Solutions:

Check if pattern is covered

import { cleanTitle } from 'flixscaper/parser';

const testTitle = "The Witcher izlemenizi bekliyor";
const cleaned = cleanTitle(testTitle);
console.log('Cleaned:', cleaned);

Add new pattern if needed

// If Netflix added new UI text, file an issue with:
// 1. The problematic title
// 2. The expected cleaned title
// 3. The new UI pattern that needs to be added

4. Playwright/Browser Issues

❌ Error: `Playwright is not installed`

Problem: Headless mode not available

Solutions:

Install Playwright

npm install playwright
npx playwright install chromium

Use library without headless mode

await scraperNetflix(url, { headless: false });

Check if you really need headless mode
- Most titles work with static mode
- Only use headless if static parsing fails

❌ Error: `Playwright chromium browser is unavailable`

Problem: Chromium browser not installed

Solution:

npx playwright install chromium

❌ Error: Memory issues with Playwright

Problem: Browser automation using too much memory

Solutions:

Limit concurrent requests

const urls = ['url1', 'url2', 'url3'];

// Process sequentially instead of parallel
for (const url of urls) {
  const result = await scraperNetflix(url);
  // Process result
}

Close browser resources properly
- The library handles this automatically
- Ensure you're not calling Playwright directly

5. Environment & Compatibility Issues

❌ Error: `File is not defined` (Node.js 18)

Problem: Node.js 18 missing File API for undici

Solutions:

Use latest library version
```
npm update flixscaper
```

Upgrade Node.js

# Upgrade to Node.js 20+ to avoid polyfill issues
nvm install 20
nvm use 20

Manual polyfill (if needed)

import './src/polyfill.js';  // Include before library import
import { scraperNetflix } from './src/index.js';

❌ Problem: Works on one machine but not another

Diagnosis Steps:

Check Node.js versions

node --version  # Should be 18+
npm --version   # Should be 8+

Check Netflix accessibility

curl -I "https://www.netflix.com/title/80189685"

Compare User-Agent strings

console.log(navigator.userAgent);  // Browser
console.log(process.userAgent);    // Node.js (may be undefined)

🔍 Debugging Techniques

1. Enable Verbose Logging

// Add debug logging to your code
async function debugScraping(url) {
  console.log('🚀 Starting scrape for:', url);

  try {
    const result = await scraperNetflix(url, {
      headless: false,  // Try without browser first
      timeoutMs: 30000
    });

    console.log('✅ Success:', result);
    return result;
  } catch (error) {
    console.error('❌ Error details:', {
      message: error.message,
      stack: error.stack,
      url: url
    });
    throw error;
  }
}

2. Test with Known Working URLs

// Test with URLs that should definitely work
const testUrls = [
  'https://www.netflix.com/title/80189685',  // The Witcher
  'https://www.netflix.com/title/82123114'   // ONE SHOT
];

for (const url of testUrls) {
  try {
    const result = await scraperNetflix(url);
    console.log(`✅ ${url}: ${result.name}`);
  } catch (error) {
    console.error(`❌ ${url}: ${error.message}`);
  }
}

3. Isolate the Problem

// Test each component separately
import { normalizeNetflixUrl } from 'flixscaper/index';
import { parseNetflixHtml } from 'flixscaper/parser';

async function isolateProblem(url) {
  try {
    // 1. Test URL normalization
    const normalized = normalizeNetflixUrl(url);
    console.log('✅ URL normalized:', normalized);

    // 2. Test HTML fetching
    const html = await fetchStaticHtml(normalized);
    console.log('✅ HTML fetched, length:', html.length);

    // 3. Test parsing
    const parsed = parseNetflixHtml(html);
    console.log('✅ Parsed:', parsed);

  } catch (error) {
    console.error('❌ Step failed:', error.message);
  }
}

4. Browser Mode Debugging

// Test with visible browser for debugging
const result = await scraperNetflix(url, {
  headless: false,     // Show browser window
  timeoutMs: 60000     // Longer timeout for manual inspection
});

// Keep browser open by adding delay if needed
await new Promise(resolve => setTimeout(resolve, 5000));

🌍 Regional & Language Issues

Turkish Netflix Specific Issues

❌ Problem: Turkish URLs not working

Test different URL formats:

const turkishUrls = [
  'https://www.netflix.com/title/80189685',           // Standard
  'https://www.netflix.com/tr/title/80189685',       // Turkish subdomain
  'https://www.netflix.com/tr/title/80189685?s=i',   // With Turkish params
  'https://www.netflix.com/tr/title/80189685?vlang=tr' // Turkish language
];

for (const url of turkishUrls) {
  try {
    const result = await scraperNetflix(url);
    console.log(`✅ ${url}: ${result.name}`);
  } catch (error) {
    console.error(`❌ ${url}: ${error.message}`);
  }
}

❌ Problem: New Turkish UI patterns not recognized

Report the issue with:

Original title: What Netflix returned
Expected title: What it should be after cleaning
URL: The Netflix URL where this occurs
Region: Your geographic location

Example issue report:

**URL**: https://www.netflix.com/tr/title/12345678
**Original**: "Dizi Adı yeni başlık | Netflix"
**Expected**: "Dizi Adı"
**Pattern to add**: "yeni başlık"
**Region**: Turkey

📊 Performance Issues

Slow Response Times

Diagnose the bottleneck:

import { performance } from 'node:perf_hooks';

async function profileScraping(url) {
  const steps = {};

  // URL Normalization
  steps.normStart = performance.now();
  const normalized = normalizeNetflixUrl(url);
  steps.normEnd = performance.now();

  // HTML Fetch
  steps.fetchStart = performance.now();
  const html = await fetchStaticHtml(normalized);
  steps.fetchEnd = performance.now();

  // Parsing
  steps.parseStart = performance.now();
  const parsed = parseNetflixHtml(html);
  steps.parseEnd = performance.now();

  console.log('Performance breakdown:', {
    normalization: steps.normEnd - steps.normStart,
    fetch: steps.fetchEnd - steps.fetchStart,
    parsing: steps.parseEnd - steps.parseStart,
    htmlSize: html.length
  });

  return parsed;
}

Optimization Solutions:

Disable headless mode (if not needed)

await scraperNetflix(url, { headless: false });

Reduce timeout (if network is fast)

await scraperNetflix(url, { timeoutMs: 5000 });

Cache results (for repeated requests)

const cache = new Map();

async function scrapeWithCache(url) {
  if (cache.has(url)) {
    return cache.get(url);
  }

  const result = await scraperNetflix(url);
  cache.set(url, result);
  return result;
}

🔧 Common Fixes

Quick Fix Checklist

Update dependencies
```
npm update flixscaper
npm update
```

Clear npm cache

npm cache clean --force
rm -rf node_modules package-lock.json
npm install

Check Node.js version

node --version  # Should be 18+
# If older, upgrade: nvm install 20 && nvm use 20

Test with minimal example

import { scraperNetflix } from 'metascraper';

scraperNetflix('https://www.netflix.com/title/80189685')
  .then(result => console.log('Success:', result))
  .catch(error => console.error('Error:', error.message));

Try different options

// If failing, try with different configurations
const configs = [
  { headless: false },
  { headless: true, timeoutMs: 30000 },
  { headless: false, userAgent: 'different-ua' }
];

for (const config of configs) {
  try {
    const result = await scraperNetflix(url, config);
    console.log('✅ Working config:', config);
    break;
  } catch (error) {
    console.log('❌ Failed config:', config, error.message);
  }
}

📞 Getting Help

When to Report an Issue

Report an issue when:

Previously working URL suddenly fails
Error messages are unclear or unhelpful
Turkish UI patterns not being removed
Performance degrades significantly
Documentation is unclear or incomplete

Issue Report Template

## Issue Description
Brief description of the problem

## Steps to Reproduce
1. URL used: ...
2. Code executed: ...
3. Expected result: ...
4. Actual result: ...

## Environment
- Node.js version: ...
- OS: ...
- flixscaper version: ...
- Browser (if relevant): ...

## Error Message

Paste full error message here


## Additional Context
Any additional information that might help

Debug Information to Include

// Include this information in issue reports
const debugInfo = {
  nodeVersion: process.version,
  platform: process.platform,
  arch: process.arch,
  flixscaperVersion: require('flixscaper/package.json').version,
  timestamp: new Date().toISOString()
};

console.log('Debug Info:', JSON.stringify(debugInfo, null, 2));

Troubleshooting guide last updated: 2025-11-23

13 KiB Raw Blame History Unescape Escape

MetaScraper Troubleshooting Guide

🚨 Common Issues & Solutions

1. Module Import Errors

❌ Error: Cannot resolve import 'flixscaper'

❌ Error: Failed to load url ../globals-polyfill.mjs

2. Network & Connection Issues

❌ Error: Request timed out while reaching Netflix

❌ Error: Netflix title not found (404)

3. Parsing & Data Issues

❌ Error: Netflix sayfa meta verisi parse edilemedi

❌ Problem: Turkish UI text not being removed

4. Playwright/Browser Issues

❌ Error: Playwright is not installed

❌ Error: Playwright chromium browser is unavailable

❌ Error: Memory issues with Playwright

5. Environment & Compatibility Issues

❌ Error: File is not defined (Node.js 18)

❌ Problem: Works on one machine but not another

🔍 Debugging Techniques

1. Enable Verbose Logging

2. Test with Known Working URLs

3. Isolate the Problem

4. Browser Mode Debugging

🌍 Regional & Language Issues

Turkish Netflix Specific Issues

❌ Problem: Turkish URLs not working

❌ Problem: New Turkish UI patterns not recognized

📊 Performance Issues

Slow Response Times

Diagnose the bottleneck:

Optimization Solutions:

🔧 Common Fixes

Quick Fix Checklist

📞 Getting Help

When to Report an Issue

Issue Report Template

Debug Information to Include

13 KiB

Raw Blame History

❌ Error: `Cannot resolve import 'flixscaper'`

❌ Error: `Failed to load url ../globals-polyfill.mjs`

❌ Error: `Request timed out while reaching Netflix`

❌ Error: `Netflix title not found (404)`

❌ Error: `Netflix sayfa meta verisi parse edilemedi`

❌ Error: `Playwright is not installed`

❌ Error: `Playwright chromium browser is unavailable`

❌ Error: `File is not defined` (Node.js 18)