metascraper/doc/TROUBLESHOOTING.md

# MetaScraper Troubleshooting Guide

## 🚨 Common Issues & Solutions

### 1. Module Import Errors

#### ❌ Error: `Cannot resolve import 'flixscaper'`

**Problem**: Cannot import the library in your project

```javascript
import { scraperNetflix } from 'metascraper';
// Throws: Cannot resolve import 'flixscaper'
```

**Causes & Solutions**:

1. **Not installed properly**
   ```bash
   npm install flixscaper
   # or
   yarn add flixscaper
   ```

2. **Using local development without proper path**
   ```javascript
   // Instead of this:
   import { scraperNetflix } from 'metascraper';

   // Use this for local development:
   import { scraperNetflix } from './src/index.js';
   ```

3. **TypeScript configuration issue**
   ```json
   // tsconfig.json
   {
     "compilerOptions": {
       "moduleResolution": "node",
       "allowSyntheticDefaultImports": true
     }
   }
   ```

#### ❌ Error: `Failed to load url ../globals-polyfill.mjs`

**Problem**: Polyfill file missing after Node.js upgrade

**Solution**: The library has been updated to use a minimal polyfill. Ensure you're using the latest version:

```bash
npm update flixscaper
```

If still occurring, check your Node.js version:

```bash
node --version  # Should be 18+
```

### 2. Network & Connection Issues

#### ❌ Error: `Request timed out while reaching Netflix`

**Problem**: Network requests are timing out

**Solutions**:

1. **Increase timeout**
   ```javascript
   await scraperNetflix(url, {
     timeoutMs: 30000  // 30 seconds instead of 15
   });
   ```

2. **Check internet connection**
   ```bash
   # Test connectivity to Netflix
   curl -I https://www.netflix.com
   ```

3. **Use different User-Agent**
   ```javascript
   await scraperNetflix(url, {
     userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
   });
   ```

#### ❌ Error: `Netflix title not found (404)`

**Problem**: Title ID doesn't exist or is not available

**Solutions**:

1. **Verify URL is correct**
   ```javascript
   // Test with known working URL
   await scraperNetflix('https://www.netflix.com/title/80189685');
   ```

2. **Check title availability in your region**
   ```javascript
   // Some titles are region-locked
   console.log('Title may not be available in your region');
   ```

3. **Use browser to verify**
   - Open the URL in your browser
   - If it shows 404 in browser, it's not a library issue

### 3. Parsing & Data Issues

#### ❌ Error: `Netflix sayfa meta verisi parse edilemedi`

**Problem**: Cannot extract metadata from Netflix page

**Causes & Solutions**:

1. **Netflix changed their HTML structure**
   ```javascript
   // Enable headless mode to get JavaScript-rendered content
   await scraperNetflix(url, { headless: true });
   ```

2. **Title has unusual formatting**
   ```javascript
   // Debug by examining the HTML
   const html = await fetchStaticHtml(url);
   console.log(html.slice(0, 1000)); // First 1000 chars
   ```

3. **Missing JSON-LD data**
   - Netflix may have removed structured data
   - Use headless mode as fallback

#### ❌ Problem: Turkish UI text not being removed

**Problem**: Titles still contain Turkish UI text like "izlemenizi bekliyor"

**Solutions**:

1. **Check if pattern is covered**
   ```javascript
   import { cleanTitle } from 'flixscaper/parser';

   const testTitle = "The Witcher izlemenizi bekliyor";
   const cleaned = cleanTitle(testTitle);
   console.log('Cleaned:', cleaned);
   ```

2. **Add new pattern if needed**
   ```javascript
   // If Netflix added new UI text, file an issue with:
   // 1. The problematic title
   // 2. The expected cleaned title
   // 3. The new UI pattern that needs to be added
   ```

### 4. Playwright/Browser Issues

#### ❌ Error: `Playwright is not installed`

**Problem**: Headless mode not available

**Solutions**:

1. **Install Playwright**
   ```bash
   npm install playwright
   npx playwright install chromium
   ```

2. **Use library without headless mode**
   ```javascript
   await scraperNetflix(url, { headless: false });
   ```

3. **Check if you really need headless mode**
   - Most titles work with static mode
   - Only use headless if static parsing fails

#### ❌ Error: `Playwright chromium browser is unavailable`

**Problem**: Chromium browser not installed

**Solution**:
```bash
npx playwright install chromium
```

#### ❌ Error: Memory issues with Playwright

**Problem**: Browser automation using too much memory

**Solutions**:

1. **Limit concurrent requests**
   ```javascript
   const urls = ['url1', 'url2', 'url3'];

   // Process sequentially instead of parallel
   for (const url of urls) {
     const result = await scraperNetflix(url);
     // Process result
   }
   ```

2. **Close browser resources properly**
   - The library handles this automatically
   - Ensure you're not calling Playwright directly

### 5. Environment & Compatibility Issues

#### ❌ Error: `File is not defined` (Node.js 18)

**Problem**: Node.js 18 missing File API for undici

**Solutions**:

1. **Use latest library version**
   ```bash
   npm update flixscaper
   ```

2. **Upgrade Node.js**
   ```bash
   # Upgrade to Node.js 20+ to avoid polyfill issues
   nvm install 20
   nvm use 20
   ```

3. **Manual polyfill (if needed)**
   ```javascript
   import './src/polyfill.js';  // Include before library import
   import { scraperNetflix } from './src/index.js';
   ```

#### ❌ Problem: Works on one machine but not another

**Diagnosis Steps**:

1. **Check Node.js versions**
   ```bash
   node --version  # Should be 18+
   npm --version   # Should be 8+
   ```

2. **Check Netflix accessibility**
   ```bash
   curl -I "https://www.netflix.com/title/80189685"
   ```

3. **Compare User-Agent strings**
   ```javascript
   console.log(navigator.userAgent);  // Browser
   console.log(process.userAgent);    // Node.js (may be undefined)
   ```

## 🔍 Debugging Techniques

### 1. Enable Verbose Logging

```javascript
// Add debug logging to your code
async function debugScraping(url) {
  console.log('🚀 Starting scrape for:', url);

  try {
    const result = await scraperNetflix(url, {
      headless: false,  // Try without browser first
      timeoutMs: 30000
    });

    console.log('✅ Success:', result);
    return result;
  } catch (error) {
    console.error('❌ Error details:', {
      message: error.message,
      stack: error.stack,
      url: url
    });
    throw error;
  }
}
```

### 2. Test with Known Working URLs

```javascript
// Test with URLs that should definitely work
const testUrls = [
  'https://www.netflix.com/title/80189685',  // The Witcher
  'https://www.netflix.com/title/82123114'   // ONE SHOT
];

for (const url of testUrls) {
  try {
    const result = await scraperNetflix(url);
    console.log(`✅ ${url}: ${result.name}`);
  } catch (error) {
    console.error(`❌ ${url}: ${error.message}`);
  }
}
```

### 3. Isolate the Problem

```javascript
// Test each component separately
import { normalizeNetflixUrl } from 'flixscaper/index';
import { parseNetflixHtml } from 'flixscaper/parser';

async function isolateProblem(url) {
  try {
    // 1. Test URL normalization
    const normalized = normalizeNetflixUrl(url);
    console.log('✅ URL normalized:', normalized);

    // 2. Test HTML fetching
    const html = await fetchStaticHtml(normalized);
    console.log('✅ HTML fetched, length:', html.length);

    // 3. Test parsing
    const parsed = parseNetflixHtml(html);
    console.log('✅ Parsed:', parsed);

  } catch (error) {
    console.error('❌ Step failed:', error.message);
  }
}
```

### 4. Browser Mode Debugging

```javascript
// Test with visible browser for debugging
const result = await scraperNetflix(url, {
  headless: false,     // Show browser window
  timeoutMs: 60000     // Longer timeout for manual inspection
});

// Keep browser open by adding delay if needed
await new Promise(resolve => setTimeout(resolve, 5000));
```

## 🌍 Regional & Language Issues

### Turkish Netflix Specific Issues

#### ❌ Problem: Turkish URLs not working

**Test different URL formats**:
```javascript
const turkishUrls = [
  'https://www.netflix.com/title/80189685',           // Standard
  'https://www.netflix.com/tr/title/80189685',       // Turkish subdomain
  'https://www.netflix.com/tr/title/80189685?s=i',   // With Turkish params
  'https://www.netflix.com/tr/title/80189685?vlang=tr' // Turkish language
];

for (const url of turkishUrls) {
  try {
    const result = await scraperNetflix(url);
    console.log(`✅ ${url}: ${result.name}`);
  } catch (error) {
    console.error(`❌ ${url}: ${error.message}`);
  }
}
```

#### ❌ Problem: New Turkish UI patterns not recognized

**Report the issue with**:
1. **Original title**: What Netflix returned
2. **Expected title**: What it should be after cleaning
3. **URL**: The Netflix URL where this occurs
4. **Region**: Your geographic location

Example issue report:
```markdown
**URL**: https://www.netflix.com/tr/title/12345678
**Original**: "Dizi Adı yeni başlık | Netflix"
**Expected**: "Dizi Adı"
**Pattern to add**: "yeni başlık"
**Region**: Turkey
```

## 📊 Performance Issues

### Slow Response Times

#### Diagnose the bottleneck:

```javascript
import { performance } from 'node:perf_hooks';

async function profileScraping(url) {
  const steps = {};

  // URL Normalization
  steps.normStart = performance.now();
  const normalized = normalizeNetflixUrl(url);
  steps.normEnd = performance.now();

  // HTML Fetch
  steps.fetchStart = performance.now();
  const html = await fetchStaticHtml(normalized);
  steps.fetchEnd = performance.now();

  // Parsing
  steps.parseStart = performance.now();
  const parsed = parseNetflixHtml(html);
  steps.parseEnd = performance.now();

  console.log('Performance breakdown:', {
    normalization: steps.normEnd - steps.normStart,
    fetch: steps.fetchEnd - steps.fetchStart,
    parsing: steps.parseEnd - steps.parseStart,
    htmlSize: html.length
  });

  return parsed;
}
```

#### Optimization Solutions:

1. **Disable headless mode** (if not needed)
   ```javascript
   await scraperNetflix(url, { headless: false });
   ```

2. **Reduce timeout** (if network is fast)
   ```javascript
   await scraperNetflix(url, { timeoutMs: 5000 });
   ```

3. **Cache results** (for repeated requests)
   ```javascript
   const cache = new Map();

   async function scrapeWithCache(url) {
     if (cache.has(url)) {
       return cache.get(url);
     }

     const result = await scraperNetflix(url);
     cache.set(url, result);
     return result;
   }
   ```

## 🔧 Common Fixes

### Quick Fix Checklist

1. **Update dependencies**
   ```bash
   npm update flixscaper
   npm update
   ```

2. **Clear npm cache**
   ```bash
   npm cache clean --force
   rm -rf node_modules package-lock.json
   npm install
   ```

3. **Check Node.js version**
   ```bash
   node --version  # Should be 18+
   # If older, upgrade: nvm install 20 && nvm use 20
   ```

4. **Test with minimal example**
   ```javascript
   import { scraperNetflix } from 'metascraper';

   scraperNetflix('https://www.netflix.com/title/80189685')
     .then(result => console.log('Success:', result))
     .catch(error => console.error('Error:', error.message));
   ```

5. **Try different options**
   ```javascript
   // If failing, try with different configurations
   const configs = [
     { headless: false },
     { headless: true, timeoutMs: 30000 },
     { headless: false, userAgent: 'different-ua' }
   ];

   for (const config of configs) {
     try {
       const result = await scraperNetflix(url, config);
       console.log('✅ Working config:', config);
       break;
     } catch (error) {
       console.log('❌ Failed config:', config, error.message);
     }
   }
   ```

## 📞 Getting Help

### When to Report an Issue

Report an issue when:

1. **Previously working URL suddenly fails**
2. **Error messages are unclear or unhelpful**
3. **Turkish UI patterns not being removed**
4. **Performance degrades significantly**
5. **Documentation is unclear or incomplete**

### Issue Report Template

```markdown
## Issue Description
Brief description of the problem

## Steps to Reproduce
1. URL used: ...
2. Code executed: ...
3. Expected result: ...
4. Actual result: ...

## Environment
- Node.js version: ...
- OS: ...
- flixscaper version: ...
- Browser (if relevant): ...

## Error Message
```
Paste full error message here
```

## Additional Context
Any additional information that might help
```

### Debug Information to Include

```javascript
// Include this information in issue reports
const debugInfo = {
  nodeVersion: process.version,
  platform: process.platform,
  arch: process.arch,
  flixscaperVersion: require('flixscaper/package.json').version,
  timestamp: new Date().toISOString()
};

console.log('Debug Info:', JSON.stringify(debugInfo, null, 2));
```

---

*Troubleshooting guide last updated: 2025-11-23*