561 lines
13 KiB
Markdown
561 lines
13 KiB
Markdown
# MetaScraper Troubleshooting Guide
|
||
|
||
## 🚨 Common Issues & Solutions
|
||
|
||
### 1. Module Import Errors
|
||
|
||
#### ❌ Error: `Cannot resolve import 'flixscaper'`
|
||
|
||
**Problem**: Cannot import the library in your project
|
||
|
||
```javascript
|
||
import { scraperNetflix } from 'metascraper';
|
||
// Throws: Cannot resolve import 'flixscaper'
|
||
```
|
||
|
||
**Causes & Solutions**:
|
||
|
||
1. **Not installed properly**
|
||
```bash
|
||
npm install flixscaper
|
||
# or
|
||
yarn add flixscaper
|
||
```
|
||
|
||
2. **Using local development without proper path**
|
||
```javascript
|
||
// Instead of this:
|
||
import { scraperNetflix } from 'metascraper';
|
||
|
||
// Use this for local development:
|
||
import { scraperNetflix } from './src/index.js';
|
||
```
|
||
|
||
3. **TypeScript configuration issue**
|
||
```json
|
||
// tsconfig.json
|
||
{
|
||
"compilerOptions": {
|
||
"moduleResolution": "node",
|
||
"allowSyntheticDefaultImports": true
|
||
}
|
||
}
|
||
```
|
||
|
||
#### ❌ Error: `Failed to load url ../globals-polyfill.mjs`
|
||
|
||
**Problem**: Polyfill file missing after Node.js upgrade
|
||
|
||
**Solution**: The library has been updated to use a minimal polyfill. Ensure you're using the latest version:
|
||
|
||
```bash
|
||
npm update flixscaper
|
||
```
|
||
|
||
If still occurring, check your Node.js version:
|
||
|
||
```bash
|
||
node --version # Should be 18+
|
||
```
|
||
|
||
### 2. Network & Connection Issues
|
||
|
||
#### ❌ Error: `Request timed out while reaching Netflix`
|
||
|
||
**Problem**: Network requests are timing out
|
||
|
||
**Solutions**:
|
||
|
||
1. **Increase timeout**
|
||
```javascript
|
||
await scraperNetflix(url, {
|
||
timeoutMs: 30000 // 30 seconds instead of 15
|
||
});
|
||
```
|
||
|
||
2. **Check internet connection**
|
||
```bash
|
||
# Test connectivity to Netflix
|
||
curl -I https://www.netflix.com
|
||
```
|
||
|
||
3. **Use different User-Agent**
|
||
```javascript
|
||
await scraperNetflix(url, {
|
||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
|
||
});
|
||
```
|
||
|
||
#### ❌ Error: `Netflix title not found (404)`
|
||
|
||
**Problem**: Title ID doesn't exist or is not available
|
||
|
||
**Solutions**:
|
||
|
||
1. **Verify URL is correct**
|
||
```javascript
|
||
// Test with known working URL
|
||
await scraperNetflix('https://www.netflix.com/title/80189685');
|
||
```
|
||
|
||
2. **Check title availability in your region**
|
||
```javascript
|
||
// Some titles are region-locked
|
||
console.log('Title may not be available in your region');
|
||
```
|
||
|
||
3. **Use browser to verify**
|
||
- Open the URL in your browser
|
||
- If it shows 404 in browser, it's not a library issue
|
||
|
||
### 3. Parsing & Data Issues
|
||
|
||
#### ❌ Error: `Netflix sayfa meta verisi parse edilemedi`
|
||
|
||
**Problem**: Cannot extract metadata from Netflix page
|
||
|
||
**Causes & Solutions**:
|
||
|
||
1. **Netflix changed their HTML structure**
|
||
```javascript
|
||
// Enable headless mode to get JavaScript-rendered content
|
||
await scraperNetflix(url, { headless: true });
|
||
```
|
||
|
||
2. **Title has unusual formatting**
|
||
```javascript
|
||
// Debug by examining the HTML
|
||
const html = await fetchStaticHtml(url);
|
||
console.log(html.slice(0, 1000)); // First 1000 chars
|
||
```
|
||
|
||
3. **Missing JSON-LD data**
|
||
- Netflix may have removed structured data
|
||
- Use headless mode as fallback
|
||
|
||
#### ❌ Problem: Turkish UI text not being removed
|
||
|
||
**Problem**: Titles still contain Turkish UI text like "izlemenizi bekliyor"
|
||
|
||
**Solutions**:
|
||
|
||
1. **Check if pattern is covered**
|
||
```javascript
|
||
import { cleanTitle } from 'flixscaper/parser';
|
||
|
||
const testTitle = "The Witcher izlemenizi bekliyor";
|
||
const cleaned = cleanTitle(testTitle);
|
||
console.log('Cleaned:', cleaned);
|
||
```
|
||
|
||
2. **Add new pattern if needed**
|
||
```javascript
|
||
// If Netflix added new UI text, file an issue with:
|
||
// 1. The problematic title
|
||
// 2. The expected cleaned title
|
||
// 3. The new UI pattern that needs to be added
|
||
```
|
||
|
||
### 4. Playwright/Browser Issues
|
||
|
||
#### ❌ Error: `Playwright is not installed`
|
||
|
||
**Problem**: Headless mode not available
|
||
|
||
**Solutions**:
|
||
|
||
1. **Install Playwright**
|
||
```bash
|
||
npm install playwright
|
||
npx playwright install chromium
|
||
```
|
||
|
||
2. **Use library without headless mode**
|
||
```javascript
|
||
await scraperNetflix(url, { headless: false });
|
||
```
|
||
|
||
3. **Check if you really need headless mode**
|
||
- Most titles work with static mode
|
||
- Only use headless if static parsing fails
|
||
|
||
#### ❌ Error: `Playwright chromium browser is unavailable`
|
||
|
||
**Problem**: Chromium browser not installed
|
||
|
||
**Solution**:
|
||
```bash
|
||
npx playwright install chromium
|
||
```
|
||
|
||
#### ❌ Error: Memory issues with Playwright
|
||
|
||
**Problem**: Browser automation using too much memory
|
||
|
||
**Solutions**:
|
||
|
||
1. **Limit concurrent requests**
|
||
```javascript
|
||
const urls = ['url1', 'url2', 'url3'];
|
||
|
||
// Process sequentially instead of parallel
|
||
for (const url of urls) {
|
||
const result = await scraperNetflix(url);
|
||
// Process result
|
||
}
|
||
```
|
||
|
||
2. **Close browser resources properly**
|
||
- The library handles this automatically
|
||
- Ensure you're not calling Playwright directly
|
||
|
||
### 5. Environment & Compatibility Issues
|
||
|
||
#### ❌ Error: `File is not defined` (Node.js 18)
|
||
|
||
**Problem**: Node.js 18 missing File API for undici
|
||
|
||
**Solutions**:
|
||
|
||
1. **Use latest library version**
|
||
```bash
|
||
npm update flixscaper
|
||
```
|
||
|
||
2. **Upgrade Node.js**
|
||
```bash
|
||
# Upgrade to Node.js 20+ to avoid polyfill issues
|
||
nvm install 20
|
||
nvm use 20
|
||
```
|
||
|
||
3. **Manual polyfill (if needed)**
|
||
```javascript
|
||
import './src/polyfill.js'; // Include before library import
|
||
import { scraperNetflix } from './src/index.js';
|
||
```
|
||
|
||
#### ❌ Problem: Works on one machine but not another
|
||
|
||
**Diagnosis Steps**:
|
||
|
||
1. **Check Node.js versions**
|
||
```bash
|
||
node --version # Should be 18+
|
||
npm --version # Should be 8+
|
||
```
|
||
|
||
2. **Check Netflix accessibility**
|
||
```bash
|
||
curl -I "https://www.netflix.com/title/80189685"
|
||
```
|
||
|
||
3. **Compare User-Agent strings**
|
||
```javascript
|
||
console.log(navigator.userAgent); // Browser
|
||
console.log(process.userAgent); // Node.js (may be undefined)
|
||
```
|
||
|
||
## 🔍 Debugging Techniques
|
||
|
||
### 1. Enable Verbose Logging
|
||
|
||
```javascript
|
||
// Add debug logging to your code
|
||
async function debugScraping(url) {
|
||
console.log('🚀 Starting scrape for:', url);
|
||
|
||
try {
|
||
const result = await scraperNetflix(url, {
|
||
headless: false, // Try without browser first
|
||
timeoutMs: 30000
|
||
});
|
||
|
||
console.log('✅ Success:', result);
|
||
return result;
|
||
} catch (error) {
|
||
console.error('❌ Error details:', {
|
||
message: error.message,
|
||
stack: error.stack,
|
||
url: url
|
||
});
|
||
throw error;
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2. Test with Known Working URLs
|
||
|
||
```javascript
|
||
// Test with URLs that should definitely work
|
||
const testUrls = [
|
||
'https://www.netflix.com/title/80189685', // The Witcher
|
||
'https://www.netflix.com/title/82123114' // ONE SHOT
|
||
];
|
||
|
||
for (const url of testUrls) {
|
||
try {
|
||
const result = await scraperNetflix(url);
|
||
console.log(`✅ ${url}: ${result.name}`);
|
||
} catch (error) {
|
||
console.error(`❌ ${url}: ${error.message}`);
|
||
}
|
||
}
|
||
```
|
||
|
||
### 3. Isolate the Problem
|
||
|
||
```javascript
|
||
// Test each component separately
|
||
import { normalizeNetflixUrl } from 'flixscaper/index';
|
||
import { parseNetflixHtml } from 'flixscaper/parser';
|
||
|
||
async function isolateProblem(url) {
|
||
try {
|
||
// 1. Test URL normalization
|
||
const normalized = normalizeNetflixUrl(url);
|
||
console.log('✅ URL normalized:', normalized);
|
||
|
||
// 2. Test HTML fetching
|
||
const html = await fetchStaticHtml(normalized);
|
||
console.log('✅ HTML fetched, length:', html.length);
|
||
|
||
// 3. Test parsing
|
||
const parsed = parseNetflixHtml(html);
|
||
console.log('✅ Parsed:', parsed);
|
||
|
||
} catch (error) {
|
||
console.error('❌ Step failed:', error.message);
|
||
}
|
||
}
|
||
```
|
||
|
||
### 4. Browser Mode Debugging
|
||
|
||
```javascript
|
||
// Test with visible browser for debugging
|
||
const result = await scraperNetflix(url, {
|
||
headless: false, // Show browser window
|
||
timeoutMs: 60000 // Longer timeout for manual inspection
|
||
});
|
||
|
||
// Keep browser open by adding delay if needed
|
||
await new Promise(resolve => setTimeout(resolve, 5000));
|
||
```
|
||
|
||
## 🌍 Regional & Language Issues
|
||
|
||
### Turkish Netflix Specific Issues
|
||
|
||
#### ❌ Problem: Turkish URLs not working
|
||
|
||
**Test different URL formats**:
|
||
```javascript
|
||
const turkishUrls = [
|
||
'https://www.netflix.com/title/80189685', // Standard
|
||
'https://www.netflix.com/tr/title/80189685', // Turkish subdomain
|
||
'https://www.netflix.com/tr/title/80189685?s=i', // With Turkish params
|
||
'https://www.netflix.com/tr/title/80189685?vlang=tr' // Turkish language
|
||
];
|
||
|
||
for (const url of turkishUrls) {
|
||
try {
|
||
const result = await scraperNetflix(url);
|
||
console.log(`✅ ${url}: ${result.name}`);
|
||
} catch (error) {
|
||
console.error(`❌ ${url}: ${error.message}`);
|
||
}
|
||
}
|
||
```
|
||
|
||
#### ❌ Problem: New Turkish UI patterns not recognized
|
||
|
||
**Report the issue with**:
|
||
1. **Original title**: What Netflix returned
|
||
2. **Expected title**: What it should be after cleaning
|
||
3. **URL**: The Netflix URL where this occurs
|
||
4. **Region**: Your geographic location
|
||
|
||
Example issue report:
|
||
```markdown
|
||
**URL**: https://www.netflix.com/tr/title/12345678
|
||
**Original**: "Dizi Adı yeni başlık | Netflix"
|
||
**Expected**: "Dizi Adı"
|
||
**Pattern to add**: "yeni başlık"
|
||
**Region**: Turkey
|
||
```
|
||
|
||
## 📊 Performance Issues
|
||
|
||
### Slow Response Times
|
||
|
||
#### Diagnose the bottleneck:
|
||
|
||
```javascript
|
||
import { performance } from 'node:perf_hooks';
|
||
|
||
async function profileScraping(url) {
|
||
const steps = {};
|
||
|
||
// URL Normalization
|
||
steps.normStart = performance.now();
|
||
const normalized = normalizeNetflixUrl(url);
|
||
steps.normEnd = performance.now();
|
||
|
||
// HTML Fetch
|
||
steps.fetchStart = performance.now();
|
||
const html = await fetchStaticHtml(normalized);
|
||
steps.fetchEnd = performance.now();
|
||
|
||
// Parsing
|
||
steps.parseStart = performance.now();
|
||
const parsed = parseNetflixHtml(html);
|
||
steps.parseEnd = performance.now();
|
||
|
||
console.log('Performance breakdown:', {
|
||
normalization: steps.normEnd - steps.normStart,
|
||
fetch: steps.fetchEnd - steps.fetchStart,
|
||
parsing: steps.parseEnd - steps.parseStart,
|
||
htmlSize: html.length
|
||
});
|
||
|
||
return parsed;
|
||
}
|
||
```
|
||
|
||
#### Optimization Solutions:
|
||
|
||
1. **Disable headless mode** (if not needed)
|
||
```javascript
|
||
await scraperNetflix(url, { headless: false });
|
||
```
|
||
|
||
2. **Reduce timeout** (if network is fast)
|
||
```javascript
|
||
await scraperNetflix(url, { timeoutMs: 5000 });
|
||
```
|
||
|
||
3. **Cache results** (for repeated requests)
|
||
```javascript
|
||
const cache = new Map();
|
||
|
||
async function scrapeWithCache(url) {
|
||
if (cache.has(url)) {
|
||
return cache.get(url);
|
||
}
|
||
|
||
const result = await scraperNetflix(url);
|
||
cache.set(url, result);
|
||
return result;
|
||
}
|
||
```
|
||
|
||
## 🔧 Common Fixes
|
||
|
||
### Quick Fix Checklist
|
||
|
||
1. **Update dependencies**
|
||
```bash
|
||
npm update flixscaper
|
||
npm update
|
||
```
|
||
|
||
2. **Clear npm cache**
|
||
```bash
|
||
npm cache clean --force
|
||
rm -rf node_modules package-lock.json
|
||
npm install
|
||
```
|
||
|
||
3. **Check Node.js version**
|
||
```bash
|
||
node --version # Should be 18+
|
||
# If older, upgrade: nvm install 20 && nvm use 20
|
||
```
|
||
|
||
4. **Test with minimal example**
|
||
```javascript
|
||
import { scraperNetflix } from 'metascraper';
|
||
|
||
scraperNetflix('https://www.netflix.com/title/80189685')
|
||
.then(result => console.log('Success:', result))
|
||
.catch(error => console.error('Error:', error.message));
|
||
```
|
||
|
||
5. **Try different options**
|
||
```javascript
|
||
// If failing, try with different configurations
|
||
const configs = [
|
||
{ headless: false },
|
||
{ headless: true, timeoutMs: 30000 },
|
||
{ headless: false, userAgent: 'different-ua' }
|
||
];
|
||
|
||
for (const config of configs) {
|
||
try {
|
||
const result = await scraperNetflix(url, config);
|
||
console.log('✅ Working config:', config);
|
||
break;
|
||
} catch (error) {
|
||
console.log('❌ Failed config:', config, error.message);
|
||
}
|
||
}
|
||
```
|
||
|
||
## 📞 Getting Help
|
||
|
||
### When to Report an Issue
|
||
|
||
Report an issue when:
|
||
|
||
1. **Previously working URL suddenly fails**
|
||
2. **Error messages are unclear or unhelpful**
|
||
3. **Turkish UI patterns not being removed**
|
||
4. **Performance degrades significantly**
|
||
5. **Documentation is unclear or incomplete**
|
||
|
||
### Issue Report Template
|
||
|
||
```markdown
|
||
## Issue Description
|
||
Brief description of the problem
|
||
|
||
## Steps to Reproduce
|
||
1. URL used: ...
|
||
2. Code executed: ...
|
||
3. Expected result: ...
|
||
4. Actual result: ...
|
||
|
||
## Environment
|
||
- Node.js version: ...
|
||
- OS: ...
|
||
- flixscaper version: ...
|
||
- Browser (if relevant): ...
|
||
|
||
## Error Message
|
||
```
|
||
Paste full error message here
|
||
```
|
||
|
||
## Additional Context
|
||
Any additional information that might help
|
||
```
|
||
|
||
### Debug Information to Include
|
||
|
||
```javascript
|
||
// Include this information in issue reports
|
||
const debugInfo = {
|
||
nodeVersion: process.version,
|
||
platform: process.platform,
|
||
arch: process.arch,
|
||
flixscaperVersion: require('flixscaper/package.json').version,
|
||
timestamp: new Date().toISOString()
|
||
};
|
||
|
||
console.log('Debug Info:', JSON.stringify(debugInfo, null, 2));
|
||
```
|
||
|
||
---
|
||
|
||
*Troubleshooting guide last updated: 2025-11-23* |