Files
metascraper/doc/TROUBLESHOOTING.md
2025-11-23 14:25:09 +03:00

561 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# MetaScraper Troubleshooting Guide
## 🚨 Common Issues & Solutions
### 1. Module Import Errors
#### ❌ Error: `Cannot resolve import 'flixscaper'`
**Problem**: Cannot import the library in your project
```javascript
import { scraperNetflix } from 'metascraper';
// Throws: Cannot resolve import 'flixscaper'
```
**Causes & Solutions**:
1. **Not installed properly**
```bash
npm install flixscaper
# or
yarn add flixscaper
```
2. **Using local development without proper path**
```javascript
// Instead of this:
import { scraperNetflix } from 'metascraper';
// Use this for local development:
import { scraperNetflix } from './src/index.js';
```
3. **TypeScript configuration issue**
```json
// tsconfig.json
{
"compilerOptions": {
"moduleResolution": "node",
"allowSyntheticDefaultImports": true
}
}
```
#### ❌ Error: `Failed to load url ../globals-polyfill.mjs`
**Problem**: Polyfill file missing after Node.js upgrade
**Solution**: The library has been updated to use a minimal polyfill. Ensure you're using the latest version:
```bash
npm update flixscaper
```
If still occurring, check your Node.js version:
```bash
node --version # Should be 18+
```
### 2. Network & Connection Issues
#### ❌ Error: `Request timed out while reaching Netflix`
**Problem**: Network requests are timing out
**Solutions**:
1. **Increase timeout**
```javascript
await scraperNetflix(url, {
timeoutMs: 30000 // 30 seconds instead of 15
});
```
2. **Check internet connection**
```bash
# Test connectivity to Netflix
curl -I https://www.netflix.com
```
3. **Use different User-Agent**
```javascript
await scraperNetflix(url, {
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
});
```
#### ❌ Error: `Netflix title not found (404)`
**Problem**: Title ID doesn't exist or is not available
**Solutions**:
1. **Verify URL is correct**
```javascript
// Test with known working URL
await scraperNetflix('https://www.netflix.com/title/80189685');
```
2. **Check title availability in your region**
```javascript
// Some titles are region-locked
console.log('Title may not be available in your region');
```
3. **Use browser to verify**
- Open the URL in your browser
- If it shows 404 in browser, it's not a library issue
### 3. Parsing & Data Issues
#### ❌ Error: `Netflix sayfa meta verisi parse edilemedi`
**Problem**: Cannot extract metadata from Netflix page
**Causes & Solutions**:
1. **Netflix changed their HTML structure**
```javascript
// Enable headless mode to get JavaScript-rendered content
await scraperNetflix(url, { headless: true });
```
2. **Title has unusual formatting**
```javascript
// Debug by examining the HTML
const html = await fetchStaticHtml(url);
console.log(html.slice(0, 1000)); // First 1000 chars
```
3. **Missing JSON-LD data**
- Netflix may have removed structured data
- Use headless mode as fallback
#### ❌ Problem: Turkish UI text not being removed
**Problem**: Titles still contain Turkish UI text like "izlemenizi bekliyor"
**Solutions**:
1. **Check if pattern is covered**
```javascript
import { cleanTitle } from 'flixscaper/parser';
const testTitle = "The Witcher izlemenizi bekliyor";
const cleaned = cleanTitle(testTitle);
console.log('Cleaned:', cleaned);
```
2. **Add new pattern if needed**
```javascript
// If Netflix added new UI text, file an issue with:
// 1. The problematic title
// 2. The expected cleaned title
// 3. The new UI pattern that needs to be added
```
### 4. Playwright/Browser Issues
#### ❌ Error: `Playwright is not installed`
**Problem**: Headless mode not available
**Solutions**:
1. **Install Playwright**
```bash
npm install playwright
npx playwright install chromium
```
2. **Use library without headless mode**
```javascript
await scraperNetflix(url, { headless: false });
```
3. **Check if you really need headless mode**
- Most titles work with static mode
- Only use headless if static parsing fails
#### ❌ Error: `Playwright chromium browser is unavailable`
**Problem**: Chromium browser not installed
**Solution**:
```bash
npx playwright install chromium
```
#### ❌ Error: Memory issues with Playwright
**Problem**: Browser automation using too much memory
**Solutions**:
1. **Limit concurrent requests**
```javascript
const urls = ['url1', 'url2', 'url3'];
// Process sequentially instead of parallel
for (const url of urls) {
const result = await scraperNetflix(url);
// Process result
}
```
2. **Close browser resources properly**
- The library handles this automatically
- Ensure you're not calling Playwright directly
### 5. Environment & Compatibility Issues
#### ❌ Error: `File is not defined` (Node.js 18)
**Problem**: Node.js 18 missing File API for undici
**Solutions**:
1. **Use latest library version**
```bash
npm update flixscaper
```
2. **Upgrade Node.js**
```bash
# Upgrade to Node.js 20+ to avoid polyfill issues
nvm install 20
nvm use 20
```
3. **Manual polyfill (if needed)**
```javascript
import './src/polyfill.js'; // Include before library import
import { scraperNetflix } from './src/index.js';
```
#### ❌ Problem: Works on one machine but not another
**Diagnosis Steps**:
1. **Check Node.js versions**
```bash
node --version # Should be 18+
npm --version # Should be 8+
```
2. **Check Netflix accessibility**
```bash
curl -I "https://www.netflix.com/title/80189685"
```
3. **Compare User-Agent strings**
```javascript
console.log(navigator.userAgent); // Browser
console.log(process.userAgent); // Node.js (may be undefined)
```
## 🔍 Debugging Techniques
### 1. Enable Verbose Logging
```javascript
// Add debug logging to your code
async function debugScraping(url) {
console.log('🚀 Starting scrape for:', url);
try {
const result = await scraperNetflix(url, {
headless: false, // Try without browser first
timeoutMs: 30000
});
console.log('✅ Success:', result);
return result;
} catch (error) {
console.error('❌ Error details:', {
message: error.message,
stack: error.stack,
url: url
});
throw error;
}
}
```
### 2. Test with Known Working URLs
```javascript
// Test with URLs that should definitely work
const testUrls = [
'https://www.netflix.com/title/80189685', // The Witcher
'https://www.netflix.com/title/82123114' // ONE SHOT
];
for (const url of testUrls) {
try {
const result = await scraperNetflix(url);
console.log(`✅ ${url}: ${result.name}`);
} catch (error) {
console.error(`❌ ${url}: ${error.message}`);
}
}
```
### 3. Isolate the Problem
```javascript
// Test each component separately
import { normalizeNetflixUrl } from 'flixscaper/index';
import { parseNetflixHtml } from 'flixscaper/parser';
async function isolateProblem(url) {
try {
// 1. Test URL normalization
const normalized = normalizeNetflixUrl(url);
console.log('✅ URL normalized:', normalized);
// 2. Test HTML fetching
const html = await fetchStaticHtml(normalized);
console.log('✅ HTML fetched, length:', html.length);
// 3. Test parsing
const parsed = parseNetflixHtml(html);
console.log('✅ Parsed:', parsed);
} catch (error) {
console.error('❌ Step failed:', error.message);
}
}
```
### 4. Browser Mode Debugging
```javascript
// Test with visible browser for debugging
const result = await scraperNetflix(url, {
headless: false, // Show browser window
timeoutMs: 60000 // Longer timeout for manual inspection
});
// Keep browser open by adding delay if needed
await new Promise(resolve => setTimeout(resolve, 5000));
```
## 🌍 Regional & Language Issues
### Turkish Netflix Specific Issues
#### ❌ Problem: Turkish URLs not working
**Test different URL formats**:
```javascript
const turkishUrls = [
'https://www.netflix.com/title/80189685', // Standard
'https://www.netflix.com/tr/title/80189685', // Turkish subdomain
'https://www.netflix.com/tr/title/80189685?s=i', // With Turkish params
'https://www.netflix.com/tr/title/80189685?vlang=tr' // Turkish language
];
for (const url of turkishUrls) {
try {
const result = await scraperNetflix(url);
console.log(`✅ ${url}: ${result.name}`);
} catch (error) {
console.error(`❌ ${url}: ${error.message}`);
}
}
```
#### ❌ Problem: New Turkish UI patterns not recognized
**Report the issue with**:
1. **Original title**: What Netflix returned
2. **Expected title**: What it should be after cleaning
3. **URL**: The Netflix URL where this occurs
4. **Region**: Your geographic location
Example issue report:
```markdown
**URL**: https://www.netflix.com/tr/title/12345678
**Original**: "Dizi Adı yeni başlık | Netflix"
**Expected**: "Dizi Adı"
**Pattern to add**: "yeni başlık"
**Region**: Turkey
```
## 📊 Performance Issues
### Slow Response Times
#### Diagnose the bottleneck:
```javascript
import { performance } from 'node:perf_hooks';
async function profileScraping(url) {
const steps = {};
// URL Normalization
steps.normStart = performance.now();
const normalized = normalizeNetflixUrl(url);
steps.normEnd = performance.now();
// HTML Fetch
steps.fetchStart = performance.now();
const html = await fetchStaticHtml(normalized);
steps.fetchEnd = performance.now();
// Parsing
steps.parseStart = performance.now();
const parsed = parseNetflixHtml(html);
steps.parseEnd = performance.now();
console.log('Performance breakdown:', {
normalization: steps.normEnd - steps.normStart,
fetch: steps.fetchEnd - steps.fetchStart,
parsing: steps.parseEnd - steps.parseStart,
htmlSize: html.length
});
return parsed;
}
```
#### Optimization Solutions:
1. **Disable headless mode** (if not needed)
```javascript
await scraperNetflix(url, { headless: false });
```
2. **Reduce timeout** (if network is fast)
```javascript
await scraperNetflix(url, { timeoutMs: 5000 });
```
3. **Cache results** (for repeated requests)
```javascript
const cache = new Map();
async function scrapeWithCache(url) {
if (cache.has(url)) {
return cache.get(url);
}
const result = await scraperNetflix(url);
cache.set(url, result);
return result;
}
```
## 🔧 Common Fixes
### Quick Fix Checklist
1. **Update dependencies**
```bash
npm update flixscaper
npm update
```
2. **Clear npm cache**
```bash
npm cache clean --force
rm -rf node_modules package-lock.json
npm install
```
3. **Check Node.js version**
```bash
node --version # Should be 18+
# If older, upgrade: nvm install 20 && nvm use 20
```
4. **Test with minimal example**
```javascript
import { scraperNetflix } from 'metascraper';
scraperNetflix('https://www.netflix.com/title/80189685')
.then(result => console.log('Success:', result))
.catch(error => console.error('Error:', error.message));
```
5. **Try different options**
```javascript
// If failing, try with different configurations
const configs = [
{ headless: false },
{ headless: true, timeoutMs: 30000 },
{ headless: false, userAgent: 'different-ua' }
];
for (const config of configs) {
try {
const result = await scraperNetflix(url, config);
console.log('✅ Working config:', config);
break;
} catch (error) {
console.log('❌ Failed config:', config, error.message);
}
}
```
## 📞 Getting Help
### When to Report an Issue
Report an issue when:
1. **Previously working URL suddenly fails**
2. **Error messages are unclear or unhelpful**
3. **Turkish UI patterns not being removed**
4. **Performance degrades significantly**
5. **Documentation is unclear or incomplete**
### Issue Report Template
```markdown
## Issue Description
Brief description of the problem
## Steps to Reproduce
1. URL used: ...
2. Code executed: ...
3. Expected result: ...
4. Actual result: ...
## Environment
- Node.js version: ...
- OS: ...
- flixscaper version: ...
- Browser (if relevant): ...
## Error Message
```
Paste full error message here
```
## Additional Context
Any additional information that might help
```
### Debug Information to Include
```javascript
// Include this information in issue reports
const debugInfo = {
nodeVersion: process.version,
platform: process.platform,
arch: process.arch,
flixscaperVersion: require('flixscaper/package.json').version,
timestamp: new Date().toISOString()
};
console.log('Debug Info:', JSON.stringify(debugInfo, null, 2));
```
---
*Troubleshooting guide last updated: 2025-11-23*