first commit
This commit is contained in:
561
doc/TROUBLESHOOTING.md
Normal file
561
doc/TROUBLESHOOTING.md
Normal file
@@ -0,0 +1,561 @@
|
||||
# MetaScraper Troubleshooting Guide
|
||||
|
||||
## 🚨 Common Issues & Solutions
|
||||
|
||||
### 1. Module Import Errors
|
||||
|
||||
#### ❌ Error: `Cannot resolve import 'flixscaper'`
|
||||
|
||||
**Problem**: Cannot import the library in your project
|
||||
|
||||
```javascript
|
||||
import { scraperNetflix } from 'metascraper';
|
||||
// Throws: Cannot resolve import 'flixscaper'
|
||||
```
|
||||
|
||||
**Causes & Solutions**:
|
||||
|
||||
1. **Not installed properly**
|
||||
```bash
|
||||
npm install flixscaper
|
||||
# or
|
||||
yarn add flixscaper
|
||||
```
|
||||
|
||||
2. **Using local development without proper path**
|
||||
```javascript
|
||||
// Instead of this:
|
||||
import { scraperNetflix } from 'metascraper';
|
||||
|
||||
// Use this for local development:
|
||||
import { scraperNetflix } from './src/index.js';
|
||||
```
|
||||
|
||||
3. **TypeScript configuration issue**
|
||||
```json
|
||||
// tsconfig.json
|
||||
{
|
||||
"compilerOptions": {
|
||||
"moduleResolution": "node",
|
||||
"allowSyntheticDefaultImports": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### ❌ Error: `Failed to load url ../globals-polyfill.mjs`
|
||||
|
||||
**Problem**: Polyfill file missing after Node.js upgrade
|
||||
|
||||
**Solution**: The library has been updated to use a minimal polyfill. Ensure you're using the latest version:
|
||||
|
||||
```bash
|
||||
npm update flixscaper
|
||||
```
|
||||
|
||||
If still occurring, check your Node.js version:
|
||||
|
||||
```bash
|
||||
node --version # Should be 18+
|
||||
```
|
||||
|
||||
### 2. Network & Connection Issues
|
||||
|
||||
#### ❌ Error: `Request timed out while reaching Netflix`
|
||||
|
||||
**Problem**: Network requests are timing out
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Increase timeout**
|
||||
```javascript
|
||||
await scraperNetflix(url, {
|
||||
timeoutMs: 30000 // 30 seconds instead of 15
|
||||
});
|
||||
```
|
||||
|
||||
2. **Check internet connection**
|
||||
```bash
|
||||
# Test connectivity to Netflix
|
||||
curl -I https://www.netflix.com
|
||||
```
|
||||
|
||||
3. **Use different User-Agent**
|
||||
```javascript
|
||||
await scraperNetflix(url, {
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
|
||||
});
|
||||
```
|
||||
|
||||
#### ❌ Error: `Netflix title not found (404)`
|
||||
|
||||
**Problem**: Title ID doesn't exist or is not available
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Verify URL is correct**
|
||||
```javascript
|
||||
// Test with known working URL
|
||||
await scraperNetflix('https://www.netflix.com/title/80189685');
|
||||
```
|
||||
|
||||
2. **Check title availability in your region**
|
||||
```javascript
|
||||
// Some titles are region-locked
|
||||
console.log('Title may not be available in your region');
|
||||
```
|
||||
|
||||
3. **Use browser to verify**
|
||||
- Open the URL in your browser
|
||||
- If it shows 404 in browser, it's not a library issue
|
||||
|
||||
### 3. Parsing & Data Issues
|
||||
|
||||
#### ❌ Error: `Netflix sayfa meta verisi parse edilemedi`
|
||||
|
||||
**Problem**: Cannot extract metadata from Netflix page
|
||||
|
||||
**Causes & Solutions**:
|
||||
|
||||
1. **Netflix changed their HTML structure**
|
||||
```javascript
|
||||
// Enable headless mode to get JavaScript-rendered content
|
||||
await scraperNetflix(url, { headless: true });
|
||||
```
|
||||
|
||||
2. **Title has unusual formatting**
|
||||
```javascript
|
||||
// Debug by examining the HTML
|
||||
const html = await fetchStaticHtml(url);
|
||||
console.log(html.slice(0, 1000)); // First 1000 chars
|
||||
```
|
||||
|
||||
3. **Missing JSON-LD data**
|
||||
- Netflix may have removed structured data
|
||||
- Use headless mode as fallback
|
||||
|
||||
#### ❌ Problem: Turkish UI text not being removed
|
||||
|
||||
**Problem**: Titles still contain Turkish UI text like "izlemenizi bekliyor"
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Check if pattern is covered**
|
||||
```javascript
|
||||
import { cleanTitle } from 'flixscaper/parser';
|
||||
|
||||
const testTitle = "The Witcher izlemenizi bekliyor";
|
||||
const cleaned = cleanTitle(testTitle);
|
||||
console.log('Cleaned:', cleaned);
|
||||
```
|
||||
|
||||
2. **Add new pattern if needed**
|
||||
```javascript
|
||||
// If Netflix added new UI text, file an issue with:
|
||||
// 1. The problematic title
|
||||
// 2. The expected cleaned title
|
||||
// 3. The new UI pattern that needs to be added
|
||||
```
|
||||
|
||||
### 4. Playwright/Browser Issues
|
||||
|
||||
#### ❌ Error: `Playwright is not installed`
|
||||
|
||||
**Problem**: Headless mode not available
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Install Playwright**
|
||||
```bash
|
||||
npm install playwright
|
||||
npx playwright install chromium
|
||||
```
|
||||
|
||||
2. **Use library without headless mode**
|
||||
```javascript
|
||||
await scraperNetflix(url, { headless: false });
|
||||
```
|
||||
|
||||
3. **Check if you really need headless mode**
|
||||
- Most titles work with static mode
|
||||
- Only use headless if static parsing fails
|
||||
|
||||
#### ❌ Error: `Playwright chromium browser is unavailable`
|
||||
|
||||
**Problem**: Chromium browser not installed
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
npx playwright install chromium
|
||||
```
|
||||
|
||||
#### ❌ Error: Memory issues with Playwright
|
||||
|
||||
**Problem**: Browser automation using too much memory
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Limit concurrent requests**
|
||||
```javascript
|
||||
const urls = ['url1', 'url2', 'url3'];
|
||||
|
||||
// Process sequentially instead of parallel
|
||||
for (const url of urls) {
|
||||
const result = await scraperNetflix(url);
|
||||
// Process result
|
||||
}
|
||||
```
|
||||
|
||||
2. **Close browser resources properly**
|
||||
- The library handles this automatically
|
||||
- Ensure you're not calling Playwright directly
|
||||
|
||||
### 5. Environment & Compatibility Issues
|
||||
|
||||
#### ❌ Error: `File is not defined` (Node.js 18)
|
||||
|
||||
**Problem**: Node.js 18 missing File API for undici
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Use latest library version**
|
||||
```bash
|
||||
npm update flixscaper
|
||||
```
|
||||
|
||||
2. **Upgrade Node.js**
|
||||
```bash
|
||||
# Upgrade to Node.js 20+ to avoid polyfill issues
|
||||
nvm install 20
|
||||
nvm use 20
|
||||
```
|
||||
|
||||
3. **Manual polyfill (if needed)**
|
||||
```javascript
|
||||
import './src/polyfill.js'; // Include before library import
|
||||
import { scraperNetflix } from './src/index.js';
|
||||
```
|
||||
|
||||
#### ❌ Problem: Works on one machine but not another
|
||||
|
||||
**Diagnosis Steps**:
|
||||
|
||||
1. **Check Node.js versions**
|
||||
```bash
|
||||
node --version # Should be 18+
|
||||
npm --version # Should be 8+
|
||||
```
|
||||
|
||||
2. **Check Netflix accessibility**
|
||||
```bash
|
||||
curl -I "https://www.netflix.com/title/80189685"
|
||||
```
|
||||
|
||||
3. **Compare User-Agent strings**
|
||||
```javascript
|
||||
console.log(navigator.userAgent); // Browser
|
||||
console.log(process.userAgent); // Node.js (may be undefined)
|
||||
```
|
||||
|
||||
## 🔍 Debugging Techniques
|
||||
|
||||
### 1. Enable Verbose Logging
|
||||
|
||||
```javascript
|
||||
// Add debug logging to your code
|
||||
async function debugScraping(url) {
|
||||
console.log('🚀 Starting scrape for:', url);
|
||||
|
||||
try {
|
||||
const result = await scraperNetflix(url, {
|
||||
headless: false, // Try without browser first
|
||||
timeoutMs: 30000
|
||||
});
|
||||
|
||||
console.log('✅ Success:', result);
|
||||
return result;
|
||||
} catch (error) {
|
||||
console.error('❌ Error details:', {
|
||||
message: error.message,
|
||||
stack: error.stack,
|
||||
url: url
|
||||
});
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Test with Known Working URLs
|
||||
|
||||
```javascript
|
||||
// Test with URLs that should definitely work
|
||||
const testUrls = [
|
||||
'https://www.netflix.com/title/80189685', // The Witcher
|
||||
'https://www.netflix.com/title/82123114' // ONE SHOT
|
||||
];
|
||||
|
||||
for (const url of testUrls) {
|
||||
try {
|
||||
const result = await scraperNetflix(url);
|
||||
console.log(`✅ ${url}: ${result.name}`);
|
||||
} catch (error) {
|
||||
console.error(`❌ ${url}: ${error.message}`);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Isolate the Problem
|
||||
|
||||
```javascript
|
||||
// Test each component separately
|
||||
import { normalizeNetflixUrl } from 'flixscaper/index';
|
||||
import { parseNetflixHtml } from 'flixscaper/parser';
|
||||
|
||||
async function isolateProblem(url) {
|
||||
try {
|
||||
// 1. Test URL normalization
|
||||
const normalized = normalizeNetflixUrl(url);
|
||||
console.log('✅ URL normalized:', normalized);
|
||||
|
||||
// 2. Test HTML fetching
|
||||
const html = await fetchStaticHtml(normalized);
|
||||
console.log('✅ HTML fetched, length:', html.length);
|
||||
|
||||
// 3. Test parsing
|
||||
const parsed = parseNetflixHtml(html);
|
||||
console.log('✅ Parsed:', parsed);
|
||||
|
||||
} catch (error) {
|
||||
console.error('❌ Step failed:', error.message);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Browser Mode Debugging
|
||||
|
||||
```javascript
|
||||
// Test with visible browser for debugging
|
||||
const result = await scraperNetflix(url, {
|
||||
headless: false, // Show browser window
|
||||
timeoutMs: 60000 // Longer timeout for manual inspection
|
||||
});
|
||||
|
||||
// Keep browser open by adding delay if needed
|
||||
await new Promise(resolve => setTimeout(resolve, 5000));
|
||||
```
|
||||
|
||||
## 🌍 Regional & Language Issues
|
||||
|
||||
### Turkish Netflix Specific Issues
|
||||
|
||||
#### ❌ Problem: Turkish URLs not working
|
||||
|
||||
**Test different URL formats**:
|
||||
```javascript
|
||||
const turkishUrls = [
|
||||
'https://www.netflix.com/title/80189685', // Standard
|
||||
'https://www.netflix.com/tr/title/80189685', // Turkish subdomain
|
||||
'https://www.netflix.com/tr/title/80189685?s=i', // With Turkish params
|
||||
'https://www.netflix.com/tr/title/80189685?vlang=tr' // Turkish language
|
||||
];
|
||||
|
||||
for (const url of turkishUrls) {
|
||||
try {
|
||||
const result = await scraperNetflix(url);
|
||||
console.log(`✅ ${url}: ${result.name}`);
|
||||
} catch (error) {
|
||||
console.error(`❌ ${url}: ${error.message}`);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### ❌ Problem: New Turkish UI patterns not recognized
|
||||
|
||||
**Report the issue with**:
|
||||
1. **Original title**: What Netflix returned
|
||||
2. **Expected title**: What it should be after cleaning
|
||||
3. **URL**: The Netflix URL where this occurs
|
||||
4. **Region**: Your geographic location
|
||||
|
||||
Example issue report:
|
||||
```markdown
|
||||
**URL**: https://www.netflix.com/tr/title/12345678
|
||||
**Original**: "Dizi Adı yeni başlık | Netflix"
|
||||
**Expected**: "Dizi Adı"
|
||||
**Pattern to add**: "yeni başlık"
|
||||
**Region**: Turkey
|
||||
```
|
||||
|
||||
## 📊 Performance Issues
|
||||
|
||||
### Slow Response Times
|
||||
|
||||
#### Diagnose the bottleneck:
|
||||
|
||||
```javascript
|
||||
import { performance } from 'node:perf_hooks';
|
||||
|
||||
async function profileScraping(url) {
|
||||
const steps = {};
|
||||
|
||||
// URL Normalization
|
||||
steps.normStart = performance.now();
|
||||
const normalized = normalizeNetflixUrl(url);
|
||||
steps.normEnd = performance.now();
|
||||
|
||||
// HTML Fetch
|
||||
steps.fetchStart = performance.now();
|
||||
const html = await fetchStaticHtml(normalized);
|
||||
steps.fetchEnd = performance.now();
|
||||
|
||||
// Parsing
|
||||
steps.parseStart = performance.now();
|
||||
const parsed = parseNetflixHtml(html);
|
||||
steps.parseEnd = performance.now();
|
||||
|
||||
console.log('Performance breakdown:', {
|
||||
normalization: steps.normEnd - steps.normStart,
|
||||
fetch: steps.fetchEnd - steps.fetchStart,
|
||||
parsing: steps.parseEnd - steps.parseStart,
|
||||
htmlSize: html.length
|
||||
});
|
||||
|
||||
return parsed;
|
||||
}
|
||||
```
|
||||
|
||||
#### Optimization Solutions:
|
||||
|
||||
1. **Disable headless mode** (if not needed)
|
||||
```javascript
|
||||
await scraperNetflix(url, { headless: false });
|
||||
```
|
||||
|
||||
2. **Reduce timeout** (if network is fast)
|
||||
```javascript
|
||||
await scraperNetflix(url, { timeoutMs: 5000 });
|
||||
```
|
||||
|
||||
3. **Cache results** (for repeated requests)
|
||||
```javascript
|
||||
const cache = new Map();
|
||||
|
||||
async function scrapeWithCache(url) {
|
||||
if (cache.has(url)) {
|
||||
return cache.get(url);
|
||||
}
|
||||
|
||||
const result = await scraperNetflix(url);
|
||||
cache.set(url, result);
|
||||
return result;
|
||||
}
|
||||
```
|
||||
|
||||
## 🔧 Common Fixes
|
||||
|
||||
### Quick Fix Checklist
|
||||
|
||||
1. **Update dependencies**
|
||||
```bash
|
||||
npm update flixscaper
|
||||
npm update
|
||||
```
|
||||
|
||||
2. **Clear npm cache**
|
||||
```bash
|
||||
npm cache clean --force
|
||||
rm -rf node_modules package-lock.json
|
||||
npm install
|
||||
```
|
||||
|
||||
3. **Check Node.js version**
|
||||
```bash
|
||||
node --version # Should be 18+
|
||||
# If older, upgrade: nvm install 20 && nvm use 20
|
||||
```
|
||||
|
||||
4. **Test with minimal example**
|
||||
```javascript
|
||||
import { scraperNetflix } from 'metascraper';
|
||||
|
||||
scraperNetflix('https://www.netflix.com/title/80189685')
|
||||
.then(result => console.log('Success:', result))
|
||||
.catch(error => console.error('Error:', error.message));
|
||||
```
|
||||
|
||||
5. **Try different options**
|
||||
```javascript
|
||||
// If failing, try with different configurations
|
||||
const configs = [
|
||||
{ headless: false },
|
||||
{ headless: true, timeoutMs: 30000 },
|
||||
{ headless: false, userAgent: 'different-ua' }
|
||||
];
|
||||
|
||||
for (const config of configs) {
|
||||
try {
|
||||
const result = await scraperNetflix(url, config);
|
||||
console.log('✅ Working config:', config);
|
||||
break;
|
||||
} catch (error) {
|
||||
console.log('❌ Failed config:', config, error.message);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 📞 Getting Help
|
||||
|
||||
### When to Report an Issue
|
||||
|
||||
Report an issue when:
|
||||
|
||||
1. **Previously working URL suddenly fails**
|
||||
2. **Error messages are unclear or unhelpful**
|
||||
3. **Turkish UI patterns not being removed**
|
||||
4. **Performance degrades significantly**
|
||||
5. **Documentation is unclear or incomplete**
|
||||
|
||||
### Issue Report Template
|
||||
|
||||
```markdown
|
||||
## Issue Description
|
||||
Brief description of the problem
|
||||
|
||||
## Steps to Reproduce
|
||||
1. URL used: ...
|
||||
2. Code executed: ...
|
||||
3. Expected result: ...
|
||||
4. Actual result: ...
|
||||
|
||||
## Environment
|
||||
- Node.js version: ...
|
||||
- OS: ...
|
||||
- flixscaper version: ...
|
||||
- Browser (if relevant): ...
|
||||
|
||||
## Error Message
|
||||
```
|
||||
Paste full error message here
|
||||
```
|
||||
|
||||
## Additional Context
|
||||
Any additional information that might help
|
||||
```
|
||||
|
||||
### Debug Information to Include
|
||||
|
||||
```javascript
|
||||
// Include this information in issue reports
|
||||
const debugInfo = {
|
||||
nodeVersion: process.version,
|
||||
platform: process.platform,
|
||||
arch: process.arch,
|
||||
flixscaperVersion: require('flixscaper/package.json').version,
|
||||
timestamp: new Date().toISOString()
|
||||
};
|
||||
|
||||
console.log('Debug Info:', JSON.stringify(debugInfo, null, 2));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Troubleshooting guide last updated: 2025-11-23*
|
||||
Reference in New Issue
Block a user