first commit

2025-11-23 14:25:09 +03:00
commit 46d75b64d5
18 changed files with 4749 additions and 0 deletions
--- a/doc/TROUBLESHOOTING.md
+++ b/doc/TROUBLESHOOTING.md
@@ -0,0 +1,561 @@
+# MetaScraper Troubleshooting Guide
+
+## 🚨 Common Issues & Solutions
+
+### 1. Module Import Errors
+
+#### ❌ Error: `Cannot resolve import 'flixscaper'`
+
+**Problem**: Cannot import the library in your project
+
+```javascript
+import { scraperNetflix } from 'metascraper';
+// Throws: Cannot resolve import 'flixscaper'
+```
+
+**Causes & Solutions**:
+
+1. **Not installed properly**
+   ```bash
+   npm install flixscaper
+   # or
+   yarn add flixscaper
+   ```
+
+2. **Using local development without proper path**
+   ```javascript
+   // Instead of this:
+   import { scraperNetflix } from 'metascraper';
+
+   // Use this for local development:
+   import { scraperNetflix } from './src/index.js';
+   ```
+
+3. **TypeScript configuration issue**
+   ```json
+   // tsconfig.json
+   {
+     "compilerOptions": {
+       "moduleResolution": "node",
+       "allowSyntheticDefaultImports": true
+     }
+   }
+   ```
+
+#### ❌ Error: `Failed to load url ../globals-polyfill.mjs`
+
+**Problem**: Polyfill file missing after Node.js upgrade
+
+**Solution**: The library has been updated to use a minimal polyfill. Ensure you're using the latest version:
+
+```bash
+npm update flixscaper
+```
+
+If still occurring, check your Node.js version:
+
+```bash
+node --version  # Should be 18+
+```
+
+### 2. Network & Connection Issues
+
+#### ❌ Error: `Request timed out while reaching Netflix`
+
+**Problem**: Network requests are timing out
+
+**Solutions**:
+
+1. **Increase timeout**
+   ```javascript
+   await scraperNetflix(url, {
+     timeoutMs: 30000  // 30 seconds instead of 15
+   });
+   ```
+
+2. **Check internet connection**
+   ```bash
+   # Test connectivity to Netflix
+   curl -I https://www.netflix.com
+   ```
+
+3. **Use different User-Agent**
+   ```javascript
+   await scraperNetflix(url, {
+     userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
+   });
+   ```
+
+#### ❌ Error: `Netflix title not found (404)`
+
+**Problem**: Title ID doesn't exist or is not available
+
+**Solutions**:
+
+1. **Verify URL is correct**
+   ```javascript
+   // Test with known working URL
+   await scraperNetflix('https://www.netflix.com/title/80189685');
+   ```
+
+2. **Check title availability in your region**
+   ```javascript
+   // Some titles are region-locked
+   console.log('Title may not be available in your region');
+   ```
+
+3. **Use browser to verify**
+   - Open the URL in your browser
+   - If it shows 404 in browser, it's not a library issue
+
+### 3. Parsing & Data Issues
+
+#### ❌ Error: `Netflix sayfa meta verisi parse edilemedi`
+
+**Problem**: Cannot extract metadata from Netflix page
+
+**Causes & Solutions**:
+
+1. **Netflix changed their HTML structure**
+   ```javascript
+   // Enable headless mode to get JavaScript-rendered content
+   await scraperNetflix(url, { headless: true });
+   ```
+
+2. **Title has unusual formatting**
+   ```javascript
+   // Debug by examining the HTML
+   const html = await fetchStaticHtml(url);
+   console.log(html.slice(0, 1000)); // First 1000 chars
+   ```
+
+3. **Missing JSON-LD data**
+   - Netflix may have removed structured data
+   - Use headless mode as fallback
+
+#### ❌ Problem: Turkish UI text not being removed
+
+**Problem**: Titles still contain Turkish UI text like "izlemenizi bekliyor"
+
+**Solutions**:
+
+1. **Check if pattern is covered**
+   ```javascript
+   import { cleanTitle } from 'flixscaper/parser';
+
+   const testTitle = "The Witcher izlemenizi bekliyor";
+   const cleaned = cleanTitle(testTitle);
+   console.log('Cleaned:', cleaned);
+   ```
+
+2. **Add new pattern if needed**
+   ```javascript
+   // If Netflix added new UI text, file an issue with:
+   // 1. The problematic title
+   // 2. The expected cleaned title
+   // 3. The new UI pattern that needs to be added
+   ```
+
+### 4. Playwright/Browser Issues
+
+#### ❌ Error: `Playwright is not installed`
+
+**Problem**: Headless mode not available
+
+**Solutions**:
+
+1. **Install Playwright**
+   ```bash
+   npm install playwright
+   npx playwright install chromium
+   ```
+
+2. **Use library without headless mode**
+   ```javascript
+   await scraperNetflix(url, { headless: false });
+   ```
+
+3. **Check if you really need headless mode**
+   - Most titles work with static mode
+   - Only use headless if static parsing fails
+
+#### ❌ Error: `Playwright chromium browser is unavailable`
+
+**Problem**: Chromium browser not installed
+
+**Solution**:
+```bash
+npx playwright install chromium
+```
+
+#### ❌ Error: Memory issues with Playwright
+
+**Problem**: Browser automation using too much memory
+
+**Solutions**:
+
+1. **Limit concurrent requests**
+   ```javascript
+   const urls = ['url1', 'url2', 'url3'];
+
+   // Process sequentially instead of parallel
+   for (const url of urls) {
+     const result = await scraperNetflix(url);
+     // Process result
+   }
+   ```
+
+2. **Close browser resources properly**
+   - The library handles this automatically
+   - Ensure you're not calling Playwright directly
+
+### 5. Environment & Compatibility Issues
+
+#### ❌ Error: `File is not defined` (Node.js 18)
+
+**Problem**: Node.js 18 missing File API for undici
+
+**Solutions**:
+
+1. **Use latest library version**
+   ```bash
+   npm update flixscaper
+   ```
+
+2. **Upgrade Node.js**
+   ```bash
+   # Upgrade to Node.js 20+ to avoid polyfill issues
+   nvm install 20
+   nvm use 20
+   ```
+
+3. **Manual polyfill (if needed)**
+   ```javascript
+   import './src/polyfill.js';  // Include before library import
+   import { scraperNetflix } from './src/index.js';
+   ```
+
+#### ❌ Problem: Works on one machine but not another
+
+**Diagnosis Steps**:
+
+1. **Check Node.js versions**
+   ```bash
+   node --version  # Should be 18+
+   npm --version   # Should be 8+
+   ```
+
+2. **Check Netflix accessibility**
+   ```bash
+   curl -I "https://www.netflix.com/title/80189685"
+   ```
+
+3. **Compare User-Agent strings**
+   ```javascript
+   console.log(navigator.userAgent);  // Browser
+   console.log(process.userAgent);    // Node.js (may be undefined)
+   ```
+
+## 🔍 Debugging Techniques
+
+### 1. Enable Verbose Logging
+
+```javascript
+// Add debug logging to your code
+async function debugScraping(url) {
+  console.log('🚀 Starting scrape for:', url);
+
+  try {
+    const result = await scraperNetflix(url, {
+      headless: false,  // Try without browser first
+      timeoutMs: 30000
+    });
+
+    console.log('✅ Success:', result);
+    return result;
+  } catch (error) {
+    console.error('❌ Error details:', {
+      message: error.message,
+      stack: error.stack,
+      url: url
+    });
+    throw error;
+  }
+}
+```
+
+### 2. Test with Known Working URLs
+
+```javascript
+// Test with URLs that should definitely work
+const testUrls = [
+  'https://www.netflix.com/title/80189685',  // The Witcher
+  'https://www.netflix.com/title/82123114'   // ONE SHOT
+];
+
+for (const url of testUrls) {
+  try {
+    const result = await scraperNetflix(url);
+    console.log(`✅ ${url}: ${result.name}`);
+  } catch (error) {
+    console.error(`❌ ${url}: ${error.message}`);
+  }
+}
+```
+
+### 3. Isolate the Problem
+
+```javascript
+// Test each component separately
+import { normalizeNetflixUrl } from 'flixscaper/index';
+import { parseNetflixHtml } from 'flixscaper/parser';
+
+async function isolateProblem(url) {
+  try {
+    // 1. Test URL normalization
+    const normalized = normalizeNetflixUrl(url);
+    console.log('✅ URL normalized:', normalized);
+
+    // 2. Test HTML fetching
+    const html = await fetchStaticHtml(normalized);
+    console.log('✅ HTML fetched, length:', html.length);
+
+    // 3. Test parsing
+    const parsed = parseNetflixHtml(html);
+    console.log('✅ Parsed:', parsed);
+
+  } catch (error) {
+    console.error('❌ Step failed:', error.message);
+  }
+}
+```
+
+### 4. Browser Mode Debugging
+
+```javascript
+// Test with visible browser for debugging
+const result = await scraperNetflix(url, {
+  headless: false,     // Show browser window
+  timeoutMs: 60000     // Longer timeout for manual inspection
+});
+
+// Keep browser open by adding delay if needed
+await new Promise(resolve => setTimeout(resolve, 5000));
+```
+
+## 🌍 Regional & Language Issues
+
+### Turkish Netflix Specific Issues
+
+#### ❌ Problem: Turkish URLs not working
+
+**Test different URL formats**:
+```javascript
+const turkishUrls = [
+  'https://www.netflix.com/title/80189685',           // Standard
+  'https://www.netflix.com/tr/title/80189685',       // Turkish subdomain
+  'https://www.netflix.com/tr/title/80189685?s=i',   // With Turkish params
+  'https://www.netflix.com/tr/title/80189685?vlang=tr' // Turkish language
+];
+
+for (const url of turkishUrls) {
+  try {
+    const result = await scraperNetflix(url);
+    console.log(`✅ ${url}: ${result.name}`);
+  } catch (error) {
+    console.error(`❌ ${url}: ${error.message}`);
+  }
+}
+```
+
+#### ❌ Problem: New Turkish UI patterns not recognized
+
+**Report the issue with**:
+1. **Original title**: What Netflix returned
+2. **Expected title**: What it should be after cleaning
+3. **URL**: The Netflix URL where this occurs
+4. **Region**: Your geographic location
+
+Example issue report:
+```markdown
+**URL**: https://www.netflix.com/tr/title/12345678
+**Original**: "Dizi Adı yeni başlık | Netflix"
+**Expected**: "Dizi Adı"
+**Pattern to add**: "yeni başlık"
+**Region**: Turkey
+```
+
+## 📊 Performance Issues
+
+### Slow Response Times
+
+#### Diagnose the bottleneck:
+
+```javascript
+import { performance } from 'node:perf_hooks';
+
+async function profileScraping(url) {
+  const steps = {};
+
+  // URL Normalization
+  steps.normStart = performance.now();
+  const normalized = normalizeNetflixUrl(url);
+  steps.normEnd = performance.now();
+
+  // HTML Fetch
+  steps.fetchStart = performance.now();
+  const html = await fetchStaticHtml(normalized);
+  steps.fetchEnd = performance.now();
+
+  // Parsing
+  steps.parseStart = performance.now();
+  const parsed = parseNetflixHtml(html);
+  steps.parseEnd = performance.now();
+
+  console.log('Performance breakdown:', {
+    normalization: steps.normEnd - steps.normStart,
+    fetch: steps.fetchEnd - steps.fetchStart,
+    parsing: steps.parseEnd - steps.parseStart,
+    htmlSize: html.length
+  });
+
+  return parsed;
+}
+```
+
+#### Optimization Solutions:
+
+1. **Disable headless mode** (if not needed)
+   ```javascript
+   await scraperNetflix(url, { headless: false });
+   ```
+
+2. **Reduce timeout** (if network is fast)
+   ```javascript
+   await scraperNetflix(url, { timeoutMs: 5000 });
+   ```
+
+3. **Cache results** (for repeated requests)
+   ```javascript
+   const cache = new Map();
+
+   async function scrapeWithCache(url) {
+     if (cache.has(url)) {
+       return cache.get(url);
+     }
+
+     const result = await scraperNetflix(url);
+     cache.set(url, result);
+     return result;
+   }
+   ```
+
+## 🔧 Common Fixes
+
+### Quick Fix Checklist
+
+1. **Update dependencies**
+   ```bash
+   npm update flixscaper
+   npm update
+   ```
+
+2. **Clear npm cache**
+   ```bash
+   npm cache clean --force
+   rm -rf node_modules package-lock.json
+   npm install
+   ```
+
+3. **Check Node.js version**
+   ```bash
+   node --version  # Should be 18+
+   # If older, upgrade: nvm install 20 && nvm use 20
+   ```
+
+4. **Test with minimal example**
+   ```javascript
+   import { scraperNetflix } from 'metascraper';
+
+   scraperNetflix('https://www.netflix.com/title/80189685')
+     .then(result => console.log('Success:', result))
+     .catch(error => console.error('Error:', error.message));
+   ```
+
+5. **Try different options**
+   ```javascript
+   // If failing, try with different configurations
+   const configs = [
+     { headless: false },
+     { headless: true, timeoutMs: 30000 },
+     { headless: false, userAgent: 'different-ua' }
+   ];
+
+   for (const config of configs) {
+     try {
+       const result = await scraperNetflix(url, config);
+       console.log('✅ Working config:', config);
+       break;
+     } catch (error) {
+       console.log('❌ Failed config:', config, error.message);
+     }
+   }
+   ```
+
+## 📞 Getting Help
+
+### When to Report an Issue
+
+Report an issue when:
+
+1. **Previously working URL suddenly fails**
+2. **Error messages are unclear or unhelpful**
+3. **Turkish UI patterns not being removed**
+4. **Performance degrades significantly**
+5. **Documentation is unclear or incomplete**
+
+### Issue Report Template
+
+```markdown
+## Issue Description
+Brief description of the problem
+
+## Steps to Reproduce
+1. URL used: ...
+2. Code executed: ...
+3. Expected result: ...
+4. Actual result: ...
+
+## Environment
+- Node.js version: ...
+- OS: ...
+- flixscaper version: ...
+- Browser (if relevant): ...
+
+## Error Message
+```
+Paste full error message here
+```
+
+## Additional Context
+Any additional information that might help
+```
+
+### Debug Information to Include
+
+```javascript
+// Include this information in issue reports
+const debugInfo = {
+  nodeVersion: process.version,
+  platform: process.platform,
+  arch: process.arch,
+  flixscaperVersion: require('flixscaper/package.json').version,
+  timestamp: new Date().toISOString()
+};
+
+console.log('Debug Info:', JSON.stringify(debugInfo, null, 2));
+```
+
+---
+
+*Troubleshooting guide last updated: 2025-11-23*