# MetaScraper API Reference ## 🎯 Main API ### `scraperNetflix(inputUrl, options?)` Netflix metadata extraction function with automatic fallback and Turkish localization. #### Parameters | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `inputUrl` | `string` | ✅ | - | Netflix title URL (any format) | | `options` | `object` | ❌ | `{}` | Configuration options | #### Options | Option | Type | Default | Description | |--------|------|---------|-------------| | `headless` | `boolean` | `true` | Enable Playwright fallback for missing data | | `timeoutMs` | `number` | `15000` | Request timeout in milliseconds | | `userAgent` | `string` | Chrome 118 User-Agent | Custom User-Agent string | #### Returns ```typescript Promise<{ url: string; // Normalized Netflix URL id: string; // Netflix title ID name: string; // Clean title (Turkish UI removed) year: string \| number \| undefined; // Release year seasons: string \| null; // Season info for TV series }> ``` #### Examples **Basic Usage** ```javascript import { scraperNetflix } from 'metascraper'; const result = await scraperNetflix('https://www.netflix.com/tr/title/82123114'); console.log(result); // { // "url": "https://www.netflix.com/title/82123114", // "id": "82123114", // "name": "ONE SHOT with Ed Sheeran", // "year": "2025", // "seasons": null // } ``` **Advanced Configuration** ```javascript import { scraperNetflix } from 'metascraper'; const result = await scraperNetflix( 'https://www.netflix.com/title/80189685', { headless: false, // Disable browser fallback timeoutMs: 30000, // 30 second timeout userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' } ); ``` **Error Handling** ```javascript import { scraperNetflix } from 'metascraper'; try { const result = await scraperNetflix('https://www.netflix.com/title/80189685'); console.log('Success:', result); } catch (error) { console.error('Scraping failed:', error.message); // Turkish error messages for Turkish users // "Netflix scraping başarısız: Netflix URL'i gereklidir." } ``` ## 🧩 Internal APIs ### `parseNetflixHtml(html)` - Parser API Parse Netflix HTML content to extract metadata without network requests. #### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `html` | `string` | ✅ | Raw HTML content from Netflix page | #### Returns ```typescript { name?: string; // Clean title year?: string \| number; // Release year seasons?: string \| null; // Season information } ``` #### Examples ```javascript import { parseNetflixHtml } from 'metascraper/parser'; // With cached HTML const fs = await import('node:fs'); const html = fs.readFileSync('netflix-page.html', 'utf8'); const metadata = parseNetflixHtml(html); console.log(metadata); // { // "name": "The Witcher", // "year": "2025", // "seasons": "4 Sezon" // } ``` ### `fetchPageContentWithPlaywright(url, options)` - Headless API Fetch Netflix page content using Playwright browser automation. #### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `url` | `string` | ✅ | Complete URL to fetch | | `options` | `object` | ✅ | Browser configuration | #### Options | Option | Type | Default | Description | |--------|------|---------|-------------| | `timeoutMs` | `number` | `15000` | Page load timeout | | `userAgent` | `string` | Chrome 118 | Browser User-Agent | | `headless` | `boolean` | `true` | Run browser in headless mode | #### Returns ```typescript Promise // HTML content of the page ``` #### Examples ```javascript import { fetchPageContentWithPlaywright } from 'metascraper/headless'; try { const html = await fetchPageContentWithPlaywright( 'https://www.netflix.com/title/80189685', { timeoutMs: 30000, headless: false // Show browser (useful for debugging) } ); // Process the HTML with parser const metadata = parseNetflixHtml(html); console.log(metadata); } catch (error) { console.error('Browser automation failed:', error.message); } ``` ## 🔧 URL Processing ### Supported URL Formats The `scraperNetflix` function automatically normalizes various Netflix URL formats: | Input Format | Normalized Output | Notes | |--------------|-------------------|-------| | `https://www.netflix.com/title/80189685` | `https://www.netflix.com/title/80189685` | Standard format | | `https://www.netflix.com/tr/title/80189685` | `https://www.netflix.com/title/80189685` | Turkish locale | | `https://www.netflix.com/tr/title/80189685?s=i&trkid=264356104&vlang=tr` | `https://www.netflix.com/title/80189685` | With parameters | | `https://www.netflix.com/title/80189685?trackId=12345` | `https://www.netflix.com/title/80189685` | With tracking | ### URL Validation The function validates URLs with these rules: 1. **Format**: Must be a valid URL 2. **Domain**: Must contain `netflix.com` 3. **Path**: Must contain `title/` followed by numeric ID 4. **ID Extraction**: Uses regex to extract title ID ```javascript // These will work: 'https://www.netflix.com/title/80189685' 'https://www.netflix.com/tr/title/80189685?s=i&vlang=tr' // These will fail: 'https://google.com' // Wrong domain 'https://www.netflix.com/browse' // No title ID 'not-a-url' // Invalid format 'https://www.netflix.com/title/abc' // Non-numeric ID ``` ## 🌍 Localization Features ### Turkish UI Text Removal The parser automatically removes Turkish Netflix UI text from titles: | Original Title | Cleaned Title | Removed Pattern | |----------------|---------------|-----------------| | "The Witcher izlemenizi bekliyor" | "The Witcher | `izlemenizi bekliyor` | | "Stranger Things izleyin" | "Stranger Things" | `izleyin` | | "Sezon 4 devam et" | "Sezon 4" | `devam et` | | "Dark başla" | "Dark" | `başla` | | "The Crown izlemeye devam" | "The Crown" | `izlemeye devam` | ### Supported Turkish Patterns ```javascript const TURKISH_UI_PATTERNS = [ /\s+izlemenizi bekliyor$/i, // "waiting for you to watch" /\s+izleyin$/i, // "watch" /\s+devam et$/i, // "continue" /\s+başla$/i, // "start" /\s+izlemeye devam$/i, // "continue watching" /\s+Sezon\s+\d+.*izlemeye devam$/i, // "Sezon X izlemeye devam" /\s+Sezon\s+\d+.*başla$/i, // "Sezon X başla" ]; ``` ### English UI Pattern Removal Also removes universal English UI text: | Original Title | Cleaned Title | Removed Pattern | |----------------|---------------|-----------------| | "Watch Now The Witcher" | "The Witcher" | `Watch Now` | | "The Witcher Continue Watching" | "The Witcher" | `Continue Watching` | | "Season 4 Play" | "Season 4" | `Season X Play` | ## 📊 Data Extraction Patterns ### JSON-LD Processing The parser extracts metadata from JSON-LD structured data: ```javascript // Looks for these JSON-LD fields: const YEAR_FIELDS = [ 'datePublished', 'startDate', 'uploadDate', 'copyrightYear', 'releasedEvent', 'releaseYear', 'dateCreated' ]; const SEASON_TYPES = ['TVSeries', 'TVShow', 'Series']; ``` ### Meta Tag Fallbacks If JSON-LD is unavailable, falls back to HTML meta tags: ```html The Witcher izlemenizi bekliyor | Netflix ``` ### Season Detection For TV series, extracts season information: ```javascript // Example JSON-LD for TV series: { "@type": "TVSeries", "name": "The Witcher", "numberOfSeasons": 4, "datePublished": "2025" } // Result: "4 Sezon" ``` ## ⚡ Performance Characteristics ### Response Times by Mode | Mode | Typical Response | Success Rate | Resource Usage | |------|------------------|--------------|----------------| | Static Only | 200-500ms | ~85% | Very Low | | Static + Headless Fallback | 2-5s | ~95% | Medium | | Headless Only | 2-3s | ~90% | High | ### Resource Requirements **Static Mode:** - CPU: Low (< 5%) - Memory: < 20MB - Network: 1 HTTP request **Headless Mode:** - CPU: Medium (10-20%) - Memory: 100-200MB - Network: Multiple requests - Browser: Chromium instance ## 🚨 Error Types & Handling ### Common Error Scenarios #### 1. Invalid URL ```javascript await scraperNetflix('invalid-url'); // Throws: "Geçersiz URL sağlandı." ``` #### 2. Non-Netflix URL ```javascript await scraperNetflix('https://google.com'); // Throws: "URL netflix.com adresini göstermelidir." ``` #### 3. Missing Title ID ```javascript await scraperNetflix('https://www.netflix.com/browse'); // Throws: "URL'de Netflix başlık ID'si bulunamadı." ``` #### 4. Network Timeout ```javascript await scraperNetflix('https://www.netflix.com/title/80189685', { timeoutMs: 1 }); // Throws: "Request timed out while reaching Netflix." ``` #### 5. 404 Not Found ```javascript await scraperNetflix('https://www.netflix.com/title/99999999'); // Throws: "Netflix title not found (404)." ``` #### 6. Playwright Not Available ```javascript // When headless mode needed but Playwright not installed // Throws: "Playwright is not installed. Install the optional dependency..." ``` #### 7. Parsing Failed ```javascript // When HTML cannot be parsed for metadata // Throws: "Netflix sayfa meta verisi parse edilemedi." ``` ### Error Object Structure ```javascript { name: "Error", message: "Netflix scraping başarısız: Geçersiz URL sağlandı.", stack: "Error: Netflix scraping başarısız: Geçersiz URL sağlandı.\n at scraperNetflix...", // Additional context for debugging } ``` ## 🔧 Advanced Usage Patterns ### Batch Processing ```javascript import { scraperNetflix } from 'metascraper'; const urls = [ 'https://www.netflix.com/title/80189685', 'https://www.netflix.com/title/82123114', 'https://www.netflix.com/title/70177057' ]; const results = await Promise.allSettled( urls.map(url => scraperNetflix(url)) ); results.forEach((result, index) => { if (result.status === 'fulfilled') { console.log(`✅ ${urls[index]}:`, result.value.name); } else { console.log(`❌ ${urls[index]}:`, result.reason.message); } }); ``` ### Custom User-Agent Rotation ```javascript const userAgents = [ 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36' ]; const getRandomUA = () => userAgents[Math.floor(Math.random() * userAgents.length)]; const result = await scraperNetflix(url, { userAgent: getRandomUA() }); ``` ### Retry Logic Implementation ```javascript async function scrapeWithRetry(url, maxRetries = 3) { for (let attempt = 1; attempt <= maxRetries; attempt++) { try { return await scraperNetflix(url); } catch (error) { if (attempt === maxRetries) throw error; console.log(`Attempt ${attempt} failed, retrying in ${attempt * 1000}ms...`); await new Promise(resolve => setTimeout(resolve, attempt * 1000)); } } } ``` ### Caching Integration ```javascript const cache = new Map(); async function scrapeWithCache(url) { const cacheKey = `netflix:${url}`; if (cache.has(cacheKey)) { console.log('Cache hit for:', url); return cache.get(cacheKey); } const result = await scraperNetflix(url); cache.set(cacheKey, result); // Optional: Cache expiration setTimeout(() => cache.delete(cacheKey), 30 * 60 * 1000); // 30 minutes return result; } ``` --- *API documentation last updated: 2025-11-23*