19 KiB
MetaScraper API Reference
🎯 Main API
scraperNetflix(inputUrl, options?)
Netflix metadata extraction function with automatic fallback and Turkish localization.
scraperPrime(inputUrl, options?)
Amazon Prime Video metadata extraction function with automatic fallback and Turkish localization.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
inputUrl |
string |
✅ | - | Netflix title URL (any format) |
options |
object |
❌ | {} |
Configuration options |
Options
| Option | Type | Default | Description |
|---|---|---|---|
headless |
boolean |
true |
Enable Playwright fallback for missing data |
timeoutMs |
number |
15000 |
Request timeout in milliseconds |
userAgent |
string |
Chrome 118 User-Agent | Custom User-Agent string |
Returns
Promise<{
url: string; // Normalized Netflix URL
id: string; // Netflix title ID
name: string; // Clean title (Turkish UI removed)
year: string \| number \| undefined; // Release year
seasons: string \| null; // Season info for TV series
thumbnail: string \| null; // Poster/thumbnail image URL
info: string \| null; // Content description/summary
genre: string \| null; // Genre (Turkish normalized)
}>
Examples
Basic Usage
import { scraperNetflix } from 'metascraper';
const result = await scraperNetflix('https://www.netflix.com/tr/title/82123114');
console.log(result);
// {
// "url": "https://www.netflix.com/title/82123114",
// "id": "82123114",
// "name": "ONE SHOT with Ed Sheeran",
// "year": "2025",
// "seasons": null,
// "thumbnail": "https://occ-0-7335-778.1.nflxso.net/dnm/api/v6/6AYY37jfdO6hpXcMjf9Yu5cnmO0/AAAABSkrIGPSyEfSWYQzc8rEFo6EtVV6Ls8WtPpNwR42MSKSNPNomZWV5P_l2MxGuJEkoPm71UT_eBK_SsTEH8pRslQr0sjpdhVHjxh4.jpg",
// "info": "Ed Sheeran, matematiğin mucizevi gücünü ve müziğin birleştirici gücünü sergileyen benzersiz bir performansla sahneye çıkıyor.",
// "genre": "Belgesel"
// }
Advanced Configuration
import { scraperNetflix } from 'metascraper';
const result = await scraperNetflix(
'https://www.netflix.com/title/80189685',
{
headless: false, // Disable browser fallback
timeoutMs: 30000, // 30 second timeout
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
);
Error Handling
import { scraperNetflix } from 'metascraper';
try {
const result = await scraperNetflix('https://www.netflix.com/title/80189685');
console.log('Success:', result);
} catch (error) {
console.error('Scraping failed:', error.message);
// Turkish error messages for Turkish users
// "Netflix scraping başarısız: Netflix URL'i gereklidir."
}
scraperPrime(inputUrl, options?)
Amazon Prime Video metadata extraction function with automatic fallback and Turkish localization.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
inputUrl |
string |
✅ | - | Amazon Prime Video URL (any format) |
options |
object |
❌ | {} |
Configuration options |
Options
| Option | Type | Default | Description |
|---|---|---|---|
headless |
boolean |
true |
Enable Playwright fallback for missing data |
timeoutMs |
number |
15000 |
Request timeout in milliseconds |
userAgent |
string |
Chrome 118 User-Agent | Custom User-Agent string |
Returns
Promise<{
url: string; // Normalized Prime Video URL
id: string; // Prime Video content ID
name: string; // Clean title (Amazon UI removed)
year: string | number | undefined; // Release year
seasons: string | null; // Season info for TV series (null for movies)
thumbnail: string | null; // Poster/thumbnail image URL
info: string | null; // Content description/summary
genre: string | null; // Genre (Turkish normalized)
}>
Examples
Basic Usage
import { scraperPrime } from 'metascraper';
const result = await scraperPrime('https://www.primevideo.com/-/tr/detail/0NHIN3TGAI9L7VZ45RS52RHUPL/ref=share_ios_movie');
console.log(result);
// {
// "url": "https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL",
// "id": "0NHIN3TGAI9L7VZ45RS52RHUPL",
// "name": "Little Women",
// "year": "2020",
// "seasons": null,
// "thumbnail": "https://m.media-amazon.com/images/S/pv-target-images/c1b08ebea5ba29c47145c623e7d1c586290221ec12fa93850029e581f54049c4.jpg",
// "info": "In the years after the Civil War, Jo March lives in New York and makes her living as a writer...",
// "genre": "Dram"
// }
Advanced Configuration
import { scraperPrime } from 'metascraper';
const result = await scraperPrime(
'https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL',
{
headless: false, // Disable browser fallback
timeoutMs: 30000, // 30 second timeout
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
);
Error Handling
import { scraperPrime } from 'metascraper';
try {
const result = await scraperPrime('https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL');
console.log('Success:', result);
} catch (error) {
console.error('Scraping failed:', error.message);
// Turkish error messages for Turkish users
// "Amazon Prime scraping başarısız: Amazon Prime URL'i gereklidir."
}
🧩 Internal APIs
parseNetflixHtml(html) - Parser API
Parse Netflix HTML content to extract metadata without network requests.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
html |
string |
✅ | Raw HTML content from Netflix page |
Returns
{
name?: string; // Clean title
year?: string \| number; // Release year
seasons?: string \| null; // Season information
thumbnail?: string \| null; // Thumbnail image URL
info?: string \| null; // Content description
genre?: string \| null; // Genre information
}
Examples
import { parseNetflixHtml } from 'metascraper/parser';
// With cached HTML
const fs = await import('node:fs');
const html = fs.readFileSync('netflix-page.html', 'utf8');
const metadata = parseNetflixHtml(html);
console.log(metadata);
// {
// "name": "The Witcher",
// "year": "2025",
// "seasons": "4 Sezon",
// "thumbnail": "https://occ-0-7335-778.1.nflxso.net/dnm/api/v6/6AYY37jfdO6hpXcMjf9Yu5cnmO0/AAAABSkrIGPSyEfSWYQzc8rEFo6EtVV6Ls8WtPpNwR42MSKSNPNomZWV5P_l2MxGuJEkoPm71UT_eBK_SsTEH8pRslQr0sjpdhVHjxh4.jpg",
// "info": "Mutasyona uğramış bir canavar avcısı olan Rivyalı Geralt, insanların çoğunlukla yaratıklardan daha uğursuz olduğu, karmaşa içindeki bir dünyada kaderine doğru yol alıyor.",
// "genre": "Aksiyon"
// }
fetchPageContentWithPlaywright(url, options) - Headless API
Fetch Netflix page content using Playwright browser automation.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string |
✅ | Complete URL to fetch |
options |
object |
✅ | Browser configuration |
Options
| Option | Type | Default | Description |
|---|---|---|---|
timeoutMs |
number |
15000 |
Page load timeout |
userAgent |
string |
Chrome 118 | Browser User-Agent |
headless |
boolean |
true |
Run browser in headless mode |
Returns
Promise<string> // HTML content of the page
Examples
import { fetchPageContentWithPlaywright } from 'metascraper/headless';
try {
const html = await fetchPageContentWithPlaywright(
'https://www.netflix.com/title/80189685',
{
timeoutMs: 30000,
headless: false // Show browser (useful for debugging)
}
);
// Process the HTML with parser
const metadata = parseNetflixHtml(html);
console.log(metadata);
} catch (error) {
console.error('Browser automation failed:', error.message);
}
parsePrimeHtml(html) - Prime Video Parser API
Parse Amazon Prime Video HTML content to extract metadata without network requests.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
html |
string |
✅ | Raw HTML content from Prime Video page |
Returns
{
name?: string; // Clean title
year?: string | number; // Release year
seasons?: string | null; // Season information
thumbnail?: string | null; // Thumbnail image URL
info?: string | null; // Content description
genre?: string | null; // Genre information
}
Examples
import { parsePrimeHtml } from 'metascraper/parser';
// With cached HTML
const fs = await import('node:fs');
const html = fs.readFileSync('prime-page.html', 'utf8');
const metadata = parsePrimeHtml(html);
console.log(metadata);
// {
// "name": "Little Women",
// "year": "2020",
// "seasons": null,
// "thumbnail": "https://m.media-amazon.com/images/S/pv-target-images/...",
// "info": "In the years after the Civil War, Jo March lives in New York...",
// "genre": "Dram"
// }
🔧 URL Processing
Supported URL Formats
The scraperNetflix function automatically normalizes various Netflix URL formats:
| Input Format | Normalized Output | Notes |
|---|---|---|
https://www.netflix.com/title/80189685 |
https://www.netflix.com/title/80189685 |
Standard format |
https://www.netflix.com/tr/title/80189685 |
https://www.netflix.com/title/80189685 |
Turkish locale |
https://www.netflix.com/tr/title/80189685?s=i&trkid=264356104&vlang=tr |
https://www.netflix.com/title/80189685 |
With parameters |
https://www.netflix.com/title/80189685?trackId=12345 |
https://www.netflix.com/title/80189685 |
With tracking |
URL Validation
The function validates URLs with these rules:
- Format: Must be a valid URL
- Domain: Must contain
netflix.com - Path: Must contain
title/followed by numeric ID - ID Extraction: Uses regex to extract title ID
// These will work:
'https://www.netflix.com/title/80189685'
'https://www.netflix.com/tr/title/80189685?s=i&vlang=tr'
// These will fail:
'https://google.com' // Wrong domain
'https://www.netflix.com/browse' // No title ID
'not-a-url' // Invalid format
'https://www.netflix.com/title/abc' // Non-numeric ID
Amazon Prime Video URL Formats
The scraperPrime function automatically normalizes various Prime Video URL formats:
| Input Format | Normalized Output | Notes |
|---|---|---|
https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL |
https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL |
Standard format |
https://www.primevideo.com/-/tr/detail/0NHIN3TGAI9L7VZ45RS52RHUPL/ref=share_ios_movie |
https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL |
Turkish locale with tracking |
https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL?ref_=atv_dp |
https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL |
With parameters |
Prime Video URL Validation
The function validates URLs with these rules:
- Format: Must be a valid URL
- Domain: Must contain
primevideo.com - Path: Must contain
detail/followed by content ID - ID Extraction: Uses path parsing to extract content ID
// These will work:
'https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL'
'https://www.primevideo.com/-/tr/detail/0NHIN3TGAI9L7VZ45RS52RHUPL/ref=share_ios_movie'
// These will fail:
'https://google.com' // Wrong domain
'https://www.primevideo.com/browse' // No content ID
'not-a-url' // Invalid format
🌍 Localization Features
Turkish UI Text Removal
The parser automatically removes Turkish Netflix UI text from titles:
| Original Title | Cleaned Title | Removed Pattern |
|---|---|---|
| "The Witcher izlemenizi bekliyor" | "The Witcher | izlemenizi bekliyor |
| "Stranger Things izleyin" | "Stranger Things" | izleyin |
| "Sezon 4 devam et" | "Sezon 4" | devam et |
| "Dark başla" | "Dark" | başla |
| "The Crown izlemeye devam" | "The Crown" | izlemeye devam |
Supported Turkish Patterns
const TURKISH_UI_PATTERNS = [
/\s+izlemenizi bekliyor$/i, // "waiting for you to watch"
/\s+izleyin$/i, // "watch"
/\s+devam et$/i, // "continue"
/\s+başla$/i, // "start"
/\s+izlemeye devam$/i, // "continue watching"
/\s+Sezon\s+\d+.*izlemeye devam$/i, // "Sezon X izlemeye devam"
/\s+Sezon\s+\d+.*başla$/i, // "Sezon X başla"
];
English UI Pattern Removal
Also removes universal English UI text:
| Original Title | Cleaned Title | Removed Pattern |
|---|---|---|
| "Watch Now The Witcher" | "The Witcher" | Watch Now |
| "The Witcher Continue Watching" | "The Witcher" | Continue Watching |
| "Season 4 Play" | "Season 4" | Season X Play |
📊 Data Extraction Patterns
JSON-LD Processing
The parser extracts metadata from JSON-LD structured data:
// Looks for these JSON-LD fields:
const YEAR_FIELDS = [
'datePublished', 'startDate', 'uploadDate',
'copyrightYear', 'releasedEvent', 'releaseYear', 'dateCreated'
];
const SEASON_TYPES = ['TVSeries', 'TVShow', 'Series'];
Meta Tag Fallbacks
If JSON-LD is unavailable, falls back to HTML meta tags:
<meta property="og:title" content="The Witcher izlemenizi bekliyor | Netflix">
<meta name="title" content="The Witcher | Netflix">
<title>The Witcher izlemenizi bekliyor | Netflix</title>
Thumbnail Image Extraction
The parser automatically extracts poster/thumbnail images from Netflix meta tags:
// Thumbnail selectors in priority order:
const THUMBNAIL_SELECTORS = [
'meta[property="og:image"]', // Open Graph image (primary)
'meta[name="twitter:image"]', // Twitter card image
'meta[property="og:image:secure_url"]', // Secure image URL
'link[rel="image_src"]', // Image source link
'meta[itemprop="image"]' // Schema.org image
];
Example Netflix HTML:
<meta property="og:image" content="https://occ-0-7335-778.1.nflxso.net/dnm/api/v6/6AYY37jfdO6hpXcMjf9Yu5cnmO0/AAAABSkrIGPSyEfSWYQzc8rEFo6EtVV6Ls8WtPpNwR42MSKSNPNomZWV5P_l2MxGuJEkoPm71UT_eBK_SsTEH8pRslQr0sjpdhVHjxh4.jpg">
URL Validation:
- Only Netflix CDN domains are accepted (nflxso.net, nflximg.net, etc.)
- Image file extensions are verified (.jpg, .jpeg, .png, .webp)
- Query parameters are cleaned for stability
Fallback Strategy:
- Try Open Graph image first (most reliable)
- Fall back to Twitter card image
- Try other meta tags if needed
- Return null if no valid thumbnail found
Season Detection
For TV series, extracts season information:
// Example JSON-LD for TV series:
{
"@type": "TVSeries",
"name": "The Witcher",
"numberOfSeasons": 4,
"datePublished": "2025"
}
// Result: "4 Sezon"
⚡ Performance Characteristics
Response Times by Mode
| Mode | Typical Response | Success Rate | Resource Usage |
|---|---|---|---|
| Static Only | 200-500ms | ~85% | Very Low |
| Static + Headless Fallback | 2-5s | ~95% | Medium |
| Headless Only | 2-3s | ~90% | High |
Resource Requirements
Static Mode:
- CPU: Low (< 5%)
- Memory: < 20MB
- Network: 1 HTTP request
Headless Mode:
- CPU: Medium (10-20%)
- Memory: 100-200MB
- Network: Multiple requests
- Browser: Chromium instance
🚨 Error Types & Handling
Common Error Scenarios
1. Invalid URL
await scraperNetflix('invalid-url');
// Throws: "Geçersiz URL sağlandı."
2. Non-Netflix URL
await scraperNetflix('https://google.com');
// Throws: "URL netflix.com adresini göstermelidir."
3. Missing Title ID
await scraperNetflix('https://www.netflix.com/browse');
// Throws: "URL'de Netflix başlık ID'si bulunamadı."
4. Network Timeout
await scraperNetflix('https://www.netflix.com/title/80189685', { timeoutMs: 1 });
// Throws: "Request timed out while reaching Netflix."
5. 404 Not Found
await scraperNetflix('https://www.netflix.com/title/99999999');
// Throws: "Netflix title not found (404)."
6. Playwright Not Available
// When headless mode needed but Playwright not installed
// Throws: "Playwright is not installed. Install the optional dependency..."
7. Parsing Failed
// When HTML cannot be parsed for metadata
// Throws: "Netflix sayfa meta verisi parse edilemedi."
Error Object Structure
{
name: "Error",
message: "Netflix scraping başarısız: Geçersiz URL sağlandı.",
stack: "Error: Netflix scraping başarısız: Geçersiz URL sağlandı.\n at scraperNetflix...",
// Additional context for debugging
}
🔧 Advanced Usage Patterns
Batch Processing
import { scraperNetflix } from 'metascraper';
const urls = [
'https://www.netflix.com/title/80189685',
'https://www.netflix.com/title/82123114',
'https://www.netflix.com/title/70177057'
];
const results = await Promise.allSettled(
urls.map(url => scraperNetflix(url))
);
results.forEach((result, index) => {
if (result.status === 'fulfilled') {
console.log(`✅ ${urls[index]}:`, result.value.name);
} else {
console.log(`❌ ${urls[index]}:`, result.reason.message);
}
});
Custom User-Agent Rotation
const userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
];
const getRandomUA = () => userAgents[Math.floor(Math.random() * userAgents.length)];
const result = await scraperNetflix(url, {
userAgent: getRandomUA()
});
Retry Logic Implementation
async function scrapeWithRetry(url, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await scraperNetflix(url);
} catch (error) {
if (attempt === maxRetries) throw error;
console.log(`Attempt ${attempt} failed, retrying in ${attempt * 1000}ms...`);
await new Promise(resolve => setTimeout(resolve, attempt * 1000));
}
}
}
Caching Integration
const cache = new Map();
async function scrapeWithCache(url) {
const cacheKey = `netflix:${url}`;
if (cache.has(cacheKey)) {
console.log('Cache hit for:', url);
return cache.get(cacheKey);
}
const result = await scraperNetflix(url);
cache.set(cacheKey, result);
// Optional: Cache expiration
setTimeout(() => cache.delete(cacheKey), 30 * 60 * 1000); // 30 minutes
return result;
}
API documentation last updated: 2025-11-23