amazon prime scrap özelliği eklendi

This commit is contained in:
2025-11-23 16:09:39 +03:00
parent 46d75b64d5
commit fefa6627e9
6 changed files with 988 additions and 28 deletions

View File

@@ -6,6 +6,10 @@
Netflix metadata extraction function with automatic fallback and Turkish localization.
### `scraperPrime(inputUrl, options?)`
Amazon Prime Video metadata extraction function with automatic fallback and Turkish localization.
#### Parameters
| Parameter | Type | Required | Default | Description |
@@ -30,6 +34,9 @@ Promise<{
name: string; // Clean title (Turkish UI removed)
year: string \| number \| undefined; // Release year
seasons: string \| null; // Season info for TV series
thumbnail: string \| null; // Poster/thumbnail image URL
info: string \| null; // Content description/summary
genre: string \| null; // Genre (Turkish normalized)
}>
```
@@ -46,7 +53,10 @@ console.log(result);
// "id": "82123114",
// "name": "ONE SHOT with Ed Sheeran",
// "year": "2025",
// "seasons": null
// "seasons": null,
// "thumbnail": "https://occ-0-7335-778.1.nflxso.net/dnm/api/v6/6AYY37jfdO6hpXcMjf9Yu5cnmO0/AAAABSkrIGPSyEfSWYQzc8rEFo6EtVV6Ls8WtPpNwR42MSKSNPNomZWV5P_l2MxGuJEkoPm71UT_eBK_SsTEH8pRslQr0sjpdhVHjxh4.jpg",
// "info": "Ed Sheeran, matematiğin mucizevi gücünü ve müziğin birleştirici gücünü sergileyen benzersiz bir performansla sahneye çıkıyor.",
// "genre": "Belgesel"
// }
```
@@ -78,6 +88,88 @@ try {
}
```
### `scraperPrime(inputUrl, options?)`
Amazon Prime Video metadata extraction function with automatic fallback and Turkish localization.
#### Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `inputUrl` | `string` | ✅ | - | Amazon Prime Video URL (any format) |
| `options` | `object` | ❌ | `{}` | Configuration options |
#### Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `headless` | `boolean` | `true` | Enable Playwright fallback for missing data |
| `timeoutMs` | `number` | `15000` | Request timeout in milliseconds |
| `userAgent` | `string` | Chrome 118 User-Agent | Custom User-Agent string |
#### Returns
```typescript
Promise<{
url: string; // Normalized Prime Video URL
id: string; // Prime Video content ID
name: string; // Clean title (Amazon UI removed)
year: string | number | undefined; // Release year
seasons: string | null; // Season info for TV series (null for movies)
thumbnail: string | null; // Poster/thumbnail image URL
info: string | null; // Content description/summary
genre: string | null; // Genre (Turkish normalized)
}>
```
#### Examples
**Basic Usage**
```javascript
import { scraperPrime } from 'metascraper';
const result = await scraperPrime('https://www.primevideo.com/-/tr/detail/0NHIN3TGAI9L7VZ45RS52RHUPL/ref=share_ios_movie');
console.log(result);
// {
// "url": "https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL",
// "id": "0NHIN3TGAI9L7VZ45RS52RHUPL",
// "name": "Little Women",
// "year": "2020",
// "seasons": null,
// "thumbnail": "https://m.media-amazon.com/images/S/pv-target-images/c1b08ebea5ba29c47145c623e7d1c586290221ec12fa93850029e581f54049c4.jpg",
// "info": "In the years after the Civil War, Jo March lives in New York and makes her living as a writer...",
// "genre": "Dram"
// }
```
**Advanced Configuration**
```javascript
import { scraperPrime } from 'metascraper';
const result = await scraperPrime(
'https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL',
{
headless: false, // Disable browser fallback
timeoutMs: 30000, // 30 second timeout
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
);
```
**Error Handling**
```javascript
import { scraperPrime } from 'metascraper';
try {
const result = await scraperPrime('https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL');
console.log('Success:', result);
} catch (error) {
console.error('Scraping failed:', error.message);
// Turkish error messages for Turkish users
// "Amazon Prime scraping başarısız: Amazon Prime URL'i gereklidir."
}
```
## 🧩 Internal APIs
### `parseNetflixHtml(html)` - Parser API
@@ -97,6 +189,9 @@ Parse Netflix HTML content to extract metadata without network requests.
name?: string; // Clean title
year?: string \| number; // Release year
seasons?: string \| null; // Season information
thumbnail?: string \| null; // Thumbnail image URL
info?: string \| null; // Content description
genre?: string \| null; // Genre information
}
```
@@ -114,7 +209,10 @@ console.log(metadata);
// {
// "name": "The Witcher",
// "year": "2025",
// "seasons": "4 Sezon"
// "seasons": "4 Sezon",
// "thumbnail": "https://occ-0-7335-778.1.nflxso.net/dnm/api/v6/6AYY37jfdO6hpXcMjf9Yu5cnmO0/AAAABSkrIGPSyEfSWYQzc8rEFo6EtVV6Ls8WtPpNwR42MSKSNPNomZWV5P_l2MxGuJEkoPm71UT_eBK_SsTEH8pRslQr0sjpdhVHjxh4.jpg",
// "info": "Mutasyona uğramış bir canavar avcısı olan Rivyalı Geralt, insanların çoğunlukla yaratıklardan daha uğursuz olduğu, karmaşa içindeki bir dünyada kaderine doğru yol alıyor.",
// "genre": "Aksiyon"
// }
```
@@ -165,6 +263,50 @@ try {
}
```
### `parsePrimeHtml(html)` - Prime Video Parser API
Parse Amazon Prime Video HTML content to extract metadata without network requests.
#### Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `html` | `string` | ✅ | Raw HTML content from Prime Video page |
#### Returns
```typescript
{
name?: string; // Clean title
year?: string | number; // Release year
seasons?: string | null; // Season information
thumbnail?: string | null; // Thumbnail image URL
info?: string | null; // Content description
genre?: string | null; // Genre information
}
```
#### Examples
```javascript
import { parsePrimeHtml } from 'metascraper/parser';
// With cached HTML
const fs = await import('node:fs');
const html = fs.readFileSync('prime-page.html', 'utf8');
const metadata = parsePrimeHtml(html);
console.log(metadata);
// {
// "name": "Little Women",
// "year": "2020",
// "seasons": null,
// "thumbnail": "https://m.media-amazon.com/images/S/pv-target-images/...",
// "info": "In the years after the Civil War, Jo March lives in New York...",
// "genre": "Dram"
// }
```
## 🔧 URL Processing
### Supported URL Formats
@@ -199,6 +341,36 @@ The function validates URLs with these rules:
'https://www.netflix.com/title/abc' // Non-numeric ID
```
### Amazon Prime Video URL Formats
The `scraperPrime` function automatically normalizes various Prime Video URL formats:
| Input Format | Normalized Output | Notes |
|--------------|-------------------|-------|
| `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL` | `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL` | Standard format |
| `https://www.primevideo.com/-/tr/detail/0NHIN3TGAI9L7VZ45RS52RHUPL/ref=share_ios_movie` | `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL` | Turkish locale with tracking |
| `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL?ref_=atv_dp` | `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL` | With parameters |
### Prime Video URL Validation
The function validates URLs with these rules:
1. **Format**: Must be a valid URL
2. **Domain**: Must contain `primevideo.com`
3. **Path**: Must contain `detail/` followed by content ID
4. **ID Extraction**: Uses path parsing to extract content ID
```javascript
// These will work:
'https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL'
'https://www.primevideo.com/-/tr/detail/0NHIN3TGAI9L7VZ45RS52RHUPL/ref=share_ios_movie'
// These will fail:
'https://google.com' // Wrong domain
'https://www.primevideo.com/browse' // No content ID
'not-a-url' // Invalid format
```
## 🌍 Localization Features
### Turkish UI Text Removal
@@ -263,6 +435,37 @@ If JSON-LD is unavailable, falls back to HTML meta tags:
<title>The Witcher izlemenizi bekliyor | Netflix</title>
```
### Thumbnail Image Extraction
The parser automatically extracts poster/thumbnail images from Netflix meta tags:
```javascript
// Thumbnail selectors in priority order:
const THUMBNAIL_SELECTORS = [
'meta[property="og:image"]', // Open Graph image (primary)
'meta[name="twitter:image"]', // Twitter card image
'meta[property="og:image:secure_url"]', // Secure image URL
'link[rel="image_src"]', // Image source link
'meta[itemprop="image"]' // Schema.org image
];
```
**Example Netflix HTML:**
```html
<meta property="og:image" content="https://occ-0-7335-778.1.nflxso.net/dnm/api/v6/6AYY37jfdO6hpXcMjf9Yu5cnmO0/AAAABSkrIGPSyEfSWYQzc8rEFo6EtVV6Ls8WtPpNwR42MSKSNPNomZWV5P_l2MxGuJEkoPm71UT_eBK_SsTEH8pRslQr0sjpdhVHjxh4.jpg">
```
**URL Validation:**
- Only Netflix CDN domains are accepted (nflxso.net, nflximg.net, etc.)
- Image file extensions are verified (.jpg, .jpeg, .png, .webp)
- Query parameters are cleaned for stability
**Fallback Strategy:**
1. Try Open Graph image first (most reliable)
2. Fall back to Twitter card image
3. Try other meta tags if needed
4. Return null if no valid thumbnail found
### Season Detection
For TV series, extracts season information: