amazon prime scrap özelliği eklendi
This commit is contained in:
207
doc/API.md
207
doc/API.md
@@ -6,6 +6,10 @@
|
||||
|
||||
Netflix metadata extraction function with automatic fallback and Turkish localization.
|
||||
|
||||
### `scraperPrime(inputUrl, options?)`
|
||||
|
||||
Amazon Prime Video metadata extraction function with automatic fallback and Turkish localization.
|
||||
|
||||
#### Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
@@ -30,6 +34,9 @@ Promise<{
|
||||
name: string; // Clean title (Turkish UI removed)
|
||||
year: string \| number \| undefined; // Release year
|
||||
seasons: string \| null; // Season info for TV series
|
||||
thumbnail: string \| null; // Poster/thumbnail image URL
|
||||
info: string \| null; // Content description/summary
|
||||
genre: string \| null; // Genre (Turkish normalized)
|
||||
}>
|
||||
```
|
||||
|
||||
@@ -46,7 +53,10 @@ console.log(result);
|
||||
// "id": "82123114",
|
||||
// "name": "ONE SHOT with Ed Sheeran",
|
||||
// "year": "2025",
|
||||
// "seasons": null
|
||||
// "seasons": null,
|
||||
// "thumbnail": "https://occ-0-7335-778.1.nflxso.net/dnm/api/v6/6AYY37jfdO6hpXcMjf9Yu5cnmO0/AAAABSkrIGPSyEfSWYQzc8rEFo6EtVV6Ls8WtPpNwR42MSKSNPNomZWV5P_l2MxGuJEkoPm71UT_eBK_SsTEH8pRslQr0sjpdhVHjxh4.jpg",
|
||||
// "info": "Ed Sheeran, matematiğin mucizevi gücünü ve müziğin birleştirici gücünü sergileyen benzersiz bir performansla sahneye çıkıyor.",
|
||||
// "genre": "Belgesel"
|
||||
// }
|
||||
```
|
||||
|
||||
@@ -78,6 +88,88 @@ try {
|
||||
}
|
||||
```
|
||||
|
||||
### `scraperPrime(inputUrl, options?)`
|
||||
|
||||
Amazon Prime Video metadata extraction function with automatic fallback and Turkish localization.
|
||||
|
||||
#### Parameters
|
||||
|
||||
| Parameter | Type | Required | Default | Description |
|
||||
|-----------|------|----------|---------|-------------|
|
||||
| `inputUrl` | `string` | ✅ | - | Amazon Prime Video URL (any format) |
|
||||
| `options` | `object` | ❌ | `{}` | Configuration options |
|
||||
|
||||
#### Options
|
||||
|
||||
| Option | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `headless` | `boolean` | `true` | Enable Playwright fallback for missing data |
|
||||
| `timeoutMs` | `number` | `15000` | Request timeout in milliseconds |
|
||||
| `userAgent` | `string` | Chrome 118 User-Agent | Custom User-Agent string |
|
||||
|
||||
#### Returns
|
||||
|
||||
```typescript
|
||||
Promise<{
|
||||
url: string; // Normalized Prime Video URL
|
||||
id: string; // Prime Video content ID
|
||||
name: string; // Clean title (Amazon UI removed)
|
||||
year: string | number | undefined; // Release year
|
||||
seasons: string | null; // Season info for TV series (null for movies)
|
||||
thumbnail: string | null; // Poster/thumbnail image URL
|
||||
info: string | null; // Content description/summary
|
||||
genre: string | null; // Genre (Turkish normalized)
|
||||
}>
|
||||
```
|
||||
|
||||
#### Examples
|
||||
|
||||
**Basic Usage**
|
||||
```javascript
|
||||
import { scraperPrime } from 'metascraper';
|
||||
|
||||
const result = await scraperPrime('https://www.primevideo.com/-/tr/detail/0NHIN3TGAI9L7VZ45RS52RHUPL/ref=share_ios_movie');
|
||||
console.log(result);
|
||||
// {
|
||||
// "url": "https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL",
|
||||
// "id": "0NHIN3TGAI9L7VZ45RS52RHUPL",
|
||||
// "name": "Little Women",
|
||||
// "year": "2020",
|
||||
// "seasons": null,
|
||||
// "thumbnail": "https://m.media-amazon.com/images/S/pv-target-images/c1b08ebea5ba29c47145c623e7d1c586290221ec12fa93850029e581f54049c4.jpg",
|
||||
// "info": "In the years after the Civil War, Jo March lives in New York and makes her living as a writer...",
|
||||
// "genre": "Dram"
|
||||
// }
|
||||
```
|
||||
|
||||
**Advanced Configuration**
|
||||
```javascript
|
||||
import { scraperPrime } from 'metascraper';
|
||||
|
||||
const result = await scraperPrime(
|
||||
'https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL',
|
||||
{
|
||||
headless: false, // Disable browser fallback
|
||||
timeoutMs: 30000, // 30 second timeout
|
||||
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
|
||||
}
|
||||
);
|
||||
```
|
||||
|
||||
**Error Handling**
|
||||
```javascript
|
||||
import { scraperPrime } from 'metascraper';
|
||||
|
||||
try {
|
||||
const result = await scraperPrime('https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL');
|
||||
console.log('Success:', result);
|
||||
} catch (error) {
|
||||
console.error('Scraping failed:', error.message);
|
||||
// Turkish error messages for Turkish users
|
||||
// "Amazon Prime scraping başarısız: Amazon Prime URL'i gereklidir."
|
||||
}
|
||||
```
|
||||
|
||||
## 🧩 Internal APIs
|
||||
|
||||
### `parseNetflixHtml(html)` - Parser API
|
||||
@@ -97,6 +189,9 @@ Parse Netflix HTML content to extract metadata without network requests.
|
||||
name?: string; // Clean title
|
||||
year?: string \| number; // Release year
|
||||
seasons?: string \| null; // Season information
|
||||
thumbnail?: string \| null; // Thumbnail image URL
|
||||
info?: string \| null; // Content description
|
||||
genre?: string \| null; // Genre information
|
||||
}
|
||||
```
|
||||
|
||||
@@ -114,7 +209,10 @@ console.log(metadata);
|
||||
// {
|
||||
// "name": "The Witcher",
|
||||
// "year": "2025",
|
||||
// "seasons": "4 Sezon"
|
||||
// "seasons": "4 Sezon",
|
||||
// "thumbnail": "https://occ-0-7335-778.1.nflxso.net/dnm/api/v6/6AYY37jfdO6hpXcMjf9Yu5cnmO0/AAAABSkrIGPSyEfSWYQzc8rEFo6EtVV6Ls8WtPpNwR42MSKSNPNomZWV5P_l2MxGuJEkoPm71UT_eBK_SsTEH8pRslQr0sjpdhVHjxh4.jpg",
|
||||
// "info": "Mutasyona uğramış bir canavar avcısı olan Rivyalı Geralt, insanların çoğunlukla yaratıklardan daha uğursuz olduğu, karmaşa içindeki bir dünyada kaderine doğru yol alıyor.",
|
||||
// "genre": "Aksiyon"
|
||||
// }
|
||||
```
|
||||
|
||||
@@ -165,6 +263,50 @@ try {
|
||||
}
|
||||
```
|
||||
|
||||
### `parsePrimeHtml(html)` - Prime Video Parser API
|
||||
|
||||
Parse Amazon Prime Video HTML content to extract metadata without network requests.
|
||||
|
||||
#### Parameters
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
|-----------|------|----------|-------------|
|
||||
| `html` | `string` | ✅ | Raw HTML content from Prime Video page |
|
||||
|
||||
#### Returns
|
||||
|
||||
```typescript
|
||||
{
|
||||
name?: string; // Clean title
|
||||
year?: string | number; // Release year
|
||||
seasons?: string | null; // Season information
|
||||
thumbnail?: string | null; // Thumbnail image URL
|
||||
info?: string | null; // Content description
|
||||
genre?: string | null; // Genre information
|
||||
}
|
||||
```
|
||||
|
||||
#### Examples
|
||||
|
||||
```javascript
|
||||
import { parsePrimeHtml } from 'metascraper/parser';
|
||||
|
||||
// With cached HTML
|
||||
const fs = await import('node:fs');
|
||||
const html = fs.readFileSync('prime-page.html', 'utf8');
|
||||
const metadata = parsePrimeHtml(html);
|
||||
|
||||
console.log(metadata);
|
||||
// {
|
||||
// "name": "Little Women",
|
||||
// "year": "2020",
|
||||
// "seasons": null,
|
||||
// "thumbnail": "https://m.media-amazon.com/images/S/pv-target-images/...",
|
||||
// "info": "In the years after the Civil War, Jo March lives in New York...",
|
||||
// "genre": "Dram"
|
||||
// }
|
||||
```
|
||||
|
||||
## 🔧 URL Processing
|
||||
|
||||
### Supported URL Formats
|
||||
@@ -199,6 +341,36 @@ The function validates URLs with these rules:
|
||||
'https://www.netflix.com/title/abc' // Non-numeric ID
|
||||
```
|
||||
|
||||
### Amazon Prime Video URL Formats
|
||||
|
||||
The `scraperPrime` function automatically normalizes various Prime Video URL formats:
|
||||
|
||||
| Input Format | Normalized Output | Notes |
|
||||
|--------------|-------------------|-------|
|
||||
| `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL` | `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL` | Standard format |
|
||||
| `https://www.primevideo.com/-/tr/detail/0NHIN3TGAI9L7VZ45RS52RHUPL/ref=share_ios_movie` | `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL` | Turkish locale with tracking |
|
||||
| `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL?ref_=atv_dp` | `https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL` | With parameters |
|
||||
|
||||
### Prime Video URL Validation
|
||||
|
||||
The function validates URLs with these rules:
|
||||
|
||||
1. **Format**: Must be a valid URL
|
||||
2. **Domain**: Must contain `primevideo.com`
|
||||
3. **Path**: Must contain `detail/` followed by content ID
|
||||
4. **ID Extraction**: Uses path parsing to extract content ID
|
||||
|
||||
```javascript
|
||||
// These will work:
|
||||
'https://www.primevideo.com/detail/0NHIN3TGAI9L7VZ45RS52RHUPL'
|
||||
'https://www.primevideo.com/-/tr/detail/0NHIN3TGAI9L7VZ45RS52RHUPL/ref=share_ios_movie'
|
||||
|
||||
// These will fail:
|
||||
'https://google.com' // Wrong domain
|
||||
'https://www.primevideo.com/browse' // No content ID
|
||||
'not-a-url' // Invalid format
|
||||
```
|
||||
|
||||
## 🌍 Localization Features
|
||||
|
||||
### Turkish UI Text Removal
|
||||
@@ -263,6 +435,37 @@ If JSON-LD is unavailable, falls back to HTML meta tags:
|
||||
<title>The Witcher izlemenizi bekliyor | Netflix</title>
|
||||
```
|
||||
|
||||
### Thumbnail Image Extraction
|
||||
|
||||
The parser automatically extracts poster/thumbnail images from Netflix meta tags:
|
||||
|
||||
```javascript
|
||||
// Thumbnail selectors in priority order:
|
||||
const THUMBNAIL_SELECTORS = [
|
||||
'meta[property="og:image"]', // Open Graph image (primary)
|
||||
'meta[name="twitter:image"]', // Twitter card image
|
||||
'meta[property="og:image:secure_url"]', // Secure image URL
|
||||
'link[rel="image_src"]', // Image source link
|
||||
'meta[itemprop="image"]' // Schema.org image
|
||||
];
|
||||
```
|
||||
|
||||
**Example Netflix HTML:**
|
||||
```html
|
||||
<meta property="og:image" content="https://occ-0-7335-778.1.nflxso.net/dnm/api/v6/6AYY37jfdO6hpXcMjf9Yu5cnmO0/AAAABSkrIGPSyEfSWYQzc8rEFo6EtVV6Ls8WtPpNwR42MSKSNPNomZWV5P_l2MxGuJEkoPm71UT_eBK_SsTEH8pRslQr0sjpdhVHjxh4.jpg">
|
||||
```
|
||||
|
||||
**URL Validation:**
|
||||
- Only Netflix CDN domains are accepted (nflxso.net, nflximg.net, etc.)
|
||||
- Image file extensions are verified (.jpg, .jpeg, .png, .webp)
|
||||
- Query parameters are cleaned for stability
|
||||
|
||||
**Fallback Strategy:**
|
||||
1. Try Open Graph image first (most reliable)
|
||||
2. Fall back to Twitter card image
|
||||
3. Try other meta tags if needed
|
||||
4. Return null if no valid thumbnail found
|
||||
|
||||
### Season Detection
|
||||
|
||||
For TV series, extracts season information:
|
||||
|
||||
Reference in New Issue
Block a user