Files
metascraper/doc/TESTING.md
2025-11-23 14:25:09 +03:00

627 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# MetaScraper Testing Guide
## 🧪 Testing Philosophy
MetaScraper follows a comprehensive testing strategy that ensures reliability, performance, and maintainability:
- **Integration First**: Focus on end-to-end functionality
- **Live Data Testing**: Test against real Netflix pages
- **Performance Awareness**: Monitor response times and resource usage
- **Error Coverage**: Test failure scenarios and edge cases
- **Localization Testing**: Verify Turkish UI text removal
## 📋 Test Structure
### Test Categories
```
tests/
├── scrape.test.js # Main integration tests
├── unit/ # Unit tests (future)
│ ├── parser.test.js # Parser function tests
│ ├── url-normalizer.test.js # URL normalization tests
│ └── title-cleaner.test.js # Title cleaning tests
├── integration/ # Integration tests (current)
│ ├── live-scraping.test.js # Real Netflix URL tests
│ └── headless-fallback.test.js # Browser fallback tests
├── performance/ # Performance benchmarks (future)
│ ├── response-times.test.js # Timing tests
│ └── concurrent.test.js # Multiple request tests
├── fixtures/ # Test data
│ ├── sample-title.html # Sample Netflix HTML
│ ├── turkish-ui.json # Turkish UI patterns
│ └── test-urls.json # Test URL collection
└── helpers/ # Test utilities (future)
├── mock-data.js # Mock HTML generators
└── test-utils.js # Common test helpers
```
## 🏗️ Current Test Implementation
### Main Test Suite: `tests/scrape.test.js`
```javascript
import { beforeAll, describe, expect, it } from 'vitest';
import { scraperNetflix } from '../src/index.js';
import { parseNetflixHtml } from '../src/parser.js';
const TEST_URL = 'https://www.netflix.com/title/80189685'; // The Witcher
const UA = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36';
let liveHtml = '';
beforeAll(async () => {
// Fetch real Netflix page for testing
const res = await fetch(TEST_URL, {
headers: {
'User-Agent': UA,
Accept: 'text/html,application/xhtml+xml'
}
});
if (!res.ok) {
throw new Error(`Live fetch başarısız: ${res.status}`);
}
liveHtml = await res.text();
}, 20000); // 20 second timeout for network requests
```
### Test Coverage Areas
#### 1. HTML Parsing Tests
```javascript
describe('parseNetflixHtml (canlı sayfa)', () => {
it(
'static HTML\'den en az isim ve yıl bilgisini okur',
() => {
const meta = parseNetflixHtml(liveHtml);
expect(meta.name).toBeTruthy();
expect(String(meta.name).toLowerCase()).toContain('witcher');
expect(meta.year).toMatch(/\d{4}/);
},
20000
);
});
```
#### 2. End-to-End Scraping Tests
```javascript
describe('scraperNetflix (canlı istek)', () => {
it(
'normalize edilmiş url, id ve meta bilgilerini döner',
async () => {
const meta = await scraperNetflix(TEST_URL, { headless: false, userAgent: UA });
expect(meta.url).toBe('https://www.netflix.com/title/80189685');
expect(meta.id).toBe('80189685');
expect(meta.name).toBeTruthy();
expect(String(meta.name).toLowerCase()).toContain('witcher');
expect(meta.year).toMatch(/\d{4}/);
},
20000
);
});
```
## 🧪 Running Tests
### Basic Test Commands
```bash
# Run all tests
npm test
# Run tests in watch mode
npm test -- --watch
# Run tests once
npm test -- --run
# Run tests with coverage
npm test -- --coverage
# Run specific test file
npm test scrape.test.js
# Run tests matching pattern
npm test -- --grep "Turkish"
```
### Test Configuration
```javascript
// vitest.config.js (if needed)
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
timeout: 30000, // 30 second timeout for network tests
hookTimeout: 30000, // Timeout for beforeAll hooks
environment: 'node', // Node.js environment
globals: true, // Use global test functions
coverage: {
reporter: ['text', 'json'],
exclude: [
'node_modules/',
'tests/',
'doc/'
]
}
}
});
```
## 📊 Test Data Management
### Live Test URLs
```javascript
// tests/fixtures/test-urls.json
[
{
"name": "The Witcher (TV Series)",
"url": "https://www.netflix.com/title/80189685",
"expected": {
"type": "series",
"hasSeasons": true,
"titleContains": "witcher"
}
},
{
"name": "ONE SHOT (Movie)",
"url": "https://www.netflix.com/title/82123114",
"expected": {
"type": "movie",
"hasSeasons": false,
"titleContains": "one shot"
}
}
]
```
### Sample HTML Fixtures
```html
<!-- tests/fixtures/sample-title.html -->
<!DOCTYPE html>
<html>
<head>
<meta property="og:title" content="The Witcher izlemenizi bekliyor | Netflix">
<meta name="title" content="The Witcher | Netflix">
<title>The Witcher izlemenizi bekliyor | Netflix</title>
<script type="application/ld+json">
{
"@type": "TVSeries",
"name": "The Witcher izlemenizi bekliyor",
"numberOfSeasons": 4,
"datePublished": "2025"
}
</script>
</head>
<body>
<!-- Netflix page content -->
</body>
</html>
```
### Turkish UI Pattern Tests
```javascript
// tests/fixtures/turkish-ui-patterns.json
{
"title_cleaning_tests": [
{
"input": "The Witcher izlemenizi bekliyor | Netflix",
"expected": "The Witcher",
"removed": "izlemenizi bekliyor | Netflix"
},
{
"input": "Stranger Things izleyin",
"expected": "Stranger Things",
"removed": "izleyin"
},
{
"input": "Sezon 4 devam et",
"expected": "Sezon 4",
"removed": "devam et"
}
]
}
```
## 🔧 Test Utilities
### Custom Test Helpers
```javascript
// tests/helpers/test-utils.js
import fs from 'node:fs';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
export function loadFixture(filename) {
const fixturePath = path.join(__dirname, '../fixtures', filename);
return fs.readFileSync(fixturePath, 'utf8');
}
export function loadJSONFixture(filename) {
const content = loadFixture(filename);
return JSON.parse(content);
}
export async function withTimeout(promise, timeoutMs = 5000) {
const timeout = new Promise((_, reject) => {
setTimeout(() => reject(new Error(`Test timeout after ${timeoutMs}ms`)), timeoutMs);
});
return Promise.race([promise, timeout]);
}
export function expectTurkishTitleClean(input, expected) {
const result = cleanTitle(input);
expect(result).toBe(expected);
}
```
### Mock Browser Automation
```javascript
// tests/helpers/mock-playwright.js
import { vi } from 'vitest';
export function mockPlaywrightSuccess(html) {
vi.doMock('playwright', () => ({
chromium: {
launch: vi.fn(() => ({
newContext: vi.fn(() => ({
newPage: vi.fn(() => ({
goto: vi.fn().mockResolvedValue(undefined),
content: vi.fn().mockResolvedValue(html),
waitForLoadState: vi.fn().mockResolvedValue(undefined)
}))
})),
close: vi.fn().mockResolvedValue(undefined)
}))
}
}));
}
export function mockPlaywrightFailure() {
vi.doMock('playwright', () => {
throw new Error('Playwright not available');
});
}
```
## 🎯 Test Scenarios
### 1. URL Normalization Tests
```javascript
describe('URL Normalization', () => {
const testCases = [
{
input: 'https://www.netflix.com/tr/title/80189685?s=i&vlang=tr',
expected: 'https://www.netflix.com/title/80189685',
description: 'Turkish URL with parameters'
},
{
input: 'https://www.netflix.com/title/80189685?trackId=12345',
expected: 'https://www.netflix.com/title/80189685',
description: 'URL with tracking parameters'
}
];
testCases.forEach(({ input, expected, description }) => {
it(description, () => {
const result = normalizeNetflixUrl(input);
expect(result).toBe(expected);
});
});
});
```
### 2. Turkish UI Text Removal Tests
```javascript
describe('Turkish UI Text Cleaning', () => {
const turkishCases = [
{
input: 'The Witcher izlemenizi bekliyor',
expected: 'The Witcher',
pattern: 'waiting for you to watch'
},
{
input: 'Dark izleyin',
expected: 'Dark',
pattern: 'watch'
},
{
input: 'Money Heist devam et',
expected: 'Money Heist',
pattern: 'continue'
}
];
turkishCases.forEach(({ input, expected, pattern }) => {
it(`removes Turkish UI text: ${pattern}`, () => {
expect(cleanTitle(input)).toBe(expected);
});
});
});
```
### 3. JSON-LD Parsing Tests
```javascript
describe('JSON-LD Metadata Extraction', () => {
it('extracts movie metadata correctly', () => {
const jsonLd = {
'@type': 'Movie',
'name': 'Inception',
'datePublished': '2010',
'copyrightYear': 2010
};
const result = parseJsonLdObject(jsonLd);
expect(result.name).toBe('Inception');
expect(result.year).toBe(2010);
expect(result.seasons).toBeUndefined();
});
it('extracts TV series metadata with seasons', () => {
const jsonLd = {
'@type': 'TVSeries',
'name': 'Stranger Things',
'numberOfSeasons': 4,
'datePublished': '2016'
};
const result = parseJsonLdObject(jsonLd);
expect(result.name).toBe('Stranger Things');
expect(result.seasons).toBe('4 Sezon');
});
});
```
### 4. Error Handling Tests
```javascript
describe('Error Handling', () => {
it('throws error for invalid URL', async () => {
await expect(scraperNetflix('invalid-url')).rejects.toThrow('Geçersiz URL sağlandı');
});
it('throws error for non-Netflix URL', async () => {
await expect(scraperNetflix('https://google.com')).rejects.toThrow('URL netflix.com adresini göstermelidir');
});
it('throws error for URL without title ID', async () => {
await expect(scraperNetflix('https://www.netflix.com/browse')).rejects.toThrow('URL\'de Netflix başlık ID\'si bulunamadı');
});
it('handles network timeouts gracefully', async () => {
await expect(scraperNetflix(TEST_URL, { timeoutMs: 1 })).rejects.toThrow('Request timed out');
});
});
```
### 5. Performance Tests
```javascript
describe('Performance', () => {
it('completes static scraping within 1 second', async () => {
const start = performance.now();
await scraperNetflix(TEST_URL, { headless: false });
const duration = performance.now() - start;
expect(duration).toBeLessThan(1000);
}, 10000);
it('handles concurrent requests efficiently', async () => {
const urls = Array(5).fill(TEST_URL);
const start = performance.now();
const results = await Promise.allSettled(
urls.map(url => scraperNetflix(url, { headless: false }))
);
const duration = performance.now() - start;
const successful = results.filter(r => r.status === 'fulfilled').length;
expect(duration).toBeLessThan(3000); // Should be faster than sequential
expect(successful).toBeGreaterThan(0); // At least some should succeed
}, 30000);
});
```
## 🔍 Test Debugging
### 1. Visual HTML Inspection
```javascript
// Save HTML for manual debugging
it('captures HTML for debugging', async () => {
const html = await fetchStaticHtml(TEST_URL);
fs.writeFileSync('debug-netflix-page.html', html);
console.log('HTML saved to debug-netflix-page.html');
expect(html).toContain('<html');
expect(html).toContain('netflix');
});
```
### 2. Network Request Debugging
```javascript
// Debug network requests
it('logs network request details', async () => {
const originalFetch = global.fetch;
global.fetch = async (url, options) => {
console.log('🌐 Request URL:', url);
console.log('📋 Headers:', options.headers);
console.log('⏰ Time:', new Date().toISOString());
const response = await originalFetch(url, options);
console.log('📊 Response status:', response.status);
console.log('📏 Response size:', response.headers.get('content-length'));
return response;
};
const result = await scraperNetflix(TEST_URL, { headless: false });
// Restore original fetch
global.fetch = originalFetch;
expect(result.name).toBeTruthy();
});
```
### 3. Step-by-Step Processing
```javascript
// Debug each step of the process
it('logs processing steps', async () => {
console.log('🚀 Starting Netflix scraping test');
// Step 1: URL normalization
const normalized = normalizeNetflixUrl(TEST_URL);
console.log('🔗 Normalized URL:', normalized);
// Step 2: HTML fetch
const html = await fetchStaticHtml(normalized);
console.log('📄 HTML length:', html.length);
// Step 3: Parsing
const parsed = parseNetflixHtml(html);
console.log('📊 Parsed metadata:', parsed);
// Step 4: Full process
const fullResult = await scraperNetflix(TEST_URL);
console.log('✅ Full result:', fullResult);
expect(fullResult.name).toBeTruthy();
});
```
## 📈 Continuous Testing
### GitHub Actions Workflow
```yaml
# .github/workflows/test.yml
name: Test Suite
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18.x, 20.x, 22.x]
steps:
- uses: actions/checkout@v3
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Playwright
run: npx playwright install chromium
- name: Run tests
run: npm test -- --coverage
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
file: ./coverage/lcov.info
```
### Pre-commit Hooks
```json
// package.json
{
"husky": {
"hooks": {
"pre-commit": "npm test && npm run lint"
}
}
}
```
## 🚨 Test Environment Considerations
### Network Dependencies
- **Live Tests**: Require internet connection to Netflix
- **Timeouts**: Extended timeouts for network requests (30s+)
- **Rate Limiting**: Be respectful to Netflix's servers
- **Geographic**: Tests may behave differently by region
### Browser Dependencies
- **Playwright**: Optional dependency for headless tests
- **Browser Installation**: Requires `npx playwright install`
- **Memory**: Browser tests use more memory
- **CI/CD**: Need to install browsers in CI environment
### Test Data Updates
- **Netflix Changes**: UI changes may break tests
- **Pattern Updates**: Turkish UI patterns may change
- **JSON-LD Structure**: Netflix may modify structured data
- **URL Formats**: New URL patterns may emerge
## 📊 Test Metrics
### Success Criteria
- **Unit Tests**: 90%+ code coverage
- **Integration Tests**: 100% API coverage
- **Performance**: <1s response time for static mode
- **Reliability**: 95%+ success rate for known URLs
### Test Monitoring
```javascript
// Performance tracking
const testMetrics = {
staticScrapingTimes: [],
headlessScrapingTimes: [],
successRates: {},
errorCounts: {}
};
function recordMetric(type, value) {
if (Array.isArray(testMetrics[type])) {
testMetrics[type].push(value);
} else {
testMetrics[type][value] = (testMetrics[type][value] || 0) + 1;
}
}
```
---
*Testing guide last updated: 2025-11-23*