# MetaScraper Testing Guide
## π§ͺ Testing Philosophy
MetaScraper follows a comprehensive testing strategy that ensures reliability, performance, and maintainability:
- **Integration First**: Focus on end-to-end functionality
- **Live Data Testing**: Test against real Netflix pages
- **Performance Awareness**: Monitor response times and resource usage
- **Error Coverage**: Test failure scenarios and edge cases
- **Localization Testing**: Verify Turkish UI text removal
## π Test Structure
### Test Categories
```
tests/
βββ scrape.test.js # Main integration tests
βββ unit/ # Unit tests (future)
β βββ parser.test.js # Parser function tests
β βββ url-normalizer.test.js # URL normalization tests
β βββ title-cleaner.test.js # Title cleaning tests
βββ integration/ # Integration tests (current)
β βββ live-scraping.test.js # Real Netflix URL tests
β βββ headless-fallback.test.js # Browser fallback tests
βββ performance/ # Performance benchmarks (future)
β βββ response-times.test.js # Timing tests
β βββ concurrent.test.js # Multiple request tests
βββ fixtures/ # Test data
β βββ sample-title.html # Sample Netflix HTML
β βββ turkish-ui.json # Turkish UI patterns
β βββ test-urls.json # Test URL collection
βββ helpers/ # Test utilities (future)
βββ mock-data.js # Mock HTML generators
βββ test-utils.js # Common test helpers
```
## ποΈ Current Test Implementation
### Main Test Suite: `tests/scrape.test.js`
```javascript
import { beforeAll, describe, expect, it } from 'vitest';
import { scraperNetflix } from '../src/index.js';
import { parseNetflixHtml } from '../src/parser.js';
const TEST_URL = 'https://www.netflix.com/title/80189685'; // The Witcher
const UA = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36';
let liveHtml = '';
beforeAll(async () => {
// Fetch real Netflix page for testing
const res = await fetch(TEST_URL, {
headers: {
'User-Agent': UA,
Accept: 'text/html,application/xhtml+xml'
}
});
if (!res.ok) {
throw new Error(`Live fetch baΕarΔ±sΔ±z: ${res.status}`);
}
liveHtml = await res.text();
}, 20000); // 20 second timeout for network requests
```
### Test Coverage Areas
#### 1. HTML Parsing Tests
```javascript
describe('parseNetflixHtml (canlΔ± sayfa)', () => {
it(
'static HTML\'den en az isim ve yΔ±l bilgisini okur',
() => {
const meta = parseNetflixHtml(liveHtml);
expect(meta.name).toBeTruthy();
expect(String(meta.name).toLowerCase()).toContain('witcher');
expect(meta.year).toMatch(/\d{4}/);
},
20000
);
});
```
#### 2. End-to-End Scraping Tests
```javascript
describe('scraperNetflix (canlΔ± istek)', () => {
it(
'normalize edilmiΕ url, id ve meta bilgilerini dΓΆner',
async () => {
const meta = await scraperNetflix(TEST_URL, { headless: false, userAgent: UA });
expect(meta.url).toBe('https://www.netflix.com/title/80189685');
expect(meta.id).toBe('80189685');
expect(meta.name).toBeTruthy();
expect(String(meta.name).toLowerCase()).toContain('witcher');
expect(meta.year).toMatch(/\d{4}/);
},
20000
);
});
```
## π§ͺ Running Tests
### Basic Test Commands
```bash
# Run all tests
npm test
# Run tests in watch mode
npm test -- --watch
# Run tests once
npm test -- --run
# Run tests with coverage
npm test -- --coverage
# Run specific test file
npm test scrape.test.js
# Run tests matching pattern
npm test -- --grep "Turkish"
```
### Test Configuration
```javascript
// vitest.config.js (if needed)
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
timeout: 30000, // 30 second timeout for network tests
hookTimeout: 30000, // Timeout for beforeAll hooks
environment: 'node', // Node.js environment
globals: true, // Use global test functions
coverage: {
reporter: ['text', 'json'],
exclude: [
'node_modules/',
'tests/',
'doc/'
]
}
}
});
```
## π Test Data Management
### Live Test URLs
```javascript
// tests/fixtures/test-urls.json
[
{
"name": "The Witcher (TV Series)",
"url": "https://www.netflix.com/title/80189685",
"expected": {
"type": "series",
"hasSeasons": true,
"titleContains": "witcher"
}
},
{
"name": "ONE SHOT (Movie)",
"url": "https://www.netflix.com/title/82123114",
"expected": {
"type": "movie",
"hasSeasons": false,
"titleContains": "one shot"
}
}
]
```
### Sample HTML Fixtures
```html
The Witcher izlemenizi bekliyor | Netflix
```
### Turkish UI Pattern Tests
```javascript
// tests/fixtures/turkish-ui-patterns.json
{
"title_cleaning_tests": [
{
"input": "The Witcher izlemenizi bekliyor | Netflix",
"expected": "The Witcher",
"removed": "izlemenizi bekliyor | Netflix"
},
{
"input": "Stranger Things izleyin",
"expected": "Stranger Things",
"removed": "izleyin"
},
{
"input": "Sezon 4 devam et",
"expected": "Sezon 4",
"removed": "devam et"
}
]
}
```
## π§ Test Utilities
### Custom Test Helpers
```javascript
// tests/helpers/test-utils.js
import fs from 'node:fs';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
export function loadFixture(filename) {
const fixturePath = path.join(__dirname, '../fixtures', filename);
return fs.readFileSync(fixturePath, 'utf8');
}
export function loadJSONFixture(filename) {
const content = loadFixture(filename);
return JSON.parse(content);
}
export async function withTimeout(promise, timeoutMs = 5000) {
const timeout = new Promise((_, reject) => {
setTimeout(() => reject(new Error(`Test timeout after ${timeoutMs}ms`)), timeoutMs);
});
return Promise.race([promise, timeout]);
}
export function expectTurkishTitleClean(input, expected) {
const result = cleanTitle(input);
expect(result).toBe(expected);
}
```
### Mock Browser Automation
```javascript
// tests/helpers/mock-playwright.js
import { vi } from 'vitest';
export function mockPlaywrightSuccess(html) {
vi.doMock('playwright', () => ({
chromium: {
launch: vi.fn(() => ({
newContext: vi.fn(() => ({
newPage: vi.fn(() => ({
goto: vi.fn().mockResolvedValue(undefined),
content: vi.fn().mockResolvedValue(html),
waitForLoadState: vi.fn().mockResolvedValue(undefined)
}))
})),
close: vi.fn().mockResolvedValue(undefined)
}))
}
}));
}
export function mockPlaywrightFailure() {
vi.doMock('playwright', () => {
throw new Error('Playwright not available');
});
}
```
## π― Test Scenarios
### 1. URL Normalization Tests
```javascript
describe('URL Normalization', () => {
const testCases = [
{
input: 'https://www.netflix.com/tr/title/80189685?s=i&vlang=tr',
expected: 'https://www.netflix.com/title/80189685',
description: 'Turkish URL with parameters'
},
{
input: 'https://www.netflix.com/title/80189685?trackId=12345',
expected: 'https://www.netflix.com/title/80189685',
description: 'URL with tracking parameters'
}
];
testCases.forEach(({ input, expected, description }) => {
it(description, () => {
const result = normalizeNetflixUrl(input);
expect(result).toBe(expected);
});
});
});
```
### 2. Turkish UI Text Removal Tests
```javascript
describe('Turkish UI Text Cleaning', () => {
const turkishCases = [
{
input: 'The Witcher izlemenizi bekliyor',
expected: 'The Witcher',
pattern: 'waiting for you to watch'
},
{
input: 'Dark izleyin',
expected: 'Dark',
pattern: 'watch'
},
{
input: 'Money Heist devam et',
expected: 'Money Heist',
pattern: 'continue'
}
];
turkishCases.forEach(({ input, expected, pattern }) => {
it(`removes Turkish UI text: ${pattern}`, () => {
expect(cleanTitle(input)).toBe(expected);
});
});
});
```
### 3. JSON-LD Parsing Tests
```javascript
describe('JSON-LD Metadata Extraction', () => {
it('extracts movie metadata correctly', () => {
const jsonLd = {
'@type': 'Movie',
'name': 'Inception',
'datePublished': '2010',
'copyrightYear': 2010
};
const result = parseJsonLdObject(jsonLd);
expect(result.name).toBe('Inception');
expect(result.year).toBe(2010);
expect(result.seasons).toBeUndefined();
});
it('extracts TV series metadata with seasons', () => {
const jsonLd = {
'@type': 'TVSeries',
'name': 'Stranger Things',
'numberOfSeasons': 4,
'datePublished': '2016'
};
const result = parseJsonLdObject(jsonLd);
expect(result.name).toBe('Stranger Things');
expect(result.seasons).toBe('4 Sezon');
});
});
```
### 4. Error Handling Tests
```javascript
describe('Error Handling', () => {
it('throws error for invalid URL', async () => {
await expect(scraperNetflix('invalid-url')).rejects.toThrow('GeΓ§ersiz URL saΔlandΔ±');
});
it('throws error for non-Netflix URL', async () => {
await expect(scraperNetflix('https://google.com')).rejects.toThrow('URL netflix.com adresini gΓΆstermelidir');
});
it('throws error for URL without title ID', async () => {
await expect(scraperNetflix('https://www.netflix.com/browse')).rejects.toThrow('URL\'de Netflix baΕlΔ±k ID\'si bulunamadΔ±');
});
it('handles network timeouts gracefully', async () => {
await expect(scraperNetflix(TEST_URL, { timeoutMs: 1 })).rejects.toThrow('Request timed out');
});
});
```
### 5. Performance Tests
```javascript
describe('Performance', () => {
it('completes static scraping within 1 second', async () => {
const start = performance.now();
await scraperNetflix(TEST_URL, { headless: false });
const duration = performance.now() - start;
expect(duration).toBeLessThan(1000);
}, 10000);
it('handles concurrent requests efficiently', async () => {
const urls = Array(5).fill(TEST_URL);
const start = performance.now();
const results = await Promise.allSettled(
urls.map(url => scraperNetflix(url, { headless: false }))
);
const duration = performance.now() - start;
const successful = results.filter(r => r.status === 'fulfilled').length;
expect(duration).toBeLessThan(3000); // Should be faster than sequential
expect(successful).toBeGreaterThan(0); // At least some should succeed
}, 30000);
});
```
## π Test Debugging
### 1. Visual HTML Inspection
```javascript
// Save HTML for manual debugging
it('captures HTML for debugging', async () => {
const html = await fetchStaticHtml(TEST_URL);
fs.writeFileSync('debug-netflix-page.html', html);
console.log('HTML saved to debug-netflix-page.html');
expect(html).toContain(' {
const originalFetch = global.fetch;
global.fetch = async (url, options) => {
console.log('π Request URL:', url);
console.log('π Headers:', options.headers);
console.log('β° Time:', new Date().toISOString());
const response = await originalFetch(url, options);
console.log('π Response status:', response.status);
console.log('π Response size:', response.headers.get('content-length'));
return response;
};
const result = await scraperNetflix(TEST_URL, { headless: false });
// Restore original fetch
global.fetch = originalFetch;
expect(result.name).toBeTruthy();
});
```
### 3. Step-by-Step Processing
```javascript
// Debug each step of the process
it('logs processing steps', async () => {
console.log('π Starting Netflix scraping test');
// Step 1: URL normalization
const normalized = normalizeNetflixUrl(TEST_URL);
console.log('π Normalized URL:', normalized);
// Step 2: HTML fetch
const html = await fetchStaticHtml(normalized);
console.log('π HTML length:', html.length);
// Step 3: Parsing
const parsed = parseNetflixHtml(html);
console.log('π Parsed metadata:', parsed);
// Step 4: Full process
const fullResult = await scraperNetflix(TEST_URL);
console.log('β
Full result:', fullResult);
expect(fullResult.name).toBeTruthy();
});
```
## π Continuous Testing
### GitHub Actions Workflow
```yaml
# .github/workflows/test.yml
name: Test Suite
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18.x, 20.x, 22.x]
steps:
- uses: actions/checkout@v3
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Playwright
run: npx playwright install chromium
- name: Run tests
run: npm test -- --coverage
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
file: ./coverage/lcov.info
```
### Pre-commit Hooks
```json
// package.json
{
"husky": {
"hooks": {
"pre-commit": "npm test && npm run lint"
}
}
}
```
## π¨ Test Environment Considerations
### Network Dependencies
- **Live Tests**: Require internet connection to Netflix
- **Timeouts**: Extended timeouts for network requests (30s+)
- **Rate Limiting**: Be respectful to Netflix's servers
- **Geographic**: Tests may behave differently by region
### Browser Dependencies
- **Playwright**: Optional dependency for headless tests
- **Browser Installation**: Requires `npx playwright install`
- **Memory**: Browser tests use more memory
- **CI/CD**: Need to install browsers in CI environment
### Test Data Updates
- **Netflix Changes**: UI changes may break tests
- **Pattern Updates**: Turkish UI patterns may change
- **JSON-LD Structure**: Netflix may modify structured data
- **URL Formats**: New URL patterns may emerge
## π Test Metrics
### Success Criteria
- **Unit Tests**: 90%+ code coverage
- **Integration Tests**: 100% API coverage
- **Performance**: <1s response time for static mode
- **Reliability**: 95%+ success rate for known URLs
### Test Monitoring
```javascript
// Performance tracking
const testMetrics = {
staticScrapingTimes: [],
headlessScrapingTimes: [],
successRates: {},
errorCounts: {}
};
function recordMetric(type, value) {
if (Array.isArray(testMetrics[type])) {
testMetrics[type].push(value);
} else {
testMetrics[type][value] = (testMetrics[type][value] || 0) + 1;
}
}
```
---
*Testing guide last updated: 2025-11-23*