# MetaScraper Testing Guide ## πŸ§ͺ Testing Philosophy MetaScraper follows a comprehensive testing strategy that ensures reliability, performance, and maintainability: - **Integration First**: Focus on end-to-end functionality - **Live Data Testing**: Test against real Netflix pages - **Performance Awareness**: Monitor response times and resource usage - **Error Coverage**: Test failure scenarios and edge cases - **Localization Testing**: Verify Turkish UI text removal ## πŸ“‹ Test Structure ### Test Categories ``` tests/ β”œβ”€β”€ scrape.test.js # Main integration tests β”œβ”€β”€ unit/ # Unit tests (future) β”‚ β”œβ”€β”€ parser.test.js # Parser function tests β”‚ β”œβ”€β”€ url-normalizer.test.js # URL normalization tests β”‚ └── title-cleaner.test.js # Title cleaning tests β”œβ”€β”€ integration/ # Integration tests (current) β”‚ β”œβ”€β”€ live-scraping.test.js # Real Netflix URL tests β”‚ └── headless-fallback.test.js # Browser fallback tests β”œβ”€β”€ performance/ # Performance benchmarks (future) β”‚ β”œβ”€β”€ response-times.test.js # Timing tests β”‚ └── concurrent.test.js # Multiple request tests β”œβ”€β”€ fixtures/ # Test data β”‚ β”œβ”€β”€ sample-title.html # Sample Netflix HTML β”‚ β”œβ”€β”€ turkish-ui.json # Turkish UI patterns β”‚ └── test-urls.json # Test URL collection └── helpers/ # Test utilities (future) β”œβ”€β”€ mock-data.js # Mock HTML generators └── test-utils.js # Common test helpers ``` ## πŸ—οΈ Current Test Implementation ### Main Test Suite: `tests/scrape.test.js` ```javascript import { beforeAll, describe, expect, it } from 'vitest'; import { scraperNetflix } from '../src/index.js'; import { parseNetflixHtml } from '../src/parser.js'; const TEST_URL = 'https://www.netflix.com/title/80189685'; // The Witcher const UA = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'; let liveHtml = ''; beforeAll(async () => { // Fetch real Netflix page for testing const res = await fetch(TEST_URL, { headers: { 'User-Agent': UA, Accept: 'text/html,application/xhtml+xml' } }); if (!res.ok) { throw new Error(`Live fetch başarΔ±sΔ±z: ${res.status}`); } liveHtml = await res.text(); }, 20000); // 20 second timeout for network requests ``` ### Test Coverage Areas #### 1. HTML Parsing Tests ```javascript describe('parseNetflixHtml (canlΔ± sayfa)', () => { it( 'static HTML\'den en az isim ve yΔ±l bilgisini okur', () => { const meta = parseNetflixHtml(liveHtml); expect(meta.name).toBeTruthy(); expect(String(meta.name).toLowerCase()).toContain('witcher'); expect(meta.year).toMatch(/\d{4}/); }, 20000 ); }); ``` #### 2. End-to-End Scraping Tests ```javascript describe('scraperNetflix (canlΔ± istek)', () => { it( 'normalize edilmiş url, id ve meta bilgilerini dΓΆner', async () => { const meta = await scraperNetflix(TEST_URL, { headless: false, userAgent: UA }); expect(meta.url).toBe('https://www.netflix.com/title/80189685'); expect(meta.id).toBe('80189685'); expect(meta.name).toBeTruthy(); expect(String(meta.name).toLowerCase()).toContain('witcher'); expect(meta.year).toMatch(/\d{4}/); }, 20000 ); }); ``` ## πŸ§ͺ Running Tests ### Basic Test Commands ```bash # Run all tests npm test # Run tests in watch mode npm test -- --watch # Run tests once npm test -- --run # Run tests with coverage npm test -- --coverage # Run specific test file npm test scrape.test.js # Run tests matching pattern npm test -- --grep "Turkish" ``` ### Test Configuration ```javascript // vitest.config.js (if needed) import { defineConfig } from 'vitest/config'; export default defineConfig({ test: { timeout: 30000, // 30 second timeout for network tests hookTimeout: 30000, // Timeout for beforeAll hooks environment: 'node', // Node.js environment globals: true, // Use global test functions coverage: { reporter: ['text', 'json'], exclude: [ 'node_modules/', 'tests/', 'doc/' ] } } }); ``` ## πŸ“Š Test Data Management ### Live Test URLs ```javascript // tests/fixtures/test-urls.json [ { "name": "The Witcher (TV Series)", "url": "https://www.netflix.com/title/80189685", "expected": { "type": "series", "hasSeasons": true, "titleContains": "witcher" } }, { "name": "ONE SHOT (Movie)", "url": "https://www.netflix.com/title/82123114", "expected": { "type": "movie", "hasSeasons": false, "titleContains": "one shot" } } ] ``` ### Sample HTML Fixtures ```html The Witcher izlemenizi bekliyor | Netflix ``` ### Turkish UI Pattern Tests ```javascript // tests/fixtures/turkish-ui-patterns.json { "title_cleaning_tests": [ { "input": "The Witcher izlemenizi bekliyor | Netflix", "expected": "The Witcher", "removed": "izlemenizi bekliyor | Netflix" }, { "input": "Stranger Things izleyin", "expected": "Stranger Things", "removed": "izleyin" }, { "input": "Sezon 4 devam et", "expected": "Sezon 4", "removed": "devam et" } ] } ``` ## πŸ”§ Test Utilities ### Custom Test Helpers ```javascript // tests/helpers/test-utils.js import fs from 'node:fs'; import path from 'node:path'; import { fileURLToPath } from 'node:url'; const __filename = fileURLToPath(import.meta.url); const __dirname = path.dirname(__filename); export function loadFixture(filename) { const fixturePath = path.join(__dirname, '../fixtures', filename); return fs.readFileSync(fixturePath, 'utf8'); } export function loadJSONFixture(filename) { const content = loadFixture(filename); return JSON.parse(content); } export async function withTimeout(promise, timeoutMs = 5000) { const timeout = new Promise((_, reject) => { setTimeout(() => reject(new Error(`Test timeout after ${timeoutMs}ms`)), timeoutMs); }); return Promise.race([promise, timeout]); } export function expectTurkishTitleClean(input, expected) { const result = cleanTitle(input); expect(result).toBe(expected); } ``` ### Mock Browser Automation ```javascript // tests/helpers/mock-playwright.js import { vi } from 'vitest'; export function mockPlaywrightSuccess(html) { vi.doMock('playwright', () => ({ chromium: { launch: vi.fn(() => ({ newContext: vi.fn(() => ({ newPage: vi.fn(() => ({ goto: vi.fn().mockResolvedValue(undefined), content: vi.fn().mockResolvedValue(html), waitForLoadState: vi.fn().mockResolvedValue(undefined) })) })), close: vi.fn().mockResolvedValue(undefined) })) } })); } export function mockPlaywrightFailure() { vi.doMock('playwright', () => { throw new Error('Playwright not available'); }); } ``` ## 🎯 Test Scenarios ### 1. URL Normalization Tests ```javascript describe('URL Normalization', () => { const testCases = [ { input: 'https://www.netflix.com/tr/title/80189685?s=i&vlang=tr', expected: 'https://www.netflix.com/title/80189685', description: 'Turkish URL with parameters' }, { input: 'https://www.netflix.com/title/80189685?trackId=12345', expected: 'https://www.netflix.com/title/80189685', description: 'URL with tracking parameters' } ]; testCases.forEach(({ input, expected, description }) => { it(description, () => { const result = normalizeNetflixUrl(input); expect(result).toBe(expected); }); }); }); ``` ### 2. Turkish UI Text Removal Tests ```javascript describe('Turkish UI Text Cleaning', () => { const turkishCases = [ { input: 'The Witcher izlemenizi bekliyor', expected: 'The Witcher', pattern: 'waiting for you to watch' }, { input: 'Dark izleyin', expected: 'Dark', pattern: 'watch' }, { input: 'Money Heist devam et', expected: 'Money Heist', pattern: 'continue' } ]; turkishCases.forEach(({ input, expected, pattern }) => { it(`removes Turkish UI text: ${pattern}`, () => { expect(cleanTitle(input)).toBe(expected); }); }); }); ``` ### 3. JSON-LD Parsing Tests ```javascript describe('JSON-LD Metadata Extraction', () => { it('extracts movie metadata correctly', () => { const jsonLd = { '@type': 'Movie', 'name': 'Inception', 'datePublished': '2010', 'copyrightYear': 2010 }; const result = parseJsonLdObject(jsonLd); expect(result.name).toBe('Inception'); expect(result.year).toBe(2010); expect(result.seasons).toBeUndefined(); }); it('extracts TV series metadata with seasons', () => { const jsonLd = { '@type': 'TVSeries', 'name': 'Stranger Things', 'numberOfSeasons': 4, 'datePublished': '2016' }; const result = parseJsonLdObject(jsonLd); expect(result.name).toBe('Stranger Things'); expect(result.seasons).toBe('4 Sezon'); }); }); ``` ### 4. Error Handling Tests ```javascript describe('Error Handling', () => { it('throws error for invalid URL', async () => { await expect(scraperNetflix('invalid-url')).rejects.toThrow('GeΓ§ersiz URL sağlandΔ±'); }); it('throws error for non-Netflix URL', async () => { await expect(scraperNetflix('https://google.com')).rejects.toThrow('URL netflix.com adresini gΓΆstermelidir'); }); it('throws error for URL without title ID', async () => { await expect(scraperNetflix('https://www.netflix.com/browse')).rejects.toThrow('URL\'de Netflix başlΔ±k ID\'si bulunamadΔ±'); }); it('handles network timeouts gracefully', async () => { await expect(scraperNetflix(TEST_URL, { timeoutMs: 1 })).rejects.toThrow('Request timed out'); }); }); ``` ### 5. Performance Tests ```javascript describe('Performance', () => { it('completes static scraping within 1 second', async () => { const start = performance.now(); await scraperNetflix(TEST_URL, { headless: false }); const duration = performance.now() - start; expect(duration).toBeLessThan(1000); }, 10000); it('handles concurrent requests efficiently', async () => { const urls = Array(5).fill(TEST_URL); const start = performance.now(); const results = await Promise.allSettled( urls.map(url => scraperNetflix(url, { headless: false })) ); const duration = performance.now() - start; const successful = results.filter(r => r.status === 'fulfilled').length; expect(duration).toBeLessThan(3000); // Should be faster than sequential expect(successful).toBeGreaterThan(0); // At least some should succeed }, 30000); }); ``` ## πŸ” Test Debugging ### 1. Visual HTML Inspection ```javascript // Save HTML for manual debugging it('captures HTML for debugging', async () => { const html = await fetchStaticHtml(TEST_URL); fs.writeFileSync('debug-netflix-page.html', html); console.log('HTML saved to debug-netflix-page.html'); expect(html).toContain(' { const originalFetch = global.fetch; global.fetch = async (url, options) => { console.log('🌐 Request URL:', url); console.log('πŸ“‹ Headers:', options.headers); console.log('⏰ Time:', new Date().toISOString()); const response = await originalFetch(url, options); console.log('πŸ“Š Response status:', response.status); console.log('πŸ“ Response size:', response.headers.get('content-length')); return response; }; const result = await scraperNetflix(TEST_URL, { headless: false }); // Restore original fetch global.fetch = originalFetch; expect(result.name).toBeTruthy(); }); ``` ### 3. Step-by-Step Processing ```javascript // Debug each step of the process it('logs processing steps', async () => { console.log('πŸš€ Starting Netflix scraping test'); // Step 1: URL normalization const normalized = normalizeNetflixUrl(TEST_URL); console.log('πŸ”— Normalized URL:', normalized); // Step 2: HTML fetch const html = await fetchStaticHtml(normalized); console.log('πŸ“„ HTML length:', html.length); // Step 3: Parsing const parsed = parseNetflixHtml(html); console.log('πŸ“Š Parsed metadata:', parsed); // Step 4: Full process const fullResult = await scraperNetflix(TEST_URL); console.log('βœ… Full result:', fullResult); expect(fullResult.name).toBeTruthy(); }); ``` ## πŸ“ˆ Continuous Testing ### GitHub Actions Workflow ```yaml # .github/workflows/test.yml name: Test Suite on: push: branches: [ main, develop ] pull_request: branches: [ main ] jobs: test: runs-on: ubuntu-latest strategy: matrix: node-version: [18.x, 20.x, 22.x] steps: - uses: actions/checkout@v3 - name: Use Node.js ${{ matrix.node-version }} uses: actions/setup-node@v3 with: node-version: ${{ matrix.node-version }} cache: 'npm' - name: Install dependencies run: npm ci - name: Install Playwright run: npx playwright install chromium - name: Run tests run: npm test -- --coverage - name: Upload coverage to Codecov uses: codecov/codecov-action@v3 with: file: ./coverage/lcov.info ``` ### Pre-commit Hooks ```json // package.json { "husky": { "hooks": { "pre-commit": "npm test && npm run lint" } } } ``` ## 🚨 Test Environment Considerations ### Network Dependencies - **Live Tests**: Require internet connection to Netflix - **Timeouts**: Extended timeouts for network requests (30s+) - **Rate Limiting**: Be respectful to Netflix's servers - **Geographic**: Tests may behave differently by region ### Browser Dependencies - **Playwright**: Optional dependency for headless tests - **Browser Installation**: Requires `npx playwright install` - **Memory**: Browser tests use more memory - **CI/CD**: Need to install browsers in CI environment ### Test Data Updates - **Netflix Changes**: UI changes may break tests - **Pattern Updates**: Turkish UI patterns may change - **JSON-LD Structure**: Netflix may modify structured data - **URL Formats**: New URL patterns may emerge ## πŸ“Š Test Metrics ### Success Criteria - **Unit Tests**: 90%+ code coverage - **Integration Tests**: 100% API coverage - **Performance**: <1s response time for static mode - **Reliability**: 95%+ success rate for known URLs ### Test Monitoring ```javascript // Performance tracking const testMetrics = { staticScrapingTimes: [], headlessScrapingTimes: [], successRates: {}, errorCounts: {} }; function recordMetric(type, value) { if (Array.isArray(testMetrics[type])) { testMetrics[type].push(value); } else { testMetrics[type][value] = (testMetrics[type][value] || 0) + 1; } } ``` --- *Testing guide last updated: 2025-11-23*