first commit

2025-11-23 14:25:09 +03:00
commit 46d75b64d5
18 changed files with 4749 additions and 0 deletions
--- a/doc/TESTING.md
+++ b/doc/TESTING.md
@@ -0,0 +1,627 @@
+# MetaScraper Testing Guide
+
+## 🧪 Testing Philosophy
+
+MetaScraper follows a comprehensive testing strategy that ensures reliability, performance, and maintainability:
+
+- **Integration First**: Focus on end-to-end functionality
+- **Live Data Testing**: Test against real Netflix pages
+- **Performance Awareness**: Monitor response times and resource usage
+- **Error Coverage**: Test failure scenarios and edge cases
+- **Localization Testing**: Verify Turkish UI text removal
+
+## 📋 Test Structure
+
+### Test Categories
+
+```
+tests/
+├── scrape.test.js           # Main integration tests
+├── unit/                    # Unit tests (future)
+│   ├── parser.test.js      # Parser function tests
+│   ├── url-normalizer.test.js # URL normalization tests
+│   └── title-cleaner.test.js   # Title cleaning tests
+├── integration/             # Integration tests (current)
+│   ├── live-scraping.test.js # Real Netflix URL tests
+│   └── headless-fallback.test.js # Browser fallback tests
+├── performance/             # Performance benchmarks (future)
+│   ├── response-times.test.js # Timing tests
+│   └── concurrent.test.js   # Multiple request tests
+├── fixtures/                # Test data
+│   ├── sample-title.html   # Sample Netflix HTML
+│   ├── turkish-ui.json     # Turkish UI patterns
+│   └── test-urls.json      # Test URL collection
+└── helpers/                 # Test utilities (future)
+    ├── mock-data.js        # Mock HTML generators
+    └── test-utils.js       # Common test helpers
+```
+
+## 🏗️ Current Test Implementation
+
+### Main Test Suite: `tests/scrape.test.js`
+
+```javascript
+import { beforeAll, describe, expect, it } from 'vitest';
+import { scraperNetflix } from '../src/index.js';
+import { parseNetflixHtml } from '../src/parser.js';
+
+const TEST_URL = 'https://www.netflix.com/title/80189685'; // The Witcher
+const UA = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36';
+
+let liveHtml = '';
+
+beforeAll(async () => {
+  // Fetch real Netflix page for testing
+  const res = await fetch(TEST_URL, {
+    headers: {
+      'User-Agent': UA,
+      Accept: 'text/html,application/xhtml+xml'
+    }
+  });
+
+  if (!res.ok) {
+    throw new Error(`Live fetch başarısız: ${res.status}`);
+  }
+
+  liveHtml = await res.text();
+}, 20000); // 20 second timeout for network requests
+```
+
+### Test Coverage Areas
+
+#### 1. HTML Parsing Tests
+
+```javascript
+describe('parseNetflixHtml (canlı sayfa)', () => {
+  it(
+    'static HTML\'den en az isim ve yıl bilgisini okur',
+    () => {
+      const meta = parseNetflixHtml(liveHtml);
+      expect(meta.name).toBeTruthy();
+      expect(String(meta.name).toLowerCase()).toContain('witcher');
+      expect(meta.year).toMatch(/\d{4}/);
+    },
+    20000
+  );
+});
+```
+
+#### 2. End-to-End Scraping Tests
+
+```javascript
+describe('scraperNetflix (canlı istek)', () => {
+  it(
+    'normalize edilmiş url, id ve meta bilgilerini döner',
+    async () => {
+      const meta = await scraperNetflix(TEST_URL, { headless: false, userAgent: UA });
+      expect(meta.url).toBe('https://www.netflix.com/title/80189685');
+      expect(meta.id).toBe('80189685');
+      expect(meta.name).toBeTruthy();
+      expect(String(meta.name).toLowerCase()).toContain('witcher');
+      expect(meta.year).toMatch(/\d{4}/);
+    },
+    20000
+  );
+});
+```
+
+## 🧪 Running Tests
+
+### Basic Test Commands
+
+```bash
+# Run all tests
+npm test
+
+# Run tests in watch mode
+npm test -- --watch
+
+# Run tests once
+npm test -- --run
+
+# Run tests with coverage
+npm test -- --coverage
+
+# Run specific test file
+npm test scrape.test.js
+
+# Run tests matching pattern
+npm test -- --grep "Turkish"
+```
+
+### Test Configuration
+
+```javascript
+// vitest.config.js (if needed)
+import { defineConfig } from 'vitest/config';
+
+export default defineConfig({
+  test: {
+    timeout: 30000,        // 30 second timeout for network tests
+    hookTimeout: 30000,    // Timeout for beforeAll hooks
+    environment: 'node',   // Node.js environment
+    globals: true,         // Use global test functions
+    coverage: {
+      reporter: ['text', 'json'],
+      exclude: [
+        'node_modules/',
+        'tests/',
+        'doc/'
+      ]
+    }
+  }
+});
+```
+
+## 📊 Test Data Management
+
+### Live Test URLs
+
+```javascript
+// tests/fixtures/test-urls.json
+[
+  {
+    "name": "The Witcher (TV Series)",
+    "url": "https://www.netflix.com/title/80189685",
+    "expected": {
+      "type": "series",
+      "hasSeasons": true,
+      "titleContains": "witcher"
+    }
+  },
+  {
+    "name": "ONE SHOT (Movie)",
+    "url": "https://www.netflix.com/title/82123114",
+    "expected": {
+      "type": "movie",
+      "hasSeasons": false,
+      "titleContains": "one shot"
+    }
+  }
+]
+```
+
+### Sample HTML Fixtures
+
+```html
+<!-- tests/fixtures/sample-title.html -->
+<!DOCTYPE html>
+<html>
+<head>
+  <meta property="og:title" content="The Witcher izlemenizi bekliyor | Netflix">
+  <meta name="title" content="The Witcher | Netflix">
+  <title>The Witcher izlemenizi bekliyor | Netflix</title>
+  <script type="application/ld+json">
+  {
+    "@type": "TVSeries",
+    "name": "The Witcher izlemenizi bekliyor",
+    "numberOfSeasons": 4,
+    "datePublished": "2025"
+  }
+  </script>
+</head>
+<body>
+  <!-- Netflix page content -->
+</body>
+</html>
+```
+
+### Turkish UI Pattern Tests
+
+```javascript
+// tests/fixtures/turkish-ui-patterns.json
+{
+  "title_cleaning_tests": [
+    {
+      "input": "The Witcher izlemenizi bekliyor | Netflix",
+      "expected": "The Witcher",
+      "removed": "izlemenizi bekliyor | Netflix"
+    },
+    {
+      "input": "Stranger Things izleyin",
+      "expected": "Stranger Things",
+      "removed": "izleyin"
+    },
+    {
+      "input": "Sezon 4 devam et",
+      "expected": "Sezon 4",
+      "removed": "devam et"
+    }
+  ]
+}
+```
+
+## 🔧 Test Utilities
+
+### Custom Test Helpers
+
+```javascript
+// tests/helpers/test-utils.js
+import fs from 'node:fs';
+import path from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = path.dirname(__filename);
+
+export function loadFixture(filename) {
+  const fixturePath = path.join(__dirname, '../fixtures', filename);
+  return fs.readFileSync(fixturePath, 'utf8');
+}
+
+export function loadJSONFixture(filename) {
+  const content = loadFixture(filename);
+  return JSON.parse(content);
+}
+
+export async function withTimeout(promise, timeoutMs = 5000) {
+  const timeout = new Promise((_, reject) => {
+    setTimeout(() => reject(new Error(`Test timeout after ${timeoutMs}ms`)), timeoutMs);
+  });
+
+  return Promise.race([promise, timeout]);
+}
+
+export function expectTurkishTitleClean(input, expected) {
+  const result = cleanTitle(input);
+  expect(result).toBe(expected);
+}
+```
+
+### Mock Browser Automation
+
+```javascript
+// tests/helpers/mock-playwright.js
+import { vi } from 'vitest';
+
+export function mockPlaywrightSuccess(html) {
+  vi.doMock('playwright', () => ({
+    chromium: {
+      launch: vi.fn(() => ({
+        newContext: vi.fn(() => ({
+          newPage: vi.fn(() => ({
+            goto: vi.fn().mockResolvedValue(undefined),
+            content: vi.fn().mockResolvedValue(html),
+            waitForLoadState: vi.fn().mockResolvedValue(undefined)
+          }))
+        })),
+        close: vi.fn().mockResolvedValue(undefined)
+      }))
+    }
+  }));
+}
+
+export function mockPlaywrightFailure() {
+  vi.doMock('playwright', () => {
+    throw new Error('Playwright not available');
+  });
+}
+```
+
+## 🎯 Test Scenarios
+
+### 1. URL Normalization Tests
+
+```javascript
+describe('URL Normalization', () => {
+  const testCases = [
+    {
+      input: 'https://www.netflix.com/tr/title/80189685?s=i&vlang=tr',
+      expected: 'https://www.netflix.com/title/80189685',
+      description: 'Turkish URL with parameters'
+    },
+    {
+      input: 'https://www.netflix.com/title/80189685?trackId=12345',
+      expected: 'https://www.netflix.com/title/80189685',
+      description: 'URL with tracking parameters'
+    }
+  ];
+
+  testCases.forEach(({ input, expected, description }) => {
+    it(description, () => {
+      const result = normalizeNetflixUrl(input);
+      expect(result).toBe(expected);
+    });
+  });
+});
+```
+
+### 2. Turkish UI Text Removal Tests
+
+```javascript
+describe('Turkish UI Text Cleaning', () => {
+  const turkishCases = [
+    {
+      input: 'The Witcher izlemenizi bekliyor',
+      expected: 'The Witcher',
+      pattern: 'waiting for you to watch'
+    },
+    {
+      input: 'Dark izleyin',
+      expected: 'Dark',
+      pattern: 'watch'
+    },
+    {
+      input: 'Money Heist devam et',
+      expected: 'Money Heist',
+      pattern: 'continue'
+    }
+  ];
+
+  turkishCases.forEach(({ input, expected, pattern }) => {
+    it(`removes Turkish UI text: ${pattern}`, () => {
+      expect(cleanTitle(input)).toBe(expected);
+    });
+  });
+});
+```
+
+### 3. JSON-LD Parsing Tests
+
+```javascript
+describe('JSON-LD Metadata Extraction', () => {
+  it('extracts movie metadata correctly', () => {
+    const jsonLd = {
+      '@type': 'Movie',
+      'name': 'Inception',
+      'datePublished': '2010',
+      'copyrightYear': 2010
+    };
+
+    const result = parseJsonLdObject(jsonLd);
+    expect(result.name).toBe('Inception');
+    expect(result.year).toBe(2010);
+    expect(result.seasons).toBeUndefined();
+  });
+
+  it('extracts TV series metadata with seasons', () => {
+    const jsonLd = {
+      '@type': 'TVSeries',
+      'name': 'Stranger Things',
+      'numberOfSeasons': 4,
+      'datePublished': '2016'
+    };
+
+    const result = parseJsonLdObject(jsonLd);
+    expect(result.name).toBe('Stranger Things');
+    expect(result.seasons).toBe('4 Sezon');
+  });
+});
+```
+
+### 4. Error Handling Tests
+
+```javascript
+describe('Error Handling', () => {
+  it('throws error for invalid URL', async () => {
+    await expect(scraperNetflix('invalid-url')).rejects.toThrow('Geçersiz URL sağlandı');
+  });
+
+  it('throws error for non-Netflix URL', async () => {
+    await expect(scraperNetflix('https://google.com')).rejects.toThrow('URL netflix.com adresini göstermelidir');
+  });
+
+  it('throws error for URL without title ID', async () => {
+    await expect(scraperNetflix('https://www.netflix.com/browse')).rejects.toThrow('URL\'de Netflix başlık ID\'si bulunamadı');
+  });
+
+  it('handles network timeouts gracefully', async () => {
+    await expect(scraperNetflix(TEST_URL, { timeoutMs: 1 })).rejects.toThrow('Request timed out');
+  });
+});
+```
+
+### 5. Performance Tests
+
+```javascript
+describe('Performance', () => {
+  it('completes static scraping within 1 second', async () => {
+    const start = performance.now();
+    await scraperNetflix(TEST_URL, { headless: false });
+    const duration = performance.now() - start;
+
+    expect(duration).toBeLessThan(1000);
+  }, 10000);
+
+  it('handles concurrent requests efficiently', async () => {
+    const urls = Array(5).fill(TEST_URL);
+    const start = performance.now();
+
+    const results = await Promise.allSettled(
+      urls.map(url => scraperNetflix(url, { headless: false }))
+    );
+
+    const duration = performance.now() - start;
+    const successful = results.filter(r => r.status === 'fulfilled').length;
+
+    expect(duration).toBeLessThan(3000); // Should be faster than sequential
+    expect(successful).toBeGreaterThan(0); // At least some should succeed
+  }, 30000);
+});
+```
+
+## 🔍 Test Debugging
+
+### 1. Visual HTML Inspection
+
+```javascript
+// Save HTML for manual debugging
+it('captures HTML for debugging', async () => {
+  const html = await fetchStaticHtml(TEST_URL);
+  fs.writeFileSync('debug-netflix-page.html', html);
+  console.log('HTML saved to debug-netflix-page.html');
+
+  expect(html).toContain('<html');
+  expect(html).toContain('netflix');
+});
+```
+
+### 2. Network Request Debugging
+
+```javascript
+// Debug network requests
+it('logs network request details', async () => {
+  const originalFetch = global.fetch;
+
+  global.fetch = async (url, options) => {
+    console.log('🌐 Request URL:', url);
+    console.log('📋 Headers:', options.headers);
+    console.log('⏰ Time:', new Date().toISOString());
+
+    const response = await originalFetch(url, options);
+    console.log('📊 Response status:', response.status);
+    console.log('📏 Response size:', response.headers.get('content-length'));
+
+    return response;
+  };
+
+  const result = await scraperNetflix(TEST_URL, { headless: false });
+
+  // Restore original fetch
+  global.fetch = originalFetch;
+
+  expect(result.name).toBeTruthy();
+});
+```
+
+### 3. Step-by-Step Processing
+
+```javascript
+// Debug each step of the process
+it('logs processing steps', async () => {
+  console.log('🚀 Starting Netflix scraping test');
+
+  // Step 1: URL normalization
+  const normalized = normalizeNetflixUrl(TEST_URL);
+  console.log('🔗 Normalized URL:', normalized);
+
+  // Step 2: HTML fetch
+  const html = await fetchStaticHtml(normalized);
+  console.log('📄 HTML length:', html.length);
+
+  // Step 3: Parsing
+  const parsed = parseNetflixHtml(html);
+  console.log('📊 Parsed metadata:', parsed);
+
+  // Step 4: Full process
+  const fullResult = await scraperNetflix(TEST_URL);
+  console.log('✅ Full result:', fullResult);
+
+  expect(fullResult.name).toBeTruthy();
+});
+```
+
+## 📈 Continuous Testing
+
+### GitHub Actions Workflow
+
+```yaml
+# .github/workflows/test.yml
+name: Test Suite
+
+on:
+  push:
+    branches: [ main, develop ]
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+
+    strategy:
+      matrix:
+        node-version: [18.x, 20.x, 22.x]
+
+    steps:
+    - uses: actions/checkout@v3
+
+    - name: Use Node.js ${{ matrix.node-version }}
+      uses: actions/setup-node@v3
+      with:
+        node-version: ${{ matrix.node-version }}
+        cache: 'npm'
+
+    - name: Install dependencies
+      run: npm ci
+
+    - name: Install Playwright
+      run: npx playwright install chromium
+
+    - name: Run tests
+      run: npm test -- --coverage
+
+    - name: Upload coverage to Codecov
+      uses: codecov/codecov-action@v3
+      with:
+        file: ./coverage/lcov.info
+```
+
+### Pre-commit Hooks
+
+```json
+// package.json
+{
+  "husky": {
+    "hooks": {
+      "pre-commit": "npm test && npm run lint"
+    }
+  }
+}
+```
+
+## 🚨 Test Environment Considerations
+
+### Network Dependencies
+
+- **Live Tests**: Require internet connection to Netflix
+- **Timeouts**: Extended timeouts for network requests (30s+)
+- **Rate Limiting**: Be respectful to Netflix's servers
+- **Geographic**: Tests may behave differently by region
+
+### Browser Dependencies
+
+- **Playwright**: Optional dependency for headless tests
+- **Browser Installation**: Requires `npx playwright install`
+- **Memory**: Browser tests use more memory
+- **CI/CD**: Need to install browsers in CI environment
+
+### Test Data Updates
+
+- **Netflix Changes**: UI changes may break tests
+- **Pattern Updates**: Turkish UI patterns may change
+- **JSON-LD Structure**: Netflix may modify structured data
+- **URL Formats**: New URL patterns may emerge
+
+## 📊 Test Metrics
+
+### Success Criteria
+
+- **Unit Tests**: 90%+ code coverage
+- **Integration Tests**: 100% API coverage
+- **Performance**: <1s response time for static mode
+- **Reliability**: 95%+ success rate for known URLs
+
+### Test Monitoring
+
+```javascript
+// Performance tracking
+const testMetrics = {
+  staticScrapingTimes: [],
+  headlessScrapingTimes: [],
+  successRates: {},
+  errorCounts: {}
+};
+
+function recordMetric(type, value) {
+  if (Array.isArray(testMetrics[type])) {
+    testMetrics[type].push(value);
+  } else {
+    testMetrics[type][value] = (testMetrics[type][value] || 0) + 1;
+  }
+}
+```
+
+---
+
+*Testing guide last updated: 2025-11-23*