97 lines
4.6 KiB
Markdown
97 lines
4.6 KiB
Markdown
# imgPub OCR to EPUB – Technical Overview
|
||
|
||
This document describes how the imgPub application converts page photos into a final EPUB book. It covers the frontend wizard, the Node.js backend that performs EPUB generation, and the most important implementation details so new contributors can extend the project confidently.
|
||
|
||
## 1. Stack Summary
|
||
|
||
| Layer | Technology | Notes |
|
||
| --- | --- | --- |
|
||
| Frontend build | **Vite + React 18** | SPA wizard experience |
|
||
| UI | **Material UI v6** | Theme, Stepper, responsive grid |
|
||
| Routing | **react-router-dom v7** | Each wizard step is a route |
|
||
| State | **Zustand** | `useAppStore` centralises workflow data |
|
||
| Upload | **react-dropzone** | Drag & drop multiple images |
|
||
| Crop | **Custom overlay** | iOS-style handles, percentage-based selection |
|
||
| OCR | **tesseract.js 5** | Uses local worker/core/lang assets (eng + tur) |
|
||
| EPUB generation | **Node.js + express + epub-gen** | Backend service builds EPUB and streams back base64 |
|
||
|
||
## 2. Folder Layout Highlights
|
||
|
||
```
|
||
src/
|
||
components/ (UploadStep, CropStep, BulkCropStep, OcrStep, EpubStep, DownloadStep)
|
||
store/useAppStore.js (global state)
|
||
utils/
|
||
cropUtils.js (canvas cropping)
|
||
ocrUtils.js (date extraction, sorting)
|
||
epubUtils.js (API client for EPUB)
|
||
fileUtils.js (download helper)
|
||
server/
|
||
index.js (Express app with /generate-epub)
|
||
package.json (epub-gen, express, cors, uuid)
|
||
public/
|
||
tesseract/ (worker.min.js, tesseract-core-simd-lstm.wasm(.js), eng.traineddata, tur.traineddata)
|
||
```
|
||
|
||
## 3. Frontend Wizard Flow
|
||
|
||
1. **Upload** – Drag/drop `.png/.jpg/.jpeg` files, previews stored with `URL.createObjectURL` (revoked during resets).
|
||
2. **Crop** – Choose a reference image, adjust selection handles, optional numeric offsets. Crop config saved in store with relative ratios.
|
||
3. **Bulk Crop** – Applies saved ratios to every upload via `<canvas>`, storing cropped blobs and URLs.
|
||
4. **OCR** – Sequential Tesseract worker (`tur` language, fallback `eng`). Each cropped image is processed in upload order and the cleaned text is appended to a single in-memory string (with a single-space separator). Only that cumulative string is persisted in the store to keep CPU/RAM usage minimal.
|
||
5. **EPUB** – After OCR, the frontend sends the full concatenated string to the backend, waits for the resulting EPUB blob, and stores it for download.
|
||
6. **Download** – Displays the EPUB metadata and lets the user download or restart the process.
|
||
|
||
## 4. EPUB Backend Service
|
||
|
||
- Located in `/server`. Run with `cd server && npm install && npm run dev` (defaults to port **4000**).
|
||
- Exposes `POST /generate-epub` accepting `{ text, meta }` where `text` is the single, concatenated OCR output.
|
||
- Uses [`epub-gen`](https://www.npmjs.com/package/epub-gen) to build one chapter containing the entire text and writes it to a temporary file.
|
||
- Returns `{ filename, data }` where `data` is base64-encoded EPUB bytes. Frontend decodes to `Blob` and stores in Zustand (`generatedEpub`).
|
||
- CORS origin defaults to `http://localhost:5173` and can be overridden via `CLIENT_ORIGIN` env var.
|
||
|
||
## 5. Tesseract Assets
|
||
|
||
All heavy OCR assets are served locally to avoid CDN issues:
|
||
- `public/tesseract/worker.min.js`
|
||
- `public/tesseract/tesseract-core-simd-lstm.wasm(.js)`
|
||
- `public/tesseract/eng.traineddata`
|
||
- `public/tesseract/tur.traineddata`
|
||
|
||
The OCR step creates a single worker and reuses it for every cropped image to keep CPU usage predictable.
|
||
|
||
## 6. State Management & Cleanup
|
||
|
||
`useAppStore` tracks:
|
||
- `uploadedImages`, `cropConfig`, `croppedImages`, `ocrResults`
|
||
- `generatedEpub` (blob, URL, filename)
|
||
- `error`
|
||
|
||
`resetFromStep(step)` clears downstream data and revokes blob URLs (uploads, crops, EPUB) so memory usage stays bounded even after long sessions.
|
||
|
||
## 7. Running the Project
|
||
|
||
```bash
|
||
# Frontend
|
||
npm install
|
||
npm run dev
|
||
|
||
# Backend in another terminal
|
||
dcd server
|
||
npm install # already included, run once
|
||
npm run dev # starts on http://localhost:4000
|
||
```
|
||
|
||
Set `VITE_API_BASE_URL` in `.env` if the server runs on a different host/port.
|
||
|
||
`npm run build` still targets Vite’s static output (`dist/`). Chunk warnings are disabled by bumping `chunkSizeWarningLimit` (see `vite.config.js`).
|
||
|
||
## 8. Potential Enhancements
|
||
|
||
- Allow users to edit OCR text before sending it to the EPUB service.
|
||
- Add cover image generation per session.
|
||
- Persist workflow state (e.g. IndexedDB) so refreshes are less disruptive.
|
||
- Stream EPUB as soon as chapters are processed for better perceived speed.
|
||
|
||
The current setup keeps PDF logic out of the client entirely, ensuring consistent Turkish characters thanks to EPUB readers’ native font stacks or bundled fonts (if added later).
|