imgPub OCR to EPUB – Technical Overview
This document describes how the imgPub application converts page photos into a final EPUB book. It covers the frontend wizard, the Node.js backend that performs EPUB generation, and the most important implementation details so new contributors can extend the project confidently.
1. Stack Summarys
| Layer | Technology | Notes |
|---|---|---|
| Frontend build | Vite + React 18 | SPA wizard experience |
| UI | Material UI v6 | Theme, Stepper, responsive grid |
| Routing | react-router-dom v7 | Each wizard step is a route |
| State | Zustand | useAppStore centralises workflow data |
| Upload | react-dropzone | Drag & drop multiple images |
| Crop | Custom overlay | iOS-style handles, percentage-based selection |
| OCR | tesseract.js 5 | Uses local worker/core/lang assets (eng + tur) |
| EPUB generation | Node.js + express + epub-gen | Backend service builds EPUB and streams back base64 |
2. Folder Layout Highlights
src/
components/ (UploadStep, CropStep, BulkCropStep, OcrStep, EpubStep, DownloadStep)
store/useAppStore.js (global state)
utils/
cropUtils.js (canvas cropping)
ocrUtils.js (date extraction, sorting)
epubUtils.js (API client for EPUB)
fileUtils.js (download helper)
server/
index.js (Express app with /generate-epub)
package.json (epub-gen, express, cors, uuid)
public/
tesseract/ (worker.min.js, tesseract-core-simd-lstm.wasm(.js), eng.traineddata, tur.traineddata)
3. Frontend Wizard Flow
- Upload – Drag/drop
.png/.jpg/.jpegfiles, previews stored withURL.createObjectURL(revoked during resets). - Crop – Choose a reference image, adjust selection handles, optional numeric offsets. Crop config saved in store with relative ratios.
- Bulk Crop – Applies saved ratios to every upload via
<canvas>, storing cropped blobs and URLs. - OCR – Sequential Tesseract worker (
turlanguage, fallbackeng). Each cropped image is processed in upload order and the cleaned text is appended to a single in-memory string (with a single-space separator). Only that cumulative string is persisted in the store to keep CPU/RAM usage minimal. - EPUB – After OCR, the frontend sends the full concatenated string to the backend, waits for the resulting EPUB blob, and stores it for download.
- Download – Displays the EPUB metadata and lets the user download or restart the process.
4. EPUB Backend Service
- Located in
/server. Run withcd server && npm install && npm run dev(defaults to port 4000). - Exposes
POST /generate-epubaccepting{ text, meta }wheretextis the single, concatenated OCR output. - Uses
epub-gento build one chapter containing the entire text and writes it to a temporary file. - Returns
{ filename, data }wheredatais base64-encoded EPUB bytes. Frontend decodes toBloband stores in Zustand (generatedEpub). - CORS origin defaults to
http://localhost:5173and can be overridden viaCLIENT_ORIGINenv var.
5. Tesseract Assets
All heavy OCR assets are served locally to avoid CDN issues:
public/tesseract/worker.min.jspublic/tesseract/tesseract-core-simd-lstm.wasm(.js)public/tesseract/eng.traineddatapublic/tesseract/tur.traineddata
The OCR step creates a single worker and reuses it for every cropped image to keep CPU usage predictable.
6. State Management & Cleanup
useAppStore tracks:
uploadedImages,cropConfig,croppedImages,ocrResultsgeneratedEpub(blob, URL, filename)error
resetFromStep(step) clears downstream data and revokes blob URLs (uploads, crops, EPUB) so memory usage stays bounded even after long sessions.
7. Running the Project
# Frontend
npm install
npm run dev
# Backend in another terminal
dcd server
npm install # already included, run once
npm run dev # starts on http://localhost:4000
Set VITE_API_BASE_URL in .env if the server runs on a different host/port.
npm run build still targets Vite’s static output (dist/). Chunk warnings are disabled by bumping chunkSizeWarningLimit (see vite.config.js).
8. Potential Enhancements
- Allow users to edit OCR text before sending it to the EPUB service.
- Add cover image generation per session.
- Persist workflow state (e.g. IndexedDB) so refreshes are less disruptive.
- Stream EPUB as soon as chapters are processed for better perceived speed.
The current setup keeps PDF logic out of the client entirely, ensuring consistent Turkish characters thanks to EPUB readers’ native font stacks or bundled fonts (if added later).