Building a Local OCR API That Returns JSON Fields

This is the setup I used to test a local OCR API that reads document images and, for Kenyan ID-style documents, returns structured JSON fields.

The pipeline is:

image
  -> PP-OCRv6 detection + recognition
  -> document-specific cleanup mode
  -> optional Supra-50M understanding model
  -> JSON fields with confidence

The service is a Rust binary. PP-OCRv6 runs through tract. The optional understanding stage is a LoRA-fused Supra-50M model running locally through candle.

1. Build the understanding-enabled binary

From the OCR project:

cd /home/ck/Desktop/projects/silly/paddle-paddle-v6-ocr-rust
cargo build --release --features understanding

Start it on a temporary local port:

OCR_PORT=3099 ./target/release/ppocr-server --no-prewarm

On first run, the server downloads the OCR and Supra weights, verifies SHA-256 checksums, and caches them under PPOCR_CACHE_DIR or the default user cache path.

In this run, the server fetched:

PP-OCRv6_medium_det
PP-OCRv6_medium_rec
supra-kenya-id

The Supra model adds about 136 MB to the cache, uses roughly 1.2 GB RAM when loaded, and adds about one second of CPU work per ID request.

2. Use synthetic test documents

Do not publish real IDs or logbooks in a blog post. I generated synthetic fixtures that preserve the layout pressure of the real documents without becoming usable documents: similar field placement, table density, label/value spacing, background texture, and camera noise, with clear SAMPLE / NOT VALID markings.

Synthetic ID-card-style OCR test fixture marked sample not valid. — Generated ID-card-style fixture. The layout references a real card; the content is fictional and unusable as identification.

3. Run OCR plus JSON extraction on the ID fixture

Request:

curl -X POST \
  "http://127.0.0.1:3099/v1/ocr?mode=kenya_id&understand=true&det_model=PP-OCRv6_medium_det&rec_model=PP-OCRv6_medium_rec" \
  --data-binary @/home/ck/Desktop/projects/silly/ck/static/images/ocr-clerk-sample-id.png \
  -H "Content-Type: application/octet-stream"

Result summary:

mode: kenya_id
elapsed_ms: 10135
num_lines: 18

The annotated image below was generated from the actual lines[].points returned by the API.

Synthetic ID-card-style fixture annotated with actual OCR detection boxes and line numbers. — Actual OCR boxes from the local understanding-enabled server: 18 detected lines, mode=kenya_id.

The OCR text came back as:

TEST ID CARD
SERIAL NUMBER
ID NUMBER
900112233
12345678
SAMPLE/NOT VALID
FULL NAMES
AMINA KIPRONO TESTER
DATE OF BIRTH
25.05.1993
SEX
FEMALE
DISTRICT OF BIRTH
PLACE OF ISSUE
DATE OF ISSUE
NANDI SOUTH
TINDIRET
14.10.2011

The understanding stage returned:

{
  "serial_number": { "value": "900112233", "confidence": 1.0 },
  "id_number": { "value": "12345678", "confidence": 0.99 },
  "full_names": { "value": "AMINA KIPRONO TESTER", "confidence": 0.96 },
  "date_of_birth": { "value": "25.05.1993", "confidence": 1.0 },
  "sex": { "value": "FEMALE", "confidence": 1.0 },
  "district_of_birth": { "value": "PLACE OF ISSUE", "confidence": 1.0 },
  "place_of_issue": { "value": "TINDIRET", "confidence": 0.89 },
  "date_of_issue": { "value": "14.10.2011", "confidence": 0.98 }
}

The bad field is district_of_birth. It should be NANDI SOUTH, but the model returned PLACE OF ISSUE.

The annotated image explains the failure. OCR detected the three labels DISTRICT OF BIRTH, PLACE OF ISSUE, and DATE OF ISSUE as one row. The values came later as a separate row. The understanding model mapped one label as if it were the value.

That is exactly why the intermediate annotated image is useful. It shows whether the failure came from OCR detection, OCR recognition, layout ordering, or the JSON extraction model.

4. What the Supra stage does

The understanding feature is currently wired for mode=kenya_id.

Its schema is:

serial_number
id_number
full_names
date_of_birth
sex
district_of_birth
place_of_issue
date_of_issue

The model generates JSON from noisy OCR lines. The server then wraps each field as:

{ "value": "...", "confidence": 0.96 }

The confidence is calculated from generation token probabilities. For each field value, the server reports the lowest token probability among the tokens used to produce that value. It is not a complete confidence system, but it is better than returning structured JSON with no uncertainty at all.

5. Run the logbook fixture

The logbook mode does not yet have a structured JSON model. It only performs document cleanup.

Synthetic vehicle-registration-style OCR test fixture marked sample not valid. — A generated logbook-style fixture. It keeps the table pressure of the real document while avoiding seals, official identifiers, and usable certificate details.

Request:

curl -X POST \
  "http://127.0.0.1:3099/v1/ocr?mode=kenya_logbook&det_model=PP-OCRv6_medium_det&rec_model=PP-OCRv6_medium_rec" \
  --data-binary @/home/ck/Desktop/projects/silly/ck/static/images/ocr-clerk-sample-logbook.png \
  -H "Content-Type: application/octet-stream"

Result summary:

mode: kenya_logbook
elapsed_ms: 9540
num_lines: 61

Synthetic logbook-style fixture annotated with actual OCR detection boxes and line numbers. — Actual OCR boxes from the local server: 61 detected lines, mode=kenya_logbook.

Excerpt from the OCR output:

Registration
KXX 000X
Number of Passengers
5
Chassis/Frame
TST-AW-110821
Tare Weight (Kgs)
1330.0
Make of Vehicle
EXAMPLE MOTORS
Tax Class
PRIVATE
...
Previous Reg. Country
Fuel Type
DIESEL
Previous Registra
...
Number of Previous Owners 0

This is useful OCR cleanup, not full document understanding. If logbooks need reliable JSON output, they need their own schema and adapter instead of reusing the ID model.

6. Deployment shape

The deployed service sits behind local Nginx:

ocr.servos.dev
  -> local Nginx
  -> 127.0.0.1:3088
  -> ppocr-server

The important boundary is the API mode:

general        read text
document       document-friendly OCR
kenya_id       ID cleanup
kenya_id + understand=true
              ID JSON extraction
kenya_logbook  logbook cleanup

Do not keep adding document-specific behavior to the global OCR path. Add a mode. If fields matter, add a schema-specific parser or small model for that document type.

Building a Local OCR API That Returns JSON Fields

1. Build the understanding-enabled binary

2. Use synthetic test documents

3. Run OCR plus JSON extraction on the ID fixture

4. What the Supra stage does

5. Run the logbook fixture

6. Deployment shape

Related Articles

Daily Agents Need a Scheduler

Put Codex Behind Mattermost, Then Stop Overbuilding It

Dream TTS Proves Local Speech Is a Product Architecture Problem

1. Build the understanding-enabled binary

2. Use synthetic test documents

3. Run OCR plus JSON extraction on the ID fixture

4. What the Supra stage does

5. Run the logbook fixture

6. Deployment shape

Related Articles

Daily Agents Need a Scheduler

Put Codex Behind Mattermost, Then Stop Overbuilding It

Dream TTS Proves Local Speech Is a Product Architecture Problem

Share Article