OCR and document AI for handwritten prescriptions: beyond raw text

Taking a photo of a prescription with a phone and getting, ten seconds later, a structured list of medications, doses, duration, and frequency: this is the experience DossiMed offers users. Behind that apparent simplicity, the technical pipeline combines two generations of AI, strong privacy discipline, and a strict boundary around medical responsibility. This article explains the logic of our approach without exposing prompts, model details, or internal parameters.

The handwritten prescription challenge

If you have ever tested a generic OCR on a medical prescription, you know the result: noisy text with random fragments that look readable but are not reliable. The causes are cumulative:

Fast, non-standard handwriting - medical writing is often cursive, slanted, and compressed
Domain-specific abbreviations - Dsp, mane, nocte, cp, amp are not in public dictionaries
Heterogeneous layouts - header, prescription body, signature, stamp, and legal notes mix without a fixed grid
Mixed languages - Arabic header, French body, Latin molecule names in the same document
Degraded image quality - poor lighting, rushed capture, wrinkled paper, or partially hidden content

For a generic OCR, each factor lowers precision. Combined, they make extraction unusable.

The worst outcome of weak medical OCR is not unreadable text. It is silent error. A wrongly read dosage can still look plausible. In healthcare, that risk requires a radically different approach from plain OCR.

A two-stage pipeline

Our approach is built as two successive stages, each specialized in a task that current AI models handle well separately.

Stage 1 - Visual extraction

The first stage relies on a modern Document Intelligence service. This type of system is trained on structured documents (invoices, contracts, medical forms) and recognizes both characters and layout. It outputs two artifacts:

Raw text reproducing the document word sequence
Tabular representation recognizing rows and columns when present (for lab reports, for example)

At this point, an image has become text. But that text is still noisy, ambiguous, and semantically unstructured.

Stage 2 - Semantic extraction

The second stage uses a frontier generative language model configured to output structured JSON from raw text. Its role is threefold:

Identify medical entities - medication names, doses, frequencies, durations, intake instructions, prescriber name, specialty, issue date, practice location
Categorize the document - medication prescription, lab report, and medical imaging follow different expected structures; categorization guides downstream handling
Correct misspelled medication names - OCR returns Glecnvanc 50 mg; the model proposes Glivec 50 mg with confidence score and can trigger online pharmacological verification for ambiguous cases

The output is clean JSON, directly usable by the app: medication list with parameters, document metadata, confidence indicators, and potential alerts.

OCR + document AI pipeline: visual extraction, semantic extraction, user validation

Privacy discipline

A document-AI pipeline for healthcare raises an obvious question: what is sent to the AI model? European regulation, and basic ethics, require the minimum possible answer.

No nominative patient data is sent to the model. Name, date of birth, social security number, allergies, pre-existing chronic conditions - none of that crosses into the AI service boundary. The model receives only document text to convert into structure.

This principle has two practical effects:

Leak-risk limitation - even an incident at a third-party AI vendor would not expose patient identity
Simpler GDPR posture - no special-category data under Article 9 is exported outside the extended European jurisdictional perimeter

The trade-off should be explicit: the model cannot use patient context to resolve every ambiguity. In such cases, extraction lines are marked for user review, and manual confirmation is requested.

Multilingual document AI

The pipeline must handle prescriptions written in French, Arabic, English, and often multiple languages simultaneously (Arabic header, French body, Latin signature). This is common in Maghreb and Middle East practice but still underestimated technically.

Arabic is particularly challenging: cursive script complicates OCR, and right-to-left direction can introduce sequence artifacts if pipeline settings are not robust.

Our choice: do not predeclare document language. The pipeline automatically detects dominant language from recognized characters and vocabulary, then adapts post-processing accordingly. For users, this means they simply photograph the prescription as-is, without language selection friction.

Human validation as final link

A core design decision differentiates DossiMed from clinical decision-support tools: the user always validates extraction output. Once structured JSON is produced, the app displays editable fields in the detail screen. Users can correct a misread drug name, adjust dose, change frequency, or remove a line. Reminder generation is based on what users validate, not what AI infers.

This is not only a UX choice. It is a regulatory choice. By making no autonomous medical decision - no dosage recommendation, no drug-interaction alert, no biological result interpretation - DossiMed remains below the software medical-device threshold under EU MDR 2017/745. AI proposes, users decide, prescribers remain responsible.

Error-case handling

No AI pipeline is perfect. Ours explicitly models failure paths and handles them transparently.

OCR confidence too low - blurry image, partially hidden content, insufficient light. Prescription status is set to needs_review instead of producing uncertain extraction. The app prompts users to retake the photo or correct fields manually.

Language-model uncertainty - when the LLM signals uncertainty about a medication or dosage, the field is shown with a visible indicator. Users know it must be checked against the original prescription before schedule generation.

Non-medical document - a prefilter detects missing medical markers (ordonnance, posologie, analyse, laboratoire, etc.) and politely rejects the document before expensive processing. This protects both UX quality and operating cost.

What the pipeline guarantees

No secret or nominative data sent to third-party AI models
Mandatory user validation before any reminder generation
Explicit needs_review status when confidence is insufficient
Early rejection of non-medical documents to control cost
Portability: extraction layer can be swapped with competing providers

A carry-ready platform

The complete pipeline - visual extraction, semantic extraction, validation, structured database integration - is implemented in a small set of serverless edge functions. It is portable across AI providers: the visual layer can be replaced by any competing document-intelligence service; the semantic layer can be replaced by any frontier model that outputs structured JSON, whether managed cloud model or self-hosted open-weight alternative.

This portability is valuable for organizations wanting to deploy DossiMed in a sovereign cloud or with their own internal LLM. Application code makes no irreversible assumption about underlying providers.

For patients, the result is an app that turns a crumpled prescription photo into a structured medical record, readable and shareable with physicians. For inheriting organizations, it is a platform whose document-AI engine can be connected to infrastructure of their choice.

DossiMed is published by REC, a fully export-oriented Tunisian single-member company. For commercial or strategic partnership inquiries, contact contact@dossimed.ai.