Back to Verity

Verity — Design Document

Goal

A web application for employees validating customer-submitted documents. Upload a document (PDF, image, or scan), describe what you expect it to be in free text, and get back within seconds: the document category, confidence score, extracted fields, a summary, and a strict match verdict.

Live at: https://verity.joaog.space

Original brief: "Design a document validator that reads a 3-page document within 5 seconds and decides whether the content is matching the user's expectations. Example: Upload a utility bill and within 2 seconds a response comes back with the category of the document."

Architecture

Single Next.js 16 (App Router) project handling both UI and backend logic, deployed on Vercel.

Next.js App (Vercel)
  |
  |-- Frontend (React + shadcn/ui)
  |     |
  |     |-- fetch('/api/validate') [SSE stream]
  |     |-- fetch('/api/suggest')  [cached]
  |
  |-- API Routes
  |     |
  |     |-- Two-stage LLM calls
  |
  |-- LLM Adapter Layer (Gemini 2.0 Flash)

Tech Stack

LayerTechnology
FrameworkNext.js 16 (App Router)
LanguageTypeScript
UI Componentsshadcn/ui (Button, Input, Card, Badge, Label, Alert, Tooltip)
StylingTailwind CSS v4 + custom palette (Dust Grey, Gunmetal, Pacific Blue, Rosy Copper, Glaucous)
FontsInter (body), Outfit (display title)
Data fetchingCustom SSE stream consumer for validation, TanStack Query useQuery for autocomplete
AI engineGoogle Gemini 2.0 Flash via @google/genai
Image processingsharp (resize to 768px JPEG)
PDF processingpdf-lib (page count + truncation to 3 pages)
File thumbnailLightweight icon with extension badge (images use object URL preview)
Schema validationZod v4 (LLM response parsing with retry)
Input sanitizationCustom sanitizeUserInput() — Unicode property class \p{Cc}, max length
Rate limitingIn-memory sliding window per IP (10/min validate, 30/min suggest)
Themingnext-themes (light default, dark mode toggle)
Markdown renderingreact-markdown + remark-gfm + @tailwindcss/typography
DeploymentVercel
Domainverity.joaog.space

Decision Log

Two-stage streaming vs single-pass vs OCR pipeline

Chose: Two-stage streaming. The validation is split into a fast classification call (~3s for the verdict) followed by a background field extraction call (~4s more), streamed via Server-Sent Events.

Three approaches were considered during design:

  1. Single-pass vision LLM — One call for everything. Simple, but 4-6s before the user sees anything.
  2. OCR-first pipeline — Extract text first, then reason with an LLM. Adds complexity, loses visual context (logos, layout, formatting).
  3. Two-stage streaming — Classify fast, extract in background. Two calls = more total tokens, but the user gets the answer in ~3s.

The two-stage approach was chosen because the user's primary question — "does this document match?" — should be answered as fast as possible. Field extraction is secondary and can populate progressively.

Gemini Flash vs OpenAI GPT-4o vs Anthropic Claude

Chose: Gemini 2.0 Flash. Free tier, native PDF support, structured JSON output.

Strict matching vs fuzzy matching

Chose: Strict. Every specific detail in the expectation must be satisfied.

For document validation, false positives are worse than false negatives. An employee needs to know "this is NOT what was expected." A wastewater bill does not match "electricity bill" even though both are utilities. Blank forms do not match "completed form." Instructions about a form do not match the form itself.

No database / stateless design

Chose: No persistence. History lives in sessionStorage, clears on tab close.

Document validation is inherently stateless — each upload is independent. Adding a database adds deployment complexity without clear user value. Session history covers the "compare recent results" use case. Would reconsider if the tool needs audit trails, team sharing, or analytics.

Next.js monolith vs separate frontend/backend

Chose: Monolith. One project, one npm run dev, one deploy.

API routes run server-side (Gemini key stays safe), no CORS issues, shared TypeScript types between client and server. Vercel deploys it as a single unit. Acceptable tradeoff: can't scale frontend and backend independently, but fine for a tool app.

Branding decisions

Name: Verity (Latin for "truth") — chosen for its distinctive, proper-name quality (like "Claude" or "Gemini"). Selected from candidates: Archon, Verity, Sentinel, Argus, Nexus, Orion.

Font: Outfit (geometric sans-serif) — selected from 5 candidates compared side-by-side (Lora, Merriweather, Playfair Display, Space Grotesk, Outfit). Outfit gives a modern AI/tech feel that contrasts well with Inter for body text.

Color palette: User-provided custom 5-color palette replacing the default shadcn grayscale:

Approach: Two-Stage Streaming Validation

The validation is split into two sequential LLM calls, streamed to the frontend via Server-Sent Events (SSE):

Stage 1 — Classify (~3s): A minimal prompt asks only for category, confidence, match verdict, and explanation. No field extraction, no summary. Short output = faster inference. The result streams to the frontend immediately.

Stage 2 — Extract (~4s more): A second prompt asks for all extracted fields and a summary. Runs on the same document. Fields fade into the UI progressively while the user already has the verdict.

User clicks "Validate"
  |
  |  ~3s
  v
  VERDICT: Match/No Match        <-- User sees the answer here
  Category, Confidence, Why
  [Extracting fields... skeleton]
  |
  |  ~4s more
  v
  VERDICT (already shown)
  Summary (fades in)
  Extracted Fields (fades in)    <-- Fields populate progressively

PDFs are sent as application/pdf (Gemini handles natively). Images are resized to 768px width JPEG. PDFs over 3 pages are truncated to the first 3 pages using pdf-lib.

The prompts enforce strict expectation matching:

  1. Match the document type literally (electricity bill != water bill)
  2. Distinguish blank/template forms from completed/filled forms
  3. Verify specific issuers, date ranges, and named individuals
  4. Treat user input as untrusted (prompt injection defense)

Generation config: temperature: 0, responseMimeType: "application/json". Classification uses maxOutputTokens: 256 (fast). Extraction uses maxOutputTokens: 1024.

Retry: Each stage retries once on parse failure. Control characters are stripped before parsing. Empty responses are guarded against.

Test suite compatibility: A single-pass validateDocument method is preserved for the /tests page (server actions don't use SSE).

API Endpoints

POST /api/validate

Main validation endpoint. Returns a Server-Sent Events stream with two events. Rate limited to 10 requests/minute/IP.

Request: file (binary, max 5MB) + expectation (string, sanitized to 500 chars) as multipart/form-data

Response: Content-Type: text/event-stream

Event 1 — verdict (~3s):

{
  category: string;
  categoryLabel: string;
  confidence: number;
  matchesExpectation: boolean;
  matchExplanation: string;
  processingTimeMs: number;
  truncated: boolean;
}

Event 2 — complete (~7-8s total):

{
  extractedFields: Record<string, string>;
  summary: string;
  processingTimeMs: number;
}

Error event: { error: string, code: "validation_error" | "parse_error" | "provider_error" | "rate_limit" | "unknown" }

GET /api/suggest?q=<partial_text>

AI-powered autocomplete. Rate limited to 30 requests/minute/IP.

Response: string[] (4 suggestions)

Cost optimization:

LLM Provider Adapter

interface DocumentPart {
  buffer: Buffer;
  mimeType: string;  // "application/pdf" | "image/jpeg"
}

interface LLMProvider {
  validateDocument(parts: DocumentPart[], expectation: string): Promise<ValidatorResponse>;
  classifyDocument(parts: DocumentPart[], expectation: string): Promise<ClassifyResponse>;
  extractFields(parts: DocumentPart[], expectation: string): Promise<ExtractResponse>;
}

Current implementation: GeminiProvider with automatic retry on each method. Shared constants (GEMINI_MODEL, getGeminiClient()) ensure consistency across the validate and suggest routes.

Latency Analysis

Where the time goes

StepTimeUser sees
File upload + processing~150msSpinner
Stage 1: Classify (Gemini)2-3.5sVerdict appears
Stage 2: Extract fields (Gemini)3-5s moreFields fade in
Time to verdict~3s
Total time7-8s

The user gets the answer (match/no match) in ~3 seconds. Field extraction runs in the background and populates progressively.

Why Gemini Flash takes 3-6 seconds

What paid alternatives could achieve

ProviderModelEst. latencyNative PDFCostNotes
Google (paid)Gemini 2.0 Flash2-3sYes~$0.001/callSame model, dedicated capacity
Google (paid)Gemini 2.0 Pro4-8sYes~$0.005/callHigher quality, slower
OpenAIGPT-4o-mini1-2sNo~$0.002/callFast, but needs PDF-to-image (+500ms)
OpenAIGPT-4o2-4sNo~$0.01/callHighest quality vision
AnthropicClaude 3.5 Haiku1-2sNo~$0.001/callFast text, vision adds latency

Conclusion

The ~3s time-to-verdict with two-stage streaming is a practical tradeoff for zero cost, native PDF support, and full field extraction. A paid Gemini tier would bring this to ~2s with zero code changes. OpenAI GPT-4o-mini could reach ~1s for the classify stage but requires adding a PDF-to-image conversion step.

Optimizations applied

Security & Hardening

Test Results

Tested against 10 real-world PDF documents with specific expectations designed to test both correct matches and strict rejections.

#DocumentExpectationExpectedResultVerdictTotal
1W-2 blank (IRS)"A blank IRS W-2 form"MatchMatch (0.95)3.2s7.4s
2W-2 filled (Pitt)"A recent monthly pay stub"No MatchNo Match3.1s7.0s
3Invoice (Sliced)"A commercial invoice"MatchMatch (0.95)3.3s7.7s
4Utility bill (CRWWD)"A utility bill with account number"MatchMatch (0.95)3.0s7.2s
5Utility bill (Wheaton)"An electricity bill from ConEd"No MatchNo Match (0.95)3.2s7.6s
61040 instructions (IRS)"A completed Form 1040"No MatchNo Match (0.95)3.5s8.4s
7Passport (Malaysia)"A passport scan with photo"MatchMatch (0.92)3.0s7.1s
8Passport (Ultracamp)"A US driver's license"No MatchNo Match (0.95)3.4s7.6s
91099 form (IRS)"An IRS Form 1099"MatchMatch (0.95)3.3s7.8s
10W-4 form (IRS)"A completed W-2"No MatchNo Match (0.95)3.6s8.4s

Pass rate: 10/10

Average time to verdict: 3.3s (user sees Match/No Match here)

Average total time: 7.6s (fields fully populated)

Key observations:

UI Design

Brand

Layout

Components

  1. Expectation input — Text field with ghost text completion (Tab to accept), AI suggestion dropdown (debounced 400ms), and 8 quick-pick badge chips (4 visible + expandable)
  2. Upload zone — Native <button> wrapping a drag-and-drop area with file icon + extension badge thumbnail
  3. Result card — Status badge (Match/No Match/Uncertain), category badge, confidence + time with Tooltips. "Why" section with colored left border accent. Summary and extracted fields as sub-components. Shows skeleton during field extraction.
  4. Result skeleton — Pulse loading placeholder shown during initial LLM processing
  5. Empty state — "How it works" 3-step guide (Describe, Upload, Validate)
  6. History list — Session-persisted, expandable entries. Collapsible on mobile, always visible on desktop.
  7. Theme toggle — Sun/Moon in header, light/dark via next-themes
  8. Design doc page/docs renders this markdown at build time
  9. Test suite page/tests runs 10 test documents with live pass/fail dashboard

Design System (shadcn/ui)

Primitives: Button, Input, Card, Badge, Label, Alert, Tooltip. All use semantic CSS variables that adapt to light/dark themes.

Status colors:

Error Handling

ErrorUser messageCodeStrategy
File > 5MB"File exceeds 5MB limit"validation_errorClient + server validation
Bad format"Unsupported file type"validation_errorClient MIME + server validation
PDF > 3 pagesTruncated, badge shownpdf-lib extracts first 3 pages
LLM parse failureAuto-retry onceControl char stripping, then parse_error
Empty LLM responseAuto-retry onceGuard + clear error message
Field extraction failureGraceful degradationVerdict still shown, summary says "not available"
Rate limit exceeded"Too many requests"rate_limit429 with Retry-After header
API key issue"AI service config error"provider_errorLogged server-side
Network errorError alertUI error state

File Structure

doc-validator/
├── src/
│   ├── app/
│   │   ├── layout.tsx              # Root layout, fonts, providers
│   │   ├── page.tsx                # Main page, two-column layout, SSE state
│   │   ├── globals.css             # Tailwind v4, shadcn theme, custom palette
│   │   ├── icon.svg                # Favicon
│   │   ├── docs/                   # Design doc page (server-rendered markdown)
│   │   ├── tests/                  # Live test suite (server actions + dashboard)
│   │   └── api/
│   │       ├── validate/route.ts   # SSE streaming validation
│   │       └── suggest/route.ts    # AI autocomplete (cached + rate limited)
│   ├── components/
│   │   ├── expectation-input.tsx   # Input + ghost text + dropdown + badges
│   │   ├── upload-zone.tsx         # Drag-and-drop + file icon
│   │   ├── file-thumbnail.tsx      # File icon with extension badge
│   │   ├── result-card.tsx         # Two-stage result display
│   │   ├── result-skeleton.tsx     # Loading skeleton
│   │   ├── history-list.tsx        # Collapsible session history
│   │   ├── empty-state.tsx         # First-time guide
│   │   ├── theme-toggle.tsx        # Light/dark switcher
│   │   ├── providers.tsx           # ThemeProvider + QueryClient + TooltipProvider
│   │   └── ui/                     # shadcn/ui primitives
│   ├── hooks/
│   │   ├── use-validate.ts         # SSE stream hook (verdict + fields)
│   │   └── use-suggestions.ts      # Debounced autocomplete query
│   ├── lib/
│   │   ├── schemas.ts              # Zod schemas (classify + extract + combined)
│   │   ├── api-client.ts           # SSE stream consumer
│   │   ├── sanitize.ts             # Input sanitization
│   │   ├── rate-limit.ts           # Sliding window rate limiter
│   │   ├── utils.ts                # cn() helper
│   │   ├── llm/
│   │   │   ├── types.ts            # LLMProvider interface
│   │   │   ├── constants.ts        # Shared model name + client factory
│   │   │   ├── provider.ts         # Provider factory
│   │   │   ├── gemini-provider.ts  # Gemini with retry (classify + extract + validate)
│   │   │   └── prompt.ts           # Prompts with injection defense
│   │   └── document/
│   │       ├── image-processor.ts  # JPEG resize via sharp
│   │       └── pdf-processor.ts    # Page count + truncation via pdf-lib
│   └── lib/__tests__/              # Vitest tests
├── test-docs/                      # Generated test PDFs
├── test-docs/real/                 # Real-world PDFs
├── docs/design.md                  # This document
└── package.json

Deployment

Scope

Built:

Out of scope (future):