Skip to content
← Back to work
Case Study / Tenancy

Tenancy

Lease abstraction agent for multifamily operators — extracts nine structured sections from residential lease PDFs with per-field source citations, click-to-highlight overlays anchored to OCR coordinates, a human-review exception queue, and grounded Q&A. The same agent is exposed as an MCP server.

9 lease sections · per-field citationsOCR-anchored highlight overlays (pdfplumber)Approve / edit / reject review queueFastAPI · LangGraph · Claude · MCPOpen ↗Source ↗
Translation artifact
2026Solo build
01Problem
Situation: Multifamily operators turn lease PDFs into structured data at portfolio scale — during acquisitions, audits, and renewals. An LLM can read a lease and emit the fields; that part is close to free. The schema was never the hard part.
Complication: The hard part is trust. A reviewer will not accept an extracted field — a rent amount, a deposit, a term date — without seeing where in the document it came from. Building that "click a field, highlight the source" link is where the system actually broke: twelve-plus iterations of fuzzy text matching produced confident, wrong-place highlights; a strict exact-match fixed wrong-place by failing silently; and asking the model to emit bounding boxes drifted three to eight percent and boxed entire section headers when the field was a blank template placeholder.
Question: Can the highlight be anchored to the document — trustworthy enough that a reviewer relies on it — given that a wrong highlight erodes trust faster than no highlight at all?
02Requirements
  • Property-management reviewer

    Every extracted field traceable to its page with a visible highlight on the source line. A highlight in the wrong place is worse than none — it teaches the reviewer to distrust the tool.

    Design rule: a silent miss beats a confident wrong-place highlight

  • Risk / compliance

    Nothing reaches the system of record until a human approves, edits, or rejects each flagged exception; a rejected blocker keeps the lease out of the ready state.

    ready_to_proceed derived per lease, gated on unresolved blockers

  • Extraction honesty

    When a field is genuinely blank in the lease, report a null value at high confidence rather than inventing a plausible one.

    match_type ∈ filled / blank / inferred / checkbox / absent

  • Engineer maintaining it

    The highlight's coordinate source must be swappable — model vision, OCR alignment, AWS Textract — without rewriting the renderer.

    Frontend consumes bboxes: BoundingBox[]; the source stays backend-internal

03Decision

OCR-anchored bboxes — pdfplumber word positions + rapidfuzz snippet alignment, one rect per line

chosen
  • meets criterion: Positional accuracy
  • meets criterion: Honest failure mode
  • meets criterion: Works on blank + scanned fields
  • partially meets criterion: Build effort

The model returns {value, snippet, match_type, section_label} and never coordinates; the backend aligns the snippet against pdfplumber's word-level positions in the OCR'd PDF and emits one bounding box per line, following the PDF spec's QuadPoints highlight model. The pattern I found across the OCR and document-layout tooling I looked at was consistent enough to commit to: let OCR own geometry, let the model own semantics, and never ask the model to emit coordinates from a raster — that is exactly where model-estimated boxes drift. The renderer consumes a bbox array and is agnostic to where the boxes came from, so the production-grade upgrade (Textract SELECTION_ELEMENT for checkbox and signature geometry) is a backend swap, not a rewrite. The build cost is higher than letting the model emit coordinates — but that option fails exactly where it matters, on the blank and scanned fields a reviewer most needs to verify.

Fuzzy text-layer matching (heuristic, 12+ iterations)

  • partially meets criterion: Positional accuracy
  • does not meet criterion: Honest failure mode
  • does not meet criterion: Works on blank + scanned fields
  • does not meet criterion: Build effort

Strict exact-normalized text match (silent miss on no match)

  • partially meets criterion: Positional accuracy
  • meets criterion: Honest failure mode
  • does not meet criterion: Works on blank + scanned fields
  • meets criterion: Build effort

Model-emitted bounding boxes (Claude vision)

  • partially meets criterion: Positional accuracy
  • partially meets criterion: Honest failure mode
  • partially meets criterion: Works on blank + scanned fields
  • meets criterion: Build effort
04Solution

A lease abstraction agent that extracts nine structured sections with per-field provenance, anchors every highlight to OCR coordinates rather than model estimation, gates output behind a human-review exception queue, and answers grounded questions over the result — exposed as both a SaaS UI and an MCP server over the same backend.

Page-image-grounded extraction
Claude Sonnet extracts nine sections (parties, property, term, rent, deposits, utilities, pets, special clauses, compliance) over OCR'd text plus a rendered PNG of each page. The image is ground truth for visual fields — checkboxes, signatures, hand-fill — and the OCR text is ground truth for dense prose. That split fixed a class of false-positive checkbox reads where OCR noise looked like a mark.
OCR-anchored highlight overlays
The backend aligns each field's source snippet against pdfplumber word positions and emits one bounding box per line; the frontend draws absolutely-positioned overlays scaled to the rendered page. Filled values land tight, multi-line values render as stacked rectangles, and blank template fields anchor on the labeled blank line rather than the section header above it.
Exception queue with real resolve semantics
Validation rules (required-field presence, date order, confidence thresholds) flag exceptions as blocking or warning. Approve accepts the current value; edit rewrites the extraction at the field path and bumps confidence to 1.0; reject closes the row for audit but keeps the blocker material. A derived ready_to_proceed flag gates the lease.
Grounded Q&A + MCP surface
Claude Haiku answers questions over the stored extraction — every answer cites field path, page, and snippet, and the citation clicks through to the same highlight. The whole agent is also a six-tool MCP server (list, get, extract, query, list-exceptions, resolve), so it runs inside Claude Desktop against the same Railway backend.
05Outcome
  • Extraction

    9 sections, per-field citations

    Page, snippet, and confidence on every value; honest null on blank fields

  • Highlight

    OCR-anchored, not model-estimated

    Filled tight · multi-line stacked · blank fields on the blank line · verified end-to-end on a fresh lease

  • Coverage, stated honestly

    Checkboxes / signatures not highlighted

    Deferred to a Textract follow-up — a silent miss beats a wrong-place box

  • Surfaces

    SaaS UI + 6-tool MCP server

    One FastAPI / LangGraph backend · Sonnet extract · Haiku Q&A

Overview

Multifamily operators turn lease PDFs into structured data at portfolio scale — during acquisitions, audits, and renewals. A language model can read a lease and emit the fields; that part is close to free. The hard part is making a reviewer trust the output, which means showing — for every rent amount, deposit, and term date — exactly where in the document it came from.

Tenancy extracts nine structured sections from a residential lease, flags anything ambiguous into a human-review queue, and on every field offers a click-to-highlight back to the source page. The design bet is narrow and load-bearing: the highlight has to be anchored to the document, because a highlight in the wrong place erodes a reviewer's trust faster than no highlight at all.

RoleStrategy, design, and engineering (frontend + backend + MCP)
Year2026
DomainAI document extraction · lease abstraction
StackNext.js 16 · FastAPI · LangGraph · Claude (claude-sonnet-4-6 extract, claude-haiku-4-5 Q&A) · ocrmypdf + pdfplumber + pypdfium2 · Neon Postgres · MCP
StatusShipped

Problem framing

Three observations shape the design, and all three are about the highlight, not the extraction:

  1. The extraction is the easy part; the highlight is the product. Claude reads a Texas Apartment Association lease and returns the nine sections — parties, property, term, rent, deposits, utilities, pets, special clauses, compliance — without much coaxing. What a property-management reviewer actually needs is to click a field and see the source line light up in the PDF. That link is where the system repeatedly broke.
  2. A wrong-place highlight is worse than none. The first approach was text matching: find the extracted snippet in the PDF's text layer and highlight it. Twelve-plus iterations of increasingly elaborate fuzzy matching still produced confident highlights on the wrong line — and a reviewer who catches the tool highlighting the wrong place once stops trusting every highlight after. The honest fallback (highlight nothing when you're not sure, and say so) beats a clever wrong guess.
  3. Letting the model emit coordinates fails exactly where it matters. The obvious fix — ask Claude's vision to return a bounding box per field — drifted three to eight percent on filled values and boxed entire section headers when the field was a blank template placeholder. The blank and scanned fields are precisely the ones a reviewer most needs to verify, so a method that degrades there is a method that fails the job.

Solution

The pipeline ingests and OCRs the PDF, extracts the nine sections grounded on page images, derives highlight coordinates from OCR positions rather than model estimation, validates the result into an exception queue, and answers grounded questions over the stored extraction. The same backend is exposed as an MCP server.

Page-image-grounded extraction

Each section is extracted by Claude Sonnet over the OCR'd text plus a rendered PNG of each page. The split is explicit in the prompt: the image is ground truth for visual fields — checkboxes, signatures, hand-fill — and the OCR text is ground truth for dense prose. This fixed a class of false-positive checkbox reads where Tesseract had read scan noise as a stray mark and the model trusted the text.

The highlight pivot — OCR-anchored bounding boxes

The model's response contract carries no coordinates. It returns {value, snippet, match_type, section_label} and nothing geometric. The backend then aligns each snippet against pdfplumber's word-level positions in the OCR'd PDF using rapidfuzz partial-ratio matching, groups the matched words into lines by their vertical coordinate, and emits one bounding box per line — the PDF specification's QuadPoints highlight model, the same shape Adobe and Mendeley render. The frontend draws absolutely-positioned overlays scaled to the rendered page width. Filled values land tight on the value; multi-line values render as stacked rectangles; blank template fields anchor on the labeled blank line rather than the section header above it.

Exception queue with real resolve semantics

Validation rules — required-field presence, date ordering, confidence thresholds — flag exceptions as blocking or warning. The three resolve actions are genuinely distinct: approve accepts the current value and clears the blocker; edit walks the extraction JSON to the exception's field_path, rewrites the leaf value, and bumps confidence to 1.0 (a human said so); reject closes the row for audit but keeps the blocking flag material. A derived ready_to_proceed boolean on every lease is true only when the lease is complete and no blocking exception is left unresolved-or-rejected.

Grounded Q&A and the MCP surface

Claude Haiku answers natural-language questions over the stored extraction — never the raw PDF — so every answer cites a field_path, page, and snippet, and the citation clicks through to the same highlight overlay. The whole agent is also a six-tool MCP server (list_leases, get_lease, extract_lease, query_lease, list_exceptions, resolve_exception), so it runs inside Claude Desktop against the same Railway backend the SaaS UI uses.

Implementation considerations

The strict matcher was the right retreat, not the answer. After the fuzzy-matching iterations, the interim ship was exact-normalized-match-only: highlight only on an exact text-layer hit, and fail silently otherwise. That made the failure mode honest — no more wrong-place highlights — but it couldn't highlight a scanned page or a blank field at all, because there was no text-layer string to match. It bought time to build the real thing without shipping a dishonest one.

The model must not emit coordinates. The pattern I found across the OCR and document-layout tooling I looked at was consistent: let OCR own geometry and let the model own semantics. Coordinate generation from a raster is exactly where model-emitted boxes drift — it is the part to take away from the model, not tune. The pivot was to stop asking Claude for boxes and start deriving them from OCR positions the model never sees.

A many-image API limit bit late, on a rotated scan. Page renders are capped at 1950px on the longest side because Anthropic's many-image requests cap each image at 2000px — and every extraction sends nine section calls, each a many-image request. Letter at 150 DPI is fine; a legal-size or rotated-and-OCR'd scan would 400 the whole extraction with an image-dimensions error until the per-page scale was clamped.

The renderer is bbox-source-agnostic by construction. The frontend consumes bboxes: BoundingBox[] and is blind to where the boxes came from. The vision-to-OCR pivot touched zero renderer code, and the eventual production-grade upgrade — AWS Textract SELECTION_ELEMENT for checkbox and signature geometry — is the same clean backend swap.

Reflections

  • The saga was the lesson. Twelve-plus iterations of fuzzy text matching, then a strict-match retreat that failed silently, then model-emitted boxes that drifted, then OCR-anchored boxes that held. The answer turned out to be an architecture — OCR for geometry, model for semantics — not a better heuristic. The detour lives in the commit history and the status log; I left it there, because the wrong turns are the part of the story that's actually instructive.
  • Verified end-to-end, not measured. On a fresh lease I checked field by field: filled values land tight, a multi-line owner address renders as two stacked rectangles, and a blank monthly-rent field anchors on the labeled blank line in the rent section rather than the section header. There is no formal eval set yet, so I describe the highlight quality from direct verification — not a single accuracy percentage I couldn't defend under questioning.
  • The coverage gap is stated where the user sees it, not just in the docs. Checkboxes and signatures aren't highlighted; that geometry is deferred to a Textract follow-up. Rather than guess a box, the app navigates to the right page and shows no overlay — the same silent-miss-over-wrong-place rule that governs the whole feature. A reviewer is told the truth at the moment of the click.
  • Two surfaces, one backend. The SaaS frontend (Next.js on Vercel) and the six-tool MCP server both hit the same FastAPI + LangGraph backend (Railway). The MCP turns "show me what's flagged on this lease and fix the start date to January 1st" into a multi-tool exchange in Claude Desktop — list_exceptions, then get_lease for context, then resolve_exception with the correction — against live production data.
  • What's deferred is deferred honestly. No multi-tenant auth, no eval set, no closed-loop feedback yet. The exception queue already stores human corrections as first-class data; feeding those corrections back into the extraction prompt is the obvious next step and is explicitly not built. Calling that out is cheaper than discovering later that the demo implied it.

Closing observation

The most useful principle carried straight over from the rest of this portfolio's work:

Make the failure visible. A lease abstraction that highlights nothing when it isn't sure — and says so — is more trustworthy than one that confidently boxes the wrong line.

The highlight is the part a reviewer actually checks. Getting it honest mattered more than getting it everywhere — and that is the decision the whole system is organized around.