Skip to content
← Back to work
Case Study / Sift

Sift

A news aggregator with civic footnotes. Reads ~50 outlets across the political spectrum, AI-summarizes today's stories across 10 categories — and on top, every politician, organization, bill, and political term in an article links to a structured dossier sourced from public records.

10 categories · ~50 outlets · AI summariesTopic search + multi-source compareDossiers · primer · inline glossaryFastAPI + LangGraph + AnthropicOpen ↗Source ↗
Translation artifact
2024–2026Solo build
01Problem
Situation: News aggregators optimize for volume — more sources, more headlines, more frequent updates. I started Sift believing the gap was upstream of the reader: that AI-curated summaries across categories, AI-powered topic search, and multi-source comparison could deliver what wire feeds couldn't.
Complication: Sift shipped the aggregator and the AI features worked, but once it was in real use, the AI-summary layer alone wasn't the differentiator I thought it would be. Most readers can read a wire description and an AI summary and not really tell the difference. The actual bottleneck surfaced underneath: most readers don't know who the players are. They can read five outlets on the same Senate vote and still not know who the senator is, what the bill does, or who funds the relevant lobbying body.
Question: What can be added on top of a working news aggregator that turns it into a daily-driver people actually learn from?
02Requirements
  • Engaged reader

    Civic scaffolding inside the reading flow — who the senator is, what the bill does, what the lobbying group wants — without leaving the article.

    The bottleneck turned out to be context, not volume

  • Daily-driver user

    Sift has to work as a real news app first — 10 categories, summaries, topic search, comparison, bookmarks. The civic layer adds value; it doesn't replace the daily browsing experience.

  • Methodology defender

    Every civic claim sourced from public records (OpenSecrets, GovTrack, ProPublica, FARA, FEC, Vote Smart) with citations one click away. AllSides + MBFC surfaced verbatim; no computed bias judgments.

  • Latency-sensitive UX

    Browse experience stays ~50ms. Heavier AI work (compare, topic search) lives on its own path and streams, accepting ~10–15s because the user is asking for analysis, not browsing.

    Two-path AI architecture: pre-computed for browse, live for compare/search

03Decision

AI-powered aggregator with a civic-literacy layer on top — both shipped, AI split by SLA

chosen
  • meets criterion: Daily-driver fit
  • meets criterion: Civic context legibility
  • meets criterion: Live latency on browse
  • partially meets criterion: Build effort

The aggregator is the daily-driver experience that builds the habit; the civic layer is what makes the daily reading worth doing. Stacked together, every article comes with the civic context the news assumes the reader already has — politicians, organizations, bills, outlets, terms, comparisons — without losing the categorized-feed UX. AI splits by SLA: browse is pre-computed and served from Postgres in ~50ms; compare and topic search run live and accept ~10–15s. The build is heavier than either layer alone, but the surface is unfakeable: anyone with an API key can build AI summaries; the dossier graph, the public-records sourcing, and the methodology are the part that has to be earned.

AI-curated aggregator only — better summaries, topic search, compare; no civic layer

  • partially meets criterion: Daily-driver fit
  • does not meet criterion: Civic context legibility
  • meets criterion: Live latency on browse
  • meets criterion: Build effort

Civic-literacy reader only — dossiers and glossary without the daily news flow

  • does not meet criterion: Daily-driver fit
  • meets criterion: Civic context legibility
  • meets criterion: Live latency on browse
  • partially meets criterion: Build effort
04Solution

Two layers — a working news aggregator (foundation) plus a civic-literacy layer (differentiator) — over a Next.js + FastAPI + LangGraph stack with AI split between a background pipeline and live endpoints.

The reader surface
News across 10 categories from ~50 vetted outlets. AI-generated summaries on every article (pipeline-side, not click-side). Topic search via Voyage AI vector similarity with SSE streaming and Claude web-search fallback. Multi-source comparison via a LangGraph fan-out workflow that pulls coverage across outlets, extracts claims, and shows the framing side-by-side. Bookmarks (Clerk-synced), dark/light themes, auth.
The civic-literacy layer
*'What you should know first'* — an adaptive primer above each story with the key terms and context the article assumes you already have. Inline glossary on every civic term, with chip tooltips and click-through to the full dossier. Civic dossiers for politicians (committees, top industries by PAC contributions, interest-group ratings), organizations (political lean, finances, funders, FARA registration), bills (status, sponsor, cosponsors, lobbying spend), and news outlets (ownership, AllSides + MBFC ratings) — all sourced from public records. Cross-spectrum framing shows how Left / Center / Right outlets covered the same story.
AI split by SLA
The browse path is pre-computed in a background pipeline (FastAPI + LangGraph + Anthropic on Railway, 10-minute cadence) and served from Neon Postgres in ~50ms. The live AI path — compare and topic search — runs AI on request and accepts ~10–15s because the user is asking for analysis. Ten services run on the pipeline: primer generation, entity extraction, entity linking, summarization, story synthesis, story clustering, civic context, batched API, cross-source comparison, usage tracking.
Public-records sourcing
Every civic claim cites its source — OpenSecrets, GovTrack, ProPublica Nonprofit Explorer, FARA, FEC, Vote Smart. Outlet political-lean and factual-reporting come from AllSides + MBFC, shown verbatim. Sift never computes its own ratings; the methodology is public at /methodology.
05Outcome
  • Reader surface

    10 categories · ~50 outlets

    AI summaries · topic search · multi-source compare

  • Civic-literacy layer

    Primer + glossary + 4 dossier types

    OpenSecrets · GovTrack · ProPublica · FARA · FEC

  • Browse latency

    ~50ms

    ~15s~50ms

    AI moved to background; compare/search on a separate live path

  • Pipeline

    10 LangGraph services

    Primer · entity extraction · linking · synthesis · compare

Overview

A news aggregator with civic footnotes. Sift reads from ~50 outlets across the political spectrum, AI-summarizes today's stories across 10 categories, and lets you search any topic or compare coverage across sources — and on top of that, every politician, organization, bill, and political term in an article links to a structured dossier sourced from public records (OpenSecrets, GovTrack, ProPublica, FARA, FEC). The aggregator is the foundation. The civic-literacy layer is what makes it different.

RoleStrategy, design, and engineering (frontend + backend)
Year2024–2026
DomainNews + civic-literacy media
StackNext.js · FastAPI · LangGraph · Anthropic · Neon Postgres + pgvector
StatusShipped

The hypothesis I started with — and what got added

I started building an AI-curated alternative to wire-feed aggregators: a hundred-plus sources, AI summaries across 10 categories, semantic topic search, side-by-side multi-source comparison of how outlets covered the same event. The aggregator shipped and worked. But once it was in real use, the AI-summary layer alone wasn't the differentiator I thought it would be — most readers can read a wire description and an AI summary and not really tell them apart.

The bottleneck wasn't the summary. It was that most readers don't already know who the players are. They can read five outlets on the same Senate vote and still have no idea who the senator is, what the bill does, what the relevant lobbying body wants, or how the framing has shifted from the last time the question came up.

So I kept the aggregator and added a civic-literacy layer on top. Same engineering substrate, expanded unit of value.

What's there now

The reader surface — foundation

  • 10 news categories — Top, Technology, Business, Science, Energy, World, Health, Politics, Sports, Entertainment.
  • AI summaries on every article, generated in the background pipeline.
  • Topic search — vector similarity over a pre-built article index (Voyage AI + pgvector), SSE streaming, Claude web-search fallback for niche queries.
  • Multi-source comparison — LangGraph fan-out across outlets, claim extraction, side-by-side framing.
  • Bookmarks with Clerk server sync; dark/light themes ("Late Edition" / "Newsprint"); auth.

The civic-literacy layer — differentiator

  • "What you should know first" — adaptive primer above each story. Key terms and context the article assumes you already have, AI-generated at ingest, expandable when the story sits on top of complex policy (the Inflation Reduction Act, debt-ceiling mechanics, FTC consent decrees).
  • Inline glossary — civic terms surface contextually inside the article itself. Chip tooltips with previews; click-through to the full dossier.
  • Civic dossiers — every politician, organization, bill, and news outlet in a story links to a structured page: committee assignments, top industries by PAC contributions, interest-group ratings, ownership, funding, FARA registration, AllSides political-lean rating, MBFC factual-reporting tier. All sourced from public records. Citations one click away.
  • Cross-spectrum framing — when multiple outlets cover the same story, what each Left / Center / Right outlet chose to emphasize. AllSides + MBFC shown verbatim; Sift never computes its own ratings.

Pipeline — AI split by SLA

Early on the AI work happened at click-time, and certain endpoints cost 15+ seconds per load. So the architecture split the AI into two paths by SLA:

  • Browse path: pre-computed in a background pipeline on a 10-minute cadence; frontend reads enriched content from Postgres in ~50ms. The whole category-browsing experience is a database read.
  • Live AI path: multi-source compare and topic search still run AI live (fan-out across outlets, claim extraction, web search). They accept ~10–15s and stream as they go, because the user is asking for analysis, not browsing.

The pipeline (FastAPI + LangGraph + Anthropic on Railway) runs ten services: primer generation, entity extraction, entity linking to dossiers, summarization, story synthesis, story clustering, civic context generation, batched API client, comparison workflow, usage tracking.

Implementation considerations

Source curation became a quieter problem than I expected. The corpus is curated mainstream — Reuters, AP, BBC, NYT, WSJ, Bloomberg, Axios, Politico, plus trade press by sector. That's fine for civic decoding (dossiers, glossary, primers work the same regardless of source), but it neutralized the more aggressive features I had built (trust scoring, propaganda tagging, extremism flags). Mainstream outlets do not give those features anything to do. I left them out of the shipped surface rather than apply judgment claims I could not defend at the corpus level.

The dossier graph proved more interesting than the cluster graph. Cluster-based comparison was the original organizing unit — group articles about the same event, show the side-by-sides. What proved more useful in usage was the entity graph: every politician, organization, and bill becomes a node, every story becomes a path, and a reader following any story gets pulled into the surrounding civic structure. The cluster surface still ships. The dossier surface is what gets daily use.

Reflections

  • Per-paragraph primer triggers. Primers attach to articles by topic match. A reader-pacing model that surfaces primers in the reading flow, not above the fold, is the next move; instrumentation is in flight.
  • Outlet observation, not outlet judgment. Cross-spectrum compare shows side-by-side framing. The version worth shipping next describes what each outlet chose to emphasize, without claiming any of them are wrong. Observation is honest; labeling is brittle.
  • A trust layer for the corpus it was built for. The flags I built (trust scoring, propaganda tagging, extremism) produce nothing meaningful applied to mainstream sources. The genuinely useful version of that layer would apply to propaganda outlets — a different product on a different corpus with a different audience.

Closing observation

The combination matters more than either piece alone. An aggregator without civic context is a commodity — every news app shows the same Reuters and AP stories in roughly the same order. A civic-context tool without the daily news flow is a research database — useful when you need it, not the daily-driver experience that builds a habit. An aggregator with civic footnotes — the news app that adds the civic context the news assumes you already have — is what changes the reading experience.

I started building the aggregator. The civic layer is what made it Sift.