A daily editorial pipeline disguised as a web app
Forty-eight RSS feeds in. One curated, risk-scored, entity-graphed, sentiment-tracked, regulation-aware briefing out — every morning, across web, email, push, Slack, and the Play Store, with the same Claude call doing the editorial work a newsroom would do at four in the morning.
What this is
§01Neural Oversight exists to answer one question, every morning, for AI governance professionals, policy makers, safety researchers, CTOs and venture capitalists:
What happened in AI today that I need to know about — and what does it mean?
Where a typical reader gives you a chronological feed, Neural Oversight performs an editorial pipeline on your behalf. Every night (and on-demand) it pulls articles, de-duplicates them against everything it has already seen, asks Claude to act as a senior intelligence analyst, clusters related stories together, generates a structured newsletter, then broadcasts that across web, email, push and Slack — with a Trusted Web Activity wrapper shipping the same app to the Play Store.
- Audience
- AI governance professionals, policy makers, safety researchers, CTOs, and venture capitalists. People who need a 2-minute morning read, not a feed to scroll.
- Business model
- Free, no paywall. Hosted on Vercel Pro; Supabase + Resend on cloud tiers. Costs controlled by consolidating LLM work into a single selection call per ingest.
- Stack
- Next.js 14 (App Router) + TypeScript + Tailwind + Radix UI + Supabase (Postgres / Auth / Storage) + Anthropic Claude (Haiku 4.5) + Resend + Web Push (VAPID) + Bubblewrap TWA.
- Position
- Two products in one repo: the Next.js app (deployed to Vercel, hosted at app.neuraloversight.com + neuraloversight.com) and a Bubblewrap Android shell that wraps the PWA as a Trusted Web Activity for the Play Store.
The mental model that makes the rest make sense
§02If you only remember one thing about Neural Oversight, remember this: it is a daily editorial pipeline disguised as a web app. Everything you see in the UI — the dashboard, the chat, the trends, the regulation tracker, the entity graph — is a view onto a single curated stream produced once a day by GET /api/ingest.
One endpoint runs the entire pipeline. Every other surface is downstream of that endpoint. Understand the cron, understand the product.
That single endpoint is gated by a timing-safe Bearer-token comparison against CRON_SECRET, declares maxDuration = 300 (Vercel Pro's hard cap), and is wired up to Vercel Cron at 08:00 UTC every day in vercel.json. It can also be triggered manually from a button on the dashboard via /api/ingest/trigger.
The ingestion pipeline, stage by stage
§03The pipeline is twelve stages long, runs end-to-end in under five minutes, and is wrapped in try/catch at every fragile boundary so a flaky source, a bad AI response, or a 410-Gone push endpoint cannot block tomorrow's run.
- 01Load active sources
Pulls every row from sources where is_active = true. Currently ~48 hand-picked AI feeds: Anthropic, OpenAI, DeepMind, MIT Tech Review, Wired, arXiv cs.AI, Stanford HAI, EU AI Act news, FLI, MIRI, Brookings, RAND, regulators, safety institutes, substacks.
- 02Fetch RSS in batches of 10
rss-parser wrapped with an 8-second timeout, raced against a 10-second hard kill. HTML entities decoded; tags stripped into a clean snippet. Per-source errors are collected but never fail the job.
- 03Recency filter (48h)
Articles older than 48 hours by RSS published_at are dropped. Missing dates pass through — let the AI decide.
- 04URL-level deduplication
Queries articles in batches of 200 to find which URLs already exist. Only new URLs survive into the candidate pool.
- 05Single Claude call — selection + categorisation
Up to 60 candidates sent in ONE prompt (title + first 200 chars + source + date). Claude (Haiku 4.5) acts as a senior AI intelligence analyst, returning up to 20 selected articles with exactly 5 marked as top stories, plus category, subcategory, 1–2 sentence summary, risk score (1–10), relevance score (1–10), sentiment score (-1.0…+1.0), up to 5 entities with salience, 1–3 topics, and a reasoning string.
- 06Upsert
Each selected article is upserted into articles with onConflict: 'url', ignoreDuplicates: true. Idempotent — running the cron twice is safe.
- 07Post-processing (best-effort)
Five passes run in parallel, each in its own try/catch: canonicalised entity extraction → article_entities link table; topic aggregation into topic_trends; sentiment-shift detection vs a 7-day baseline → sentiment_alerts; heuristic regulation detection (~20 keywords + jurisdiction + status) → regulations; trigram clustering of titles via find_similar_cluster RPC → story_clusters.
- 08Daily briefing generation
Two parallel Claude calls. First — the legacy briefing (plain-text headline + 4–6 bullets + a 'Watch:' analysis). Second — full-text extraction of top-story URLs via Mozilla Readability (3000 chars max, 5 concurrent workers, 8s timeout), fed into generateStructuredBriefing to produce a 15-word headline, 200–300-word deep dive on the #1 story, top stories with why_it_matters, 5–8 quick hits, tools & launches, trend watch, and key themes.
- 09Email broadcast
Paginates through all Supabase auth users; filters out anyone in email_preferences.unsubscribed. Per-recipient send (not BCC) via Resend, each with their own signed one-click unsubscribe URL. RFC 8058 List-Unsubscribe + List-Unsubscribe-Post headers so Gmail honours the one-click.
- 10Webhook + Web Push
POSTs to every active Slack/webhook in notification_channels. Fans out to every row in push_subscriptions via web-push using VAPID. 404/410 endpoints auto-pruned in the same pass — no orphaned subscriptions.
- 11Weekly roundup (Mondays only)
getPreviousWeekRange() returns null on non-Mondays. On Mondays: emerging/declining topics vs the previous week, entity movers, regulatory updates, AI-written executive summary and outlook. One row upserted into weekly_roundups, keyed by week_start.
- 12Response
{ success, new_articles, fetched, reasoning, errors, timestamp }. Errors collected but non-fatal — the response is the audit log for that run.
Why this design, not the obvious one
§04There are three big architectural decisions in the pipeline, all of them counter-intuitive at first glance.
One AI call for selection, not one per article
The obvious design is per-article: send each candidate to the LLM, get back metadata, store. That's hundreds of calls a day with no shared context. Instead Neural Oversight sends up to 60 candidates in a single prompt and asks Claude to act as a global editor — selecting the top 20, marking exactly 5 as top stories, deduplicating across sources, balancing categories. One call is dramatically cheaper AND produces better editorial judgement because the model can see the whole front page at once.
Best-effort post-processing, not transactional
Entity extraction, trend aggregation, sentiment alerts, regulation detection and clustering are each wrapped in their own try/catch. A flaky migration or a malformed entity row cannot break the whole ingest. The article gets in; the metadata gets enriched if it can; tomorrow's run picks up the slack.
Idempotent by construction
onConflict: 'url' on articles. onConflict: 'date' on briefings. onConflict: 'topic,date' on topic_trends. unique(week_start) on weekly_roundups. The cron can fire twice in a row, or be manually re-triggered, with no duplicates, no double-emails, no broken state. This matters more than it sounds — it makes the pipeline safe to retry under any failure.
Cluster summaries regenerated at thresholds, not every insert
When a new article joins an existing cluster, the cluster's weighted-average sentiment is recomputed cheaply in SQL. But the AI-written cluster summary is only regenerated when the cluster crosses size thresholds of 2, 5, 10 or 20. A cluster of 7 articles uses the summary written when it had 5. Caps cost; keeps summaries stable; users don't see the wording flicker.
The data model
§05Sixteen numbered migrations under supabase/migrations/. The Supabase database is the single source of truth — Next.js carries no ORM models, no shadow schema. The tables below are the ones load-bearing enough to know about.
id, title, url (unique), source, published_at, ingested_at, summary, raw_content, category, subcategory, risk_score, relevance_score, sentiment_score, sentiment_label, entities JSONB, topics TEXT[], cluster_id, is_top_story. Full-text GIN index on title+summary+content; partial indexes on is_top_story = true and risk_score >= 8; trigram via pg_trgm.
type (rss/scrape), category, fetch_interval_hours, last_fetched_at, plus credibility_score, bias_label, factual_rating, credibility_notes (migration 011).
date PK, content (legacy flat text), key_themes TEXT[], article_count, structured_content JSONB containing { headline, deep_dive, top_stories, quick_hits, tools_and_launches, trend_watch, key_themes }.
Created by the find_similar_cluster RPC using pg_trgm trigram similarity against clusters created in the last 72 hours. Threshold 0.35. Carries weighted-average sentiment + an AI summary regenerated at size thresholds 2/5/10/20.
entities deduplicates companies/people/orgs/technologies/legislation. article_entities is the many-to-many with per-occurrence salience and sentiment. entities.article_count + latest_sentiment are recomputed each ingest.
topic + date PK. Article count, avg sentiment, sample article IDs. Powers the trend dashboards and the sentiment-shift detector.
Generated when a topic's sentiment shifts by >= 0.3 vs the 7-day baseline. Surfaced in the Alerts inbox.
Heuristic extraction (~20 keywords). Jurisdiction + status (proposed / committee / passed / enacted) + related article links.
week_start unique. JSONB content payload with executive summary, top stories, emerging/declining topics, entity movers, regulatory updates, outlook.
User saves with priority (low/normal/high/urgent) and status (open/reviewing/resolved/dismissed). Strict RLS — read public, write only your own.
Stores conversation history with cited_article_ids per assistant turn. Demo user gated by per-session + global daily caps.
endpoint + p256dh + auth keyed to user_id. Daily cron fans out to all rows via web-push; 410/404 endpoints auto-pruned.
RLS — strict by default
Every table has RLS enabled. Reads are gated by authenticated; writes are gated by service_role (the pipeline) or auth.uid() = user_id (flags, queue, push subs). On top of that, migration 016 adds RESTRICTIVE deny policies for the demo user — defence in depth against any API path forgetting to check.
Two domains, one deployment
§06One Next.js deployment serves both a marketing site and a gated app from different hosts. src/middleware.ts is the routing brain.
- neuraloversight.com
- Marketing. / is rewritten (not redirected) to /marketing — URL bar stays clean. /privacy, /terms, /about, /sources-directory rewrite to /marketing/*. Anything else redirects to app.<host>.
- app.neuraloversight.com
- The app. /marketing/* is blocked (redirects to /). Public paths: /login, /auth/*, /unsubscribed, /demo. Every other path runs Supabase SSR auth — no user, no entry, straight to /login.
- Security headers
- STRICT_HEADERS on /login, /auth, /marketing, /api/*: X-Frame-Options: DENY + CSP frame-ancestors 'none'. FRAMABLE_HEADERS elsewhere with frame-ancestors 'self' https://tomphillips.uk https://www.tomphillips.uk https://*.vercel.app — which is exactly what lets this portfolio embed Neural Oversight in an iframe.
The dashboard — fourteen queries in parallel
§07The signed-in homepage is a server component that fires fourteen Supabase queries via Promise.all([...]) and assembles a rich editorial layout. Most queries use count: 'exact', head: true so they're constant-time — the page renders in a single round trip.
- Time-of-day-aware personalised greeting, picked deterministically from day-of-epoch so the same greeting sticks for the whole day. First name resolved from user_metadata.first_name → full_name → name → email local-part.
- Colour temperature — the page subtly tints based on today's avgRiskScore: calm (emerald), moderate (blue), warm (amber), elevated (red). Border, glow, accent dot all shift together.
- Breaking news banner — polls /api/breaking for any article with risk_score ≥ 8 ingested in the last 6 hours.
- Ticker tape — articles today, delta vs yesterday, risk level, high-risk count, open flags, active sources, top source, all-time total. The 'Articles' cell carries a 7-day sparkline.
- High-risk alert card — appears whenever any article scores ≥ 7. Lists the top 3 with inline score badges.
- Editorial top-stories block — hero (story #1) + two secondary + a compact row of #4–5.
- Daily Briefing card — renders the rich structured_content if present (headline → deep dive teaser → quick hits → tools & launches → trend watch → key themes), or falls back to the legacy flat text format.
- Sentiment Pulse — live 7-day aggregate sentiment chart.
- Latest articles — 20 most recent non-top-story articles, each rendered as an ArticleCard, with staggered fade-in animations.
The API surface — 27 routes
§08All API responses carry Cache-Control: no-store and the strict CSP/X-Frame-Options. A small handful do the interesting work; the rest are thin Supabase views.
| Method | Route | Purpose |
|---|---|---|
| GET | /api/ingest | Full pipeline (cron-authenticated, 5min cap) |
| POST | /api/ingest/trigger | Authenticated manual ingest from dashboard |
| GET | /api/articles | Filtered feed; /export for CSV/JSON |
| GET | /api/breaking | risk_score ≥ 8, last 6h — drives the banner |
| GET | /api/briefing | Today's briefing (legacy + structured) |
| GET | /api/clusters | Story clusters with summaries |
| GET | /api/trends | Topic timeseries |
| GET | /api/entities | Catalogue + per-entity drill-down |
| GET | /api/graph | Entity co-occurrence graph data |
| GET | /api/regulatory | Regulation tracker |
| GET | /api/intelligence | Frontier-lab competitive aggregates |
| GET | /api/sentiment-pulse | 7-day sentiment for dashboard widget |
| GET | /api/credibility | Per-source credibility metadata |
| GET PATCH | /api/sources | List + toggle sources (admin) |
| POST PATCH DELETE | /api/flags | User flags (RLS-protected) |
| GET POST DELETE | /api/queue | Reading queue CRUD |
| GET PATCH | /api/alerts | Sentiment-shift alerts (read/mark-read) |
| GET | /api/weekly | Weekly roundup retrieval |
| GET | /api/videos | Curated YouTube videos |
| POST | /api/chat | Streaming SSE Ask the Feed |
| GET POST DELETE | /api/notifications | Slack/webhook channels |
| POST | /api/push/subscribe | Persist Web Push subscription |
| GET | /api/unsubscribe | One-click email unsubscribe (RFC 8058) |
| GET | /api/proxy | Server-side fetch proxy for cross-origin assets |
| GET | /api/proxy/extract | Readability full-text for in-app viewer |
| GET | /api/health | Liveness probe |
| POST | /api/account/delete | GDPR-compliant deletion |
Ask the Feed — the chat that knows what was reported today
§09The most interesting endpoint is /api/chat. It's a streaming SSE interface that turns today's articles into context and lets the user ask questions over them.
- 01Verify the user via Supabase SSR cookies.
- 02If demo user — enforce per-session cap (default 3 messages, tracked via demo-session-id cookie) AND global daily cap (default 500/day) using demo_chat_usage + demo_chat_daily_cap. Increment BEFORE the Anthropic call to prevent race conditions under concurrency.
- 03Validate the body: message required, ≤ 2000 chars.
- 04Pull today's 50 most recent articles. Build a context block per article — title, source, category, sentiment label, summary, entities, topics.
- 05Resolve or create chat_conversations for the user; append the user message to chat_messages.
- 06Stream Claude (Haiku 4.5) back to the browser as Server-Sent Events. Emit data: { conversation_id } first, then data: { text } chunks, then data: { done: true, full_text }. Persist the assistant turn after streaming completes.
Why SSE rather than WebSockets? One-direction streams, no handshake complexity, plays nicely with Vercel's serverless functions, and the Anthropic SDK already speaks it. WebSockets would have been overkill for what is functionally an unbounded HTTP response.
Demo mode — the shared, read-only sandbox
§10The demo is a clever piece of plumbing that lets this portfolio embed a fully working Neural Oversight in an iframe, without giving every visitor an account. It's also the model used for the Narrate demo on this same site.
- 01A real Supabase Auth user exists with email demo@neuraloversight.com. Its UUID is stored in demo_config.demo_user_id and process.env.DEMO_USER_ID.
- 02Visiting /demo (directly or via iframe) triggers a server-side magic-link generation: supabaseAdmin.auth.admin.generateLink({ type: 'magiclink', email }) returns a token_hash without sending an email.
- 03supabase.auth.verifyOtp({ type: 'magiclink', token_hash }) exchanges the token for a session.
- 04All auth cookies are written with SameSite=None; Secure; HttpOnly; Partitioned so they survive cross-origin iframe requests. A demo-session-id UUID cookie is set for per-visitor rate limiting.
- 05Redirect to / with a valid session. Layout shows the DemoBanner. The user is in.
Write protection — three layers deep
- App layer — lib/demo.ts exports isDemoUser() and helper response builders. Every mutating endpoint checks and returns 403 DEMO_READONLY if the demo user attempts a write.
- Database layer — RLS RESTRICTIVE policies on flags and push_subscriptions use the is_demo_user(auth.uid()) function to block writes regardless of which API path called.
- Chat — rate-limited per session AND per day, with the counter incremented BEFORE the Anthropic call to defeat concurrent demo attempts.
The demo is never reset. Writes are blocked, so seeded state (a few flagged articles, a sample conversation) lives forever and every visitor sees the same curated view.
The email pipeline
§11The daily email goes out once per recipient (never BCC), each with their own signed one-click unsubscribe URL. Three files orchestrate it: email/send.ts (the loop), email/template.ts (HTML + plain text), and email/unsubscribe.ts (the signed URL mint).
- Per-recipient send — each user gets a personalised unsubscribe URL embedded in their email. BCC would have been simpler; this is correct.
- RFC 8058 one-click — List-Unsubscribe + List-Unsubscribe-Post: List-Unsubscribe=One-Click. Gmail honours this without a round-trip; deliverability stays high.
- /api/unsubscribe flips email_preferences.unsubscribed = true and redirects to /unsubscribed.
- Subject format: 'Neural Oversight - 20 May 2026'.
- Templates are aware of both formats — structured_content if present (rich newsletter), legacy flat text otherwise.
- Resend From address is configurable via RESEND_FROM_EMAIL; default is briefing@neuraloversight.com.
Push notifications, and the Android TWA
§12VAPID keys live in NEXT_PUBLIC_VAPID_PUBLIC_KEY and VAPID_PRIVATE_KEY, generated via scripts/generate-vapid-keys.mjs.
- PushNotifications component (in layout.tsx) registers public/sw.js and calls /api/push/subscribe to persist the { endpoint, p256dh, auth } triple.
- Daily ingest calls sendBriefingPushNotification(articleCount, topStories) which fans out to every row via Promise.allSettled so one bad endpoint cannot fail the broadcast.
- 404/410 endpoints are pruned in the same pass — no orphan subscriptions.
- Android TWA inherits notifications because Bubblewrap proxies them straight to Android's native notification system.
The Android wrapper
neural-oversight-android/ is a Bubblewrap-generated Trusted Web Activity. It is a minimal Android shell that boots into a Chrome Custom Tab pointed at app.neuraloversight.com. The whole UX is the PWA. There is no native code. The wrapper exists for one reason — to ship to the Play Store.
- twa-manifest.json defines package id (com.neuraloversight.app), host, theme colour (#37352F), icon URLs, signing key reference.
- app-release-bundle.aab is the Play Store upload artifact. app-release-signed.apk is sideload-ready.
- Digital Asset Links served at /.well-known/assetlinks.json by Next.js prove domain ownership to Chrome — that's what hides the URL bar in the TWA.
- scripts/setup-android.sh regenerates the Bubblewrap project when the web manifest changes.
Security posture
§13The app handles AI governance news, not customer data — but governance professionals don't trust tools that don't look like they understand security. Every layer is hardened deliberately.
- Auth — Supabase magic-link, SSR cookies handled by @supabase/ssr. Middleware blocks anonymous access to every non-public path.
- RLS — strict policies on every table. Reads gated by authenticated; writes by service_role or auth.uid() = user_id. Demo user has RESTRICTIVE deny.
- CSP — frame-ancestors allowlists strictly limit which origins can iframe the app. Strict (no framing) on /login, /auth, /api, /marketing.
- Headers everywhere — X-Content-Type-Options: nosniff, Referrer-Policy: strict-origin-when-cross-origin, Permissions-Policy: camera=(), microphone=(), geolocation=().
- Cron auth — timing-safe (crypto.timingSafeEqual) Bearer-token comparison on /api/ingest. Header manipulation cannot bypass it.
- API cache — Cache-Control: no-store on every /api/* response.
- DOMPurify sanitises any AI-generated HTML before render.
- Idle timeout signs users out after extended inactivity.
- RFC 8058 one-click unsubscribe is respected BEFORE any send — not after.
Performance and reliability notes
§14- Dashboard fires ~14 queries in parallel; most are count-only (head: true) and constant-time. Page renders in one round trip.
- articles has partial indexes on the hot paths — is_top_story = true and risk_score ≥ 8.
- RSS fetches batch in groups of 10 with a 10-second hard timeout per source. One slow feed cannot block the whole job.
- Full-text extractor caps at 3000 chars and 5MB response size, with an 8-second abort.
- AI calls are deliberately consolidated — one selection call, one briefing call, one structured briefing call per ingest. Cluster summaries only regenerate at thresholds (2/5/10/20).
- Push notifications use Promise.allSettled so one bad endpoint cannot fail the broadcast. 410/404 pruned in the same pass.
- Every ingestion stage is wrapped in try/catch with errors collected and returned in the response — a flaky AI response does not block tomorrow's run.
The signed-in surfaces
§15Beyond the dashboard, the sidebar gives access to fourteen views onto the same curated stream. They split into three groups.
- Dashboard — editorial home
- Feed — full feed + category filters
- Flagged — your flagged articles
- Digest — full daily briefing
- Weekly — weekly roundup
- Sources — manage RSS + credibility scores
- Watch — saved watchlist / topic monitor
- Trends — topic dashboards from topic_trends
- Entities — catalogue + sentiment
- Regulatory — by jurisdiction + status
- Connections — entity co-occurrence graph
- Competitive — frontier-lab aggregates
- Alerts — sentiment-shift inbox
- Ask the Feed — Claude chat over today's articles
- Reading Queue — save for later
Deployment, cost, and the cron
§16- Hosting
- Vercel Pro. Two domains point at one project: neuraloversight.com (marketing) and app.neuraloversight.com (the app). Middleware routes them to the right pages — one deployment, two products.
- Cron
- vercel.json registers one job: 0 8 * * * → GET /api/ingest. Vercel automatically attaches Authorization: Bearer $CRON_SECRET. The timing-safe comparison inside the endpoint accepts only that exact bearer.
- Function limits
- maxDuration = 300 on /api/ingest. maxDuration = 60 on /api/chat. Both rely on Vercel Pro's extended execution limits.
- Database
- Hosted Supabase (free or pro tier). All persistent state lives here. Migrations applied in order via Supabase SQL editor.
- Resend for all transactional + briefing email. Per-recipient send, RFC 8058 one-click, signed unsubscribe URLs.
- Cost lever
- The single AI selection call per ingest is the cost lever. Everything else is downstream of that call.
Where to make changes
§17A working map. If the next change is X, the file to open is Y.