Voice Legacy · Build notes

A voice for after

Record your voice while you still can. Speak in it when you can't.

~8 min readNext.js 16 · Supabase · ElevenLabs · CapacitorLive in private beta

What this is

§01

Voice Legacy is a web app — also wrappable as iOS and Android via Capacitor — for people who have been told they will lose their voice, and for the families around them. It is, deliberately, one straight-line idea:

Record your voice while you still have it. From then on, type — or tap a saved phrase — and hear it played back in your own voice.

The voice clone is built by ElevenLabs. The app is the considered wrapper around that capability: a recording flow that keeps quality high, a typing surface that respects motor and cognitive limits, a dashboard that gives you control over what you've banked, and an architecture that keeps your audio out of any database under our control.

The name is meant literally. The thing you leave for the people who knew your voice is, in part, your voice.

Who it's for

§02

The product is sharpest for progressive voice loss.

ALS / MND: The canonical case. Bulbar onset can take a voice in months; ALS forums recommend voice banking as one of the first things to do at diagnosis.
Throat / oesophageal cancer: Laryngectomy patients can plan ahead and bank ahead of surgery.
Primary Progressive Aphasia (PPA): Language fades but cognition remains. Typing surfaces work; conversational AAC apps don't.
Parkinson's: Speech volume and clarity decline; banking gives later-stage prosthesis.

It is less sharp for stroke aphasia (language production is impaired, so typing-as-a-bridge doesn't work) or congenital speech-language impairments (the user never had the voice we're trying to preserve). That distinction matters: the name is misleadingly broad, and the landing copy makes the progressive-loss framing explicit for that reason.

The two phases

§03

The product has exactly two screens that matter, and they map to two phases of the user's life.

Phase

Bank

While the user still has their voice, they record samples. A scripted reader with a live waveform and live volume monitoring; soft warning at 100s, hard auto-stop at 120s; explicit consent gate before submission.

Audio sent to ElevenLabs' voice cloning endpoint
ElevenLabs returns a voice_id
We save the voice_id — not the audio

Phase

Speak

When typing replaces speaking, the user types a phrase or taps a saved one. We call ElevenLabs TTS with the user's voice_id and play the MP3 back through a waveform visualisation.

Typing surface + quick-tap phrases
12 sensible defaults seeded per account
Utterance history; one-tap to re-speak

Design decisions

§04

A handful of choices were made on purpose, and they're worth enumerating because they shaped the codebase more than any framework did.

Aesthetic: Archive

A warm-paper, deep-ink, single-oxblood-accent visual identity with Fraunces as the display serif and IBM Plex Sans as the body. Editorial layouts, hairline rules, numbered headings — no glassmorphism, no gradients, no centred-on-radial-bg SaaS landing. The product carries weight; the chrome shouldn't be a generic-startup template. It also distances the app from sci-fi voice AI connotations, which would be a worse frame for a family at diagnosis than a literary one.

Voice provider: ElevenLabs

The competitive set includes Resemble.ai, Play.ht, Microsoft Custom Neural Voice, Acapela's my-own-voice. ElevenLabs wins on three axes: quality (accents and laughter come through; competitor TTS still sounds trained), a stable public API, and instant voice cloning from a few minutes of audio. The central dependency, kept abstracted in two API routes so it's the migration surface if pricing or service changes.

Backend: Supabase

Auth, Postgres, RLS, magic-link emails, and OAuth in one. RLS is the right primitive here — 'the user can only see their own voices.' Magic-link is the right auth primitive for the user base — 'an older person who shouldn't need to remember a password.' The two come bundled.

Auth: magic link + Google

Passwords were never considered. The target user is often older, often under stress from a diagnosis, often using an unfamiliar device. A password they'll forget is a barrier; an email link is closer to how a normal product works for them. Google added because it's frictionless for users already on Gmail. Apple omitted for now because the $99/yr developer fee isn't yet justified.

Why audio is never stored at rest

Voice samples go directly from the browser to /api/clone, are forwarded to ElevenLabs, and the response (voice_id) is the only thing kept. Raw audio is never written to Supabase storage, never written to disk in a serverless function, never logged. Voice samples are identity-grade biometric data. Holding them creates a target for breach and legal exposure under medical-context GDPR interpretations. Not holding them is simpler.

Why RLS-first, not application-code-first

Every table has Row-Level Security enabled at creation, scoped to auth.uid(). The worst case if app code ever has an authorization bug (a forgotten .eq('user_id', user.id)) is no data leak — Postgres refuses the row anyway. Defence-in-depth that costs nothing.

The stack

§05

Frontend

Next.js 16 (App Router, Turbopack)
React 19
TypeScript 5 strict
Tailwind CSS v4
Mona Sans + IBM Plex Mono

Backend

Supabase (Postgres + Auth + Storage)
Row-Level Security on every table
Server Actions for mutations
Route handlers for upstream proxies

ElevenLabs Instant Voice Cloning
ElevenLabs TTS (multilingual v2)
MediaRecorder API + AnalyserNode for live waveform

Native

Capacitor 8 (iOS + Android shells)
WebView wrapper of deployed URL
Single codebase across web, PWA, and native

Infra

Vercel hosting
GitHub-driven deploys
Supabase Auth: magic link + 6-digit OTP + Google OAuth

Data model

§06

Four tables, every one RLS-enforced. Worth showing because the shape is the product: voices are owned, phrases are scoped, utterances are logged, prefs travel with the user.

profilesRLS enforced

One row per authenticated user; mirrors auth.users.id.

id	uuid (PK, FK → auth.users.id)
full_name	text
condition	text	free-text, optional
default_voice_id	uuid (FK → voices.id)
prefers_reduced_motion	boolean
text_scale	real	0.85–1.6

voicesRLS enforced

One row per banked voice; a user can have many.

id	uuid (PK)
user_id	uuid (FK → auth.users.id)
eleven_voice_id	text	ElevenLabs' opaque external ID
name	text	user-editable
sample_count	int
total_duration_sec	real
last_used_at	timestamptz	bumped on TTS

phrasesRLS enforced

User's quick-tap phrases, categorised and ordered.

id	uuid (PK)
user_id	uuid (FK → auth.users.id)
text	text	≤ 240 chars
category	text	default 'general'
position	integer	order within category

utterancesRLS enforced

Log of every TTS call; powers 'Recently spoken'.

id	uuid (PK)
user_id	uuid (FK → auth.users.id)
voice_id	uuid (FK → voices.id, nullable)
voice_label	text	snapshot, survives voice delete
text	text
created_at	timestamptz

Security posture

§07

What's actively defended. Plus an honest list of what's deferred — because a build journal that lists only successes isn't a build journal.

Actively defended

Server-only secrets — ElevenLabs API key and Supabase service role never imported into client code
RLS on every table — the database refuses any row that doesn't match auth.uid()
Auth on every API route — /api/clone and /api/tts both require a session
Voice ownership check on every TTS call — you can only speak in voices you own (or premade ones)
Input validation: text length, file size, file count, MIME types
HTTPS-only, HttpOnly cookies, SameSite=Lax, Origin check on sign-out
Strict security headers: HSTS preload, X-Content-Type-Options, X-Frame-Options DENY, Permissions-Policy restricting mic/camera/geolocation
PKCE magic links — bound to the device that requested them
No raw audio at rest, anywhere
Consent gate before any voice clone is created
No XSS surface — React escapes by default, no dangerouslySetInnerHTML

Known gaps

No per-user rate limit yet — mitigated by auth gates and upstream cost visibility
No strict Content-Security-Policy — needs nonce-aware middleware; deferred
No MFA UI — Supabase supports TOTP/WebAuthn; not exposed yet
Service-role key reserved for future webhook handlers; unused today

Privacy & ethics

§08

Voice cloning is a dual-use technology. The same model that lets a dying parent leave a message for their child also enables phone-fraud impersonation at scale. The product takes that seriously, and not by accident.

Consent gate. The user must affirm ownership or consent before any audio leaves the device.
Identity binding. Voices are owned by accounts; RLS prevents any user from speaking in another user's voice.
No audio retention. The raw recordings are never persisted on our side — only the resulting ElevenLabs voice_id.
Two-sided deletion. Removing a voice in the dashboard also deletes the model upstream on ElevenLabs — the artifact, not just the link.

What we don't yet do, and what's on the table if the product moves out of private beta: identity verification (passport / government ID), liveness checks during recording, watermarking on generated audio. Those become essential at scale.

Roadmap

§09

What's shipped. What's coming, in rough priority order. No date pressure — this is the product in private beta, not a public launch.

Shipped

Magic-link + 6-digit code + Google OAuth
Voice banking via MediaRecorder + scripts + consent
Voice cloning through ElevenLabs
TTS playback with live waveform
4 curated premade voices for instant try
Multi-voice management (rename / delete / set-default)
Categorised phrases with 12 sensible defaults seeded
Utterance history with one-tap re-speak
Accessibility prefs (reduce motion + 4 text-scale steps)
PWA install + Capacitor iOS/Android shells
Mobile-first responsive layout throughout
RLS-everywhere, auth on every endpoint

Stripe subscriptions (gates voice cloning)
Family viewer access — invite a relative
Speech-to-text recovery for residual but garbled speech
Per-user daily TTS character quota
Service worker + offline shell
Drag-and-drop phrase reordering
Apple Sign in (when there's reason to spend the $99)
MFA UI
Strict CSP via nonce middleware
Playwright end-to-end test suite