A voice for after
Record your voice while you still can. Speak in it when you can't.
What this is
§01Voice Legacy is a web app — also wrappable as iOS and Android via Capacitor — for people who have been told they will lose their voice, and for the families around them. It is, deliberately, one straight-line idea:
Record your voice while you still have it. From then on, type — or tap a saved phrase — and hear it played back in your own voice.
The voice clone is built by ElevenLabs. The app is the considered wrapper around that capability: a recording flow that keeps quality high, a typing surface that respects motor and cognitive limits, a dashboard that gives you control over what you've banked, and an architecture that keeps your audio out of any database under our control.
The name is meant literally. The thing you leave for the people who knew your voice is, in part, your voice.
Who it's for
§02The product is sharpest for progressive voice loss.
- ALS / MND
- The canonical case. Bulbar onset can take a voice in months; ALS forums recommend voice banking as one of the first things to do at diagnosis.
- Throat / oesophageal cancer
- Laryngectomy patients can plan ahead and bank ahead of surgery.
- Primary Progressive Aphasia (PPA)
- Language fades but cognition remains. Typing surfaces work; conversational AAC apps don't.
- Parkinson's
- Speech volume and clarity decline; banking gives later-stage prosthesis.
It is less sharp for stroke aphasia (language production is impaired, so typing-as-a-bridge doesn't work) or congenital speech-language impairments (the user never had the voice we're trying to preserve). That distinction matters: the name is misleadingly broad, and the landing copy makes the progressive-loss framing explicit for that reason.
The two phases
§03The product has exactly two screens that matter, and they map to two phases of the user's life.
While the user still has their voice, they record samples. A scripted reader with a live waveform and live volume monitoring; soft warning at 100s, hard auto-stop at 120s; explicit consent gate before submission.
- Audio sent to ElevenLabs' voice cloning endpoint
- ElevenLabs returns a voice_id
- We save the voice_id — not the audio
When typing replaces speaking, the user types a phrase or taps a saved one. We call ElevenLabs TTS with the user's voice_id and play the MP3 back through a waveform visualisation.
- Typing surface + quick-tap phrases
- 12 sensible defaults seeded per account
- Utterance history; one-tap to re-speak
Design decisions
§04A handful of choices were made on purpose, and they're worth enumerating because they shaped the codebase more than any framework did.
Aesthetic: Archive
A warm-paper, deep-ink, single-oxblood-accent visual identity with Fraunces as the display serif and IBM Plex Sans as the body. Editorial layouts, hairline rules, numbered headings — no glassmorphism, no gradients, no centred-on-radial-bg SaaS landing. The product carries weight; the chrome shouldn't be a generic-startup template. It also distances the app from sci-fi voice AI connotations, which would be a worse frame for a family at diagnosis than a literary one.
Voice provider: ElevenLabs
The competitive set includes Resemble.ai, Play.ht, Microsoft Custom Neural Voice, Acapela's my-own-voice. ElevenLabs wins on three axes: quality (accents and laughter come through; competitor TTS still sounds trained), a stable public API, and instant voice cloning from a few minutes of audio. The central dependency, kept abstracted in two API routes so it's the migration surface if pricing or service changes.
Backend: Supabase
Auth, Postgres, RLS, magic-link emails, and OAuth in one. RLS is the right primitive here — 'the user can only see their own voices.' Magic-link is the right auth primitive for the user base — 'an older person who shouldn't need to remember a password.' The two come bundled.
Auth: magic link + Google
Passwords were never considered. The target user is often older, often under stress from a diagnosis, often using an unfamiliar device. A password they'll forget is a barrier; an email link is closer to how a normal product works for them. Google added because it's frictionless for users already on Gmail. Apple omitted for now because the $99/yr developer fee isn't yet justified.
Why audio is never stored at rest
Voice samples go directly from the browser to /api/clone, are forwarded to ElevenLabs, and the response (voice_id) is the only thing kept. Raw audio is never written to Supabase storage, never written to disk in a serverless function, never logged. Voice samples are identity-grade biometric data. Holding them creates a target for breach and legal exposure under medical-context GDPR interpretations. Not holding them is simpler.
Why RLS-first, not application-code-first
Every table has Row-Level Security enabled at creation, scoped to auth.uid(). The worst case if app code ever has an authorization bug (a forgotten .eq('user_id', user.id)) is no data leak — Postgres refuses the row anyway. Defence-in-depth that costs nothing.
The stack
§05- Next.js 16 (App Router, Turbopack)
- React 19
- TypeScript 5 strict
- Tailwind CSS v4
- Mona Sans + IBM Plex Mono
- Supabase (Postgres + Auth + Storage)
- Row-Level Security on every table
- Server Actions for mutations
- Route handlers for upstream proxies
- ElevenLabs Instant Voice Cloning
- ElevenLabs TTS (multilingual v2)
- MediaRecorder API + AnalyserNode for live waveform
- Capacitor 8 (iOS + Android shells)
- WebView wrapper of deployed URL
- Single codebase across web, PWA, and native
- Vercel hosting
- GitHub-driven deploys
- Supabase Auth: magic link + 6-digit OTP + Google OAuth
Data model
§06Four tables, every one RLS-enforced. Worth showing because the shape is the product: voices are owned, phrases are scoped, utterances are logged, prefs travel with the user.
profilesRLS enforcedOne row per authenticated user; mirrors auth.users.id.
| id | uuid (PK, FK → auth.users.id) | |
| full_name | text | |
| condition | text | free-text, optional |
| default_voice_id | uuid (FK → voices.id) | |
| prefers_reduced_motion | boolean | |
| text_scale | real | 0.85–1.6 |
voicesRLS enforcedOne row per banked voice; a user can have many.
| id | uuid (PK) | |
| user_id | uuid (FK → auth.users.id) | |
| eleven_voice_id | text | ElevenLabs' opaque external ID |
| name | text | user-editable |
| sample_count | int | |
| total_duration_sec | real | |
| last_used_at | timestamptz | bumped on TTS |
phrasesRLS enforcedUser's quick-tap phrases, categorised and ordered.
| id | uuid (PK) | |
| user_id | uuid (FK → auth.users.id) | |
| text | text | ≤ 240 chars |
| category | text | default 'general' |
| position | integer | order within category |
utterancesRLS enforcedLog of every TTS call; powers 'Recently spoken'.
| id | uuid (PK) | |
| user_id | uuid (FK → auth.users.id) | |
| voice_id | uuid (FK → voices.id, nullable) | |
| voice_label | text | snapshot, survives voice delete |
| text | text | |
| created_at | timestamptz |
Security posture
§07What's actively defended. Plus an honest list of what's deferred — because a build journal that lists only successes isn't a build journal.
Actively defended
- Server-only secrets — ElevenLabs API key and Supabase service role never imported into client code
- RLS on every table — the database refuses any row that doesn't match auth.uid()
- Auth on every API route — /api/clone and /api/tts both require a session
- Voice ownership check on every TTS call — you can only speak in voices you own (or premade ones)
- Input validation: text length, file size, file count, MIME types
- HTTPS-only, HttpOnly cookies, SameSite=Lax, Origin check on sign-out
- Strict security headers: HSTS preload, X-Content-Type-Options, X-Frame-Options DENY, Permissions-Policy restricting mic/camera/geolocation
- PKCE magic links — bound to the device that requested them
- No raw audio at rest, anywhere
- Consent gate before any voice clone is created
- No XSS surface — React escapes by default, no dangerouslySetInnerHTML
Known gaps
- No per-user rate limit yet — mitigated by auth gates and upstream cost visibility
- No strict Content-Security-Policy — needs nonce-aware middleware; deferred
- No MFA UI — Supabase supports TOTP/WebAuthn; not exposed yet
- Service-role key reserved for future webhook handlers; unused today
Privacy & ethics
§08Voice cloning is a dual-use technology. The same model that lets a dying parent leave a message for their child also enables phone-fraud impersonation at scale. The product takes that seriously, and not by accident.
- Consent gate. The user must affirm ownership or consent before any audio leaves the device.
- Identity binding. Voices are owned by accounts; RLS prevents any user from speaking in another user's voice.
- No audio retention. The raw recordings are never persisted on our side — only the resulting ElevenLabs voice_id.
- Two-sided deletion. Removing a voice in the dashboard also deletes the model upstream on ElevenLabs — the artifact, not just the link.
What we don't yet do, and what's on the table if the product moves out of private beta: identity verification (passport / government ID), liveness checks during recording, watermarking on generated audio. Those become essential at scale.
Roadmap
§09What's shipped. What's coming, in rough priority order. No date pressure — this is the product in private beta, not a public launch.
- Magic-link + 6-digit code + Google OAuth
- Voice banking via MediaRecorder + scripts + consent
- Voice cloning through ElevenLabs
- TTS playback with live waveform
- 4 curated premade voices for instant try
- Multi-voice management (rename / delete / set-default)
- Categorised phrases with 12 sensible defaults seeded
- Utterance history with one-tap re-speak
- Accessibility prefs (reduce motion + 4 text-scale steps)
- PWA install + Capacitor iOS/Android shells
- Mobile-first responsive layout throughout
- RLS-everywhere, auth on every endpoint
- Stripe subscriptions (gates voice cloning)
- Family viewer access — invite a relative
- Speech-to-text recovery for residual but garbled speech
- Per-user daily TTS character quota
- Service worker + offline shell
- Drag-and-drop phrase reordering
- Apple Sign in (when there's reason to spend the $99)
- MFA UI
- Strict CSP via nonce middleware
- Playwright end-to-end test suite