Voice AI Built for the Real World
Applied research at the intersection of voice, safety, and intelligent systems.
See What We're BuildingSafety & Performance
Compliance Ready
Privacy by Design
On-Device STT/TTS
No cloud audio streaming
<850ms TTFA
Time to first audio
20-Layer Safety
Multi-stage content filter
Our Research Pillars
Voice Pipeline Architecture
End-to-end investigation of low-latency STT → safety → LLM → TTS architectures. We optimize every millisecond from utterance to response.
Safety Filter Design
Multi-layer content safety systems with real-time input and output filtering, designed to operate with zero latency using pre-computed fallback responses.
Design-First Compliance
We build compliance into the architecture from day one, not as a legal afterthought. Our voice pipeline is engineered to meet COPPA and HIPAA standards by design: no personal information collection, on-device audio processing so voices are never stored in the cloud, parental consent and visibility built into the product, full data deletion rights, and encrypted data handling across every layer. Whether we're building for children's education or exploring healthcare and elder care applications, regulatory compliance is a product feature, not a checkbox.
The Intelligent Voice Pipeline
On-Device Speech-to-Text
Speech is captured and transcribed locally using native platform APIs on both iPhone and Android. Audio never leaves the device. This eliminates cloud audio streaming latency and ensures COPPA compliance from the first byte.
Safety Filter
Transcribed text is sent to a backend safety filter that evaluates content safety in real time. Harmful, off-topic, or personally identifying content is intercepted before reaching the LLM. If unsafe input is detected, a pre-recorded safe redirect audio response plays instantly.
LLM Response Generation
Safe prompts are routed to a large language model for fast, age-appropriate, contextually rich responses. The system maintains conversational context across multiple turns, ensuring natural back-and-forth dialogue with educational relevance.
On-Device Text-to-Speech
Responses are synthesized into natural speech on-device using native platform APIs. Character voices bring personality to each response, making learning feel like a conversation with a friend. On-device synthesis means zero cloud costs and instant playback.
Latest from the Lab
View all postsBuilding a Sub-Second Voice Pipeline with On-Device STT
How we achieved sub-second time-to-first-audio by moving speech recognition on-device, eliminating cloud round-trip latency entirely.
Multi-Layer Safety: Content Filtering for Children's Voice AI
A walkthrough of our multi-stage safety architecture : from real-time evaluation on device to server-side fallback triggers.
Character Voice Design for Kids
Designing the Memo companion character and the voice pipeline that brings its personality to life : warmth, pacing, and prosody that feels right for a child.