Voice AI Built for the Real World

Applied research at the intersection of voice, safety, and intelligent systems.

See What We're Building

Compliance Ready

Privacy by Design

On-Device STT/TTS

No cloud audio streaming

<850ms TTFA

Time to first audio

20-Layer Safety

Multi-stage content filter

Our Research Pillars

< 850ms TTFA

Voice Pipeline Architecture

End-to-end investigation of low-latency STT → safety → LLM → TTS architectures. We optimize every millisecond from utterance to response.

20-Layer

Safety Filter Design

Multi-layer content safety systems with real-time input and output filtering, designed to operate with zero latency using pre-computed fallback responses.

COPPA HIPAA

Design-First Compliance

We build compliance into the architecture from day one, not as a legal afterthought. Our voice pipeline is engineered to meet COPPA and HIPAA standards by design: no personal information collection, on-device audio processing so voices are never stored in the cloud, parental consent and visibility built into the product, full data deletion rights, and encrypted data handling across every layer. Whether we're building for children's education or exploring healthcare and elder care applications, regulatory compliance is a product feature, not a checkbox.

Flagship Product

Memo Kids

Voice AI your kids can trust

The first COPPA-compliant voice AI companion built for Pre-K to Grade 5. Natural conversation, on-device safety filtering, and curriculum-aligned intelligence.

✓ COPPA ✓ On-Device < 850ms

Learn more

Architecture

The Intelligent Voice Pipeline

On-Device Speech-to-Text

Speech is captured and transcribed locally using native platform APIs on both iPhone and Android. Audio never leaves the device. This eliminates cloud audio streaming latency and ensures COPPA compliance from the first byte.

Safety Filter

Transcribed text is sent to a backend safety filter that evaluates content safety in real time. Harmful, off-topic, or personally identifying content is intercepted before reaching the LLM. If unsafe input is detected, a pre-recorded safe redirect audio response plays instantly.

LLM Response Generation

Safe prompts are routed to a large language model for fast, age-appropriate, contextually rich responses. The system maintains conversational context across multiple turns, ensuring natural back-and-forth dialogue with educational relevance.

On-Device Text-to-Speech

Responses are synthesized into natural speech on-device using native platform APIs. Character voices bring personality to each response, making learning feel like a conversation with a friend. On-device synthesis means zero cloud costs and instant playback.

Latest from the Lab

View all posts

Architecture

Apr 28, 2026

Building a Sub-Second Voice Pipeline with On-Device STT

How we achieved sub-second time-to-first-audio by moving speech recognition on-device, eliminating cloud round-trip latency entirely.

Safety

Apr 20, 2026

Multi-Layer Safety: Content Filtering for Children's Voice AI

A walkthrough of our multi-stage safety architecture : from real-time evaluation on device to server-side fallback triggers.

Product

Apr 10, 2026

Character Voice Design for Kids

Designing the Memo companion character and the voice pipeline that brings its personality to life : warmth, pacing, and prosody that feels right for a child.