Speech Translator & Hindi Tutor
Real-time speech-to-speech translator and AI-generated Pimsleur-style Hindi lesson system — all on-device
Tech Stack
Overview
Two ML-powered language tools built around on-device inference — no data leaves the machine.
The Real-time Speech Translator captures microphone audio via WebSocket streaming, transcribes with Whisper, translates via MarianNMT (Helsinki-NLP), and synthesizes speech with Piper TTS. It also supports Meta's SeamlessM4T v2 as an alternative end-to-end speech-to-speech pipeline, selectable at runtime. Six language pairs are supported: English, Russian, German, French, Spanish, and Hindi.
The Hindi Tutor is a daily lesson generation pipeline that pulls from Google Docs notes, homework corrections, and Pimsleur transcripts, then uses Claude to plan 30-minute lessons with drills, conversations, and spaced repetition. Kokoro TTS renders the final audio with separate instructor and Hindi speaker voices. 28+ lessons have been generated and practiced since March 2026.
Screenshots

Live translator — English → Hindi with pipeline log

Hindi tutor — Lesson #29 with live transcript and audio player

Vocabulary cards with spaced repetition tracking
Key Features
Dual Translation Pipeline
Modular (Whisper → MarianNMT → Piper) or end-to-end (SeamlessM4T v2), selectable at runtime depending on quality and latency needs.
Daily Lesson Generation
Full pipeline: fetch sources → identify struggle words → plan lesson → render 30-min audio → schedule reviews. All automated.
Pimsleur-Style Drills
Slow-then-normal pronunciation, learner pauses, conversation role-play, and vocabulary recycling — modeled after the Pimsleur method.
Homework Integration
Pulls corrections from Google Docs, turns mistakes into targeted drills with spaced repetition at 1, 3, 7, 14, and 30 day intervals.
On-Device Inference
All models (Whisper, MarianNMT, Kokoro, Piper, SeamlessM4T) run locally. MLX-optimized for Apple Silicon, CUDA support for GPU machines.
6 Language Pairs
English, Russian, German, French, Spanish, and Hindi. MarianNMT handles translation with language-specific model pairs.
Architecture
The translator uses a FastAPI backend with WebSocket streaming for real-time audio capture and playback. The modular pipeline chains Whisper (MLX for Apple Silicon, WhisperX with large-v3 for CUDA) for transcription, MarianNMT for translation, and Piper ONNX for speech synthesis.
The Hindi Tutor pipeline is orchestrated by Claude CLI, which composes lessons from multiple source materials (Google Docs homework, Pimsleur transcripts, vocabulary lists). Kokoro 82M renders dual-voice audio (instructor + Hindi speaker). A custom spaced repetition tracker schedules vocabulary reviews at configurable intervals.