ML / Language Learning

Speech Translator & Hindi Tutor

Real-time speech-to-speech translator and AI-generated Pimsleur-style Hindi lesson system — all on-device

Tech Stack

Whisper (MLX / WhisperX)MarianNMTKokoro TTSPiper TTSMeta SeamlessM4T v2Claude CLIFastAPI + WebSocketsPython

Overview

Two ML-powered language tools built around on-device inference — no data leaves the machine.

The Real-time Speech Translator captures microphone audio via WebSocket streaming, transcribes with Whisper, translates via MarianNMT (Helsinki-NLP), and synthesizes speech with Piper TTS. It also supports Meta's SeamlessM4T v2 as an alternative end-to-end speech-to-speech pipeline, selectable at runtime. Six language pairs are supported: English, Russian, German, French, Spanish, and Hindi.

The Hindi Tutor is a daily lesson generation pipeline that pulls from Google Docs notes, homework corrections, and Pimsleur transcripts, then uses Claude to plan 30-minute lessons with drills, conversations, and spaced repetition. Kokoro TTS renders the final audio with separate instructor and Hindi speaker voices. 28+ lessons have been generated and practiced since March 2026.

Screenshots

Live translator — English → Hindi with pipeline log

Hindi tutor — Lesson #29 with live transcript and audio player

Vocabulary cards with spaced repetition tracking

Key Features

Dual Translation Pipeline

Modular (Whisper → MarianNMT → Piper) or end-to-end (SeamlessM4T v2), selectable at runtime depending on quality and latency needs.

Daily Lesson Generation

Full pipeline: fetch sources → identify struggle words → plan lesson → render 30-min audio → schedule reviews. All automated.

Pimsleur-Style Drills

Slow-then-normal pronunciation, learner pauses, conversation role-play, and vocabulary recycling — modeled after the Pimsleur method.

Homework Integration

Pulls corrections from Google Docs, turns mistakes into targeted drills with spaced repetition at 1, 3, 7, 14, and 30 day intervals.

On-Device Inference

All models (Whisper, MarianNMT, Kokoro, Piper, SeamlessM4T) run locally. MLX-optimized for Apple Silicon, CUDA support for GPU machines.

6 Language Pairs

English, Russian, German, French, Spanish, and Hindi. MarianNMT handles translation with language-specific model pairs.

Architecture

The translator uses a FastAPI backend with WebSocket streaming for real-time audio capture and playback. The modular pipeline chains Whisper (MLX for Apple Silicon, WhisperX with large-v3 for CUDA) for transcription, MarianNMT for translation, and Piper ONNX for speech synthesis.

The Hindi Tutor pipeline is orchestrated by Claude CLI, which composes lessons from multiple source materials (Google Docs homework, Pimsleur transcripts, vocabulary lists). Kokoro 82M renders dual-voice audio (instructor + Hindi speaker). A custom spaced repetition tracker schedules vocabulary reviews at configurable intervals.

← Back to all projects