Lucy — Local AI Voice Assistant

Why

Nothing leaves the machine unless a tool explicitly reaches the internet. That constraint — no cloud, no API keys — was the design brief, and an excuse to learn the local-AI stack hands-on, from voice activity detection to vector memory.

What it does

Conversation — type or talk from any browser on the LAN; a local LLM (Ollama) replies, spoken aloud by a local Piper voice.
Memory — long-term recall across sessions via Mem0 + ChromaDB with local embeddings.
Tools — web search, weather, news, maps, YouTube Music — all server-side, all keyless.
The orb — a Canvas-rendered ferrofluid orb driven by a Web Audio analyser, pulsing with the synthesized speech.

How the system works

Browsermic + chat, WebSocket to server
STTfaster-whisper, int8 on CPU
LLMOllama, resident model, tool calls
MemoryMem0 + ChromaDB embeddings
TTSPiper ONNX → MP3 stream → orb

The pipeline is fully streaming and async (FastAPI + asyncio): tokens, audio chunks and orb-amplitude data flow over one WebSocket. The hard part is latency on consumer hardware — keeping the model resident, quantizing STT, and streaming TTS so the orb starts speaking before the sentence is even finished generating.

Stack

Python 3.11FastAPIWebSockets / asyncio Ollamafaster-whisperPiper TTS Mem0 + ChromaDBWeb Audio APICanvas 2D