← All apps Side project

Lucy

A voice and chat assistant that runs entirely on my own LAN — no cloud, no API keys. Speak from any browser in the house; a local model answers, a local voice replies, and a ferrofluid orb pulses in time with the speech.

Role
Solo — full stack, AI pipeline, frontend
Platform
Python · FastAPI · self-hosted
Status
Working — v0.2
Links
Source private — demo on request

Why

Nothing leaves the machine unless a tool explicitly reaches the internet. That constraint — no cloud, no API keys — was the design brief, and an excuse to learn the local-AI stack hands-on, from voice activity detection to vector memory.

What it does

  • Conversation — type or talk from any browser on the LAN; a local LLM (Ollama) replies, spoken aloud by a local Piper voice.
  • Memory — long-term recall across sessions via Mem0 + ChromaDB with local embeddings.
  • Tools — web search, weather, news, maps, YouTube Music — all server-side, all keyless.
  • The orb — a Canvas-rendered ferrofluid orb driven by a Web Audio analyser, pulsing with the synthesized speech.

How the system works

  • Browsermic + chat, WebSocket to server
  • STTfaster-whisper, int8 on CPU
  • LLMOllama, resident model, tool calls
  • MemoryMem0 + ChromaDB embeddings
  • TTSPiper ONNX → MP3 stream → orb

The pipeline is fully streaming and async (FastAPI + asyncio): tokens, audio chunks and orb-amplitude data flow over one WebSocket. The hard part is latency on consumer hardware — keeping the model resident, quantizing STT, and streaming TTS so the orb starts speaking before the sentence is even finished generating.

Stack

Python 3.11FastAPIWebSockets / asyncio Ollamafaster-whisperPiper TTS Mem0 + ChromaDBWeb Audio APICanvas 2D