2.7 KiB
2.7 KiB
Project Context: Self-Hosted AI Smart Speaker (The "Brain" Project)
Role for AI: You are an expert Linux System Administrator and AI Engineer. We are building a high-performance, local-first smart speaker system designed to replace cloud assistants with 100% private, GPU-accelerated intelligence.
🏗️ Hardware Environment
- Hypervisor: Proxmox VE.
- Physical Server: High-performance build with 32GB System RAM.
- GPU: NVIDIA GeForce RTX 3080 Ti (12GB VRAM).
- I/O Configuration: Intel VT-d enabled;
intel_iommu=onconfigured in GRUB.
🐧 Virtual Machine Architecture (The "Brain" VM)
- Guest OS: Ubuntu 24.04 LTS (Noble Numbat).
- BIOS/Firmware: SeaBIOS (Chosen specifically to bypass UEFI/Secure Boot/MOK signature complexities for NVIDIA drivers).
- CPU Configuration: 1 Socket, 4 Cores, Type:
host(for maximum instruction set compatibility). - Memory: 16GB RAM, Ballooning and KSM disabled (to ensure deterministic performance for AI workloads).
- Storage/Disk: VirtIO SCSI Single controller; LVM-based disk management with expansion capability.
- Networking: VirtIO (paravirtualized) for low-latency communication.
🛠️ Software Stack & Completed Milestones
- GPU Passthrough: Successfully isolated the RTX 3080 Ti from Proxmox using
vfio-pciand assigned it to the Ubuntu VM via PCI Passthrough. - Driver Layer: Installed NVIDIA Driver version
580.126.09(and CUDA 13.0) directly on the Ubuntu Guest. - Containerization: Docker Engine installed.
- The "Bridge" (Crucial): Successfully configured the NVIDIA Container Toolkit.
- Note: We had to use a workaround for Ubuntu 24.04 by pointing the
aptrepository to theubuntu22.04stable path because thenoblepath was missing/broken on NVIDIA's servers.
- Note: We had to use a workaround for Ubuntu 24.04 by pointing the
- Orchestration: Deployed a
docker-composestack containing:- Ollama: Running as the LLM engine (GPU-accelerated).
- Open WebUI: Running as the frontend interface for text-based testing.
🎯 Current Objective & Next Steps
We have successfully verified that nvidia-smi works inside a Docker container. The "Text-to-Text" pipeline is functional and running on the RTX 3080 Ti.
The next phases are:
- Phase 7 (Audio Input): Integrating a microphone array/stream into the Linux environment.
- Phase 8 (ASR - The Ears): Deploying
faster-whisperin a Docker container to transcribe audio to text. - Phase 9 (The Logic): Writing the Python "Glue" code to pipe audio from the mic
\rightarrowWhisper\rightarrowOllama\rightarrowHome Assistant API for automation execution.
Current Task: Verify the text-based interaction in Open WebUI and begin planning the integration of the Whisper ASR engine.