Project Context: Self-Hosted AI Smart Speaker (The "Brain" Project)

Role for AI: You are an expert Linux System Administrator and AI Engineer. We are building a high-performance, local-first smart speaker system designed to replace cloud assistants with 100% private, GPU-accelerated intelligence.

🏗️ Hardware Environment

Hypervisor: Proxmox VE.
Physical Server: High-performance build with 32GB System RAM.
GPU: NVIDIA GeForce RTX 3080 Ti (12GB VRAM).
I/O Configuration: Intel VT-d enabled; intel_iommu=on configured in GRUB.

🐧 Virtual Machine Architecture (The "Brain" VM)

Guest OS: Ubuntu 24.04 LTS (Noble Numbat).
BIOS/Firmware: SeaBIOS (Chosen specifically to bypass UEFI/Secure Boot/MOK signature complexities for NVIDIA drivers).
CPU Configuration: 1 Socket, 4 Cores, Type: host (for maximum instruction set compatibility).
Memory: 16GB RAM, Ballooning and KSM disabled (to ensure deterministic performance for AI workloads).
Storage/Disk: VirtIO SCSI Single controller; LVM-based disk management with expansion capability.
Networking: VirtIO (paravirtualized) for low-latency communication.

🛠️ Software Stack & Completed Milestones

GPU Passthrough: Successfully isolated the RTX 3080 Ti from Proxmox using vfio-pci and assigned it to the Ubuntu VM via PCI Passthrough.
Driver Layer: Installed NVIDIA Driver version 580.126.09 (and CUDA 13.0) directly on the Ubuntu Guest.
Containerization: Docker Engine installed.
The "Bridge" (Crucial): Successfully configured the NVIDIA Container Toolkit.
- Note: We had to use a workaround for Ubuntu 24.04 by pointing the apt repository to the ubuntu22.04 stable path because the noble path was missing/broken on NVIDIA's servers.
Orchestration: Deployed a docker-compose stack containing:
- Ollama: Running as the LLM engine (GPU-accelerated).
- Open WebUI: Running as the frontend interface for text-based testing.

🎯 Current Objective & Next Steps

We have successfully verified that nvidia-smi works inside a Docker container. The "Text-to-Text" pipeline is functional and running on the RTX 3080 Ti.

The next phases are:

Phase 7 (Audio Input): Integrating a microphone array/stream into the Linux environment.
Phase 8 (ASR - The Ears): Deploying faster-whisper in a Docker container to transcribe audio to text.
Phase 9 (The Logic): Writing the Python "Glue" code to pipe audio from the mic \rightarrow Whisper \rightarrow Ollama \rightarrow Home Assistant API for automation execution.

Current Task: Verify the text-based interaction in Open WebUI and begin planning the integration of the Whisper ASR engine.

2.7 KiB Raw Blame History

Project Context: Self-Hosted AI Smart Speaker (The "Brain" Project)

🏗️ Hardware Environment

🐧 Virtual Machine Architecture (The "Brain" VM)

🛠️ Software Stack & Completed Milestones

🎯 Current Objective & Next Steps

2.7 KiB

Raw Blame History