Cuda

Self-Hosted AI Inference Cluster featured image

Self-Hosted AI Inference Cluster

Multi-user local LLM inference platform running on dual NVIDIA RTX 3090s with 48GB combined VRAM, serving Qwen 3.6 35B via vLLM.

Read more