Self-Hosted AI Inference Cluster
Multi-user local LLM inference platform running on dual NVIDIA RTX 3090s with 48GB combined VRAM, serving Qwen 3.6 35B via vLLM.
•
1 min read
Read more
Multi-user local LLM inference platform running on dual NVIDIA RTX 3090s with 48GB combined VRAM, serving Qwen 3.6 35B via vLLM.