Running Qwen3.6 27B Locally on Dual RTX 3090s with vLLM v0.19
How I went from a blank Docker template to 116+ tok/s with speculative decoding, FlashInfer, and a 160k context window on dual 3090s.

Staff engineer and solutions architect with 10+ years leading teams and delivering production systems at scale. I bring the technical depth to design the right solution and the people skills to align teams, unblock engineers, and drive it across the finish line. From payment platforms handling millions of daily transactions to AI and LLM tooling — I own the problem end-to-end, from whiteboard to production.
How I went from a blank Docker template to 116+ tok/s with speculative decoding, FlashInfer, and a 160k context window on dual 3090s.
A tour of my open-source Proxmox utility toolkit — scripts, docs, and learning paths for homelabbers and enterprise engineers who are tired of repeating themselves.
The secret to a great AI assistant isn't the model — it's what you tell it about yourself. Here's the framework for building a system prompt that makes AI actually useful.
A real-world comparison of Qwen3.5 27B Q8 and 35B-A3B Q8 running locally on a dual RTX 3090 homelab — which one actually belongs in your daily workflow?
Uncle Bob reveals timeless architecture rules that eliminate framework dependencies, maximize testability, and enable systems to evolve gracefully across decades.
By Robert C. Martin
The foundational text teaching platform architects to model complex business domains through patterns that unite technical implementation with strategic organizational design.
By Eric Evans
Timeless principles for software craftsmanship that transform good programmers into pragmatic engineers who consistently deliver flexible, maintainable code.
By David Thomas, Andrew Hunt
Master technical leadership without management authority by understanding role expectations, strategic thinking, and making everyone around you better.
By Tanya Reilly