Blog

  • Hello Local AI World

    Like every good engineer, my first instinct is to overthink and overgeneralize everything. So my first thought with AI – seeing the direction that everything is going with pay-to-play, restrictions, potential privacy issues – was looking into local AI.

    Dipping in the Toes

    I started looking around the time OpenClaw started gaining popularity. At that time, everyone was talking about using a Mac Mini to safely sandbox it, so I went ahead and got a low-end Mac Mini (I didn’t have a personal Max device so this also could double for that at worst case).

    On that, I figured out that the easy way to get into local LLM tinkering was Ollama. With only 16GB RAM total since I invested low, my options were very limited, but after trying a few models I managed to get started with Qwen 3.5 2B, which could run reasonably fast, got tool calls right (unlike Llama 3) and was enough to get a taste for what it could do.

    I saw that OpenClaw was a security nightmare at the time, and doubly so if you’re running with a smaller (read: less intelligent) model. So I decided as my learning exercise, I would just start building up my own harness in python calling into Ollama. This was a great learning exercise to understand things like tool calls, context windows, harnessing workflows, etc.

    At this point though there was no good local coding model available, so I used Claude Code to do most of the coding. This was somewhat irritating because (at the $20/mo tier) it had significant limits per-time-of-day so I could code for a bit then had to stop and wait until the next time slot.

    Leveling Up

    In the meanwhile, since I wasn’t doing anything risky, I started migrating some of this to my main PC, what has an RTX 3080 w/ 10GB VRAM. This was enough to run a slightly larger Qwen 3.5 model (9B), as well as load up ComfyUI and start playing around with image and video generation.

    With this I started to be able to get more value out of both the local LLM (which between the maturing harness and better model could start to perform tasks fairly consistently) and the image/video generation. Between generating images and then using image-to-video, I was able to have some fun making a Pokemon trainer video clip for my daughter to test it out.

    Going All In

    The release of Qwen 3.6 was a gamechanger for local LLMs on two fronts:

    • For general local LLM usage, the 35B MOE model gets near-frontier performance attainable on consumer-level hardware.
    • For coding, the 27B model gets near-frontier performance attainable on consumer-level hardware.

    To get into the game, I decided to drop for a refurbished RTX 3090, the universally-acknowledged entry level into mainstream local LLM hosting with 24GB VRAM. With that slotted in, I was able to:

    • Host it with unsloth, which combines ease of hosting with a bit more flexibility than Ollama.
    • Harness it for coding with OpenCode for coding, which I then started stacking with skills, starting with Superpowers and adding Awesome Claude Skills on top of that (which Superpowers contains context for bootstrapping)

    With that, I was able to move all my coding off of Claude Code to local. For my “hello world” I had it create a Node.js based Pong game, which you can find at pong.djohnson.ai – this was created with a single prompt, and just slightly tweaked with one more to have it slow down the AI paddle since it was pretty much unbeatable. I even was able to get an MCP server hooked up to have it do the deployment to my site directly from OpenCode.

    The Future

    This is the worst that local models will ever be. Folks over at r/LocalLlama are already salivating over Qwen 3.7 which benchmarks competitively with closed frontier models. We’re continuing to see better models on low hardware requirements. While the cloud space is going to be volatile between regulations and monetization pressure and conflicts over datacenters, having a local setup that will be immune to those issues is looking less like premature optimization and more like a solid investment.

  • A new horizon

    Things are moving fast in big tech, and the world is changing faster than anyone can keep up. What used to take many hours of work and expertise can now be done faster than ever thanks to the productivity multiplier of AI. As someone with a day job and a family, it can often be hard to find the time for side projects, but thanks to breakthroughs in generating code, assets, and even project plans, I’ve found new opportunities to learn and grow. And I’d like to share that journey with you. Follow along as I share what I’m doing and learning, and let’s explore this new world together.