It’s been a busy week in the world of large language models. OpenAI’s GPT-5 is now live, but that’s not all from the LLM leader.
For the first time since GPT-2 over six years ago, the company has also released fully licensed open-weight models. In one move, OpenAI has given everyday builders a smarter flagship model and the freedom to run smaller versions on their own hardware. No cloud required.
Why GPT-5 matters
GPT-5 isn’t just a spec-bump over GPT-4. It has the potential to rewrite how we build, test and ship software. Faster reasoning, cleaner code on the first pass, and a new family of open-weight models mean development moves from API calls in the cloud to fully owned stacks on your own metal.
- One model family, three sizes. GPT-5 replaces the alphabet soup of 3-series and 4-series options with a clear line-up: GPT-5, 5 Mini and 5 Nano. Most users will default to GPT-5. Heavy traffic is directed to Mini. Nano is reserved for ultra-low-latency edge cases. The simplification alone will save dev teams hours of head-scratching.
- Built-in ‘thinking’. GPT-5 decides when to engage its reasoning chain instead of forcing you to pick a separate “think” model. In practice, that means fewer toggles and more consistent answers in ChatGPT and the API.
- Sharper code on the first try. OpenAI claims GPT-5 delivers more “zero-shot” solutions—code that runs correctly without iterative prompting. As someone who has spent late nights wrestling with earlier models, I’m keen to test that promise.
- Choose your voice. You can now select personalities like Robot, Nerd, or Listener for voice or text sessions instead of what they had out of the box, which was a little too friendly and flirty.
And yes, there were still a ton of em dashes in the demo.
The open-weight drop
Open-weight models (20 B and 120 B parameters) are available under an Apache 2.0 license. OpenAI pitches them as a way for governments, start-ups and hobbyists to “run and customise AI on their own infrastructure,” broadening access beyond the cloud paywall.
I’ve already pulled the weights and run them on a 24 GB GPU workstation. The experience is eye-opening but comes with caveats:
- You’ll need RAM or cash. Expect at least 24 GB of VRAM or a Mac with 128 GB of unified memory to comfortably load the larger checkpoint.
- Power spikes are real. Each prompt can push a desktop GPU to ~400 W for a few seconds, roughly equivalent to consuming a cup of coffee a day if you hammer it nonstop.
- Guardrails can be re-trained out. Like other open models, these weights will be fine-tuned in the wild, both for beneficial and malicious purposes. Openness is a feature, not a guarantee.
Part of it is playing catch-up and saving face. OpenAI has ‘open’ in its name, but it hasn’t been open.
OpenAI was the last major player to hold out. Meta, Anthropic, and several Chinese labs have shipped weights for months. Releasing GPT-OSS now keeps OpenAI in the credibility race, while giving builders like us new deployment options for privacy-sensitive projects.
What teams should do next?
- Prototype with GPT-5, budget with 5 Mini. Start with the flagship to validate quality, then benchmark Mini for production cost per token.
- Spin up an on-prem sandbox. If data residency or latency is critical, test the 20 B open-weight model on a local GPU server.
- Stay compliant. Open weights don’t absolve you of content policy or safety obligations. Implement your own filters before shipping any customer-facing content.
What’s next?
GPT-5 gives us a smarter, more opinionated chat partner. The open weights give us sovereignty over where and how that intelligence runs. Together, they move the industry closer to software defined by conversation rather than code.
We’re ready to start folding GPT-5 and its open cousins into prototypes at BitBakery. If you’re weighing when—or whether—to make the leap, drop us a line. Let’s build something extraordinary.