We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

OpenCode: Open-Source Claude Code Alternative
Published on 2026.07.01 by DeepInfra
OpenCode: Open-Source Claude Code Alternative

Open your cloud bill after a month of heavy agent use and the number stops being abstract. Teams report coding-assistant costs in the hundreds of dollars per developer, and some now set token budgets the way they once rationed cloud compute. Then in June 2026 the US government barred non-Americans from Anthropic’s Fable 5, and Anthropic switched off access for everyone. For the first time, access to a frontier coding model depended on a government’s say-so.

That is the backdrop against which OpenCode, the open-source Claude Code alternative from the SST team, stopped looking like a side project. OpenCode decouples the agent from the model. Claude Code runs Claude and only Claude. OpenCode runs whatever you point it at, including open-weight models you can host yourself or rent by the token. The result is an agent that answers to your budget and your hardware, not one vendor’s roadmap or one government’s export policy. A model-agnostic agent is worth switching to under one condition, that the models underneath it are open, cheap, and capable. We will cover what OpenCode does, how it compares to Claude Code, and which open-weight models on DeepInfra make the swap pay off.

What OpenCode Is, and Why It Counts as an Open-Source Claude Code Alternative

OpenCode is an open-source coding agent released under the MIT license, and by mid-2026 it had become the most-starred coding agent on GitHub, passing 178,000 stars on a curve that turned sharply upward early in the year. The SST team built it as a client/server program, not a single CLI binary, so one backend drives a terminal TUI, a desktop app, and IDE extensions for VS Code and Cursor.

The design choice that matters most is the one in the name. OpenCode separates the agent harness from the model. Claude Code is tuned end to end around Anthropic’s models and talks to them and only them. OpenCode connects to 75+ providers through Models.dev, including Anthropic, OpenAI, Google, Moonshot, Z.ai, local runtimes through Ollama, and any OpenAI-compatible endpoint you hand it. That last category is the one that matters here: it is how a hosted open-weight model becomes a first-class backend.

Two more features make OpenCode read like a serious open-source Claude Code alternative rather than a thin wrapper. It spawns Language Server Protocol servers and feeds compiler diagnostics back to the model after each edit, so a fresh TypeScript type error lands in the next turn and the model corrects itself. And it keeps Git-based snapshots, so an aggressive edit-and-run loop stays recoverable: when a change goes wrong, you roll it back with /undo rather than babysitting every shell call.

OpenCode vs Claude Code: The Differences That Matter

The two agents share a goal and disagree on almost every default. The clearest split is model freedom. Claude Code is optimized around a single vendor, which buys tight integration and predictable quality. OpenCode treats the model as a swappable backend, which buys leverage: route a planning step to a stronger model and the bulk edits to a cheaper one.

Execution philosophy differs too. Claude Code defaults to read-only and asks before destructive actions, so you approve each command, edit, and shell call. OpenCode flips the default toward action and leans on Git snapshots and /undo for safety. Neither is strictly safer. One front-loads human review, the other front-loads speed and trusts version control to catch the misses.

The third axis is context efficiency, which quietly drives cost. MCP servers are convenient, but a handful of active ones can eat roughly a quarter of a 200,000-token window before any real work starts. The two agents attack this differently. Claude Code automates the cleanup: its tool-search approach has been measured cutting a 77,000-token tool payload to about 8,700 tokens, an 85 percent reduction. OpenCode hands you the controls instead. It warns you to be choosy about MCP servers and lets you disable any of them in config, compacts long sessions automatically as they near the limit, and exposes its tools and prompts openly so you can see exactly what is eating the window.

DimensionClaude CodeOpenCode
LicenseProprietaryMIT, open source
ModelsAnthropic only75+ providers, any OpenAI-compatible endpoint
Default executionAsk before actingAct, roll back with /undo
Safety netPermission promptsGit snapshots
LSP diagnostics fed backNoYes
InterfaceCLITUI, desktop, IDE extensions

On raw speed and instruction-following with its native model, Claude Code still leads. In Builder.io’s head-to-head test running the same Claude Sonnet 4.5 through both, Claude Code finished four tasks in 9 minutes 9 seconds to OpenCode’s 16 minutes 20 seconds, though OpenCode wrote more tests.

Why an Open-Source Claude Code Alternative Only Pays Off With Open Weights

Model freedom is a feature only when the models are worth running. This is where the macro story matters. In June 2026 a Beijing lab, Zhipu (Z.ai), shipped GLM 5.2, which the research firm Artificial Analysis ranked as the most intelligent open-source model on the market, fourth overall behind three closed American systems. The Economist reported it runs at less than a tenth of the price of Anthropic’s Fable 5. Anthropic keeps a real quality edge, roughly 17 percent across an average of benchmarks, but that is no longer the chasm it was a year ago.

Open weights change the risk profile, not only the price. A model whose weights are published can run on hardware you control, beyond any vendor’s billing portal or export rule. When Anthropic switched off Fable 5 during the June ban, anyone standardized on it had no fallback inside the same tool. Zhipu’s cofounder framed the contrast as “radical openness,” and argued that external blockades leave closed systems “subject to revocation at any moment.”

For a coding agent you might run thousands of times a day, that combination, competitive capability at a fraction of the price with no revocation risk, is the whole case for pairing an open agent with open weights. Our breakdown of open vs closed source AI models covers the wider tradeoffs, but the short version is that capability has caught up faster than price has. That is exactly the regime where open weights win on coding workloads.

Best Open-Weight Models to Run in OpenCode on DeepInfra

OpenCode will call any of these through DeepInfra’s OpenAI-compatible API. The picks below map to the jobs a coding agent actually does. Coding is capability-sensitive, so current generations usually earn their keep, with one cheap model held back for high-volume grunt work.

For the lead reasoning slot, GLM-5.2 is the strongest open-weight option in 2026. It carries the capability crown from the GLM line that our GLM-5.1 overview tracked through SWE-Bench Pro and NL2Repo, and its 1M-token context window fits a large repository map without aggressive pruning. Use it for planning, multi-file refactors, and any task that needs sustained judgement across a long session.

For the write-and-edit loop, Qwen3-Coder-480B-A35B-Instruct-Turbo is the value pick. It is a 480B mixture-of-experts model with 35B active parameters, so you pay for a fraction of the network per token, and at $0.30 in and $1.00 out per million tokens it is cheap enough to fire on every edit. Our rundown of the best models for OpenClaw agentic workloads lands on the same family for code generation and pull-request review.

For long-horizon autonomy, Kimi-K2.6 is built for the job. The Kimi K2.6 overview describes a 1T-parameter MoE with 32B active and a 262K context, tuned for multi-step loops that run for many turns without losing the thread. Reach for it when the agent chains dozens of tool calls.

For the cost floor, DeepSeek-V3.1-Terminus is the grunt-work model. At $0.27 in and $0.95 out per million tokens it is the cheapest of the four, and for triage, summarizing diffs, or routing it clears the bar without reasoning-model spend.

ModelIn / Out per 1MContextBest role
GLM-5.2$1.00 / $4.001MLead reasoning, refactors
Qwen3-Coder-480B-A35B-Instruct-Turbo$0.30 / $1.00262KCode edits, PR review
Kimi-K2.6$0.75 / $3.50262KLong-horizon agent loops
DeepSeek-V3.1-Terminus$0.27 / $0.95164KTriage, routing, grunt work

Wiring DeepInfra Into OpenCode

The connection point is the custom provider block in opencode.json. OpenCode reads OpenAI-compatible endpoints through the @ai-sdk/openai-compatible package, so DeepInfra slots in as one provider. Drop the following into opencode.json in your project root, or the global file at ~/.config/opencode/, set the provider id to deepinfra, and point baseURL at DeepInfra’s OpenAI-compatible route.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "deepinfra": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "DeepInfra",
      "options": {
        "baseURL": "https://api.deepinfra.com/v1/openai",
        "apiKey": "{env:DEEPINFRA_API_TOKEN}"
      },
      "models": {
        "zai-org/GLM-5.2": { "name": "GLM-5.2" },
        "Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo": { "name": "Qwen3 Coder 480B Turbo" },
        "moonshotai/Kimi-K2.6": { "name": "Kimi K2.6" },
        "deepseek-ai/DeepSeek-V3.1-Terminus": { "name": "DeepSeek V3.1 Terminus" }
      }
    }
  }
}
copy

The model ids keep their vendor slash, exactly as DeepInfra lists them, and the {env:DEEPINFRA_API_TOKEN} syntax pulls the key from your shell rather than hardcoding it. Each entry under models becomes a selectable backend, so all four picks show up in one switcher and you can swap reasoning model mid-session. Before launching the TUI, confirm the key works with a quick OpenAI SDK call, since the same base_url drives both checks.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepinfra.com/v1/openai",
    api_key=os.environ["DEEPINFRA_API_TOKEN"],
)

resp = client.chat.completions.create(
    model="zai-org/GLM-5.2",
    messages=[{"role": "user", "content": "Reply with OK if you can read this."}],
)

print(resp.choices[0].message.content)
copy

Install OpenCode with the official script, export the token, then start the agent and switch models with /models.

curl -fsSL https://opencode.ai/install | bash
copy

The same provider-block pattern shows up in our OpenClaw setup and provider workflow guide, so a provider you wire once is portable across agents.

Cost Math: A Coding Session Priced Out

Per-token prices only tell half the story. The Economist flagged a caveat that matters for agents specifically: open models often spend more tokens to reach the same answer. A Georgia Tech study found that on the same tasks a DeepSeek model burned 23 times more tokens than its OpenAI rival to reach basically the same result, which is why the honest comparison is total cost of the tokens used, not the sticker price per million. On a benchmark built to test software engineering, GLM 5.2 ended up costing more than competing systems from Anthropic and OpenAI once those token counts were in.

So treat the following as a rough estimate, not a benchmark we ran. Picture a moderately heavy day: 40 sessions, each reading and editing code with roughly 80,000 input tokens and 12,000 output tokens after context caching. Multiply the per-million prices out and the four open models land far below a closed frontier API billed near $50 per million output tokens.

ModelCost / session (est.)Cost / 40-session day (est.)
GLM-5.2~$0.13~$5.12
Qwen3-Coder-480B-A35B-Instruct-Turbo~$0.04~$1.44
Kimi-K2.6~$0.10~$4.08
DeepSeek-V3.1-Terminus~$0.03~$1.32
Closed frontier (~$50/1M out)~$0.60 output alone~$24+

The lever OpenCode hands you is per-step routing. Run planning on GLM-5.2, push the high-volume edits to Qwen3 Coder, and drop triage onto DeepSeek-V3.1-Terminus, and the blended day costs less than a single closed session would. If OpenCode is not the only agent on your shortlist, our OpenClaw alternatives roundup weighs other open-weight agent frameworks that swap models freely through DeepInfra.

Which to Choose, and How to Start

The decision comes down to what you are optimizing. Go with Claude Code if you want the fastest managed experience and the best instruction-following on its native model, and you are comfortable that one vendor controls the model, the price, and the access. Go with OpenCode plus DeepInfra if you want model freedom, per-token cost control, and an agent whose weights cannot be switched off from outside. For most teams running agents at scale, that second column is where the budget and the leverage live. A useful middle path keeps both: reach for Claude Code on the occasional hard reasoning task where latency is worth the premium, and run OpenCode on DeepInfra for the high-volume editing, triage, and long-horizon loops that otherwise dominate the bill.

Pick a model on the DeepInfra catalog, wire the provider block above, and run your first session against GLM-5.2 or Qwen3-Coder-480B-A35B-Instruct-Turbo. The full API reference lives in the DeepInfra docs. When you have it running, tell us what you shipped: join the community on Discord, send a note to feedback@deepinfra.com, or follow @DeepInfra for new model launches as they land.

Related articles
GLM-5.2 Model Overview and Integration GuideGLM-5.2 Model Overview and Integration Guide<p>GLM-5.2 is Z.AI&#8217;s flagship open-source large language model, engineered for long-horizon coding, agentic, and reasoning tasks. Designed for complex reasoning, advanced software engineering, and large-scale data processing, GLM-5.2 introduces a massive 1,048,576-token context window alongside significant architectural innovations. Hosted on the DeepInfra platform, GLM-5.2 provides developers with a high-performance, OpenAI-compatible interface. Whether you are building [&hellip;]</p>
NVIDIA Nemotron 3 Super 120B API Benchmarks: Latency & CostNVIDIA Nemotron 3 Super 120B API Benchmarks: Latency & Cost<p>About NVIDIA Nemotron 3 Super 120B A12B NVIDIA&#8217;s Nemotron 3 Super 120B A12B is an open-weight large language model released on March 11, 2026. It features 120B total parameters with only 12B active per forward pass, delivering exceptional compute efficiency for complex multi-agent applications such as software development and cybersecurity triaging. The model uses a [&hellip;]</p>
Deploy Custom LLMs on DeepInfraDeploy Custom LLMs on DeepInfraDid you just finetune your favorite model and are wondering where to run it? Well, we have you covered. Simple API and predictable pricing. Put your model on huggingface Use a private repo, if you wish, we don't mind. Create a hf access token just for the repo for better security. Create c...