We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

stepfun-ai/

Step-3.5-Flash

$0.10

in

$0.30

out

$0.02

cached

/ 1M tokens

Step 3.5 Flash is an open-source reasoning model by StepFun with 196B total parameters (11B active) using Mixture of Experts. It features a 256K context window, deep reasoning, tool calling, and agentic capabilities, achieving 97.3 on AIME 2025 and 74.4% on SWE-bench Verified.

stepfun-ai/Step-3.5-Flash cover image
stepfun-ai/Step-3.5-Flash cover image
Step-3.5-Flash

Ask me anything

0.00s

Settings

Model Information

Step 3.5 Flash

Step 3.5 Flash is an open-source frontier reasoning model by StepFun. Built on a sparse Mixture of Experts (MoE) architecture, it activates only 11B of its 196B total parameters per token — delivering state-of-the-art performance at a fraction of the cost of dense models.

Capabilities

  • Reasoning: Extended thinking with **\<think>** blocks, controllable via reasoning_effort parameter (none, low, medium, high)
  • Tool Calling: Native function calling support with parallel tool invocation
  • Long Context: 256K token context window with efficient Sliding Window Attention
  • JSON Mode: Structured output via response_format

Benchmarks

CategoryBenchmarkScore
MathAIME 202597.3
MathHMMT 2025 (Feb.)98.4
CodingLiveCodeBench-V686.4
CodingSWE-bench Verified74.4%
AgenticTerminal-Bench 2.051.0%
AgenticGAIA (no file)84.5
AgenticBrowseComp51.6

Architecture

Total Parameters196B
Active Parameters~11B per token
Context Window256K tokens
Experts288 routed + 1 shared per layer (Top-8 selection)
LicenseApache 2.0

Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepinfra.com/v1/openai",
    api_key="YOUR_DEEPINFRA_TOKEN",
)

# Basic chat with reasoning
response = client.chat.completions.create(
    model="stepfun-ai/Step-3.5-Flash",
    messages=[{"role": "user", "content": "Prove that sqrt(2) is irrational."}],
)
print(response.choices[0].message.reasoning_content)  # thinking
print(response.choices[0].message.content)             # answer

# Disable reasoning for faster responses
response = client.chat.completions.create(
    model="stepfun-ai/Step-3.5-Flash",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"reasoning_effort": "none"},
)

# Tool calling
response = client.chat.completions.create(
    model="stepfun-ai/Step-3.5-Flash",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }],
)

Links

- https://huggingface.co/stepfun-ai/Step-3.5-Flash
- https://github.com/stepfun-ai/Step-3.5-Flash
- https://arxiv.org/abs/2602.10604
copy