stepfun-ai/

Step-3.5-Flash

$0.09

in

$0.30

out

$0.02

cached

/ 1M tokens

Step 3.5 Flash is an open-source reasoning model by StepFun with 196B total parameters (11B active) using Mixture of Experts. It features a 256K context window, deep reasoning, tool calling, and agentic capabilities, achieving 97.3 on AIME 2025 and 74.4% on SWE-bench Verified.

Deploy Private Endpoint

Public

fp8

262,144

Function

Project Paper License

api versions

Step-3.5-Flash

Ask me anything

0.00s

Settings

Model Information

Step 3.5 Flash

Step 3.5 Flash is an open-source frontier reasoning model by StepFun. Built on a sparse Mixture of Experts (MoE) architecture, it activates only 11B of its 196B total parameters per token — delivering state-of-the-art performance at a fraction of the cost of dense models.

Capabilities

Reasoning: Extended thinking with **\<think>** blocks, controllable via reasoning_effort parameter (none, low, medium, high)
Tool Calling: Native function calling support with parallel tool invocation
Long Context: 256K token context window with efficient Sliding Window Attention
JSON Mode: Structured output via response_format

Benchmarks

Category	Benchmark	Score
Math	AIME 2025	97.3
Math	HMMT 2025 (Feb.)	98.4
Coding	LiveCodeBench-V6	86.4
Coding	SWE-bench Verified	74.4%
Agentic	Terminal-Bench 2.0	51.0%
Agentic	GAIA (no file)	84.5
Agentic	BrowseComp	51.6

Architecture


Total Parameters	196B
Active Parameters	~11B per token
Context Window	256K tokens
Experts	288 routed + 1 shared per layer (Top-8 selection)
License	Apache 2.0

Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepinfra.com/v1/openai",
    api_key="YOUR_DEEPINFRA_TOKEN",
)

# Basic chat with reasoning
response = client.chat.completions.create(
    model="stepfun-ai/Step-3.5-Flash",
    messages=[{"role": "user", "content": "Prove that sqrt(2) is irrational."}],
)
print(response.choices[0].message.reasoning_content)  # thinking
print(response.choices[0].message.content)             # answer

# Disable reasoning for faster responses
response = client.chat.completions.create(
    model="stepfun-ai/Step-3.5-Flash",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"reasoning_effort": "none"},
)

# Tool calling
response = client.chat.completions.create(
    model="stepfun-ai/Step-3.5-Flash",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }],
)

Links

- https://huggingface.co/stepfun-ai/Step-3.5-Flash
- https://github.com/stepfun-ai/Step-3.5-Flash
- https://arxiv.org/abs/2602.10604
copy