We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

stepfun-ai logo

stepfun-ai/

Step-3.7-Flash

$0.20

in

$1.15

out

$0.04

cached

/ 1M tokens

Step 3.7 Flash is an open-source multimodal reasoning model by StepFun with 198B total parameters (11B active) using Mixture of Experts. It accepts text and image inputs and features a 256K context window, selectable reasoning effort, tool calling, and agentic capabilities for coding and search workflows, scoring 80.9% on GPQA Diamond and 56.3% on SWE-bench Pro.

Deploy Private Endpoint
Public
modelopt
262,144
Function
Multimodal
ProjectPaperLicense
stepfun-ai/Step-3.7-Flash cover image
stepfun-ai/Step-3.7-Flash cover image
Step-3.7-Flash

Ask me anything

0.00s

Settings

Model Information

Step 3.7 Flash

Step 3.7 Flash is an open-source frontier multimodal reasoning model by StepFun. Built on a sparse Mixture of Experts (MoE) architecture, it activates only ~11B of its 198B total parameters per token, and pairs its language backbone with a vision encoder for native image understanding — delivering state-of-the-art reasoning at a fraction of the cost of dense models.

Capabilities

  • Multimodal: Native image understanding — send text and images together via the standard image_url content format
  • Reasoning: Extended thinking with **\<think>** blocks, with selectable depth via reasoning_effort (low, medium, high). Reasoning is always on for this model
  • Tool Calling: Native function calling support with parallel tool invocation
  • Long Context: 256K token context window
  • Structured Output: JSON via response_format

Benchmarks

CategoryBenchmarkScore
ReasoningGPQA Diamond80.9%
CodingSWE-bench Pro56.3%
AgenticTerminal-Bench 2.159.6%

Architecture

Total Parameters198B
Active Parameters~11B per token
Context Window256K tokens
ModalityText + Image
ReasoningAlways on; selectable effort (low / medium / high)
LicenseApache 2.0

Usage

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepinfra.com/v1/openai",
    api_key="YOUR_DEEPINFRA_TOKEN",
)

# Chat with reasoning
response = client.chat.completions.create(
    model="stepfun-ai/Step-3.7-Flash",
    messages=[{"role": "user", "content": "Prove that sqrt(2) is irrational."}],
)
print(response.choices[0].message.reasoning_content)  # thinking
print(response.choices[0].message.content)             # answer

# Image understanding (multimodal)
response = client.chat.completions.create(
    model="stepfun-ai/Step-3.7-Flash",
    messages=[{"role": "user", "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
    ]}],
)

# Control reasoning depth
response = client.chat.completions.create(
    model="stepfun-ai/Step-3.7-Flash",
    messages=[{"role": "user", "content": "Plan a 3-day trip to Tokyo."}],
    extra_body={"reasoning_effort": "high"},
)

# Tool calling
response = client.chat.completions.create(
  model="stepfun-ai/Step-3.7-Flash",
  messages=[{"role": "user", "content": "What's the weather in Paris?"}],
  tools=[{
      "type": "function",
      "function": {
          "name": "get_weather",
          "description": "Get weather for a city",
          "parameters": {
              "type": "object",
              "properties": {"city": {"type": "string"}},
              "required": ["city"],
          },
      },
  }],
)

Links

- https://huggingface.co/stepfun-ai/Step-3.7-Flash
- https://github.com/stepfun-ai/Step-3.7-Flash
copy