We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

Langchain improvements: async and streaming
Published on 2023.10.25 by Iskren Chernev
Langchain improvements: async and streaming

Starting from langchain v0.0.322 you can make efficient async generation and streaming tokens with deepinfra.

Async generation

The deepinfra wrapper now supports native async calls, so you can expect more performance (no more threads per invocation) from your async pipelines.

from langchain.llms.deepinfra import DeepInfra

async def async_predict():
    llm = DeepInfra(model_id="meta-llama/Llama-2-7b-chat-hf")
    output = await llm.apredict("What is 2 + 2?")
    print(output)
copy

Response streaming

Streaming lets you receive each token of the response as it gets generated. This is indispensable in user-facing applications.

def streaming():
    llm = DeepInfra(model_id="meta-llama/Llama-2-7b-chat-hf")
    for chunk in llm.stream("[INST] Hello [/INST] "):
        print(chunk, end='', flush=True)
    print()
copy

You can also use the asynchronous streaming API, natively implemented underneath.

async def async_streaming():
    llm = DeepInfra(model_id="meta-llama/Llama-2-7b-chat-hf")
    async for chunk in llm.astream("[INST] Hello [/INST] "):
        print(chunk, end='', flush=True)
    print()
copy
Related articles
Power the Next Era of Image Generation with FLUX.2 Visual Intelligence on DeepInfraPower the Next Era of Image Generation with FLUX.2 Visual Intelligence on DeepInfraDeepInfra is excited to support FLUX.2 from day zero, bringing the newest visual intelligence model from Black Forest Labs to our platform at launch. We make it straightforward for developers, creators, and enterprises to run the model with high performance, transparent pricing, and an API designed for productivity.
How to deploy Databricks Dolly v2 12b, instruction tuned casual language model.How to deploy Databricks Dolly v2 12b, instruction tuned casual language model.Databricks Dolly is instruction tuned 12 billion parameter casual language model based on EleutherAI's pythia-12b. It was pretrained on The Pile, GPT-J's pretraining corpus. [databricks-dolly-15k](http...
Accelerating Reasoning Workflows with Nemotron 3 Nano on DeepInfraAccelerating Reasoning Workflows with Nemotron 3 Nano on DeepInfraDeepInfra is an official launch partner for NVIDIA Nemotron 3 Nano, the newest open reasoning model in the Nemotron family. Our goal is to give developers, researchers, and teams the fastest and simplest path to using Nemotron 3 Nano from day one.