Custom Models

Deep Infra supports custom (container) based models. At the moment cog is supported, but we're planning on adding more options.


COG is a container format for models and a tool to build such containers. In it's basic configuration it contains a single yaml file with metadata including package dependencies, and a single python file with 2 functions: setup and predict. Once run, the container serves an HTTP server that accepts inference (prediction) requests (one at a time) and dispatches them to the predict function and returns the results (with some metadata).


# download a release executable for the correct platform
sudo curl -o /usr/local/bin/cog -L \`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog

# create a new project dir
mkdir cog-example
cd cog-example
# initialize cog (creates a sample cog.yaml and
cog init

Also make sure you have a working docker at hand: apt install for ubuntu.

Implement custom inference

We'll setup a simple gpt2 using transformers library for our example.

# example cog.yaml for gpt2

  # set to true if your model requires a GPU
  gpu: true

  # python version in the form '3.8' or '3.8.12'
  python_version: "3.8"

  # a list of packages in the format <package-name>==<version>
    - "transformers[torch]==4.29.2"

# defines how predictions are run on your model
predict: ""
# example for gpt2

from cog import BasePredictor, Input
import os
import transformers

# store/load weights in the local folder
os.environ['TRANSFORMERS_CACHE'] = 'w-gpt2'

class Predictor(BasePredictor):
    def setup(self):
        """Load the model into memory to make running multiple predictions efficient"""
        # self.model = torch.load("./weights.pth")

    def predict(
        prompt: str = Input(description="model prompt"),
    ) -> str:
        return self.pipe(prompt)[0]['generated_text']

Testing our code

Building/deploying the container might take a long time (due to big size caused by large weights), so ideally you should test it beforehand.

# run an inference (through
cog predict -i prompt="hello world"

Making changes to the code ( and it's dependencies) is fast because no container rebuilds are required. Changing the yaml file will re-download the deps which can take a bit more time.

You can also use cog run $CMD to run a command inside the container, like bash, or a python script or anything really.

Building the container

Once the code is ready for production it's time to build the container. This step bundles the contents of the project directory inside the container. If you'd like to bundle the weights, it's a good time to ensure they're present (in the current directory) and no extra weights (i.e previous iterations, junk) is left in it.

cog build -t cog-example:latest

After the build you can run the container and try it out via curl to do a proper end-to-end test:

# run it in one terminal
docker run --rm -p 5000:5000 cog-example:latest

# in another terminal query it
curl \
    --data '{"input": {"prompt": "Hello world"}}' \
    -H 'Content-Type: application/json' \
    python -m json.tool # prettify response

Push the container to deepinfra

Your lowercase github username is used as the namespace for all your published models (similarly to private github repos). deepctl push takes care of that, you only need to specify a name (or it will default to the container name).

# push the local cog-example:latest container under USERNAME/my-cog
deepctl push cog-example:latest my-cog

If you get an error about permissions, you can try deepctl auth set-token $(deepctl auth token). This sets up the docker credentials necessary to talk to our registry.

# check that it's available
deepctl model list --visibility private | grep my-cog
# run an inference (auto-deploys if necessary)
deepctl infer -m USERNAME/my-cog -i prompt="Hello world"