Documentation
Contents
Deep Infra supports custom (container) based models. At the moment cog is supported, but we're planning on adding more options.
COG is a container format for models and a tool to build such containers. In
it's basic configuration it contains a single yaml file with metadata including
package dependencies, and a single python file with 2 functions: setup
and
predict
. Once run, the container serves an HTTP server that accepts inference
(prediction) requests (one at a time) and dispatches them to the predict
function and returns the results (with some metadata).
# download a release executable for the correct platform
sudo curl -o /usr/local/bin/cog -L \
https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog
# create a new project dir
mkdir cog-example
cd cog-example
# initialize cog (creates a sample cog.yaml and predict.py)
cog init
Also make sure you have a working docker at hand: apt install docker.io
for ubuntu.
We'll setup a simple gpt2 using transformers library for our example.
# example cog.yaml for gpt2
build:
# set to true if your model requires a GPU
gpu: true
# python version in the form '3.8' or '3.8.12'
python_version: "3.8"
# a list of packages in the format <package-name>==<version>
python_packages:
- "transformers[torch]==4.29.2"
# predict.py defines how predictions are run on your model
predict: "predict.py:Predictor"
# example predict.py for gpt2
from cog import BasePredictor, Input
import os
import transformers
# store/load weights in the local folder
os.environ['TRANSFORMERS_CACHE'] = 'w-gpt2'
class Predictor(BasePredictor):
def setup(self):
"""Load the model into memory to make running multiple predictions efficient"""
# self.model = torch.load("./weights.pth")
def predict(
self,
prompt: str = Input(description="model prompt"),
) -> str:
return self.pipe(prompt)[0]['generated_text']
Building/deploying the container might take a long time (due to big size caused by large weights), so ideally you should test it beforehand.
# run an inference (through predict.py)
cog predict -i prompt="hello world"
Making changes to the code (predict.py
and it's dependencies) is fast because
no container rebuilds are required. Changing the yaml file will re-download the
deps which can take a bit more time.
You can also use cog run $CMD
to run a command inside the container, like
bash, or a python script or anything really.
Once the code is ready for production it's time to build the container. This step bundles the contents of the project directory inside the container. If you'd like to bundle the weights, it's a good time to ensure they're present (in the current directory) and no extra weights (i.e previous iterations, junk) is left in it.
cog build -t cog-example:latest
After the build you can run the container and try it out via curl to do a proper end-to-end test:
# run it in one terminal
docker run --rm -p 5000:5000 cog-example:latest
# in another terminal query it
curl http://127.0.0.1:5000/predictions \
--data '{"input": {"prompt": "Hello world"}}' \
-H 'Content-Type: application/json' \
python -m json.tool # prettify response
Your lowercase github username is used as the namespace for all your published
models (similarly to private github repos). deepctl push
takes care of that,
you only need to specify a name (or it will default to the container name).
# push the local cog-example:latest container under USERNAME/my-cog
deepctl push cog-example:latest my-cog
If you get an error about permissions, you can try deepctl auth set-token $(deepctl auth token)
. This sets up the docker credentials necessary to talk
to our registry.
# check that it's available
deepctl model list --visibility private | grep my-cog
# run an inference (auto-deploys if necessary)
deepctl infer -m USERNAME/my-cog -i prompt="Hello world"