Deploying LiteLLM Proxy
Quick Start Docker Image: Github Container Registry​
Pull the litellm ghcr docker image​
See the latest available ghcr docker image here: https://github.com/berriai/litellm/pkgs/container/litellm
docker pull ghcr.io/berriai/litellm:main-v1.10.1
Run the Docker Image​
docker run ghcr.io/berriai/litellm:main-v1.10.0
Run the Docker Image with LiteLLM CLI args​
See all supported CLI args here:
Here's how you can run the docker image and pass your config to litellm
docker run ghcr.io/berriai/litellm:main-v1.10.0 --config your_config.yaml
Here's how you can run the docker image and start litellm on port 8002 with num_workers=8
docker run ghcr.io/berriai/litellm:main-v1.10.0 --port 8002 --num_workers 8
Run the Docker Image using docker compose​
Step 1
(Recommended) Use the example file
docker-compose.example.yml
given in the project root. e.g. https://github.com/BerriAI/litellm/blob/main/docker-compose.example.ymlRename the file
docker-compose.example.yml
todocker-compose.yml
.
Here's an example docker-compose.yml
file
version: "3.9"
services:
litellm:
image: ghcr.io/berriai/litellm:main
ports:
- "8000:8000" # Map the container port to the host, change the host port if necessary
volumes:
- ./litellm-config.yaml:/app/config.yaml # Mount the local configuration file
# You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
command: [ "--config", "/app/config.yaml", "--port", "8000", "--num_workers", "8" ]
# ...rest of your docker-compose config if any
Step 2
Create a litellm-config.yaml
file with your LiteLLM config relative to your docker-compose.yml
file.
Check the config doc here
Step 3
Run the command docker-compose up
or docker compose up
as per your docker installation.
Use
-d
flag to run the container in detached mode (background) e.g.docker compose up -d
Your LiteLLM container should be running now on the defined port e.g. 8000
.
Deploy on Render https://render.com/​
LiteLLM Proxy Performance​
LiteLLM proxy has been load tested to handle 1500 req/s.
Throughput - 30% Increase​
LiteLLM proxy + Load Balancer gives 30% increase in throughput compared to Raw OpenAI API
Latency Added - 0.00325 seconds​
LiteLLM proxy adds 0.00325 seconds latency as compared to using the Raw OpenAI API