How to Locally Deploy and Optimize the DeepSeek Model Using Ollama

Local deployment of the DeepSeek model based on Ollama can be performed by following these steps. Ollama is a tool for running large language models (LLMs) locally, supporting various models, including DeepSeek.

DeepSeek model

1. Install Ollama

Ollama supports macOS and Linux systems. Here are the installation steps:

DeepSeek model

macOS

1. Open Terminal.

2. Run the following command to install Ollama:

 brew install ollama
Or download the file (https://ollama.com/download/Ollama-darwin.zip) and install locally

Linux

1. Open Terminal.

2. Run the following command to install Ollama:

 curl -fsSL https://ollama.com/install.sh | sh

Windows DeepSeek model

Ollama currently does not support native Windows, but can be run on Windows through WSL (Windows Subsystem for Linux). After installing WSL, follow the Linux steps.

2. Download DeepSeek Model

Ollama supports loading models from Hugging Face or other sources. Here are the steps to download and load the DeepSeek model:

1. Ensure Ollama is installed and running.

2. Use the following command to download the DeepSeek model:

 ollama pull deepseek-r1:1.5b

If DeepSeek is not in the default model list, you can manually specify the model file path or URL.

3. Run DeepSeek Model

Once the download is complete, you can run the DeepSeek model using the following command:

 ollama run deepseek-r1:1.5b

After running, Ollama will launch an interactive terminal where you can directly interact with the DeepSeek model.

4. Invoke DeepSeek Model via API

Ollama provides a local API to interact with the model via HTTP requests.

Start API Service

1. Run the following command to start Ollama’s API service:

 ollama serve

By default, the API will run at http://localhost:11434.

Send Request

Use curl or programming languages like Python to send requests. Here is an example:

 curl http://localhost:11434/api/generate -d '{
  "model": "deepseek",
  "prompt": "Hello, introduce yourself",
  "stream": false
}'
Python Example
 import requests

url = "http://localhost:11434/api/generate"
data = {
    "model": "deepseek",
    "prompt": "Hello, introduce yourself",
    "stream": False
}

response = requests.post(url, json=data)
print(response.json())

5. Performance Optimization

If the model runs slowly, you can try the following optimization methods:

  • Use GPU: Ensure that your system has CUDA or ROCm installed, and configure Ollama to use the GPU.
  • Quantize the model: Quantize the model to lower precision (e.g., 8-bit or 4-bit) to reduce memory usage and improve speed.
  • Adjust parameters: Reduce max_tokens or temperature to decrease computation.

6. Troubleshooting

DeepSeek model cannot be downloaded

  • • Check your network connection to ensure you can access Hugging Face or the model source.
  • • Manually download the model file and specify the path.
Insufficient performance
  • • Ensure your hardware meets the requirements (e.g., sufficient memory and GPU).
  • • Try to quantize the model or use a smaller model variant.

DeepSeek model API is not accessible

  • • Check if the Ollama service is running properly.
  • • Ensure that port 11434 is not occupied.

7. Graphical Interface

Use chatboxai: Download https://chatboxai.app/en

Select model provider: OLLAMA API

Select model: deepseek-r1:1.5b

Model Terminal DeepSeek Deployment Quantization