mats/dia-api-server

Fork 0

Go to file

matst80 b911db1e07 add dockercopose

2026-03-04 22:39:28 +01:00

.gitignore

initial

2026-03-04 22:21:47 +01:00

docker-compose.yml

add dockercopose

2026-03-04 22:39:28 +01:00

Dockerfile

initial

2026-03-04 22:21:47 +01:00

main.py

initial

2026-03-04 22:21:47 +01:00

README.md

initial

2026-03-04 22:21:47 +01:00

requirements.txt

initial

2026-03-04 22:21:47 +01:00

test_api.py

initial

2026-03-04 22:21:47 +01:00

README.md

Dia-1.6B API Server

API server for nari-labs/Dia-1.6B, a 1.6 billion-parameter text-to-speech (TTS) model designed for realistic dialogue generation.

Features

🗣️ Realistic Dialogue: Directly generates natural-sounding conversations from transcripts.
🎭 Emotion and Tone: Supports non-verbal cues like (laughs), (coughs), and (clears throat).
👥 Multi-Speaker Support: Uses tags like [S1] and [S2] to alternate between speakers.
🎙️ Audio Prompting: Supports voice conditioning and cloning via audio prompts.
🚀 FastAPI Implementation: High-performance, documented API endpoints.

Prerequisites

Python 3.9+
NVIDIA GPU (Recommended): 10GB+ VRAM for optimal performance.
CUDA 12.6+ (Mandatory for inference).

Installation

Clone the repository and navigate into the folder:
```
git clone <repo-url>
cd dia-api-server
```

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Running the Server

python main.py

The server will be available at http://localhost:8000.

API Documentation

Once the server is running, you can access the interactive documentation at:

Swagger UI: http://localhost:8000/docs
Redoc: http://localhost:8000/redoc

Example Endpoint: `/generate` (POST)

Parameters:

text (Form data): The transcript including speaker tags.
audio_prompt (Form file, optional): An audio file to condition the generation.

Response: Returns a StreamingResponse as a audio/wav binary stream.

Test Script

You can use test_api.py to verify the server:

python test_api.py

Docker Deployment (Recommended)

Developing and running locally may be complicated due to CUDA requirements. Here is a sample Dockerfile for deployment:

FROM nvidia/cuda:12.6.0-devel-ubuntu22.04

WORKDIR /app

RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    git \
    ffmpeg \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip3 install -r requirements.txt

COPY . .

CMD ["python3", "main.py"]

License

Refer to the nari-labs/Dia-1.6B license on Hugging Face.

README.md

Dia-1.6B API Server

Features

Prerequisites

Installation

Usage

Running the Server

API Documentation

Example Endpoint: /generate (POST)

Test Script

Docker Deployment (Recommended)

License

Example Endpoint: `/generate` (POST)