Dia-1.6B API Server
API server for nari-labs/Dia-1.6B, a 1.6 billion-parameter text-to-speech (TTS) model designed for realistic dialogue generation.
Features
- 🗣️ Realistic Dialogue: Directly generates natural-sounding conversations from transcripts.
- 🎭 Emotion and Tone: Supports non-verbal cues like
(laughs),(coughs), and(clears throat). - 👥 Multi-Speaker Support: Uses tags like
[S1]and[S2]to alternate between speakers. - 🎙️ Audio Prompting: Supports voice conditioning and cloning via audio prompts.
- 🚀 FastAPI Implementation: High-performance, documented API endpoints.
Prerequisites
- Python 3.9+
- NVIDIA GPU (Recommended): 10GB+ VRAM for optimal performance.
- CUDA 12.6+ (Mandatory for inference).
Installation
-
Clone the repository and navigate into the folder:
git clone <repo-url> cd dia-api-server -
Create a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt
Usage
Running the Server
python main.py
The server will be available at http://localhost:8000.
API Documentation
Once the server is running, you can access the interactive documentation at:
- Swagger UI:
http://localhost:8000/docs - Redoc:
http://localhost:8000/redoc
Example Endpoint: /generate (POST)
Parameters:
text(Form data): The transcript including speaker tags.audio_prompt(Form file, optional): An audio file to condition the generation.
Response:
Returns a StreamingResponse as a audio/wav binary stream.
Test Script
You can use test_api.py to verify the server:
python test_api.py
Docker Deployment (Recommended)
Developing and running locally may be complicated due to CUDA requirements. Here is a sample Dockerfile for deployment:
FROM nvidia/cuda:12.6.0-devel-ubuntu22.04
WORKDIR /app
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
git \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY . .
CMD ["python3", "main.py"]
License
Refer to the nari-labs/Dia-1.6B license on Hugging Face.
Description
Languages
Python
89.8%
Dockerfile
10.2%