This commit is contained in:
matst80
2026-03-04 22:21:47 +01:00
commit a86357c190
6 changed files with 312 additions and 0 deletions

87
README.md Normal file
View File

@@ -0,0 +1,87 @@
# Dia-1.6B API Server
API server for [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B), a 1.6 billion-parameter text-to-speech (TTS) model designed for realistic dialogue generation.
## Features
- 🗣️ **Realistic Dialogue**: Directly generates natural-sounding conversations from transcripts.
- 🎭 **Emotion and Tone**: Supports non-verbal cues like `(laughs)`, `(coughs)`, and `(clears throat)`.
- 👥 **Multi-Speaker Support**: Uses tags like `[S1]` and `[S2]` to alternate between speakers.
- 🎙️ **Audio Prompting**: Supports voice conditioning and cloning via audio prompts.
- 🚀 **FastAPI Implementation**: High-performance, documented API endpoints.
## Prerequisites
- **Python 3.9+**
- **NVIDIA GPU (Recommended)**: 10GB+ VRAM for optimal performance.
- **CUDA 12.6+** (Mandatory for inference).
## Installation
1. **Clone the repository and navigate into the folder:**
```bash
git clone <repo-url>
cd dia-api-server
```
2. **Create a virtual environment:**
```bash
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
## Usage
### Running the Server
```bash
python main.py
```
The server will be available at `http://localhost:8000`.
### API Documentation
Once the server is running, you can access the interactive documentation at:
- Swagger UI: `http://localhost:8000/docs`
- Redoc: `http://localhost:8000/redoc`
### Example Endpoint: `/generate` (POST)
**Parameters:**
- `text` (Form data): The transcript including speaker tags.
- `audio_prompt` (Form file, optional): An audio file to condition the generation.
**Response:**
Returns a `StreamingResponse` as a `audio/wav` binary stream.
### Test Script
You can use `test_api.py` to verify the server:
```bash
python test_api.py
```
## Docker Deployment (Recommended)
Developing and running locally may be complicated due to CUDA requirements. Here is a sample `Dockerfile` for deployment:
```dockerfile
FROM nvidia/cuda:12.6.0-devel-ubuntu22.04
WORKDIR /app
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
git \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY . .
CMD ["python3", "main.py"]
```
## License
Refer to the [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B#🪪-license) license on Hugging Face.