88 lines
2.4 KiB
Markdown
88 lines
2.4 KiB
Markdown
# Dia-1.6B API Server
|
|
|
|
API server for [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B), a 1.6 billion-parameter text-to-speech (TTS) model designed for realistic dialogue generation.
|
|
|
|
## Features
|
|
- 🗣️ **Realistic Dialogue**: Directly generates natural-sounding conversations from transcripts.
|
|
- 🎭 **Emotion and Tone**: Supports non-verbal cues like `(laughs)`, `(coughs)`, and `(clears throat)`.
|
|
- 👥 **Multi-Speaker Support**: Uses tags like `[S1]` and `[S2]` to alternate between speakers.
|
|
- 🎙️ **Audio Prompting**: Supports voice conditioning and cloning via audio prompts.
|
|
- 🚀 **FastAPI Implementation**: High-performance, documented API endpoints.
|
|
|
|
## Prerequisites
|
|
- **Python 3.9+**
|
|
- **NVIDIA GPU (Recommended)**: 10GB+ VRAM for optimal performance.
|
|
- **CUDA 12.6+** (Mandatory for inference).
|
|
|
|
## Installation
|
|
|
|
1. **Clone the repository and navigate into the folder:**
|
|
```bash
|
|
git clone <repo-url>
|
|
cd dia-api-server
|
|
```
|
|
|
|
2. **Create a virtual environment:**
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
|
```
|
|
|
|
3. **Install dependencies:**
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Running the Server
|
|
```bash
|
|
python main.py
|
|
```
|
|
The server will be available at `http://localhost:8000`.
|
|
|
|
### API Documentation
|
|
Once the server is running, you can access the interactive documentation at:
|
|
- Swagger UI: `http://localhost:8000/docs`
|
|
- Redoc: `http://localhost:8000/redoc`
|
|
|
|
### Example Endpoint: `/generate` (POST)
|
|
**Parameters:**
|
|
- `text` (Form data): The transcript including speaker tags.
|
|
- `audio_prompt` (Form file, optional): An audio file to condition the generation.
|
|
|
|
**Response:**
|
|
Returns a `StreamingResponse` as a `audio/wav` binary stream.
|
|
|
|
### Test Script
|
|
You can use `test_api.py` to verify the server:
|
|
```bash
|
|
python test_api.py
|
|
```
|
|
|
|
## Docker Deployment (Recommended)
|
|
Developing and running locally may be complicated due to CUDA requirements. Here is a sample `Dockerfile` for deployment:
|
|
|
|
```dockerfile
|
|
FROM nvidia/cuda:12.6.0-devel-ubuntu22.04
|
|
|
|
WORKDIR /app
|
|
|
|
RUN apt-get update && apt-get install -y \
|
|
python3 \
|
|
python3-pip \
|
|
git \
|
|
ffmpeg \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
COPY requirements.txt .
|
|
RUN pip3 install -r requirements.txt
|
|
|
|
COPY . .
|
|
|
|
CMD ["python3", "main.py"]
|
|
```
|
|
|
|
## License
|
|
Refer to the [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B#🪪-license) license on Hugging Face.
|