# Dia-1.6B API Server

API server for [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B), a 1.6 billion-parameter text-to-speech (TTS) model designed for realistic dialogue generation.

## Features
- 🗣️ **Realistic Dialogue**: Directly generates natural-sounding conversations from transcripts.
- 🎭 **Emotion and Tone**: Supports non-verbal cues like `(laughs)`, `(coughs)`, and `(clears throat)`.
- 👥 **Multi-Speaker Support**: Uses tags like `[S1]` and `[S2]` to alternate between speakers.
- 🎙️ **Audio Prompting**: Supports voice conditioning and cloning via audio prompts.
- 🚀 **FastAPI Implementation**: High-performance, documented API endpoints.

## Prerequisites
- **Python 3.9+**
- **NVIDIA GPU (Recommended)**: 10GB+ VRAM for optimal performance.
- **CUDA 12.6+** (Mandatory for inference).

## Installation

1. **Clone the repository and navigate into the folder:**
   ```bash
   git clone <repo-url>
   cd dia-api-server
   ```

2. **Create a virtual environment:**
   ```bash
   python -m venv .venv
   source .venv/bin/activate  # On Windows: .venv\Scripts\activate
   ```

3. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

## Usage

### Running the Server
```bash
python main.py
```
The server will be available at `http://localhost:8000`.

### API Documentation
Once the server is running, you can access the interactive documentation at:
- Swagger UI: `http://localhost:8000/docs`
- Redoc: `http://localhost:8000/redoc`

### Example Endpoint: `/generate` (POST)
**Parameters:**
- `text` (Form data): The transcript including speaker tags.
- `audio_prompt` (Form file, optional): An audio file to condition the generation.

**Response:**
Returns a `StreamingResponse` as a `audio/wav` binary stream.

### Test Script
You can use `test_api.py` to verify the server:
```bash
python test_api.py
```

## Docker Deployment (Recommended)
Developing and running locally may be complicated due to CUDA requirements. Here is a sample `Dockerfile` for deployment:

```dockerfile
FROM nvidia/cuda:12.6.0-devel-ubuntu22.04

WORKDIR /app

RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    git \
    ffmpeg \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip3 install -r requirements.txt

COPY . .

CMD ["python3", "main.py"]
```

## License
Refer to the [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B#🪪-license) license on Hugging Face.