# Dia-1.6B API Server API server for [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B), a 1.6 billion-parameter text-to-speech (TTS) model designed for realistic dialogue generation. ## Features - 🗣️ **Realistic Dialogue**: Directly generates natural-sounding conversations from transcripts. - 🎭 **Emotion and Tone**: Supports non-verbal cues like `(laughs)`, `(coughs)`, and `(clears throat)`. - 👥 **Multi-Speaker Support**: Uses tags like `[S1]` and `[S2]` to alternate between speakers. - 🎙️ **Audio Prompting**: Supports voice conditioning and cloning via audio prompts. - 🚀 **FastAPI Implementation**: High-performance, documented API endpoints. ## Prerequisites - **Python 3.9+** - **NVIDIA GPU (Recommended)**: 10GB+ VRAM for optimal performance. - **CUDA 12.6+** (Mandatory for inference). ## Installation 1. **Clone the repository and navigate into the folder:** ```bash git clone cd dia-api-server ``` 2. **Create a virtual environment:** ```bash python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate ``` 3. **Install dependencies:** ```bash pip install -r requirements.txt ``` ## Usage ### Running the Server ```bash python main.py ``` The server will be available at `http://localhost:8000`. ### API Documentation Once the server is running, you can access the interactive documentation at: - Swagger UI: `http://localhost:8000/docs` - Redoc: `http://localhost:8000/redoc` ### Example Endpoint: `/generate` (POST) **Parameters:** - `text` (Form data): The transcript including speaker tags. - `audio_prompt` (Form file, optional): An audio file to condition the generation. **Response:** Returns a `StreamingResponse` as a `audio/wav` binary stream. ### Test Script You can use `test_api.py` to verify the server: ```bash python test_api.py ``` ## Docker Deployment (Recommended) Developing and running locally may be complicated due to CUDA requirements. Here is a sample `Dockerfile` for deployment: ```dockerfile FROM nvidia/cuda:12.6.0-devel-ubuntu22.04 WORKDIR /app RUN apt-get update && apt-get install -y \ python3 \ python3-pip \ git \ ffmpeg \ && rm -rf /var/lib/apt/lists/* COPY requirements.txt . RUN pip3 install -r requirements.txt COPY . . CMD ["python3", "main.py"] ``` ## License Refer to the [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B#🪪-license) license on Hugging Face.