initial

2026-03-04 22:21:47 +01:00
commit a86357c190
6 changed files with 312 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,87 @@
+# Dia-1.6B API Server
+
+API server for [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B), a 1.6 billion-parameter text-to-speech (TTS) model designed for realistic dialogue generation.
+
+## Features
+- 🗣️ **Realistic Dialogue**: Directly generates natural-sounding conversations from transcripts.
+- 🎭 **Emotion and Tone**: Supports non-verbal cues like `(laughs)`, `(coughs)`, and `(clears throat)`.
+- 👥 **Multi-Speaker Support**: Uses tags like `[S1]` and `[S2]` to alternate between speakers.
+- 🎙️ **Audio Prompting**: Supports voice conditioning and cloning via audio prompts.
+- 🚀 **FastAPI Implementation**: High-performance, documented API endpoints.
+
+## Prerequisites
+- **Python 3.9+**
+- **NVIDIA GPU (Recommended)**: 10GB+ VRAM for optimal performance.
+- **CUDA 12.6+** (Mandatory for inference).
+
+## Installation
+
+1. **Clone the repository and navigate into the folder:**
+   ```bash
+   git clone <repo-url>
+   cd dia-api-server
+   ```
+
+2. **Create a virtual environment:**
+   ```bash
+   python -m venv .venv
+   source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+   ```
+
+3. **Install dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+## Usage
+
+### Running the Server
+```bash
+python main.py
+```
+The server will be available at `http://localhost:8000`.
+
+### API Documentation
+Once the server is running, you can access the interactive documentation at:
+- Swagger UI: `http://localhost:8000/docs`
+- Redoc: `http://localhost:8000/redoc`
+
+### Example Endpoint: `/generate` (POST)
+**Parameters:**
+- `text` (Form data): The transcript including speaker tags.
+- `audio_prompt` (Form file, optional): An audio file to condition the generation.
+
+**Response:**
+Returns a `StreamingResponse` as a `audio/wav` binary stream.
+
+### Test Script
+You can use `test_api.py` to verify the server:
+```bash
+python test_api.py
+```
+
+## Docker Deployment (Recommended)
+Developing and running locally may be complicated due to CUDA requirements. Here is a sample `Dockerfile` for deployment:
+
+```dockerfile
+FROM nvidia/cuda:12.6.0-devel-ubuntu22.04
+
+WORKDIR /app
+
+RUN apt-get update && apt-get install -y \
+    python3 \
+    python3-pip \
+    git \
+    ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+
+COPY requirements.txt .
+RUN pip3 install -r requirements.txt
+
+COPY . .
+
+CMD ["python3", "main.py"]
+```
+
+## License
+Refer to the [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B#🪪-license) license on Hugging Face.