initial
This commit is contained in:
87
README.md
Normal file
87
README.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Dia-1.6B API Server
|
||||
|
||||
API server for [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B), a 1.6 billion-parameter text-to-speech (TTS) model designed for realistic dialogue generation.
|
||||
|
||||
## Features
|
||||
- 🗣️ **Realistic Dialogue**: Directly generates natural-sounding conversations from transcripts.
|
||||
- 🎭 **Emotion and Tone**: Supports non-verbal cues like `(laughs)`, `(coughs)`, and `(clears throat)`.
|
||||
- 👥 **Multi-Speaker Support**: Uses tags like `[S1]` and `[S2]` to alternate between speakers.
|
||||
- 🎙️ **Audio Prompting**: Supports voice conditioning and cloning via audio prompts.
|
||||
- 🚀 **FastAPI Implementation**: High-performance, documented API endpoints.
|
||||
|
||||
## Prerequisites
|
||||
- **Python 3.9+**
|
||||
- **NVIDIA GPU (Recommended)**: 10GB+ VRAM for optimal performance.
|
||||
- **CUDA 12.6+** (Mandatory for inference).
|
||||
|
||||
## Installation
|
||||
|
||||
1. **Clone the repository and navigate into the folder:**
|
||||
```bash
|
||||
git clone <repo-url>
|
||||
cd dia-api-server
|
||||
```
|
||||
|
||||
2. **Create a virtual environment:**
|
||||
```bash
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
||||
```
|
||||
|
||||
3. **Install dependencies:**
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Running the Server
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
The server will be available at `http://localhost:8000`.
|
||||
|
||||
### API Documentation
|
||||
Once the server is running, you can access the interactive documentation at:
|
||||
- Swagger UI: `http://localhost:8000/docs`
|
||||
- Redoc: `http://localhost:8000/redoc`
|
||||
|
||||
### Example Endpoint: `/generate` (POST)
|
||||
**Parameters:**
|
||||
- `text` (Form data): The transcript including speaker tags.
|
||||
- `audio_prompt` (Form file, optional): An audio file to condition the generation.
|
||||
|
||||
**Response:**
|
||||
Returns a `StreamingResponse` as a `audio/wav` binary stream.
|
||||
|
||||
### Test Script
|
||||
You can use `test_api.py` to verify the server:
|
||||
```bash
|
||||
python test_api.py
|
||||
```
|
||||
|
||||
## Docker Deployment (Recommended)
|
||||
Developing and running locally may be complicated due to CUDA requirements. Here is a sample `Dockerfile` for deployment:
|
||||
|
||||
```dockerfile
|
||||
FROM nvidia/cuda:12.6.0-devel-ubuntu22.04
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN apt-get update && apt-get install -y \
|
||||
python3 \
|
||||
python3-pip \
|
||||
git \
|
||||
ffmpeg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip3 install -r requirements.txt
|
||||
|
||||
COPY . .
|
||||
|
||||
CMD ["python3", "main.py"]
|
||||
```
|
||||
|
||||
## License
|
||||
Refer to the [nari-labs/Dia-1.6B](https://huggingface.co/nari-labs/Dia-1.6B#🪪-license) license on Hugging Face.
|
||||
Reference in New Issue
Block a user