2026-03-04 22:39:28 +01:00
2026-03-04 22:21:47 +01:00
2026-03-04 22:39:28 +01:00
2026-03-04 22:21:47 +01:00
2026-03-04 22:21:47 +01:00
2026-03-04 22:21:47 +01:00
2026-03-04 22:21:47 +01:00
2026-03-04 22:21:47 +01:00

Dia-1.6B API Server

API server for nari-labs/Dia-1.6B, a 1.6 billion-parameter text-to-speech (TTS) model designed for realistic dialogue generation.

Features

  • 🗣️ Realistic Dialogue: Directly generates natural-sounding conversations from transcripts.
  • 🎭 Emotion and Tone: Supports non-verbal cues like (laughs), (coughs), and (clears throat).
  • 👥 Multi-Speaker Support: Uses tags like [S1] and [S2] to alternate between speakers.
  • 🎙️ Audio Prompting: Supports voice conditioning and cloning via audio prompts.
  • 🚀 FastAPI Implementation: High-performance, documented API endpoints.

Prerequisites

  • Python 3.9+
  • NVIDIA GPU (Recommended): 10GB+ VRAM for optimal performance.
  • CUDA 12.6+ (Mandatory for inference).

Installation

  1. Clone the repository and navigate into the folder:

    git clone <repo-url>
    cd dia-api-server
    
  2. Create a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    

Usage

Running the Server

python main.py

The server will be available at http://localhost:8000.

API Documentation

Once the server is running, you can access the interactive documentation at:

  • Swagger UI: http://localhost:8000/docs
  • Redoc: http://localhost:8000/redoc

Example Endpoint: /generate (POST)

Parameters:

  • text (Form data): The transcript including speaker tags.
  • audio_prompt (Form file, optional): An audio file to condition the generation.

Response: Returns a StreamingResponse as a audio/wav binary stream.

Test Script

You can use test_api.py to verify the server:

python test_api.py

Developing and running locally may be complicated due to CUDA requirements. Here is a sample Dockerfile for deployment:

FROM nvidia/cuda:12.6.0-devel-ubuntu22.04

WORKDIR /app

RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    git \
    ffmpeg \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip3 install -r requirements.txt

COPY . .

CMD ["python3", "main.py"]

License

Refer to the nari-labs/Dia-1.6B license on Hugging Face.

Description
No description provided
Readme 29 KiB
Languages
Python 89.8%
Dockerfile 10.2%