# 🎙️ CosyVoice Flask WebSocket Speech Synthesis Service

A ready-to-use Alibaba Cloud CosyVoice speech synthesis service based on Flask + Socket.IO + WebSocket, one-click start.

## 📦 Project Features

- ✅ **One-Click Start** - Automatically checks environment, installs dependencies, and starts service
- ✅ **Cross-Platform Support** - Compatible with Windows, macOS, and Linux
- ✅ **Real-Time Synthesis** - WebSocket streaming audio transmission with fast response
- ✅ **Multiple Voice Options** - Supports all Alibaba Cloud CosyVoice voice models
- ✅ **User-Friendly Interface** - Simple web operation interface

## 📁 Project Structure

```
cosyvoiceFlask_en/
├── 📜 start.sh                           # Shell startup script
├── 🔧 my_cosyvoice_websocket_server.py  # Flask server main program
├── 📋 requirements.txt                   # Python dependencies list
├── 🔑 get_api_key.py                    # API Key retrieval guide script
└── 📂 templates/
    └── my_cosyvoice_client.html         # Frontend web interface
```

## 🚀 Quick Start

### Prerequisites

- Python 3.7 or higher
- Alibaba Cloud DashScope API Key

### Step 1: Get API Key
Obtain the API Key according to the official documentation and configure it as an environment variable

### Step 2: Start Service

Choose one of the following methods to start:

#### macOS/Linux (Recommended)

```bash
./start.sh
```

#### Cross-Platform Python Script

```bash
python3 start.py
```

#### Windows

Double-click `start.bat` or run in command line:

```cmd
start.bat
```

### Step 3: Use the Service

After successful startup, visit in your browser:

```
http://localhost:9000
```

## 🎯 Usage

### Web Interface Usage

1. Open browser and visit `http://localhost:9000`
2. Enter the text to be synthesized in the text box
3. Select a voice (optional, default is longanyang)
4. Click the "Synthesize Speech" button
5. Wait for audio generation and automatic playback

### Set Environment Variable (Optional)

If you don't want to manually enter the API Key each time, you can set an environment variable:

**macOS/Linux:**

```bash
export DASHSCOPE_API_KEY='sk-xxxxxxxxxxxx'
```

**Windows (CMD):**

```cmd
set DASHSCOPE_API_KEY=sk-xxxxxxxxxxxx
```

**Windows (PowerShell):**

```powershell
$env:DASHSCOPE_API_KEY='sk-xxxxxxxxxxxx'
```


## ⚙️ Technical Architecture

### Core Technology Stack

- **Flask** - Lightweight web framework
- **Socket.IO** - Real-time bidirectional communication
- **WebSocket** - Communication with Alibaba Cloud DashScope API
- **CosyVoice** - Alibaba Cloud speech synthesis model

### Workflow

```
┌─────────┐    Socket.IO    ┌──────────┐    WebSocket    ┌──────────────┐
│ Web UI  │ ◄─────────────► │Flask Srv │ ◄─────────────► │Alibaba Cloud │
└─────────┘                 └──────────┘                 └──────────────┘
    │                            │                              │
    │ 1. Send text request       │                              │
    ├──────────────────────────►│                              │
    │                            │ 2. Establish WebSocket      │
    │                            ├────────────────────────────►│
    │                            │ 3. Send text data           │
    │                            ├────────────────────────────►│
    │                            │ 4. Stream audio data back   │
    │                            │◄────────────────────────────┤
    │ 5. Forward audio stream    │                              │
    │◄───────────────────────────┤                              │
    │ 6. Play audio              │                              │
```


## 🔧 Configuration

### Modify Port

Edit the last line of `my_cosyvoice_websocket_server.py`:

```python
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=9000)  # Change the port number here
```

### Modify Voice

Edit lines 670 and 722 in `my_cosyvoice_client.html`.


## 🛠️ Troubleshooting

### Issue 1: Port Already in Use

**Symptom:** Startup shows port 9000 is already in use

**Solution:**

```bash
# Find the process using the port
lsof -i :9000        # macOS/Linux
netstat -ano | findstr :9000  # Windows

# Terminate the process
kill -9 <PID>        # macOS/Linux
taskkill /PID <PID> /F  # Windows
```

Or modify the configuration to use a different port.


### Issue 2: Invalid API Key

**Symptom:** Authentication failure during synthesis

**Solution:**

1. Confirm API Key format is correct (starts with `sk-`)
2. Visit console to confirm API Key status
3. Confirm account has balance or free quota
4. Re-obtain and set API Key

### Issue 3: WebSocket Connection Failed

**Symptom:** Console shows WebSocket errors

**Possible Causes and Solutions:**

1. **Network Issue** - Check network connection, confirm access to Alibaba Cloud services
2. **Firewall Blocking** - Temporarily disable firewall for testing
3. **Proxy Settings** - If using proxy, confirm proxy configuration is correct
4. **API Key Expired** - Re-obtain API Key

### Issue 4: Python Version Too Low

**Symptom:** Startup script shows Python version not supported

**Solution:**

Install Python 3.7 or higher:

- **macOS:** `brew install python@3.11`
- **Ubuntu/Debian:** `sudo apt install python3.11`
- **Windows:** Download and install from [python.org](https://www.python.org/downloads/)

## 📊 Performance Optimization Suggestions

1. **Use Local Cache** - Cache audio files for identical text
2. **Connection Reuse** - Maintain WebSocket connection for long-term use
3. **Text Preprocessing** - Optimize text segmentation logic to improve synthesis efficiency
4. **Concurrency Control** - Limit the number of simultaneous synthesis tasks

## 🔒 Security Recommendations

1. **Don't Commit API Key** - Add API Key to `.gitignore`
2. **Use Environment Variables** - Avoid hardcoding sensitive information in code
3. **Restrict Access** - Add authentication mechanism in production environment
4. **HTTPS** - Use HTTPS encryption for transmission in production environment

## 📝 Development Guide

### Adding New Features

1. Modify `my_cosyvoice_websocket_server.py` to add server-side logic
2. Modify `templates/my_cosyvoice_client.html` to add frontend interface

### Logging

The server will output detailed logs to the console:
- Client connection/disconnection events
- Synthesis request details (text, voice, session ID)
- WebSocket status changes
- Error stack traces

### Debug Mode

Modify `my_cosyvoice_websocket_server.py` to enable Flask debug mode:

```python
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=9000, debug=True)
```
