Highlights of GPT-SoVITS-WebUI
1. Powerful Voice Conversion and TTS Capabilities
- Zero - shot TTS: With just a 5 - second vocal sample, users can achieve instant text - to - speech conversion.
- Few - shot TTS: Fine - tuning the model with only 1 minute of training data can improve voice similarity and realism.
2. Cross - lingual Support
It supports inference in multiple languages, including English, Japanese, Korean, Cantonese, and Chinese, allowing users to perform voice conversion and TTS across different languages from the training dataset.
3. Comprehensive WebUI Tools
The integrated WebUI includes useful tools such as voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling. These tools assist beginners in creating training datasets and GPT/SoVITS models.
4. User - friendly Installation
- Multiple Environments: Tested on various Python and PyTorch versions across different devices like CUDA, Apple silicon, and CPU.
- Windows: Integrated package available for easy download and startup with a double - click on go - webui.bat.
- Linux and macOS: Installable via
conda
andbash
scripts, with specific instructions for different systems.
5. Regular Updates and Improvements
- Output Text Support: Added support for Chinese - English mixed and Japanese - English mixed output texts.
- Output Mode: Optional segmentation mode for output.
- Bug Fixes: Fixed issues like UVR5 directory reading and multiple newline inference errors.
- Optimization: Removed redundant logs in the inference WebUI and optimized the processing logic for numbers and English in text.
6. Community and Online Resources
- Online Demos: Available on platforms like Colab and Huggingface, allowing users to quickly experience the features.
- Community Support: A Discord community is provided for users to communicate and get help.
- User Guides: Offered in multiple languages, including Chinese and English, to assist users in using the tool.