How to Use TTS Voice Cloning ai locally - Text to Speech ai Voice cloning locally on your PC.

This tutorial will show you how to use RVC (Real Time Voice Cloning) voice models to create text-to-speech (TTS) in anyone's voice. The tool and the plugins that come with it will allow you to create and clone voices using Text to speech locally on your own computer, so you won't have to pay huge amounts of money to services like Eleven Labs. You'll need a fair bit of space set aside for everything we're going to use so make sure your hard drive isn't full.

How to Download and Setup Applio on Windows.

If it does accept and open Applio in your browser. If it doesn't, you can copy and paste the address into a browser window to get access.

How to add Voice Models to Applio to Text to Speech. Joe Biden, Morgan Freeman, Taylor Swift, etc.

Now that you have Applio set up you can add custom voice models to it. This will allow you to create text-to-speech content in anyone's voice, so long as there is a sample.

As with everything AI at the moment, you will probably have to experiment with the options in the TTS Voices tab, some seem to work better with certain voices than others. Just don't mix and match languages and genders. You'll get some weird results if you do.

How to Clone Voices for Text to Speech Locally on your PC.

Here is where things get a little tricky and a lot more time-consuming. You'll also need a fairly high-end GPU. For example, cloning a voice locally on my laptop with an RTX 3050 took about 8 hours so if you have anything less, you might be waiting for quite a long time. The Voice output file was also 60GB in total so make sure you have at least 80gb of free storage space if you are planning on using Applio.

The rest of the process is a waiting game. As I mentioned above it will take a very, very long time for Python and the AI tools to analyse and clone your voice sample. There are also quite a lot of other checkboxes and variables throughout this process that you can experiment with, however for your first run through, it's best to just use the default settings and see what sort of results you get from the local voice cloning process. If you encounter any weird errors with ports a quick PC restart will usually solve those problems.

When you're voice is ready you'll find it listed as a Voice Model along with the Index file so you'll be able to use them with any TTS transcript. The good news is that the actual Text to speech voice component doesn't take all that long to process. Though it's not quite as good as the results coming out of Eleven Labs. At least not yet but for an entirely local process, it's pretty damn impressive.

Comments