Voice Cloning
This week we experiment with AI-powered voice cloning tools.
Prompting ChatGPT resulted in 10 suggestions for software tools that could either generate a unique AI voice or clone an existing voice given a good-quality sample.
I opted to use Resemble AI from https://elevenlabs.io for this experiment.
Although regular text to speech using a set of pre-defined voices is available on the free account plan, voice cloning is only available on a paid plan. The lowest cost option is around £10 per month which is good for around 120 minutes of generated audio per month.
The first step is to upload some good quality samples of your voice. The suggestion is at least 1 minute of audio, but I uploaded 4 files of around a minute each reading the introductions to some of my product growth themed articles.
Here is one of the samples from the original recordings.
It only takes a minute or so to analyse the files and produce a voice that is ready to be selected from the drop-down list of voices.
The user interface supports a number of different use cases including text to speech, speech to speech conversions, video dubbing, automatic web site to speech narration and more.
I took the opening paragraphs from one of the same articles I used in the training set and produced the following AI narrated version.
The results are pretty good considering the minimal training and time taken to generate this initial voice. I don't think it is going to fool close relatives, but is absolutely "good enough" for quite a few different applications.
Leaving the ethical and ownership concerns aside, I could absolutely see how this could be useful for content producers. Some of the advantages include:
- The ability to "record" narrations in noisy environments.
- Create audio in a consistent tone, style and without mistakes.
- Generate more audio/video content to bring written work alive.
- Piece together different parts of a conversation into a cohesive whole.
Of course, there are a number of disadvantages too including:
- Can still be recognised as AI-generated.
- Lack of authenticity and possible trust issues.
- Difficult to re-enforce particular points.
- Irritating to listen to for extended periods.
Got to go now as I need to phone my Mum...I wonder.
Comments ()