Voice synthesis on ISR

Page 6/36
1 | 2 | 3 | 4 | 5 | | 7 | 8 | 9 | 10 | 11

Par Grauw

Ascended (10768)

Portrait de Grauw

29-05-2016, 21:48

MSX-AUDIO and YM2152 OPM have a CSM mode that may be worth looking in to as well.

Par Louthrax

Prophet (2465)

Portrait de Louthrax

29-05-2016, 22:05

ARTRAG wrote:

The real problem is how to generate the data from a speech sample.
My matlab script basically does this

- read a wav file
- low pass filter the input in 500Hz-6000Hz
- segment the audio file in chunks of 1/60 of sec
- compute the power spectrum of the chunk (via FFT)
- find the 8 biggest local maxima of the power spectrum making sure they do not mask each other
- encode their frequencies and amplitudes as msx periods and volumes

Simple but with some manual tweaking on the encoding side

Merci Artrag pour les détails, simple & efficace, j'adore Smile

That MatLab script could probably be ported to C code. There are lots of C FFT sources on the net. Finding the peaks and generating the data (do we need to define a special format?) should not be too difficult. We could have a nice generic command line tool (using only stdio / stdlib) that would work on all platforms...

My SofaCas tool already has the basic framework for that (parsing a .WAV file and analyzing frequencies, it's just using Goertzel instead of FFT...). Sources are available on my website. I'd be tempted to give that project a try but that will not be immediatley...

Par wouter_

Hero (525)

Portrait de wouter_

29-05-2016, 22:10

@ARTAG: Do you take into account that the PSG plays square waves instead of sine waves? Maybe you could, after you found the first spectral peak, subtract this peak including all the harmonics caused by the square wave (instead of sine wave) before looking for the 2nd peak. This *might* improve sound quality a bit.

Actually, how often do you find peaks that are harmonics of each other? In case of SCC you could create custom waveforms that replicate those harmonics. Of course that would take more data, but it should allow to more closely match the full spectra.

Par Manuel

Ascended (19468)

Portrait de Manuel

29-05-2016, 22:12

ARTRAG wrote:

No idea, this is the main script, it compiles too.
Just replace the path where wav files are with your own path.
Someone with octave could tell if it works.

Called it sample2.m and used /tmp/ as path which contained fc.wav, but alas:

>> sample2
ii =  1
/tmp/fc.waverror: findpeaks: argument 'SORTSTR' is not a valid parameter
error: called from
    error at line 480 column 7
    parse at line 397 column 13
    findpeaks at line 136 column 5
    sample2 at line 46 column 18

Par ARTRAG

Enlighted (6935)

Portrait de ARTRAG

29-05-2016, 22:22

Goertzel pour détecter les données FSK de les cassettes? magnifique! Mon Franch est trop rouillé pour continuer

Here, looking for the local maxima maybe resorting to the fft could be more efficient.
Anyway willing to reuse Goertzel you could span the whole bandwidth at 5Hz steps and look for the maxima.
I think the result could be acceptable, even if the resulting coder wouldn't be very efficient

Par ARTRAG

Enlighted (6935)

Portrait de ARTRAG

29-05-2016, 22:32

wouter_ wrote:

@ARTAG: Do you take into account that the PSG plays square waves instead of sine waves? Maybe you could, after you found the first spectral peak, subtract this peak including all the harmonics caused by the square wave (instead of sine wave) before looking for the 2nd peak. This *might* improve sound quality a bit.

Actually, how often do you find peaks that are harmonics of each other? In case of SCC you could create custom waveforms that replicate those harmonics. Of course that would take more data, but it should allow to more closely match the full spectra.

I do not see harmonics in the maxima, so it is very hard to keep into account for the higher harmonics of the square waves.
I see very little chance that higher harmonics from square wave falls in the right place at the right amplitude.

These are the data for "double".
If the maxima were harmonics, I would have had some sort of progression among periods.
Periods are sort by amplitude, you see volumes (psg in the upper nibble, scc in the lower) every second line

    dw 311,55,85,35,233,67,110,45 
    db 0x71,0x71,0x71,0x71,0x81,0x81,0x81,0x92 
    dw 186,75,621,143,47,62,67,373 
    db 0x71,0x81,0x91,0x92,0x92,0xA2,0xB3,0xE9 
    dw 56,62,75,621,69,133,233,311 
    db 0xA2,0xB3,0xB3,0xC4,0xD5,0xD6,0xD7,0xEA 
    dw 47,621,155,311,67,75,133,207 
    db 0xA2,0xB3,0xC5,0xD7,0xD7,0xD7,0xE9,0xFF 
    dw 621,155,81,110,311,72,207,124 
    db 0xC4,0xC5,0xD6,0xD6,0xD7,0xE9,0xEA,0xEB 
    dw 93,72,169,110,311,81,124,207 
    db 0xB3,0xB3,0xC5,0xD6,0xD7,0xE8,0xE8,0xE9 
    dw 621,75,311,98,169,233,85,133 
    db 0xA2,0xB3,0xC4,0xC5,0xD6,0xD6,0xE8,0xEA 
    dw 37,621,373,186,85,233,98,143 
    db 0x71,0x92,0xB3,0xB3,0xB3,0xC5,0xD6,0xD7 
    dw 89,621,186,266,373,124,143,110 
    db 0x81,0xA2,0xB3,0xB3,0xB3,0xC4,0xC5,0xC5 
    dw 55,81,93,110,155,207,266,466 
    db 0x20,0x30,0x30,0x40,0x60,0x71,0x81,0xB3 
    dw 169,133,81,58,62,104,311,466 
    db 0x00,0x20,0x20,0x20,0x20,0x40,0x40,0xA2 
    dw 75,155,98,85,266,117,133,466 
    db 0x40,0x50,0x50,0x60,0x61,0x61,0x71,0x81 
    dw 89,104,207,155,117,466,133,266 
    db 0x20,0x50,0x71,0x91,0xA2,0xA2,0xB3,0xC5 
    dw 72,85,110,621,373,155,133,266 
    db 0x30,0x50,0x91,0x92,0x92,0xA3,0xB3,0xD7 
    dw 93,38,104,466,186,155,133,233 
    db 0x00,0x10,0x20,0x81,0x81,0xA2,0xB3,0xD6 
    dw 37,32,81,466,124,169,143,233 
    db 0x10,0x20,0x20,0x81,0x91,0x91,0xA2,0xC4 
    dw 85,37,98,466,133,155,311,207 
    db 0x30,0x30,0x40,0x81,0x81,0x81,0x92,0x92 
    dw 33,98,36,31,37,124,143,233 
    db 0x10,0x10,0x20,0x20,0x40,0x50,0x92,0x92 
    dw 37,110,621,124,169,207,143,266 
    db 0x30,0x30,0x30,0x40,0x71,0x71,0x71,0x81 
    dw 35,36,110,932,124,373,155,266 
    db 0x00,0x00,0x00,0x10,0x10,0x50,0x61,0x71 
    dw 1864,37,35,110,373,169,207,266 
    db 0x00,0x00,0x00,0x00,0x30,0x50,0x61,0x71 
    dw 49,98,85,36,133,466,155,266 
    db 0x00,0x00,0x00,0x00,0x20,0x30,0x60,0x71 
    dw 78,38,35,93,104,37,155,233 
    db 0x00,0x00,0x00,0x00,0x00,0x00,0x50,0x71 
    dw 93,85,1864,104,117,373,155,266 
    db 0x00,0x00,0x00,0x00,0x00,0x20,0x50,0x60 

Par Louthrax

Prophet (2465)

Portrait de Louthrax

29-05-2016, 22:35

ARTRAG wrote:

Goertzel pour détecter les données FSK de les cassettes? magnifique! Mon Franch est trop rouillé pour continuer

Smile Yeah, Goertzel seems like the recommended method for FSK.

ARTRAG wrote:

Here, looking for the local maxima maybe resorting to the fft could be more efficient.
Anyway willing to reuse Goertzel you could span the whole bandwidth at 5Hz steps and look for the maxima.
I think the result could be acceptable, even if the resulting coder wouldn't be very efficient

That's what I do in the header detection phase (not using fixed steps but a binary search for the more powerfull frequency).

Having several parallel Goertzel's is supposed to be less efficient as FFT, but I was lazy for SofaCas. Do you have an idea of the frequency range we should check ? Anyway, CPU time is not really a concern as this is a command line tool running on modern PCs...

Par ARTRAG

Enlighted (6935)

Portrait de ARTRAG

29-05-2016, 22:35

In a sense I keep into account the fact the psg does not produce tones.
I enable low bandwidth noise on the louder psg channel.
This tends to mask artifacts due to the higher harmonics of the square wave.

Par ARTRAG

Enlighted (6935)

Portrait de ARTRAG

29-05-2016, 22:40

Louthrax wrote:
ARTRAG wrote:

Goertzel pour détecter les données FSK de les cassettes? magnifique! Mon Franch est trop rouillé pour continuer

Smile Yeah, Goertzel seems like the recommended method for FSK.

ARTRAG wrote:

Here, looking for the local maxima maybe resorting to the fft could be more efficient.
Anyway willing to reuse Goertzel you could span the whole bandwidth at 5Hz steps and look for the maxima.
I think the result could be acceptable, even if the resulting coder wouldn't be very efficient

That's what I do in the header detection phase (not using fixed steps but a binary search for the more powerfull frequency).

Having several parallel Goertzel's is supposed to be less efficient as FFT, but I was lazy for SofaCas. Do you have an idea of the frequency range we should check ? Anyway, CPU time is not really a concern as this is a command line tool running on modern PCs...

I agree, cpu time here isn't a problem. For human voice you should span from 450Hz to 5000Hz (maybe 6000Hz for female voices gives better results). I would use 5Hz steps or less (errors of 3Hz are usually not perceived).

Par ARTRAG

Enlighted (6935)

Portrait de ARTRAG

29-05-2016, 22:43

Manuel wrote:
ARTRAG wrote:

No idea, this is the main script, it compiles too.
Just replace the path where wav files are with your own path.
Someone with octave could tell if it works.

Called it sample2.m and used /tmp/ as path which contained fc.wav, but alas:

>> sample2
ii =  1
/tmp/fc.waverror: findpeaks: argument 'SORTSTR' is not a valid parameter
error: called from
    error at line 480 column 7
    parse at line 397 column 13
    findpeaks at line 136 column 5
    sample2 at line 46 column 18

This is always the problem with octave. Many commands are not complete.
Anyway if you want to use octave, I could avoid SORTSTR by sorting explicitly the results from findpeaks.
Let me see what I can do.

Page 6/36
1 | 2 | 3 | 4 | 5 | | 7 | 8 | 9 | 10 | 11