MSX-AUDIO and YM2152 OPM have a CSM mode that may be worth looking in to as well.
The real problem is how to generate the data from a speech sample.
My matlab script basically does this
- read a wav file
- low pass filter the input in 500Hz-6000Hz
- segment the audio file in chunks of 1/60 of sec
- compute the power spectrum of the chunk (via FFT)
- find the 8 biggest local maxima of the power spectrum making sure they do not mask each other
- encode their frequencies and amplitudes as msx periods and volumes
Simple but with some manual tweaking on the encoding side
Merci Artrag pour les détails, simple & efficace, j'adore
That MatLab script could probably be ported to C code. There are lots of C FFT sources on the net. Finding the peaks and generating the data (do we need to define a special format?) should not be too difficult. We could have a nice generic command line tool (using only stdio / stdlib) that would work on all platforms...
My SofaCas tool already has the basic framework for that (parsing a .WAV file and analyzing frequencies, it's just using Goertzel instead of FFT...). Sources are available on my website. I'd be tempted to give that project a try but that will not be immediatley...
@ARTAG: Do you take into account that the PSG plays square waves instead of sine waves? Maybe you could, after you found the first spectral peak, subtract this peak including all the harmonics caused by the square wave (instead of sine wave) before looking for the 2nd peak. This *might* improve sound quality a bit.
Actually, how often do you find peaks that are harmonics of each other? In case of SCC you could create custom waveforms that replicate those harmonics. Of course that would take more data, but it should allow to more closely match the full spectra.
No idea, this is the main script, it compiles too.
Just replace the path where wav files are with your own path.
Someone with octave could tell if it works.
Called it sample2.m
and used /tmp/
as path which contained fc.wav
, but alas:
>> sample2 ii = 1 /tmp/fc.waverror: findpeaks: argument 'SORTSTR' is not a valid parameter error: called from error at line 480 column 7 parse at line 397 column 13 findpeaks at line 136 column 5 sample2 at line 46 column 18
Goertzel pour détecter les données FSK de les cassettes? magnifique! Mon Franch est trop rouillé pour continuer
Here, looking for the local maxima maybe resorting to the fft could be more efficient.
Anyway willing to reuse Goertzel you could span the whole bandwidth at 5Hz steps and look for the maxima.
I think the result could be acceptable, even if the resulting coder wouldn't be very efficient
@ARTAG: Do you take into account that the PSG plays square waves instead of sine waves? Maybe you could, after you found the first spectral peak, subtract this peak including all the harmonics caused by the square wave (instead of sine wave) before looking for the 2nd peak. This *might* improve sound quality a bit.
Actually, how often do you find peaks that are harmonics of each other? In case of SCC you could create custom waveforms that replicate those harmonics. Of course that would take more data, but it should allow to more closely match the full spectra.
I do not see harmonics in the maxima, so it is very hard to keep into account for the higher harmonics of the square waves.
I see very little chance that higher harmonics from square wave falls in the right place at the right amplitude.
These are the data for "double".
If the maxima were harmonics, I would have had some sort of progression among periods.
Periods are sort by amplitude, you see volumes (psg in the upper nibble, scc in the lower) every second line
dw 311,55,85,35,233,67,110,45 db 0x71,0x71,0x71,0x71,0x81,0x81,0x81,0x92 dw 186,75,621,143,47,62,67,373 db 0x71,0x81,0x91,0x92,0x92,0xA2,0xB3,0xE9 dw 56,62,75,621,69,133,233,311 db 0xA2,0xB3,0xB3,0xC4,0xD5,0xD6,0xD7,0xEA dw 47,621,155,311,67,75,133,207 db 0xA2,0xB3,0xC5,0xD7,0xD7,0xD7,0xE9,0xFF dw 621,155,81,110,311,72,207,124 db 0xC4,0xC5,0xD6,0xD6,0xD7,0xE9,0xEA,0xEB dw 93,72,169,110,311,81,124,207 db 0xB3,0xB3,0xC5,0xD6,0xD7,0xE8,0xE8,0xE9 dw 621,75,311,98,169,233,85,133 db 0xA2,0xB3,0xC4,0xC5,0xD6,0xD6,0xE8,0xEA dw 37,621,373,186,85,233,98,143 db 0x71,0x92,0xB3,0xB3,0xB3,0xC5,0xD6,0xD7 dw 89,621,186,266,373,124,143,110 db 0x81,0xA2,0xB3,0xB3,0xB3,0xC4,0xC5,0xC5 dw 55,81,93,110,155,207,266,466 db 0x20,0x30,0x30,0x40,0x60,0x71,0x81,0xB3 dw 169,133,81,58,62,104,311,466 db 0x00,0x20,0x20,0x20,0x20,0x40,0x40,0xA2 dw 75,155,98,85,266,117,133,466 db 0x40,0x50,0x50,0x60,0x61,0x61,0x71,0x81 dw 89,104,207,155,117,466,133,266 db 0x20,0x50,0x71,0x91,0xA2,0xA2,0xB3,0xC5 dw 72,85,110,621,373,155,133,266 db 0x30,0x50,0x91,0x92,0x92,0xA3,0xB3,0xD7 dw 93,38,104,466,186,155,133,233 db 0x00,0x10,0x20,0x81,0x81,0xA2,0xB3,0xD6 dw 37,32,81,466,124,169,143,233 db 0x10,0x20,0x20,0x81,0x91,0x91,0xA2,0xC4 dw 85,37,98,466,133,155,311,207 db 0x30,0x30,0x40,0x81,0x81,0x81,0x92,0x92 dw 33,98,36,31,37,124,143,233 db 0x10,0x10,0x20,0x20,0x40,0x50,0x92,0x92 dw 37,110,621,124,169,207,143,266 db 0x30,0x30,0x30,0x40,0x71,0x71,0x71,0x81 dw 35,36,110,932,124,373,155,266 db 0x00,0x00,0x00,0x10,0x10,0x50,0x61,0x71 dw 1864,37,35,110,373,169,207,266 db 0x00,0x00,0x00,0x00,0x30,0x50,0x61,0x71 dw 49,98,85,36,133,466,155,266 db 0x00,0x00,0x00,0x00,0x20,0x30,0x60,0x71 dw 78,38,35,93,104,37,155,233 db 0x00,0x00,0x00,0x00,0x00,0x00,0x50,0x71 dw 93,85,1864,104,117,373,155,266 db 0x00,0x00,0x00,0x00,0x00,0x20,0x50,0x60
Goertzel pour détecter les données FSK de les cassettes? magnifique! Mon Franch est trop rouillé pour continuer
Yeah, Goertzel seems like the recommended method for FSK.
Here, looking for the local maxima maybe resorting to the fft could be more efficient.
Anyway willing to reuse Goertzel you could span the whole bandwidth at 5Hz steps and look for the maxima.
I think the result could be acceptable, even if the resulting coder wouldn't be very efficient
That's what I do in the header detection phase (not using fixed steps but a binary search for the more powerfull frequency).
Having several parallel Goertzel's is supposed to be less efficient as FFT, but I was lazy for SofaCas. Do you have an idea of the frequency range we should check ? Anyway, CPU time is not really a concern as this is a command line tool running on modern PCs...
In a sense I keep into account the fact the psg does not produce tones.
I enable low bandwidth noise on the louder psg channel.
This tends to mask artifacts due to the higher harmonics of the square wave.
Goertzel pour détecter les données FSK de les cassettes? magnifique! Mon Franch est trop rouillé pour continuer
Yeah, Goertzel seems like the recommended method for FSK.
Here, looking for the local maxima maybe resorting to the fft could be more efficient.
Anyway willing to reuse Goertzel you could span the whole bandwidth at 5Hz steps and look for the maxima.
I think the result could be acceptable, even if the resulting coder wouldn't be very efficient
That's what I do in the header detection phase (not using fixed steps but a binary search for the more powerfull frequency).
Having several parallel Goertzel's is supposed to be less efficient as FFT, but I was lazy for SofaCas. Do you have an idea of the frequency range we should check ? Anyway, CPU time is not really a concern as this is a command line tool running on modern PCs...
I agree, cpu time here isn't a problem. For human voice you should span from 450Hz to 5000Hz (maybe 6000Hz for female voices gives better results). I would use 5Hz steps or less (errors of 3Hz are usually not perceived).
No idea, this is the main script, it compiles too.
Just replace the path where wav files are with your own path.
Someone with octave could tell if it works.
Called it sample2.m
and used /tmp/
as path which contained fc.wav
, but alas:
>> sample2 ii = 1 /tmp/fc.waverror: findpeaks: argument 'SORTSTR' is not a valid parameter error: called from error at line 480 column 7 parse at line 397 column 13 findpeaks at line 136 column 5 sample2 at line 46 column 18
This is always the problem with octave. Many commands are not complete.
Anyway if you want to use octave, I could avoid SORTSTR by sorting explicitly the results from findpeaks.
Let me see what I can do.