Voice synthesis on ISR

Page 7/36
1 | 2 | 3 | 4 | 5 | 6 | | 8 | 9 | 10 | 11 | 12

Par Louthrax

Prophet (2492)

Portrait de Louthrax

29-05-2016, 22:51

ARTRAG wrote:

I agree, cpu time here isn't a problem. For human voice you should span from 450Hz to 5000Hz (maybe 6000Hz for female voices gives better results). I would use 5Hz steps or less (errors of 3Hz are usually not perceived).

Mmmm, that make 1000 parallel Goertzels, it might be a bit slow... I'm tempted to first add a real FFT to SofaCas for clean header frequency detection, converting the project to "WAV2MSXVOICE (??)" would then be easy.

Par ARTRAG

Enlighted (6976)

Portrait de ARTRAG

29-05-2016, 22:56

@Manuel
Try this version in octave


close all;
clear;
path = 'wav\SKYJAGUAR\';

names = dir([path '*.wav']);
nfiles = size(names,1);

for ii = 1:nfiles 
    ii
    name = [ path names(ii).name];

    [Y,FS,NBITS] = wavread(name);
    if size(Y,2)>1
     X = Y(:,1)+Y(:,2);
    else
     X = Y;
    end

    Wn = [450/FS, 6000/FS];
    
    [Bbp,Abp]=butter(5,Wn); 

    Tntsc = 1/60;

    Nntsc = fix(Tntsc*FS);

    Nblk = fix(length(X)/Nntsc);
    X = X(1:Nblk*Nntsc);
    L = length(X);

    t = [1:Nntsc]';

    Y = zeros(Nblk*Nntsc,1);
    f = zeros(Nblk,8);
    a = zeros(Nblk,8);
    p = zeros(1,8);
    
    X = filter(Bbp,Abp,X);
    
    for i=1:Nblk
        x = X((i-1)*Nntsc+1:i*Nntsc);

        XF = abs(fft(x));
        [pks,locs]= findpeaks(XF(1:round(Nntsc/2))); 
        [y,j]=sort(pks,'descend');
        pks = pks(j);
        locs = locs(j);

        if size(pks,1)<8
            y = zeros(1,8);
            y(1:length(pks)) = pks;
            pks = y;
            y = round(Nntsc/2)-7:round(Nntsc/2);
            y(1:length(locs)) = locs;
            locs = y;
        end
        
         pks = pks(8:-1:1);
         locs = locs(8:-1:1);
        
        y = zeros(size(x));
        freq = zeros(1,8);
        amp  = zeros(1,8);
        
        for  ti=1:8
            j = locs(ti);   
            freq(ti) = (j-1)/Nntsc*FS;
            amp(ti) = abs(XF(j))/Nntsc;
            y = y + amp(ti)*(sin(2*pi*freq(ti)*t/FS+p(ti)));
            p(ti) = 2*pi*freq(ti)*t(end)/FS;
        end

        Y((i-1)*Nntsc+1:i*Nntsc) =  y;
        f(i,:) = freq;
        a(i,:) = amp;
    end

    sound(X,FS)
    sound(Y,FS)

%     figure (ii)
%     subplot(2,1,1),plot(abs(fftshift(fft(x))));
%     subplot(2,1,2),plot(abs(fftshift(fft(y))));
%     plot(abs(fftshift(fft(X))));
%     hold on
%     plot(abs(fftshift(fft(Y))),'r');
%     pause(0.5)    
%    wavwrite(Y,FS,NBITS,[name 'out.wav'])

    TP = uint16(3579545./(32*f));

    m = max(a(:));
    nscc = uint8(a/m*15);

    npsg = 2*log2(a/m)+15;
    npsg(isinf(npsg))=0;
    npsg = uint8(ceil(npsg));

	n = npsg*16+nscc;		% in the same byte psg and scc volumes
	
    fid = fopen([name 'frm_scc3.txt'],'w');
    for i = 1:Nblk
        fprintf(fid,'    dw %d,%d,%d,%d,%d,%d,%d,%d \n',TP(i,1),TP(i,2),TP(i,3),TP(i,4),TP(i,5),TP(i,6),TP(i,7),TP(i,8));
        fprintf(fid,'    db 0x%s,0x%s,0x%s,0x%s,0x%s,0x%s,0x%s,0x%s \n',dec2hex(n(i,1),2),dec2hex(n(i,2),2),dec2hex(n(i,3),2),dec2hex(n(i,4),2),dec2hex(n(i,5),2),dec2hex(n(i,6),2),dec2hex(n(i,7),2),dec2hex(n(i,8),2));
    end
    fclose(fid);

end

fid = fopen('frm_scc3.txt','w');
fprintf(fid,'nfiles: equ  %d \n\n',nfiles);

fprintf(fid,'   page 0\n');

fprintf(fid,'frames: \n');


for ii = 1:nfiles
    fprintf(fid,'   dw frame%d\n',ii-1);
    fprintf(fid,'   db :frame%d\n',ii-1);
end

for ii = 1:nfiles 
    fprintf(fid,'   page 1..31\n');
    name = [ path names(ii).name];
    fprintf(fid,'frame%d: \n',ii-1);
    fprintf(fid,'   include %s \n',[name 'frm_scc3.txt']);
    fprintf(fid,'   db	080h\n');
end
fclose(fid);

!sjasm -Iasm -s sccLOFI3.asm
 

Par ARTRAG

Enlighted (6976)

Portrait de ARTRAG

29-05-2016, 23:17

Louthrax wrote:
ARTRAG wrote:

I agree, cpu time here isn't a problem. For human voice you should span from 450Hz to 5000Hz (maybe 6000Hz for female voices gives better results). I would use 5Hz steps or less (errors of 3Hz are usually not perceived).

Mmmm, that make 1000 parallel Goertzels, it might be a bit slow... I'm tempted to first add a real FFT to SofaCas for clean header frequency detection, converting the project to "WAV2MSXVOICE (??)" would then be easy.

Why not running the binary search for maxima 50 times in different intervals of 100Hz and choose the 8 highest results?

Par Manuel

Ascended (19676)

Portrait de Manuel

29-05-2016, 23:26

ARTRAG wrote:

@Manuel
Try this version in octave

!sjasm -Iasm -s sccLOFI3.asm
 

If I leave out the sjasm line, I get output Smile Not sure whether it's correct, though Smile

Par ARTRAG

Enlighted (6976)

Portrait de ARTRAG

29-05-2016, 23:31

I've sent you a link to my asm files.
Try them including your output

Par NYYRIKKI

Enlighted (6089)

Portrait de NYYRIKKI

30-05-2016, 01:01

ARTRAG wrote:

Totally true! This is why I was asking for snippets to detect OPLL and play tones at variable volumes and frequencies. OPLL is still a obscure for me, but it should be perfectly suitable for the purpose.

I don't have any ready snippets for you, but it's good that you are good in math. :) The final played frequency is result of frequency representing number (F-number), octave (BLOCK) and multiplier (MUL).

See here: https://www.msx.org/wiki/MSX-Music_programming#FM-PAC
and here: http://www.smspower.org/maxim/Documents/YM2413ApplicationMan...
(Volume explained in page 17)

Par hit9918

Prophet (2932)

Portrait de hit9918

30-05-2016, 00:43

wouter_ wrote:

@ARTAG: Do you take into account that the PSG plays square waves instead of sine waves? Maybe you could, after you found the first spectral peak, subtract this peak including all the harmonics caused by the square wave (instead of sine wave) before looking for the 2nd peak. This *might* improve sound quality a bit.

I wondered about this one, too.
I think fourier is "multiply the wave with a question-sinus, the integral is the amplitude of that question frequency".
Now what if one uses question-squares instead of sinuses.
Maybe that's all and one needs do no more about it and the resulting peaks graph is ideal for PSG.
In that graph a 100hz square would make a graph with a peak on 100hz and no peak on 300hz.
"only a 100hz instruction, the 300hz is implied", the square wave machines diagram Smile
The whole distribution might be all slightly different, different tones turning out as the most important.

Par hit9918

Prophet (2932)

Portrait de hit9918

30-05-2016, 04:05

poke &hfd10,&h38 : x = usr(0)
poke it before every call to usr

this poke improves the PSG much Big smile
the noise channel was on.
That and the improvements the last ROM brought and then a naked PSG goes better than before the SCC.

Par ARTRAG

Enlighted (6976)

Portrait de ARTRAG

30-05-2016, 08:11

Noise was added on purpose to mask higher harmonics from squarewaves.
Maybe one can tune it by adding it to a weaker channel, but it improves the perceived quality (unless your audio system is severely filtering high frequencies)

Par ARTRAG

Enlighted (6976)

Portrait de ARTRAG

30-05-2016, 08:49

Wouter
I was looking closer at the data and maybe admitting some errors one can see harmonics.
The fact is that taking into account this aspect is not easy. One could design the sccwave to match those higher harmonics.
This would give a better spectral approximation, but analysis apart (algorithm to be invented), the player should mix tones at different frequencies with different amplitudes and update the 4 waves at each interrupt.
In this way one could encode in the data only the relatve amplitudes of the higher harmonics and encode voice without making data explode

Page 7/36
1 | 2 | 3 | 4 | 5 | 6 | | 8 | 9 | 10 | 11 | 12