Robert Keller – Feedback using a phase vocoder

Hi, for my feedback with found systems assignment, I used a phase vocoder that I improved (initially from http://sethares.engr.wisc.edu/vocoders/matlabphasevocoder.html). A phase vocoder is essentially a system that slows down or speeds up audio without altering the pitch of the audio. Designing an industry standard phase vocoder is rather difficult, and the implementation below is not without it’s shortcomings. Namely, the process of speeding down audio using the vocoder below introduces a “phasey” aspect to the resulting audio. The phasey artifacts introduced by the vocoder are initially passable, but if one were to say, slow the initial audio down by a factor of 2, then speed the audio up by a factor of 2, and then repeat the process 20+ times, one would drastically alter the original quality of the audio. The final result is incredibly phasey. The initial piece of audio I used was from the song “Message in a Bottle” by the Police. Below is the resulting audio and code from Matlab.

function [finalvector] = vocoder(inputvector,speedupfactor,slowdownfactor,time)%#codegen
%vocoder takes in 3 inputs, a vector and 2 speed factors returns a vector
% vector input has 1 row and a bunch of columns
%that has been slowed down or sped up
if nargin == 3
time=0; % total time to process
end
hopin=speedupfactor; % hop length for input
hopout=slowdownfactor; % hop length for output
all2pi=2*pi*(0:100); % all multiples of 2 pi (used in PV-style freq search)
max_peak=50; % parameters for peak finding: number of peaks
eps_peak=0.005; % height of peaks
nfft=2^12; nfft2=nfft/2; % fft length
win=hanning(nfft)'; % windows and windowing variables
%[y,sr]=audioread(infile); % read song file
sr = 44100;
y = inputvector;
%siz=size(y); % length of song in samples
%stmo=min(siz); leny=max(siz); % stereo (stmo=2) or mono (stmo=1)
leny = length(y);
%if(siz(1)>siz(2))
%finalsignallength = round(siz(1) * (slowdownfactor/speedupfactor)); %this line needs to adapt
%else
%finalsignallength = round(siz(2) * (slowdownfactor/speedupfactor));
%end
%does not work if siz(2)<siz(1)
%if siz(2)<siz(1), y=y'; end
if time==0, time = 100000; end
tt=zeros(1,ceil(hopout/hopin)*min(leny,time*sr)); % place for output
lenseg=floor((min(leny,time*sr)-nfft)/hopin); % number of nfft segments to process
ssf=sr*(0:nfft2)/nfft; % frequency vector
phold=zeros(1,nfft2+1); phadvance=zeros(1,nfft2+1);
outbeat=zeros(1,nfft); pold1=[]; pold2=[];
dtin=hopin/sr; % time advances dt per hop for input
dtout=hopout/sr; % time advances dt per hop for output
for k=1:lenseg-1 % main loop - process each beat separately
%if k/1000==round(k/1000), disp(k), end % for display so I know where we are
indin=round(((k-1)*hopin+1):((k-1)*hopin+nfft));
% do L/R channels separately
s=win.*y(1,indin); % get this frame and take FFT
ffts=fft(s);
mag=abs(ffts(1:nfft2+1));
ph=angle(ffts(1:nfft2+1));
% find peaks to define spectral mapping
peaks=findPeaks4(mag, max_peak, eps_peak, ssf);
[dummy,inds]=sort(mag(peaks(:,2)));
peaksort=peaks(inds,:);
pc=peaksort(:,2);
bestf=zeros(size(pc));
for tk=1:length(pc) % estimate frequency using PV strategy
dtheta=(ph(pc(tk))-phold(1,pc(tk)))+all2pi;
fest=dtheta./(2*pi*dtin); % see pvanalysis.m for same idea
[er,indf]=min(abs(ssf(pc(tk))-fest));
bestf(tk)=fest(indf); % find best freq estimate for each row
end
% generate output mag and phase
magout=mag; phout=ph;
for tk=1:length(pc)
fdes=bestf(tk); % reconstruct with original frequency
freqind=(peaksort(tk,1):peaksort(tk,3)); % indices of the surrounding bins
% specify magnitude and phase of each partial
magout(freqind)=mag(freqind);
phadvance(1,peaksort(tk,2))=phadvance(1,peaksort(tk,2))+2*pi*fdes*dtout;
pizero=pi*ones(1,length(freqind));
pcent=peaksort(tk,2)-peaksort(tk,1)+1;
indpc=(2-mod(pcent,2)):2:length(freqind);
pizero(indpc)=zeros(1,length(indpc));
phout(freqind)=phadvance(1,peaksort(tk,2))+pizero;
end
% reconstruct time signal (stretched or compressed)
compl=complex(magout.*exp(sqrt(complex(-1))*phout));
compl(nfft2+1)=ffts(nfft2+1); %right hand is complex
compl=[compl,fliplr(conj(compl(2:(nfft2))))];
wave=real(ifft(compl));
outbeat(1,:)=wave;
phold(1,:)=ph;
% end stereo
indout=round(((k-1)*hopout+1):((k-1)*hopout+nfft));
tt(:,indout)=tt(:,indout)+outbeat;
end
tt=0.8*tt/max(max(abs(tt)));
%vectorbeforetruncation = tt;
%tt = tt(1:2,1:finalsignallength); %fixthis
%testvector = tt(1,1:end);
i1 = find(tt,1,'last');
finalvector = tt(1,1:i1);
%finalvector = newtt;
%finalvector = tt;
end
view raw phasevocoder.m hosted with ❤ by GitHub

Here is the script I used to generate the audio:

[a,sr]=audioread('messageinabottlemono.wav');
b = a(882000:1323000,1);
audiowrite('messageinabottle1.wav',b,44100);
sliceofsignal = b';
resultsloweddown = vocoder(sliceofsignal,250,500);
resultspedup = vocoder(resultsloweddown,500,250);
audiowrite('messageinabottle2.wav',resultspedup,44100);
resultsloweddown = vocoder(resultspedup,250,500);
resultspedup = vocoder(resultsloweddown,500,250);
audiowrite('messageinabottle3.wav',resultspedup,44100);
resultsloweddown = vocoder(resultspedup,250,500);
resultspedup = vocoder(resultsloweddown,500,250);
audiowrite('messageinabottle4.wav',resultspedup,44100);
resultsloweddown = vocoder(resultspedup,250,500);
resultspedup = vocoder(resultsloweddown,500,250);
audiowrite('messageinabottle5.wav',resultspedup,44100);
for c = 1:4
resultsloweddown = vocoder(resultspedup,250,500);
resultspedup = vocoder(resultsloweddown,500,250);
end
audiowrite('messageinabottle9.wav',resultspedup,44100);
for c = 1:11
resultsloweddown = vocoder(resultspedup,250,500);
resultspedup = vocoder(resultsloweddown,500,250);
end
audiowrite('messageinabottle20.wav',resultspedup,44100);

Original Audio:

Audio Player

Audio passed through phase vocoder once:

Audio Player

Audio passed through twice:

Audio Player

Audio passed through three times:

Audio Player

Audio passed through nine times:

Audio Player

Audio passed through fifteen times:

Audio Player

Audio passed through twenty times:

Audio Player

Audio passed through thirty times:

Audio Player