Hi, for my feedback with found systems assignment, I used a phase vocoder that I improved (initially from http://sethares.engr.wisc.edu/vocoders/matlabphasevocoder.html). A phase vocoder is essentially a system that slows down or speeds up audio without altering the pitch of the audio. Designing an industry standard phase vocoder is rather difficult, and the implementation below is not without it’s shortcomings. Namely, the process of speeding down audio using the vocoder below introduces a “phasey” aspect to the resulting audio. The phasey artifacts introduced by the vocoder are initially passable, but if one were to say, slow the initial audio down by a factor of 2, then speed the audio up by a factor of 2, and then repeat the process 20+ times, one would drastically alter the original quality of the audio. The final result is incredibly phasey. The initial piece of audio I used was from the song “Message in a Bottle” by the Police. Below is the resulting audio and code from Matlab.
function [finalvector] = vocoder(inputvector,speedupfactor,slowdownfactor,time)%#codegen | |
%vocoder takes in 3 inputs, a vector and 2 speed factors returns a vector | |
% vector input has 1 row and a bunch of columns | |
%that has been slowed down or sped up | |
if nargin == 3 | |
time=0; % total time to process | |
end | |
hopin=speedupfactor; % hop length for input | |
hopout=slowdownfactor; % hop length for output | |
all2pi=2*pi*(0:100); % all multiples of 2 pi (used in PV-style freq search) | |
max_peak=50; % parameters for peak finding: number of peaks | |
eps_peak=0.005; % height of peaks | |
nfft=2^12; nfft2=nfft/2; % fft length | |
win=hanning(nfft)'; % windows and windowing variables | |
%[y,sr]=audioread(infile); % read song file | |
sr = 44100; | |
y = inputvector; | |
%siz=size(y); % length of song in samples | |
%stmo=min(siz); leny=max(siz); % stereo (stmo=2) or mono (stmo=1) | |
leny = length(y); | |
%if(siz(1)>siz(2)) | |
%finalsignallength = round(siz(1) * (slowdownfactor/speedupfactor)); %this line needs to adapt | |
%else | |
%finalsignallength = round(siz(2) * (slowdownfactor/speedupfactor)); | |
%end | |
%does not work if siz(2)<siz(1) | |
%if siz(2)<siz(1), y=y'; end | |
if time==0, time = 100000; end | |
tt=zeros(1,ceil(hopout/hopin)*min(leny,time*sr)); % place for output | |
lenseg=floor((min(leny,time*sr)-nfft)/hopin); % number of nfft segments to process | |
ssf=sr*(0:nfft2)/nfft; % frequency vector | |
phold=zeros(1,nfft2+1); phadvance=zeros(1,nfft2+1); | |
outbeat=zeros(1,nfft); pold1=[]; pold2=[]; | |
dtin=hopin/sr; % time advances dt per hop for input | |
dtout=hopout/sr; % time advances dt per hop for output | |
for k=1:lenseg-1 % main loop - process each beat separately | |
%if k/1000==round(k/1000), disp(k), end % for display so I know where we are | |
indin=round(((k-1)*hopin+1):((k-1)*hopin+nfft)); | |
% do L/R channels separately | |
s=win.*y(1,indin); % get this frame and take FFT | |
ffts=fft(s); | |
mag=abs(ffts(1:nfft2+1)); | |
ph=angle(ffts(1:nfft2+1)); | |
% find peaks to define spectral mapping | |
peaks=findPeaks4(mag, max_peak, eps_peak, ssf); | |
[dummy,inds]=sort(mag(peaks(:,2))); | |
peaksort=peaks(inds,:); | |
pc=peaksort(:,2); | |
bestf=zeros(size(pc)); | |
for tk=1:length(pc) % estimate frequency using PV strategy | |
dtheta=(ph(pc(tk))-phold(1,pc(tk)))+all2pi; | |
fest=dtheta./(2*pi*dtin); % see pvanalysis.m for same idea | |
[er,indf]=min(abs(ssf(pc(tk))-fest)); | |
bestf(tk)=fest(indf); % find best freq estimate for each row | |
end | |
% generate output mag and phase | |
magout=mag; phout=ph; | |
for tk=1:length(pc) | |
fdes=bestf(tk); % reconstruct with original frequency | |
freqind=(peaksort(tk,1):peaksort(tk,3)); % indices of the surrounding bins | |
% specify magnitude and phase of each partial | |
magout(freqind)=mag(freqind); | |
phadvance(1,peaksort(tk,2))=phadvance(1,peaksort(tk,2))+2*pi*fdes*dtout; | |
pizero=pi*ones(1,length(freqind)); | |
pcent=peaksort(tk,2)-peaksort(tk,1)+1; | |
indpc=(2-mod(pcent,2)):2:length(freqind); | |
pizero(indpc)=zeros(1,length(indpc)); | |
phout(freqind)=phadvance(1,peaksort(tk,2))+pizero; | |
end | |
% reconstruct time signal (stretched or compressed) | |
compl=complex(magout.*exp(sqrt(complex(-1))*phout)); | |
compl(nfft2+1)=ffts(nfft2+1); %right hand is complex | |
compl=[compl,fliplr(conj(compl(2:(nfft2))))]; | |
wave=real(ifft(compl)); | |
outbeat(1,:)=wave; | |
phold(1,:)=ph; | |
% end stereo | |
indout=round(((k-1)*hopout+1):((k-1)*hopout+nfft)); | |
tt(:,indout)=tt(:,indout)+outbeat; | |
end | |
tt=0.8*tt/max(max(abs(tt))); | |
%vectorbeforetruncation = tt; | |
%tt = tt(1:2,1:finalsignallength); %fixthis | |
%testvector = tt(1,1:end); | |
i1 = find(tt,1,'last'); | |
finalvector = tt(1,1:i1); | |
%finalvector = newtt; | |
%finalvector = tt; | |
end |
Here is the script I used to generate the audio:
[a,sr]=audioread('messageinabottlemono.wav'); | |
b = a(882000:1323000,1); | |
audiowrite('messageinabottle1.wav',b,44100); | |
sliceofsignal = b'; | |
resultsloweddown = vocoder(sliceofsignal,250,500); | |
resultspedup = vocoder(resultsloweddown,500,250); | |
audiowrite('messageinabottle2.wav',resultspedup,44100); | |
resultsloweddown = vocoder(resultspedup,250,500); | |
resultspedup = vocoder(resultsloweddown,500,250); | |
audiowrite('messageinabottle3.wav',resultspedup,44100); | |
resultsloweddown = vocoder(resultspedup,250,500); | |
resultspedup = vocoder(resultsloweddown,500,250); | |
audiowrite('messageinabottle4.wav',resultspedup,44100); | |
resultsloweddown = vocoder(resultspedup,250,500); | |
resultspedup = vocoder(resultsloweddown,500,250); | |
audiowrite('messageinabottle5.wav',resultspedup,44100); | |
for c = 1:4 | |
resultsloweddown = vocoder(resultspedup,250,500); | |
resultspedup = vocoder(resultsloweddown,500,250); | |
end | |
audiowrite('messageinabottle9.wav',resultspedup,44100); | |
for c = 1:11 | |
resultsloweddown = vocoder(resultspedup,250,500); | |
resultspedup = vocoder(resultsloweddown,500,250); | |
end | |
audiowrite('messageinabottle20.wav',resultspedup,44100); |
Original Audio:
Audio PlayerAudio passed through phase vocoder once:
Audio PlayerAudio passed through twice:
Audio PlayerAudio passed through three times:
Audio PlayerAudio passed through nine times:
Audio PlayerAudio passed through fifteen times:
Audio PlayerAudio passed through twenty times:
Audio PlayerAudio passed through thirty times:
Audio Player