[ieee 2011 ieee international symposium on circuits and systems (iscas) - rio de janeiro, brazil...

Subband Blind Source Separation ConsideringAcoustic Reverberation Characteristics

Paulo Bulkool BatalheiroState University of Rio de Janeiro

FEN/DETEL/PROSAICO20559-900

Rio de Janeiro, BrazilEmail: [email protected]

Mariane Rembold PetragliaFederal University of Rio de Janeiro

PEE/COPPECP 68504, 21945-970Rio de Janeiro, Brazil

Email: [email protected]

Diego Barreto HaddadFederal Center for Technological

Education, CEFET-RJ26041-271, Nova Iguacu

Rio de Janeiro, BrazilEmail: [email protected]

Abstract— Subband blind source separation methods have beenrecently proposed with the objective of reducing the computa-tional complexity and improving the convergence rate of theonline algorithms. Oversampled subband structures with DFTfilter banks are usually employed in order to avoid aliasingeffects and keep enough samples to estimate the statistics of thesubband signals. In this paper we present a critically sampledsubband structure, composed of cosine-modulated filter banksand reduced-order adaptive subfilters, for convolutive blindsource separation. Its performance is compared to those of thefullband algorithm, of an oversampled subband algorithm andof a frequency-domain algorithm. We also evaluate, throughcomputer simulations, the impact of reducing the order of thehigh-frequency subfilters on the separation results for differentreverberation characteristics.

I. INTRODUCTION

Blind source separation (BSS) techniques have been ex-tensively investigated recently, allowing the extraction of thesignal of a desired source sq(n) from mixed signals xp(n)of more than one source without any other knowledge of theoriginal sources, such as their positions or spectral contents,nor of the mixing process. Examples of applications of BSSare speech enhancement/recognition (cocktail party problem)and digital communications, among others. This paper consid-ers convolutive mixtures of speech signals, which takes intoaccount the reverberation in echoic ambients. In such cases,finite impulse response (FIR) separation filters of large ordersare usually required, making the separation task very difficult.In order to solve such problem, several time-domain andfrequency-domain methods based on independent componentanalysis (ICA) have been proposed in the literature.

Some of these solutions employ FIR separation filters,whose coefficients are estimated by a ICA-based algorithmdirectly in the time-domain. In real applications, the separationfilters have thousands of coefficients and, therefore, suchalgorithms present large computational complexity and slowconvergence [1], [2]. In order to mitigate such difficulties,frequency-domain BSS methods were proposed, where con-volutive mixtures can be treated as instantaneous mixturesin each frequency bin [3], [4]. The disadvantages of suchmethods are permutations and scaling by different factorsamong the bins, besides the need of using long windows of

data for implementing high-order separation filters. Owing tothe non-stationarity of the speech signals and mixing systems,the estimates of the needed statistics for each bin mightnot be correct for long data window [1]. Such problems,which can severely degrade the performance of frequency-domain algorithms, can be reduced by employing the minimaldistortion principle [5], a multivariate score function [6] and/orestimations of the direction of arrival (DOA) of the sources[7]. In this scenario, subband methods can be employedmainly due to their characteristics of breaking the high-orderseparation filters into independent smaller-order ones andallowing the reduction of the data sampling rate. Such methodsusually employ DFT oversampled filter banks with complex-coefficients and single side band (SSB) modulation to workwith the real-part of the subband signals only [8], reducingthe computational complexity.

In this paper, we present a subband BSS method whichemploys critically sampled uniform filter banks and reduced-order separation FIR filters. The coefficients of the sub-band separation filters are adjusted independently by a time-domain adaptive algorithm [2] using second-order statistics.The proposed structure applies multirate processing and extrafilters to cancel aliasing among adjacent bands [9]. Sincefilter banks with real-coefficients are used, the algorithm issuitable for DSP implementations. An experimental evaluationof the performance of the proposed method for speech signalis conducted, considering different filter banks and acousticreverberation characteristics.

II. UNIFORM SUBBAND STRUCTURE WITH CRITICAL

SAMPLING

Figure 1 shows the kth channel of the linear TITO (twosources and two sensors) configuration for the M -channelsubband BSS. In this structure the qth observed signal (xq(n))is decomposed by the direct-path filters (hk,k(n)) and theextra filters (hk,k−1(n) and hk,k+1(n)). The resulting signals(xk,k

q (n), xk,k−1q (n) and xk,k+1

q (n)) are downsampled by thecritical decimation factor (M ) and applied to the separationfilters (wk

qp(n), wk−1qp (n) and wk+1

qp (n), respectively). Thecorresponding output signals are up-sampled and recombinedby the synthesis filters (fk(n)) to restore the fullband output

978-1-4244-9474-3/11/$26.00 ©2011 IEEE 633

fullbandobservedsignals

Outputfullbandsignals

x1(n)

Mhk,k(n)

Mhk,k+1(n)

+

Mhk,k-1(n) wk-1

11 (m)

wk

11 (m)

wk+1

11 (m)

wk-1

12 (m)

wk

12 (m)

wk+1

12 (m)

wk-1

21 (m)

wk

21(m)

wk+1

21 (m)

x2(n)

Mhk,k(n)

Mhk,k+1(n)

+

Mhk,k-1(n) wk-1

22 (m)

wk

22 (m)

wk+1

22 (m)

yk

1(m) yk

1(n)

fk(n)My

k

2(m)

fk(n)Mx

k,k

1(m)

xk,k-1

1(m)

xk,k+1

1(m)

xk,k

2(m)

xk,k-1

2(m)

xk,k+1

2(m)

+y

1(n)

y0

1(n)

yM-1

1(n)

yk

2(n)+

y2(n)

y0

2(n)

yM-1

2(n)

Fig. 1. k-th channel of the linear TITO configuration for BSS in subbands.

signals (estimated sources). With perfect reconstruction (PR)filter banks, this structure is able of exactly modeling any FIRunmixing system [9].

Assuming that hp(n) is the impulse response of a prototypefilter of length NP of a cosine modulated multirate systemhaving M bands, in order to implement separation filters oflength S, the number of coefficients of the subfilter at the kthsubband should be at least [9]

K =

‰S + NP

M

ı. (1)

The filters hk,i(n) of Fig. 1, which decompose the observedsignals xq(n), have impulse responses hk,i(n) = hk(n) ∗hi(n), where ∗ is the convolution operation.

The observed subband signals xk,iq (m) of Fig. 1 can be

expressed asxk,i

q (m) = xTp (m)hk,i, (2)

where xp(m) = [xp(mM), xp(mM−1), . . . , xp(mM−NH +1)]T is the vector that contains the latest NH samples of thepth sensor signal and hk,i = [hk,i(0), hk,i(1), . . . , hk,i(NH −1)]T is the vector that contains the NH = 2Np−1 coefficientsof the analysis filter hk,i(n).

For colored and non-stationary signals, such as speech,the BSS problem can be solved by diagonalizing the outputcorrelation matrix considering multiple blocks in different timeinstants.

From Fig. 1, assuming that there is no overlap between thefrequency responses of non-adjacent analysis filters hk(n), weobserve that the qth output signal at the kth subband is givenby

ykq (m) =

P∑p=1

k+1∑i=k−1

[wiqp]

T xk,ip (m), (3)

where xk,ip (m) is the vector that contains the latest K sam-

ples of the pth sensor at ith subband, xk,ip (m), and wi

qp =[wi

qp(0), wiqp(1), . . . , wi

qp(K − 1)]T is the vector with the Kcoefficients of the subband separation subfilters wi

qp(m). Thevector xk,i

p (m) can be written as

xk,ip (m) = Xp(m)hk,i (4)

where the K × NH matrix Xp(m) is given by

Xp(m)=

⎡⎢⎢⎢⎣

xp(mM)T

xp((m−1)M)T

...xp((m−K+1)M)T

⎤⎥⎥⎥⎦. (5)

In the generic block time-domain subband BSS algorithm,defining N as the block size and D as the number of blockswhich are used in the correlation estimates (1 ≤ D ≤ K), wecan write the kth output vectors of block index � as

ykq (�) =

PXp=1

k+1Xi=k−1

[wiqp]

T Xk,i

p (�) (6)

with the K × N matrix Xk,i

p (�) expressed as

Xk,i

p (�)=ˆXp(�K), Xp(�K+1), · · · , Xp(�K+N−1)

˜Hk,i , (7)

where the NHN ×N matrix Hk,i has the first column formedby the coefficients of hk,i(n) followed by (N − 1)NH zeros,and the remaining columns are circularly shifted by NHpositions from one column to the next. The D × N matricesYk

q (m) contain the corresponding D subsequent output vectorsand can be written as

Ykq (�) =

PXp=1

k+1Xi=k−1

Wiqp(�) Xk,i

p (�) , (8)

withXk,i

p (�) =hX

k,ip (�), X

k,ip (� − 1)

iT

(9)

and Wipq(�) is a D × 2K Sylvester-type matrix given by

Wiqp(�)=

266666664

wiqp,0 wi

qp,1 · · · wiqp,K−1 0 · · · 0

0 wiqp,0 wi

qp,1 · · · wiqp,K−1 0

......

. . .. . .

. . .. . .

. . ....

0 · · · 0 wiqp,0 wi

qp,1 · · · wiqp,K−1

377777775,

(10)for D = K . Combining all P outputs of each subband, wecan rewrite Eq. (3) concisely as

Yk(�) =ˆYk

1(�), · · · , YkP (�)

˜T=

k+1Xi=k−1

Wi(�)Xk,i(�) , (11)

where

Xk,i(�) =ˆXk,i

1 (�), . . . , Xk,iP (�)

˜T, (12)

Wi(�) =

264

Wi11(�) . . . Wi

1P (�)...

. . ....

WiP1(�) . . . Wi

PP (�)

375 . (13)

The batch-type block off-line algorithm for adjusting thecoefficients of the separation subfilters of each subband, con-sidering a TITO system and omitting index � of the submatrixcorrelation to simplify the notation, is given by

Wk(i) = Wk(i − 1) − μ∇GNWk

�k(i − 1), (14)

634

∇GNWk

�k(i)=2b∑

�=1

1b

⎡⎣

[Rk11]−1[Rk

21]T Wk21 [Rk

11]−1[Rk21]T Wk

22

[Rk22]−1[Rk

12]T Wk11 [Rk

22]−1[Rk12]T Wk

12

⎤⎦,

(15)where Rk

qp(�) is a submatrix of dimension D × D given by

Rkqp(�)] = Yk

q (�)[Ykp(�)]T , (16)

μ is the step-size of the adaptation algorithm, and i is theoff-line iteration index. In order to reduce the computationalcomplexity of the algorithm, the matrices [Rk

qq(�)]−1 in Eq.

(15) can be substituted by the inverse of the diagonal matricesR

kqq(�) =

[ykq (�)[yk

q (�)]T + δ]

I, with ykq(�) given in Eq. (6),

which contains the power of the more recent block of the kthoutput subband signal yk

q (m) in its diagonal, and δ a smallpositive constant which avoids ill-conditioning.

The corresponding block online algorithm for adjusting thecoefficients of the separation subfilters of each subband isgiven by

Wk(�)=Wk(�−1) − μhλ∇GN

Wk�k(�−1)+(1−λ)∇GN

Wk�k(�)

i,

(17)

∇GNWk

�k(�)=2

264[R

k

11]−1[Rk

21]T Wk

21 [Rk

11]−1[Rk

21]T Wk

22

[Rk

22]−1[Rk

12]T Wk

11 [Rk

22]−1[Rk

12]T Wk

12

375, (18)

where λ is the forgetting factor.

III. EXPERIMENTAL RESULTS

In our experiments, speech signals sampled at Fs =16 kHzwere used and the subband coefficients wi

qp(n) were initializedwith zeros, except for p = q, for which the coefficients wereinitialized with 1. The number of data blocks used to estimatethe correlations (Eq. (18)) was D=K , the length of the outputblocks was N = 2K and the lengths of the separation filterswere equal to the lengths of the mixture filters (S = U ).During our experiments we monitored the correlations amongthe estimates of the sources in the several subbands in orderto avoid permutation problems, but the algorithm was shownvery robust not demanding any correction.

The speech signals were convolved with synthetic impulseresponses obtained by simulating the acoustics of a virtualroom of dimensions 3.55 m × 4.55 m × 2.5 m, adoptedin the 2006 Signal Separation Evaluation Campaign [10],which presents reverberation time of 250 ms. The distancebetween the two microphones was 5 cm and the sources werepositioned at 1 m of distance from the center point of themicrophones, at directions of −500 and 450. Even thoughthe original scenario had four distinct sources in differentdirections, our simulations employed only two sources in orderto reduce the computational complexity of the BSS methods.

The performance of the BSS algorithms was evaluatedthrough the Signal to Interference Ratio (SIR) [11], using thetoolbox function available in [12].

A. Off-line Algorithms

In this experiment we compare the performances of the off-line subband block algorithm presented in Section II with thefullband algorithm [2] and two other off-line algorithms. The

TABLE I

STEADY-STATE SIR (IN DB).

Structures Final SIRFullband 12.38

Critically sampled subband 12.30Oversampled subband 9.15

Frequency domain 9.88

first one is a subband-based BSS technique, which employsan oversampled uniform DFT filter bank and single sideband(SSB) modulation/demodulation in the analysis/ synthesisbanks in order to execute a time domain BSS on real-valued subband signals [8]. The second one is the frequencydomain BSS that exploits higher order frequency dependencies(EHOD) of the source signals and uses a multivariate costfunction [6].

Two speech signals (of 10 seconds length) were used. Suchsignals were convolved with the artificial impulse responsesgenerated for the previously described room, considering theirfirst S = 1024 samples. The lengths of the separation filterswere fixed at the same values of the mixing filters lengths(U = S).

The uniform subband structure was implemented usingcosine-modulated filter banks with M = 8 and NP = 128 andyielding perfect reconstruction, length K = 144 separationsubfilters, adaptation step-size μ = 0.02, number of time-lags for the block correlation calculations (see Eq. (16)) equalto D = K = 144, and output signal blocks of lengthN = 2K = 288. The fullband algorithm was implementedusing length S = 1024 separation subfilters, μ = 0.005,D = 1024 and N = 2048. The oversampled subband structurewas implemented using N = 16 bands, R = 4 down-samplingrate, length 6N analysis filters, length 6R synthesis filters, α =0.04 adaptation step-size, 256 coefficients separation subfilters,D = 256 and N = 512. In all time-domain algorithms thenumber of time-lags used in the block correlation calculationswas D=K , the length of the output signal blocks was N =2Kand the block shift was of D samples. The frequency domainEHOD approach used a 1024-point fast Fourier transform(FFT) and Hanning window to convert the time-domain signalsto the frequency domain. The length of the window was 1024samples, the shift size was 256 samples and the algorithm step-size was η = 0.08. In all simulations, the step-size values werechosen such that each algorithm presented the best final SIR.

Figure 2 shows the signal to interference ratio (SIR) evolu-tion and Table I contains the steady-state SIR. It can be ob-served that the proposed critically sampled subband structureobtained a performance as good as the fullband structure andbetter than the oversampled subband and frequency domainstructures.

B. On-line Algorithms

The reverberation time is an important factor for the soundquality of the signals captured by the microphones. In [13],several experiments in rooms of different dimensions wereconducted, demonstrating that the average reverberation time

635

0 100 200 300 400 500 600 700 8000

2

4

6

8

10

12

14

Iteration Number

Mea

n S

IR (

dB)

fullband

subband − critically sampled

frequency domain

subband − oversampled

Fig. 2. SIR evolution for fullband (dotted line), critically sampled subband(solid line), oversampled subband (dashed line) and frequency domain (dash-dot line) algorithms with mixing filters lengths U=1024.

TABLE II

SIR WITH DIFFERENT SUBFILTER LENGTHS

SIR without reducing subfilter lengthsM U = 256 U = 512 U = 10241 12.2 10.1 7.44 13.1 9.9 7.28 12.5 10.7 7.216 14.2 12.4 8.2SIR with reduced subfilter lengths in the

highest frequency bandsM U = 256 U = 512 U = 10241 12.2 10.1 7.44 13.5 9.9 7.18 13.1 10.7 7.216 14.2 12.4 8.2

decreases for high frequencies, mainly due to propagationlosses caused by the air viscosity for frequencies above 1kHz [14]. Based on these results, we explore the flexibilityof the subband BSS structure, which allows the independentadaptation of the separation subfilters and the use of differentlengths for such filters. The lengths of the high-frequencysubfilters were reduced and the results of the separationprocedure were compared to those of the fullband algorithm(M = 1) presented in [2], for mixture filters of lengthsU = 256, 512 and 1024. The subband structure was firstsimulated with all subfilters of same length, given in Eq. (1).Subsequently, the lengths of the subfilters of the one fourthof the highest frequency bands were reduced to half of theirinitial values. The results of these experiments are presentedin Table II. From such results, we conclude that the reductionin the number of coefficients in the high-frequency bands doesnot affect the performance of the separation algorithm.

IV. CONCLUSIONS

In this paper we presented a subband blind source separationalgorithm that employs uniform filter banks to decompose the

signals from the sensors. The separation filters, applied tothe subband observed signals, work at the critical samplingrate. The adaptation is performed by a natural-gradient typealgorithm in each subband. The use of extra filters in thesubband algorithm avoided aliasing effects, resulting in asubband BSS algorithm with similar steady-state performanceand smaller computational cost when compared to the fullbandalgorithm. Computer simulations with speech signals werepresented, showing the advantages of the subband structurewith respect to the final signal to noise interference ratio overthe time-domain and frequency-domain algorithms, and illus-trating the versatility of the structure that takes into account thecharacteristics of the acoustic reverberation. It was verified thata reduction in the number of coefficients of the subfilters whichoperate at the high-frequency components does not degrade theperformance of the subband BSS algorithm.

ACKNOWLEDGMENT

The authors would like to thank CNPq and FAPERJ, Brazil,for providing partial support.

REFERENCES

[1] A. Hyvrinen, H. Karhunen, and E. Oja, Independent Component Anal-ysis. John Wiley & Sons, 2001.

[2] H. Buchner, R. Aichner, and W. Kellermann, “A generalization of blindsource separation algorithms for convolutive mixtures based on second-order statistics,” IEEE Trans. on Speech and Audio Process., vol. 13,no. 1, pp. 120–134, Jan. 2005.

[3] H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano,“Blind source separation based on a fast-convergence algorithm combin-ing ica and beamforming,” IEEE Trans. on Audio, Speech, and LanguageProcess., vol. 14, no. 2, pp. 666–678, Mar. 2006.

[4] P. Smaragdis, “Blind separation of convolved mixtures in the frequencydomain,” Neurocomputing, vol. 22, pp. 21–34, 1998.

[5] K. Matsuoka and S. Nakashima, “Minimal distortion principle for blindsource separation,” in Proc. Int. Conf. Independent Component Analysisand Signal Separation., 2001, pp. 722–727.

[6] T. Kim, H. T. Attias, S.-Y. Lee, and T.-W. Lee, “Blind source separationexploiting higher-order frequency dependencies,” IEEE Transactions onAudio, Speech and Language Processing,, vol. 15, no. 1, pp. 70–78,2007.

[7] H. Sawada, R. Mukai, S. Araki, and S. Makino, “A robust and precisemethod for solving the permutation problem of frequency-domain blindsource separation,” IEEE Transactions on Speech And Audio Processing,,vol. 12, no. 5, pp. 530–538, 2004.

[8] S. Araki, S. Makino, R.Aichner, T. Nishikawa, and H. Saruwatari,“Subband-based blind separation for convolutive mixtures of speech,”IEICE Trans. Fundamentals, vol. E88-A, pp. 3593–3603, 2005.

[9] M. R. Petraglia and P. B. Batalheiro, “Filter bank design for a subbandadaptive filtering structure with critical sampling,” IEEE Trans. onCircuits And Systems I - Fundamental Theory And Applications., vol. 51,pp. 1194–1202, Jun. 2004.

[10] http://www.irisa.fr/metiss/SASSEC07/?show=intro.[11] E. Vincent, R. Gribonval, and C. Fvotte, “Performance measurement in

blind audio source separation,” IEEE Transactions on Audio, Speech,and Language Process., vol. 14, no. 4, pp. 1462–1469, 2006.

[12] http://www.irisa.fr/metiss/bss− eval/.[13] F. A. Everest and K. Pohlmann, Master HandBook of Acoustics.

McGraw-Hill, 2001.[14] L. Beranek, Music, Acoustics and Architecture. John Wiley & Sons,

1962.

636

[ieee 2011 ieee international symposium on circuits and systems (iscas) - rio de janeiro, brazil...

Documents