em algorithm · 2014. 4. 2. · em algorithm • em algorithm is a general iterative method of...

EM ALGORITHMEM ALGORITHM

• EM algorithm is a general iterative method of maximum likelihood estimation for incomplete data

• Used to tackle a wide variety of problems, some of which would not usually be viewed as an incomplete data problem

• Natural situations

– Missing data problems

– Grouped data problems

– Truncated and censored data problems

• Not so obvious situations

– Variance component estimation

– Latent variable situations and random effects models

– Mixture models

• Areas of applications

– Image analysis

– Epidemiology and Medicine

– Engineering

– Genetics and Biology

• Seminal Paper

Dempster, A.P., Laird, N.M. and Rubin,D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). JRSS B 39: 1-38

EM algorithm closely related to the following ad hoc process of handling missing data

1. Fill in the missing values by their estimated values

2. Estimate the parameters for this completed dataset

3. Use the estimated parameters to re-estimate the missing values

4. Re-estimate the parameters from this updated completed dataset

Alternate between steps 3 and 4 until convergence of the parameter estimates

• The EM algorithm formalises this approach

The essential idea behind the EM algorithm is to calculate the maximum likelihood estimates for the incomplete data problem by using the complete data likelihood instead of the observed likelihood because the observed likelihood might be complicated or numerically infeasible to maximise.

To do this, we augment the observed data with manufactured data so as to create a complete likelihood that is computationally more tractable. We then replace, at each iteration, the incomplete data, which are in the sufficient statistics for the parameters in the complete data likelihood, by their conditional expectation given the observed data and the current parameter estimates (Expectation step: E-step)

The new parameter estimates are obtained from these replaced sufficient statistics as though they had come from the complete sample (Maximisation step: M-step)

Alternating E- and M-steps, the sequence of estimates often converges to the mle’s under very general conditions

EXAMPLESEXAMPLES

1. Genetic Linkage Model

2. Censored (survival) data

3. Mixture of two univariate normals

• Genetic Linkage Model

197 animals distributed into four categories

Y is postulated to have arisen from a multinomial d/n with cell probabilities

⎟⎠⎞

⎜⎝⎛ −−+

4),1(

41),1(

41,

421 θθθθ

1 2 3 4( , , , ) (125,18,20,34)y y y y y= =

( )

data' real' as 25 treat :step-M

2525.05.0125;|

: estimate, initialan obtain to

n expectatio lconditiona itsby replace :step-E

5.0estimate, initial theas Take

)0(2

)0(2

)0(2

)0(2

2

)0(

=

=+

×=Ε=

=

x

YXx

x

x

θ

θ

( ) 14.292608.0

608.0125;|

: of estimate improvedObtain

608.0)20183425(

)3425(

: of estimate Update

)1(2

)1(2

2

(1)

=+

×=Ε=

=+++

+=

θ

θ

θ

YXx

x

Alternate E and M-steps

step θ(m)

0 0.5

1 0.6082

2 0.6243

3 0.6265

4 0.6268

5 0.6268

6 0.6268

• Survival time data: right censored exponential(θ) data

3 uncensored observations: t1 = 0.5, t2 = 1.5 and t3 = 4

2 right-censored observations: t4 = 1* and t5 = 3*

Recall lack of memory property

θ1)|( +=>Ε ttTT

5.0)45.15.0(

3

:ratetheofestimateinitialan obtain todatacensoredIgnore

(0) =++

=θ

52313)3|(

of estimate initialan Obtain

32111)1|(

of estimate initialan Obtain

:nsexpectatio lconditionaby their data censored replace :step-E

)0(55)0(

5

5)0(

5

)0(44)0(

4

4)0(

4

=+=+=>Ε=

=+=+=>Ε=

θ

θ

TTt

tt

TTt

tt

8.513)3|(

of estimate improvedObtain

8.311)1|(

of estimate improvedObtain

3571.0)5345.15.0(

5:rate theof estimate Update

data' real' as 5 and 3 treat :step-M

)1(55)1(

5

5)1(

5

)1(44)1(

4

4)1(

4

(1)

)0(5

)0(4

=+=>Ε=

=+=>Ε=

=++++

=

==

θ

θ

θ

TTt

tt

TTt

tt

tt

step θ(m)

0 0.51 0.35712 0.32053 0.30794 0.30315 0.30126 0.30057 0.30028 0.3001

• Mixture of two univariate normals (Old Faithful’s eruptions)

step pi mu[1] mu[2] sigma2[1] sigma2[2]0 0.35 2 4.3 0.1 0.21 0.34920810 2.02049523 4.27511436 0.05694792 0.188709482 0.34889072 2.01974514 4.27441727 0.05637577 0.189616523 0.34869541 2.01928712 4.27398638 0.05602933 0.190180384 0.34857750 2.01901127 4.27372586 0.05582123 0.190521925 0.34850700 2.01884660 4.27357000 0.05569720 0.190726506 0.34846512 2.01874885 4.27347731 0.05562365 0.190848237 0.34844032 2.01869101 4.27342243 0.05558015 0.190920348 0.34842567 2.01865686 4.27339000 0.05555447 0.190962969 0.34841703 2.01863671 4.27337087 0.05553933 0.1909881110 0.34841194 2.01862484 4.27335959 0.05553041 0.1910029411 0.34840893 2.01861784 4.27335294 0.05552515 0.1910116712 0.34840717 2.01861372 4.27334903 0.05552205 0.1910168213 0.34840613 2.01861129 4.27334672 0.05552023 0.1910198514 0.34840551 2.01860986 4.27334537 0.05551916 0.1910216415 0.34840515 2.01860902 4.27334457 0.05551852 0.1910226916 0.34840494 2.01860853 4.27334410 0.05551815 0.1910233117 0.34840481 2.01860823 4.27334382 0.05551793 0.1910236718 0.34840470 2.01860810 4.27334370 0.05551780 0.1910239019 0.34840470 2.01860796 4.27334356 0.05551773 0.1910240120 0.34840467 2.01860790 4.27334350 0.05551768 0.19102409

faithful$eruptions

1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

Normal mixtureKernel density

EM ALGORITHM FOR THE REGULAR EM ALGORITHM FOR THE REGULAR EXPONENTIAL FAMILYEXPONENTIAL FAMILY

)())(exp()();(

from (wlog) ddistribute be ),(Let

θθθ aXtXbXg

ZYX

TC

TTT

=

=

[ ] say ,;|)(

of computing therequires step-E

)()( mm tYXt =Ε θ

[ ] )(|)(

solving requires step-M

mtXt =Ε θ

EM ALGORITHM FOR THE FINITE MIXTURE EM ALGORITHM FOR THE FINITE MIXTURE PROBLEMPROBLEM

Let XT = (YT, ZT) be the complete data vector. Y is the observed data vector and Z the unobserved data vector

The observed likelihood is

∏∑= =

=n

i

k

jjijj ygyL

1 1);( )|( ψπθ

which is difficult to maximise

}componentpth {11 and ),,( where),,,( Define ∈Ι=== iyipikiT

iT

nTT zzzzzzZ ……

ijij zjij

n

i

k

j

zjC ygxL );()|(

1 1

ψπθ ∏∏= =

=

Thus,

and

{ }∑=

+==n

ii

TiCC uvzxLxl

1)()()|(log)|( ψπθθ

where

);(log,),;((log)(

)log,,(log)(

11

1

kikiT

i

kT

ygygu

v

ψψψ

πππ

……

=

=

In the E-step, we compute

∑∑==

+=n

ii

Tmi

n

i

Tmi

m uwvwQ1

)(

1

)()( )()()()(),( ψθπθθθ

where

[ ])(;|)( miii yzw θθ Ε=

∑=

= k

j

mjij

mj

mjij

mjm

ij

yg

ygw

1

)()(

)()()(

);(

);()(

ψπ

ψπθ

and

In the M-step, we simply maximise Q(θ, θ(m))

PROPERTIES OF THE EM ALGORITHMPROPERTIES OF THE EM ALGORITHM

• Stability/Monotonicity

• Under suitable regularity conditions, if θ(m) ’s converge then they converge to a stationary point of l(θ

| y)

• EM algorithm converges at a linear rate, with the rate depending on the proportion of information about θ

in the

observed density

STANDARD ERRORS OF PARAMETERSSTANDARD ERRORS OF PARAMETERS

Louis (1982) showed that

θθ

θθ

θθ

θθ

θθ

θθ

θθθ

ˆ

ˆ

ˆ;|)|(logcov);ˆ(

ˆ;|)|(log)|(log);ˆ();ˆ(

=

=

⎭⎬⎫

⎩⎨⎧

∂∂

−=

⎥⎦⎤

⎢⎣⎡

∂∂

∂∂

Ε−=Ι

yxLyI

yxLxLyIy

CC

TCC

C

Invert to get approximate covariance matrix for the parameter estimates

Returning to Example 1 (Genetic Linkage),

232

242

2

3242

)1()()|(log

)1()|(log

θθθθθ

θθθθ

−+

++

=∂∂

∂−

−+

−+

=∂

∂

yyyxxL

yyyxxL

TC

C

Therefore,

5.435)ˆ1(ˆ

2ˆˆ125

)ˆ1()(

ˆ]ˆ;|[

);ˆ(

232

2

4

232

242

=−

++

++=

−

++

+Ε=

θθθ

θ

θθθ

θ

yyy

yyyyxyI C

and

8.572ˆ

22ˆ

ˆˆ

125ˆ

)ˆ;|var(ˆ;|)|(log

cov22

2

ˆ

=⎟⎠⎞

⎜⎝⎛

+⎟⎟⎠

⎞⎜⎜⎝

⎛

+==

⎭⎬⎫

⎩⎨⎧

∂∂

= θθθ

θθθ

θθ

θ

θθ

yxy

xLC

Thus,

7.3778.575.435);ˆ( =−=Ι yθ

and the standard error of θ̂ is equal to .05.07.3771 =

EM ALGORITHMSlide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27

em algorithm · 2014. 4. 2. · em algorithm • em algorithm is a general iterative method of...

Documents