em algorithm · 2014. 4. 2. · em algorithm • em algorithm is a general iterative method of...

27
EM ALGORITHM EM ALGORITHM EM algorithm is a general iterative method of maximum likelihood estimation for incomplete data Used to tackle a wide variety of problems, some of which would not usually be viewed as an incomplete data problem

Upload: others

Post on 23-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • EM ALGORITHMEM ALGORITHM

    • EM algorithm is a general iterative method of maximum likelihood estimation for incomplete data

    • Used to tackle a wide variety of problems, some of which would not usually be viewed as an incomplete data problem

  • • Natural situations

    – Missing data problems

    – Grouped data problems

    – Truncated and censored data problems

    • Not so obvious situations

    – Variance component estimation

    – Latent variable situations and random effects models

    – Mixture models

  • • Areas of applications

    – Image analysis

    – Epidemiology and Medicine

    – Engineering

    – Genetics and Biology

  • • Seminal Paper

    Dempster, A.P., Laird, N.M. and Rubin,D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). JRSS B 39: 1-38

  • EM algorithm closely related to the following ad hoc process of handling missing data

    1. Fill in the missing values by their estimated values

    2. Estimate the parameters for this completed dataset

    3. Use the estimated parameters to re-estimate the missing values

    4. Re-estimate the parameters from this updated completed dataset

    Alternate between steps 3 and 4 until convergence of the parameter estimates

  • • The EM algorithm formalises this approach

    The essential idea behind the EM algorithm is to calculate the maximum likelihood estimates for the incomplete data problem by using the complete data likelihood instead of the observed likelihood because the observed likelihood might be complicated or numerically infeasible to maximise.

    To do this, we augment the observed data with manufactured data so as to create a complete likelihood that is computationally more tractable. We then replace, at each iteration, the incomplete data, which are in the sufficient statistics for the parameters in the complete data likelihood, by their conditional expectation given the observed data and the current parameter estimates (Expectation step: E-step)

  • The new parameter estimates are obtained from these replaced sufficient statistics as though they had come from the complete sample (Maximisation step: M-step)

    Alternating E- and M-steps, the sequence of estimates often converges to the mle’s under very general conditions

  • EXAMPLESEXAMPLES

    1. Genetic Linkage Model

    2. Censored (survival) data

    3. Mixture of two univariate normals

  • • Genetic Linkage Model

    197 animals distributed into four categories

    Y is postulated to have arisen from a multinomial d/n with cell probabilities

    ⎟⎠⎞

    ⎜⎝⎛ −−+

    4),1(

    41),1(

    41,

    421 θθθθ

    1 2 3 4( , , , ) (125,18,20,34)y y y y y= =

  • ( )

    data' real' as 25 treat :step-M

    2525.05.0125;|

    : estimate, initialan obtain to

    n expectatio lconditiona itsby replace :step-E

    5.0estimate, initial theas Take

    )0(2

    )0(2

    )0(2

    )0(2

    2

    )0(

    =

    =+

    ×=Ε=

    =

    x

    YXx

    x

    x

    θ

    θ

  • ( ) 14.292608.0

    608.0125;|

    : of estimate improvedObtain

    608.0)20183425(

    )3425(

    : of estimate Update

    )1(2

    )1(2

    2

    (1)

    =+

    ×=Ε=

    =+++

    +=

    θ

    θ

    θ

    YXx

    x

  • Alternate E and M-steps

    step θ(m)

    0 0.5

    1 0.6082

    2 0.6243

    3 0.6265

    4 0.6268

    5 0.6268

    6 0.6268

  • • Survival time data: right censored exponential(θ) data

    3 uncensored observations: t1 = 0.5, t2 = 1.5 and t3 = 4

    2 right-censored observations: t4 = 1* and t5 = 3*

    Recall lack of memory property

    θ1)|( +=>Ε ttTT

  • 5.0)45.15.0(

    3

    :ratetheofestimateinitialan obtain todatacensoredIgnore

    (0) =++

    52313)3|(

    of estimate initialan Obtain

    32111)1|(

    of estimate initialan Obtain

    :nsexpectatio lconditionaby their data censored replace :step-E

    )0(55)0(

    5

    5)0(

    5

    )0(44)0(

    4

    4)0(

    4

    =+=+=>Ε=

    =+=+=>Ε=

    θ

    θ

    TTt

    tt

    TTt

    tt

  • 8.513)3|(

    of estimate improvedObtain

    8.311)1|(

    of estimate improvedObtain

    3571.0)5345.15.0(

    5:rate theof estimate Update

    data' real' as 5 and 3 treat :step-M

    )1(55)1(

    5

    5)1(

    5

    )1(44)1(

    4

    4)1(

    4

    (1)

    )0(5

    )0(4

    =+=>Ε=

    =+=>Ε=

    =++++

    =

    ==

    θ

    θ

    θ

    TTt

    tt

    TTt

    tt

    tt

  • step θ(m)

    0 0.51 0.35712 0.32053 0.30794 0.30315 0.30126 0.30057 0.30028 0.3001

  • • Mixture of two univariate normals (Old Faithful’s eruptions)

    step pi mu[1] mu[2] sigma2[1] sigma2[2]0 0.35 2 4.3 0.1 0.21 0.34920810 2.02049523 4.27511436 0.05694792 0.188709482 0.34889072 2.01974514 4.27441727 0.05637577 0.189616523 0.34869541 2.01928712 4.27398638 0.05602933 0.190180384 0.34857750 2.01901127 4.27372586 0.05582123 0.190521925 0.34850700 2.01884660 4.27357000 0.05569720 0.190726506 0.34846512 2.01874885 4.27347731 0.05562365 0.190848237 0.34844032 2.01869101 4.27342243 0.05558015 0.190920348 0.34842567 2.01865686 4.27339000 0.05555447 0.190962969 0.34841703 2.01863671 4.27337087 0.05553933 0.1909881110 0.34841194 2.01862484 4.27335959 0.05553041 0.1910029411 0.34840893 2.01861784 4.27335294 0.05552515 0.1910116712 0.34840717 2.01861372 4.27334903 0.05552205 0.1910168213 0.34840613 2.01861129 4.27334672 0.05552023 0.1910198514 0.34840551 2.01860986 4.27334537 0.05551916 0.1910216415 0.34840515 2.01860902 4.27334457 0.05551852 0.1910226916 0.34840494 2.01860853 4.27334410 0.05551815 0.1910233117 0.34840481 2.01860823 4.27334382 0.05551793 0.1910236718 0.34840470 2.01860810 4.27334370 0.05551780 0.1910239019 0.34840470 2.01860796 4.27334356 0.05551773 0.1910240120 0.34840467 2.01860790 4.27334350 0.05551768 0.19102409

  • faithful$eruptions

    1 2 3 4 5 6

    0.0

    0.2

    0.4

    0.6

    0.8

    Normal mixtureKernel density

  • EM ALGORITHM FOR THE REGULAR EM ALGORITHM FOR THE REGULAR EXPONENTIAL FAMILYEXPONENTIAL FAMILY

    )())(exp()();(

    from (wlog) ddistribute be ),(Let

    θθθ aXtXbXg

    ZYX

    TC

    TTT

    =

    =

    [ ] say ,;|)(

    of computing therequires step-E

    )()( mm tYXt =Ε θ

    [ ] )(|)(

    solving requires step-M

    mtXt =Ε θ

  • EM ALGORITHM FOR THE FINITE MIXTURE EM ALGORITHM FOR THE FINITE MIXTURE PROBLEMPROBLEM

    Let XT = (YT, ZT) be the complete data vector. Y is the observed data vector and Z the unobserved data vector

    The observed likelihood is

    ∏∑= =

    =n

    i

    k

    jjijj ygyL

    1 1);( )|( ψπθ

    which is difficult to maximise

  • }componentpth {11 and ),,( where),,,( Define ∈Ι=== iyipikiT

    iT

    nTT zzzzzzZ ……

    ijij zjij

    n

    i

    k

    j

    zjC ygxL );()|(

    1 1

    ψπθ ∏∏= =

    =

    Thus,

    and

    { }∑=

    +==n

    ii

    TiCC uvzxLxl

    1)()()|(log)|( ψπθθ

  • where

    );(log,),;((log)(

    )log,,(log)(

    11

    1

    kikiT

    i

    kT

    ygygu

    v

    ψψψ

    πππ

    ……

    =

    =

    In the E-step, we compute

    ∑∑==

    +=n

    ii

    Tmi

    n

    i

    Tmi

    m uwvwQ1

    )(

    1

    )()( )()()()(),( ψθπθθθ

    where

    [ ])(;|)( miii yzw θθ Ε=

  • ∑=

    = k

    j

    mjij

    mj

    mjij

    mjm

    ij

    yg

    ygw

    1

    )()(

    )()()(

    );(

    );()(

    ψπ

    ψπθ

    and

    In the M-step, we simply maximise Q(θ, θ(m))

  • PROPERTIES OF THE EM ALGORITHMPROPERTIES OF THE EM ALGORITHM

    • Stability/Monotonicity

    • Under suitable regularity conditions, if θ(m) ’s converge then they converge to a stationary point of l(θ

    | y)

    • EM algorithm converges at a linear rate, with the rate depending on the proportion of information about θ

    in the

    observed density

  • STANDARD ERRORS OF PARAMETERSSTANDARD ERRORS OF PARAMETERS

    Louis (1982) showed that

    θθ

    θθ

    θθ

    θθ

    θθ

    θθ

    θθθ

    ˆ

    ˆ

    ˆ;|)|(logcov);ˆ(

    ˆ;|)|(log)|(log);ˆ();ˆ(

    =

    =

    ⎭⎬⎫

    ⎩⎨⎧

    ∂∂

    −=

    ⎥⎦⎤

    ⎢⎣⎡

    ∂∂

    ∂∂

    Ε−=Ι

    yxLyI

    yxLxLyIy

    CC

    TCC

    C

    Invert to get approximate covariance matrix for the parameter estimates

  • Returning to Example 1 (Genetic Linkage),

    232

    242

    2

    3242

    )1()()|(log

    )1()|(log

    θθθθθ

    θθθθ

    −+

    ++

    =∂∂

    ∂−

    −+

    −+

    =∂

    yyyxxL

    yyyxxL

    TC

    C

    Therefore,

    5.435)ˆ1(ˆ

    2ˆˆ125

    )ˆ1()(

    ˆ]ˆ;|[

    );ˆ(

    232

    2

    4

    232

    242

    =−

    ++

    ++=

    ++

    +Ε=

    θθθ

    θ

    θθθ

    θ

    yyy

    yyyyxyI C

  • and

    8.572ˆ

    22ˆ

    ˆˆ

    125ˆ

    )ˆ;|var(ˆ;|)|(log

    cov22

    2

    ˆ

    =⎟⎠⎞

    ⎜⎝⎛

    +⎟⎟⎠

    ⎞⎜⎜⎝

    +==

    ⎭⎬⎫

    ⎩⎨⎧

    ∂∂

    = θθθ

    θθθ

    θθ

    θ

    θθ

    yxy

    xLC

    Thus,

    7.3778.575.435);ˆ( =−=Ι yθ

    and the standard error of θ̂ is equal to .05.07.3771 =

    EM ALGORITHMSlide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27