Memória Associativa Linear
Ruy Luiz Milidiú
Regressão Linear ObjetivoExaminar o modelo de memória associativa
linear, suas vantagens e limitações
Sumário Memória linear simples Memória linear múltipla Múltiplas memórias lineares múltiplas Cross-Validation
Memória linear simples Exemplos
(x1, y1), (x2, y2), … , (xn, yn) xi real, yi real
Neurônio Linear ŷ = w0 + w1. x w0 , w1 = ?
Desempenho E Erro = (ŷ1 – y1)2 + … + (ŷn – yn)2
Exemplo
0
5
10
15
20
0 5 10 15
x
y
Exemplo
0
5
10
15
20
0 5 10 15
x
y
Aprendizado Supervisionado
Minimizar Erro … E(w0,w1) = (ŷ1 – y1)2 + … + (ŷn – yn)2
E(w0,w1) = (w0+w1.xi – yi)2
Diferenciando…2.(w0+w1.x1 – y1)+ … +2.(w0+w1.xn – yn) = 02.x1.(w0+w1.x1 – y1)+ … +2.xn.(w0+w1.xn – yn)
= 0
Equações NormaisSistema de equações lineares…(w0+w1.x1 – y1) + … + (w0+w1.xn – yn) = 0x1.(w0+w1.x1 – y1)+ … +xn.(w0+w1.xn – yn) =
0
n.w0 + (x1+ … +xn).w1 = y1+ … +yn
(x1+ … +xn).w0 + (x12+ … +xn
2).w1 = x1.y1+ … +xn.yn
Equações NormaisSolução por substituição…w0 = (y1+ … +yn )/n - [(x1+ … +xn)/n].w1
(x1+ … +xn).{(y1+ … +yn )/n - [(x1+ … +xn)/n].w1} + (x1
2+ … +xn2).w1 = x1.y1+ … +xn.yn
w1 = A / BA = x1.y1+ … +xn.yn - (x1+ … +xn).(y1+ … +yn )/n B = (x1
2+ … +xn2) - (x1+ … +xn).(x1+ … +xn)/n
w0 = (y1+ … +yn )/n - [(x1+ … +xn)/n].(A/B)
Equações Normais
n xi w0 yi
xi xi2 w1
xi.yi
=
Memória linear múltipla Exemplos
(x1, y1), (x2, y2), … , (xn, yn) xi Rk, yi real
Neurônio Linear ŷ = wT. x w = ?
Desempenho E Erro = (ŷ1 – y1)2 + … + (ŷn – yn)2
Aprendizado Supervisionado
E(w) = (wT.xi – yi)2
E(w) = vT.v onde vi = wT.xi – yi = xiT.w – yi
E(w) =∑i (xiT.w – yi)T.(xi
T.w – yi) E(w) =∑i yi
2 – 2 ∑i yi.xiT.w + wT.∑i xi.xi
T.w X = [x1,…,xn] E(w) = yT.y – 2.(Xy)T.w + wT.XXTw
∂E/∂w = 2.XXTw – 2.Xy = 0
Aprendizado Supervisionado
XXTw = Xy Eq. Normal
w = (XXT)-1Xy
Adaptação Rápidaw = (XXT)-1Xy
X(n) = [x1,…,xn] X(n)X(n)
T = [x1,…,xn] . [x1,…,xn]T
X(n)X(n)T = xi.xi
T
X(n)X(n)T = X(n-1)X(n-1)
T + xn.xnT
[X(n)X(n)T]-1 = [X(n-1)X(n-1)
T + xn.xnT]-1
Inversa Incremental(A + x.xT)-1 = ? AT = A
A + x.xT = (I + x.xT.A-1).A
(I + x.xT.A-1)-1 = ? (I + x.vT)-1 = ? onde v = A-1x (I + U)-1 = I - U + U2 - U3 + … (x.vT)2 = c.x.vT onde c = vT.x (x.vT)r =cr-1.x.vT r=1,2,… (I + x.vT)-1 = I - (1+c)-1.x.vT
Inversa Incremental
(A + x.xT)-1 = A-1 - (1+c)-1.v.vT)
onde v = A-1x c = vT.x
Adaptação Rápidaw = (XXT)-1Xy
X(n) = [x1,…,xn] = [X(n-1) ,xn]X(n)X(n)
T = .I + xi.xiT 0
X(n)X(n)T = A + xn.xn
Tvn = A-1.xn
w(n) = {A-1 - (1+xnT.vn)-1.vn.vn
T}.(X(n-1)y(n-1)+yn.xn)A-1.X(n-1)y(n-1) = w(n-1) A-1.yn.xn = yn.vn
vnT.X(n-1)y(n-1) = xn
T.A-1.X(n-1)y(n-1) = xnT.w(n-1)
vnT.yn.xn = yn.xn
T.vn
Adaptação Rápidaw = (XXT)-1Xy
XXT = [x1,…,xn] . [x1,…,xn]T
XXT = .I + xi.xiT 0
XXT = A + xn.xnT vn = A-1.xn
w(n) = w(n-1) + (yn-w(n-1)T.xn).(1+xn
T.vn)-1.vn
Adaptação Rápidaw = (XXT)-1Xy
XXT = [x1,…,xn] . [x1,…,xn]T
XXT = .I + xi.xiT 0
XXT = A + xn.xnT vn = A-1.xn
w w + (yn-wT.xn).(1+xnT.vn)-1.vn
Adaptação LentaE(w) = (wT.xi – yi)2
E(n)(w) = E(n-1)(w) + (wT.xn – yn)2
∂E(n)/∂w = ∂E(n-1)/∂w + 2.(wT.xn – yn).xn ∂E(n)/∂w 0 + 2.(wn-1
T.xn – yn).xn ∂E(n)/∂w 2.(wn-1
T.xn – yn).xn
wn = wn-1 + .(yn-wn-1T.xn).xn
ADALINEE(w) = (wT.xi – yi)2
ADAptive LInear NEuron Adaptação lenta Método do gradiente Aprendizado online por exemplo Aprendizado distribuido vetores
wn = wn-1 + .(yn-wn-1T.xn).xn
… memórias lineares múltiplas Exemplos
(x1, y1), (x2, y2), … , (xn, yn) xi em Rk, yi em Rs
Neurônio Linear ŷ = W.x W = ?
Memória Ótima
W = Y.XT.(XXT)-1
20 1 2: ( ; ) m
mf R R f x w w x w x w x w
1T TX X X
w y
2(1) (1) (1)
0(1) 2(2) (2) (2)1
( )
2( ) ( ) ( )
1
1
1
m
m
nmmn n n
x x xw
ywx x xX
y wx x x
y w
Polynomial Regression
0 1 ( ; )f x w w x w 2 30 1 2 3 ( ; )f x w w x w x w x w
2 50 1 2 5( ; )f x w w x w x w x w 2 10
0 1 2 10( ; )f x w w x w x w x w
-1.5 -1 -0.5 0 0.5 1 1.5
2
4
6
8
-1.5 -1 -0.5 0 0.5 1 1.5
2
4
6
8
-1.5 -1 -0.5 0 0.5 1 1.5
0
2
4
6
8
Regression with polynomials: fit improves with increased order
We want to fit the training set, but as model complexity
increases, we run the risk of over-fitting.
2( ) ( )
1
1 ˆ; 0n
i i
iy f
n x w
-1.5 -1 -0.5 0 0.5 1 1.5
0
2
4
6
8
Train set
-1.5 -1 -0.5 0 0.5 1 1.50
2
4
6
8
Leave out
When the model order is over-fitting, leaving a single data point out of the training set can drastically change the fit.
Over-fitting
We want to fit the training set, but we want to also generalize correctly. To measure generalization, we leave out a data point (named the test
point), fit the data, and then measure error on the test point. The average error over all possible test points is the cross validation error.
2( ) ( ) (! )
1
1 ;n
i i i
iCV y f
n x w
(! )iwWeights estimated from a training set that does not include the i-th data point
Cross-Validation
2 4 6 8 10
0.4
0.5
0.6
0.7
0.8
0.9
Model order
Mea
n-sq
uare
d er
ror
(trai
ning
set
)
2 4 6 8 100
25
50
75
100
125
150
175
1 2 3 4 5
1
1.5
2
2.5
3
Model order
Cro
ss-v
alid
atio
n er
ror
Cro
ss-v
alid
atio
n er
ror
Model order
(actual data was generated with a 2nd order polynomial process)
Model Selection