yun-nung (vivian) chenyvchen/s105-icb/doc/170613_recenttr… · chen, et al., ^end-to-end memory...

22
1 YUN-NUNG (VIVIAN) CHEN HTTP://VIVIANCHEN.IDV.TW HAKKANI-TUR, TUR, GAO, DENG

Upload: others

Post on 17-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

1

Y U N - N U N G ( V I V I A N ) C H E N

H T T P : / / V I V I A N C H E N . I D V . T W

H A K K A N I - T U R , T U R , G A O , D E N G

Page 2: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture

End-to-End Training

Experiments

Conclusion & Future Work

2

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

Page 3: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture

End-to-End Training

Experiments

Conclusion & Future Work

3

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

Page 4: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Dialogue System Pipeline

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

4

ASRLanguage Understanding (LU)• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• Policy Decision

Output Generation

Hypothesisare there any action movies to see this weekend

Semantic Frame (Intents, Slots)request_moviegenre=actiondate=this weekend

System Actionrequest_locaion

Text responseWhere are you located?

Screen Displaylocation?

Text InputAre there any action movies to see this weekend?

Speech Signal

Page 5: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

LU Importance

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

15

30

45

60

75

90

10

5

12

0

13

5

15

0

16

5

18

0

19

5

21

0

22

5

24

0

25

5

27

0

28

5

30

0

31

5

33

0

34

5

36

0

37

5

39

0

40

5

42

0

43

5

45

0

46

5

48

0

49

5

Succ

ess

Rat

e

Simulation Epoch

Learning Curve of System Performance

Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05

RL Agent w/o LU errors

Rule Agent w/o LU errors

Page 6: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

LU Importance

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

15

30

45

60

75

90

10

5

12

0

13

5

15

0

16

5

18

0

19

5

21

0

22

5

24

0

25

5

27

0

28

5

30

0

31

5

33

0

34

5

36

0

37

5

39

0

40

5

42

0

43

5

45

0

46

5

48

0

49

5

Succ

ess

Rat

e

Simulation Epoch

Learning Curve of System Performance

Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05

RL Agent w/o LU errors

RL Agent w/ 5% LU errors

Rule Agent w/o LU errors

Rule Agent w/ 5% LU errors

>5% performance drop

The system performance is sensitive to LU errors, for both rule-based and reinforcement learning agents.

Page 7: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Dialogue System Pipeline

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

7

SLU usually focuses on understanding single-turn utterances

The understanding result is usually influenced by 1) local observations 2) global knowledge.

ASRLanguage Understanding (LU)• User Intent Detection• Slot Filling

Dialogue Management (DM)• Dialogue State Tracking• Policy Decision

Output Generation

Hypothesisare there any action movies to see this weekend

Semantic Frame (Intents, Slots)request_moviegenre=actiondate=this weekend

System Actionrequest_locaion

Text responseWhere are you located?

Screen Displaylocation?

Text InputAre there any action movies to see this weekend?

Speech Signal

current bottleneck error propagation

Page 8: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Spoken Language Understanding

8

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

just sent email to bob about fishing this weekend

O O O OB-contact_name

O

B-subject I-subject I-subject

U

S

I send_email

D communication

send_email(contact_name=“bob”, subject=“fishing this weekend”)

are we going to fish this weekend

U1

S2

send_email(message=“are we going to fish this weekend”)

send email to bob

U2

send_email(contact_name=“bob”)

B-messageI-message

I-message I-message I-messageI-message I-message

B-contact_nameS1

Domain Identification Intent Prediction Slot Filling

Page 9: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture

End-to-End Training

Experiments

Conclusion & Future Work

9

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

Page 10: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

MODEL ARCHITECTURE

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

10

u

Knowledge Attention Distributionpi

mi

Memory Representation

Weighted Sum

h

∑ Wkg

oKnowledge Encoding

Representation

history utterances {xi}

current utterance

c

Inner Product

Sentence Encoder

RNNin

x1 x2 xi…

Contextual Sentence Encoder

x1 x2 xi…

RNNmem

slot tagging sequence y

ht-1 ht

V V

W W W

wt-1 wt

yt-1 yt

U U

RNN Tagger

M M

Idea: additionally incorporating contextual knowledge during slot tagging

Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

Page 11: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

MODEL ARCHITECTURE

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

11

u

Knowledge Attention Distributionpi

mi

Memory Representation

Weighted Sum

h

∑ Wkg

oKnowledge Encoding

Representation

history utterances {xi}

current utterance

c

Inner Product

Sentence Encoder

RNNin

x1 x2 xi…

Contextual Sentence Encoder

x1 x2 xi…

RNNmem

slot tagging sequence y

ht-1 ht

V V

W W W

wt-1 wt

yt-1 yt

U U

RNN Tagger

M M

Idea: additionally incorporating contextual knowledge during slot tagging

Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.

1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding

CNN

CNN

Page 12: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

END-TO-END TRAINING

• Tagging Objective

• RNN Tagger

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

12

slot tag sequence contextual utterances & current utterance

ht-1 ht+1ht

V V V

W W W W

wt-1 wt+1wt

yt-1 yt+1yt

U U U

o

M M M

Automatically figure out the attention distribution without explicit supervision

Page 13: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture

End-to-End Training

Experiments

Conclusion & Future Work

13

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

Page 14: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

EXPERIMENTS

• Dataset: Cortana communication session data– GRU for all RNN

– adam optimizer

– embedding dim=150

– hidden unit=100

– dropout=0.5

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

14

Model Training SetKnowledge Encoding

Sentence Encoder

First Turn Other Overall

RNN Taggersingle-turn x x 60.6 16.2 25.5

The model trained on single-turn data performs worse for non-first turns due to mismatched training data

Page 15: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

EXPERIMENTS

• Dataset: Cortana communication session data– GRU for all RNN

– adam optimizer

– embedding dim=150

– hidden unit=100

– dropout=0.5

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

15

Model Training SetKnowledge Encoding

Sentence Encoder

First Turn Other Overall

RNN Taggersingle-turn x x 60.6 16.2 25.5

multi-turn x x 55.9 45.7 47.4

Treating multi-turn data as single-turn for training performs reasonable

Page 16: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

EXPERIMENTS

• Dataset: Cortana communication session data– GRU for all RNN

– adam optimizer

– embedding dim=150

– hidden unit=100

– dropout=0.5

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

16

Model Training SetKnowledge Encoding

Sentence Encoder

First Turn Other Overall

RNN Taggersingle-turn x x 60.6 16.2 25.5

multi-turn x x 55.9 45.7 47.4

Encoder-Tagger

multi-turn current utt (c) RNN 57.6 56.0 56.3

multi-turn history + current (x, c) RNN 69.9 60.8 62.5

Encoding current and history utterances improves the performance but increases the training time

Page 17: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

EXPERIMENTS

• Dataset: Cortana communication session data– GRU for all RNN

– adam optimizer

– embedding dim=150

– hidden unit=100

– dropout=0.5

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

17

Model Training SetKnowledge Encoding

Sentence Encoder

First Turn Other Overall

RNN Taggersingle-turn x x 60.6 16.2 25.5

multi-turn x x 55.9 45.7 47.4

Encoder-Tagger

multi-turn current utt (c) RNN 57.6 56.0 56.3

multi-turn history + current (x, c) RNN 69.9 60.8 62.5Proposed multi-turn history + current (x, c) RNN 73.2 65.7 67.1

Applying memory networks significantly outperforms all approaches with much less training time

Page 18: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

EXPERIMENTS

• Dataset: Cortana communication session data– GRU for all RNN

– adam optimizer

– embedding dim=150

– hidden unit=100

– dropout=0.5

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

18

Model Training SetKnowledge Encoding

Sentence Encoder

First Turn Other Overall

RNN Taggersingle-turn x x 60.6 16.2 25.5

multi-turn x x 55.9 45.7 47.4

Encoder-Tagger

multi-turn current utt (c) RNN 57.6 56.0 56.3

multi-turn history + current (x, c) RNN 69.9 60.8 62.5

Proposedmulti-turn history + current (x, c) RNN 73.2 65.7 67.1

multi-turn history + current (x, c) CNN 73.8 66.5 68.0

CNN produces comparable results for sentence encoding with shorter training time

Page 19: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Outline

Introduction

Spoken Dialogue System

Spoken/Natural Language Understanding (SLU/NLU)

Contextual Spoken Language Understanding

Model Architecture

End-to-End Training

Experiments

Conclusion & Future Work

19

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

Page 20: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Conclusion

• The proposed end-to-end memory networks store

contextual knowledge, which can be exploited dynamically

based on an attention model for manipulating knowledge

carryover for multi-turn understanding

• The end-to-end model performs the tagging task instead of

classification

• The experiments show the feasibility and robustness of

modeling knowledge carryover through memory networks

20

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

Page 21: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Future Work

• Leveraging not only local observation but also global

knowledge for better language understanding

– Syntax or semantics can serve as global knowledge to guide

the understanding model

– “Knowledge as a Teacher: Knowledge-Guided Structural

Attention Networks,” arXiv preprint arXiv: 1609.03286

21

End

-to-En

d M

emo

ry Netw

orks fo

r Mu

lti-Turn

Spo

ken Lan

guage U

nd

erstand

ing Yu

n-N

un

g (Vivian

) Ch

en

Page 22: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in

Q & AT H A N K S F O R Y O U R AT T E N T I O N !

22

The code will be available at https://github.com/yvchen/ContextualSLU