Search
Search Funnelback University
- Refined by:
- Date: 2018
21 -
30 of
243
search results for KaKaoTalk:po03 op
where 0
match all words and 243
match some words.
Results that match 1 of 2 words
-
A Network-based End-to-End Trainable Task-oriented Dialogue System…
mi.eng.cam.ac.uk/~sjy/papers/wgmv17.pdf20 Feb 2018: On-line active reward learning for policy op-timisation in spoken dialogue systems. -
Semantically Conditioned LSTM-based Natural Language Generation…
mi.eng.cam.ac.uk/~sjy/papers/wgms15.pdf20 Feb 2018: Recent workby Graves et al. (2014) has demonstrated that anNN structure augmented with a carefully designedmemory block and differentiable read/write op-erations can learn to mimic computer programs.Moreover, the -
Multi-domain Neural Network Language Generation forSpoken Dialogue…
mi.eng.cam.ac.uk/~sjy/papers/wgmr16.pdf20 Feb 2018: By op-timising directly against the desired objective func-tion such as BLEU score (Auli and Gao, 2014) orWord Error Rate (Kuo et al., 2002), the model canexplore its output space -
Training a real-world POMDP-based Dialogue System Blaise Thomson,…
mi.eng.cam.ac.uk/~sjy/papers/tswy07.pdf20 Feb 2018: Hence defining an op-timal summary policy is not so obvious. If f is chosenwell, however, then one could hope that the optimal ac-tion is dependent only on f (b). -
Reward Estimation for Dialogue Policy Optimisation Pei-Hao Su, Milica …
mi.eng.cam.ac.uk/~sjy/papers/sugy18.pdf20 Feb 2018: Note that the reward model and the dialogue policy are being jointly op-timised during the sequence of dialogues. -
On-line Active Reward Learning for Policy Optimisationin Spoken…
mi.eng.cam.ac.uk/~sjy/papers/sgmb16.pdf20 Feb 2018: This Gaussian process op-erates on a continuous space dialogue rep-resentation generated in an unsupervisedfashion using a recurrent neural networkencoder-decoder. -
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. ...
mi.eng.cam.ac.uk/~sjy/papers/scyo09.pdf20 Feb 2018: increases. Pop op-erations are then performed where possible, the tree is prunedand identical nodes are joined so that the number stays constantor decreases. ... Error bars indicate 99% con-fidence intervals. This demonstrates the competitiveness of the -
Sample-efficient Actor-Critic Reinforcement Learningwith Supervised…
mi.eng.cam.ac.uk/~sjy/papers/sbug17.pdf20 Feb 2018: A comparison between the three op-tions is included in the experimental evaluation. ... whilst suffering initially.We hypothesise that the optimised SL pre-trainedparameters distributed very differently to the op-timal A2C ER parameters. -
k-Nearest Neighbor Monte-Carlo Control Algorithmfor POMDP-based…
mi.eng.cam.ac.uk/~sjy/papers/lgjk09.pdf20 Feb 2018: In Section 3, the grid-based ap-. proach to policy optimisation is introduced followedby a presentation of the k-nn Monte-Carlo policy op-timization in Section 4, along with an ... 5 ConclusionIn this paper, an extension to a grid-based policy -
crosseval_diff-reward2b.ps
mi.eng.cam.ac.uk/~sjy/papers/kgjm10.pdf20 Feb 2018: The op-tions for each random decision point are reason-able in the context in which it is encountered, buta uniform distribution of outcomes might not re-flect real user behaviour. ... Many of the decisions involvedare deterministic, allowing only one
Search history
Recently clicked results
Recently clicked results
Your click history is empty.
Recent searches
Recent searches
Your search history is empty.