摘要:本文第一部分讲解自然语言处理的原理,第二部分讲解聊天机器人的实现原理,解决方案及挑战,最后以seq2seq+Attention机制讲解模型结构。第三部分讲解如何从0开始训练一个聊天机器人。
[wximg]http://mmbiz.qpic.cn/mmbiz_jpg/TCHicQEF6XKCKrp19vLd0ib1VsvibqUZxLjlJ609ciayaLYlTjZx6RCkCn4UYYovvb3L01KpCZAAwpdoMbkSJ3oHicw/0?wx_fmt=jpeg[/wximg]
作者/分享人:李嘉璇,《TensorFlow技术解析与实战》作者,创建TensorFlow技术社区,活跃于国内各大技术社区,知乎编程问题回答者。擅长研究深度学习框架的架构、源码分析及在不同领域的应用。有处理图像、社交文本数据情感分析、数据挖掘等深度学习实战经验,参与过基于深度学习的自动驾驶二维感知系统Hackathon竞赛, 曾任职百度研发工程师。现在研究NLP、ChatBot,以及云深度学习平台的架构设计和实现。
一、自然语言处理的原理
二、聊天机器人的实现原理,解决方案及挑战
-
答句可读性好
-
答句多样性强
-
出现不相关的答句,容易分析、定位bug
-
端到端的训练,比较容易实现
-
避免维护一个大的Q-A数据集
-
不需要对每一个模块额外进行调优,避免了各个模块之间的误差级联效应
-
如何利用前几轮对话的信息,应用到当轮对话当中
-
合并现有的知识库的内容进来
-
能否做到个性化,千人千面。
-
和人类能够进行持续的沟通
-
对不同的提问能够给出合适的回答
-
考虑到人类不同个性化的差异性,给出差异性的回答(例如,同一个问题,对男女老少不同群体的回答应该略有差异)
-
QA 问答系统:是回答事实问题(例如珠峰有多高)以及非事实问题(例如why, how, opinion等观点性问题)的领域机器人。
-
Dialog system 对话系统:这种大多是目标驱动的,但是近几年都也在慢慢接受聊天机器人功能
-
Online customer service 在线客服:例如淘宝小蜜,它在多数情况下像一个自动的FAQ。
-
非事实问题的问答
-
社区型问答系统(例如百度知道等,对问题和答案间有较强的匹配;并且一个问题多个答案时有评分、排序)
-
从在线系统中挖掘一些好的QA corpus
-
相似性和相关性是不同的。用于相似性计算的各种方法并不适用于相关性。我们需要建立一套短文本相关性计算方法。
-
相关性计算有一些在早期的聊天机器人的构建中延续下来的方法:
-
词语共现的统计
-
基于机器翻译的相关度计算
-
主题模型(LDA)的相似度计算
-
-
Word2vec, Glove
-
CNN, LSTM, GRU
-
Seq2Seq
-
Attention mechanism
-
Deep Reinforcement Learning
-
是否容易被回答(一个好的生成式的结果应该是更容易让人接下去的)
-
一个好的回复应该让对话前进,避免重复性
-
保证语义的连贯性。用生成式的结果反推回query的可能性有多大,能保证在语义上是和谐的、连贯的。
-
缺乏公共的训练数据集,目前使用国外的数据集较多
-
Ubuntu dialog corpus(subset of Ubuntu Corpus)
-
Reddit dataset(可读性和质量都不错)
-
Corpora from Chinese SNS(微博)
-
-
测试集还不统一
-
评估度量:度量很难设计(目前是从机器翻译和自动摘要的BLEU、ROUGE里借鉴的,但面临问题是否能刻画聊天机器人的好坏,并且指导聊天机器人的技术朝着正向的方向发展)
-
聊天机器人的一般对话和任务导向的对话之间如何能够平滑切换
-
这种平滑切换对用户的体验非常重要
-
切换的技术需要依赖情绪分析及上下文分析
-
用户不需要给出明确的反馈。例如,我前一句说鹿晗好,后一句说不喜欢韩范,需要聊天机器人能正确识别
-
-
仍然存在的问题
-
句子级、片段级的语义建模还没有词语的建模(word embedding)那么好
-
生成式仍然会产生一些安全回答
-
在知识的表示和引入上还需要努力
-
- Neural Responding Machine for Short-Text Conversation (2015-03)
- A Neural Conversational Model (2015-06)
- A Neural Network Approach to Context-Sensitive Generation of Conversational Responses (2015-06)
- The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue - Systems (2015-06)
- Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models (2015-07)
- A Diversity-Promoting Objective Function for Neural Conversation Models (2015-10)
- Attention with Intention for a Neural Network Conversation Model (2015-10)
- Improved Deep Learning Baselines for Ubuntu Corpus Dialogs (2015-10)
- A Survey of Available Corpora for Building Data-Driven Dialogue Systems (2015-12)
- Incorporating Copying Mechanism in Sequence-to-Sequence Learning (2016-03)
- A Persona-Based Neural Conversation Model (2016-03)
- How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation (2016-03)
三、动手实现一个智能机器人
1)库引入及超参数定义
import numpy as np #matrix math
import tensorflow as tf #machine learningt
import helpers #for formatting data into batches and generating random sequence data
tf.reset_default_graph() #Clears the default graph stack and resets the global default graph.
sess = tf.InteractiveSession()
PAD = 0
EOS = 1
vocab_size = 10
input_embedding_size = 20 #character length
encoder_hidden_units = 20 #num neurons
decoder_hidden_units = encoder_hidden_units * 2
#input placehodlers
encoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='encoder_inputs')
encoder_inputs_length = tf.placeholder(shape=(None,), dtype=tf.int32, name='encoder_inputs_length')
decoder_targets = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_targets')
2)输入文本的向量表示
embeddings = tf.Variable(tf.random_uniform([vocab_size, input_embedding_size], -1.0, 1.0), dtype=tf.float32)
#this thing could get huge in a real world application
encoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, encoder_inputs)
3)Encoder
from tensorflow.contrib.rnn import LSTMCell, LSTMStateTuple
encoder_cell = LSTMCell(encoder_hidden_units)
((encoder_fw_outputs,
encoder_bw_outputs),
(encoder_fw_final_state,
encoder_bw_final_state)) = (
tf.nn.bidirectional_dynamic_rnn(cell_fw=encoder_cell,
cell_bw=encoder_cell,
inputs=encoder_inputs_embedded,
sequence_length=encoder_inputs_length,
dtype=tf.float64, time_major=True)
)
encoder_outputs = tf.concat((encoder_fw_outputs, encoder_bw_outputs), 2)
encoder_final_state_c = tf.concat(
(encoder_fw_final_state.c, encoder_bw_final_state.c), 1)
encoder_final_state_h = tf.concat(
(encoder_fw_final_state.h, encoder_bw_final_state.h), 1)
encoder_final_state = LSTMStateTuple(
c=encoder_final_state_c,
h=encoder_final_state_h
)
4)Decoder
decoder_cell = LSTMCell(decoder_hidden_units)
encoder_max_time, batch_size = tf.unstack(tf.shape(encoder_inputs))
decoder_lengths = encoder_inputs_length + 3
#weights
W = tf.Variable(tf.random_uniform([decoder_hidden_units, vocab_size], -1, 1), dtype=tf.float32)
#bias
b = tf.Variable(tf.zeros([vocab_size]), dtype=tf.float32)
assert EOS == 1 and PAD == 0
eos_time_slice = tf.ones([batch_size], dtype=tf.int32, name='EOS')
pad_time_slice = tf.zeros([batch_size], dtype=tf.int32, name='PAD')
#retrieves rows of the params tensor. The behavior is similar to using indexing with arrays in numpy
eos_step_embedded = tf.nn.embedding_lookup(embeddings, eos_time_slice)
pad_step_embedded = tf.nn.embedding_lookup(embeddings, pad_time_slice)
def loop_fn_initial():
initial_elements_finished = (0 >= decoder_lengths) # all False at the initial step
#end of sentence
initial_input = eos_step_embedded
#last time steps cell state
initial_cell_state = encoder_final_state
#none
initial_cell_output = None
#none
initial_loop_state = None # we don't need to pass any additional information
return (initial_elements_finished,
initial_input,
initial_cell_state,
initial_cell_output,
initial_loop_state)
def loop_fn_transition(time, previous_output, previous_state, previous_loop_state):
def get_next_input():
output_logits = tf.add(tf.matmul(previous_output, W), b)
#Returns the index with the largest value across axes of a tensor.
prediction = tf.argmax(output_logits, axis=1)
#embed prediction for the next input
next_input = tf.nn.embedding_lookup(embeddings, prediction)
return next_input
elements_finished = (time >= decoder_lengths) # this operation produces boolean tensor of [batch_size]
# defining if corresponding sequence has ended
#Computes the "logical and" of elements across dimensions of a tensor.
finished = tf.reduce_all(elements_finished) # -> boolean scalar
#Return either fn1() or fn2() based on the boolean predicate pred.
input = tf.cond(finished, lambda: pad_step_embedded, get_next_input)
#set previous to current
state = previous_state
output = previous_output
loop_state = None
return (elements_finished,
input,
state,
output,
loop_state)
def loop_fn(time, previous_output, previous_state, previous_loop_state):
if previous_state is None: # time == 0
assert previous_output is None and previous_state is None
return loop_fn_initial()
else:
return loop_fn_transition(time, previous_output, previous_state, previous_loop_state)
decoder_outputs_ta, decoder_final_state, _ = tf.nn.raw_rnn(decoder_cell, loop_fn)
decoder_outputs = decoder_outputs_ta.stack()
decoder_max_steps, decoder_batch_size, decoder_dim = tf.unstack(tf.shape(decoder_outputs))
#flettened output tensor
decoder_outputs_flat = tf.reshape(decoder_outputs, (-1, decoder_dim))
#pass flattened tensor through decoder
decoder_logits_flat = tf.add(tf.matmul(decoder_outputs_flat, W), b)
#prediction vals
decoder_logits = tf.reshape(decoder_logits_flat, (decoder_max_steps, decoder_batch_size, vocab_size))
decoder_prediction = tf.argmax(decoder_logits, 2)
5)Optimizer
stepwise_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
labels=tf.one_hot(decoder_targets, depth=vocab_size, dtype=tf.float32),
logits=decoder_logits,
)
#loss function
loss = tf.reduce_mean(stepwise_cross_entropy)
#train it
train_op = tf.train.AdamOptimizer().minimize(loss)
sess.run(tf.global_variables_initializer())
6)Training on the toy task
batch_size = 100
batches = helpers.random_sequences(length_from=3, length_to=8,
vocab_lower=2, vocab_upper=10,
batch_size=batch_size)
print('head of the batch:')
for seq in next(batches)[:10]:
print(seq)
def next_feed():
batch = next(batches)
encoder_inputs_, encoder_input_lengths_ = helpers.batch(batch)
decoder_targets_, _ = helpers.batch(
[(sequence) + [EOS] + [PAD] * 2 for sequence in batch]
)
return {
encoder_inputs: encoder_inputs_,
encoder_inputs_length: encoder_input_lengths_,
decoder_targets: decoder_targets_,
}
loss_track = []
max_batches = 3001
batches_in_epoch = 1000
try:
for batch in range(max_batches):
fd = next_feed()
_, l = sess.run([train_op, loss], fd)
loss_track.append(l)
if batch == 0 or batch % batches_in_epoch == 0:
print('batch {}'.format(batch))
print(' minibatch loss: {}'.format(sess.run(loss, fd)))
predict_ = sess.run(decoder_prediction, fd)
for i, (inp, pred) in enumerate(zip(fd[encoder_inputs].T, predict_.T)):
print(' sample {}:'.format(i + 1))
print(' input > {}'.format(inp))
print('predicted > {}'.format(pred))
if i >= 2:
break
print()
except KeyboardInterrupt:
print(' training interrupted')
GitChat 是一种全新的阅读/写作互动体验产品。一场 Chat 包含一篇文章和一场为文章的读者和作者定制的专属线上交流。本文出自 Chat 话题《用TensorFlow实现机器人的原理及如何实现一个对话机器人》。