FastQA学习 | CinKate's Blogs

现在大多数的阅读理解系统都是 top-down 的形式构建的，也就是说一开始就提出了一个很复杂的结构（一般经典的就是 emedding-, encoding-, interaction-, answer-layer ），然后通过 ablation study，不断的减少一些模块配置来验证想法，大多数的创新点都在 interaction 层，比如BIDAF、R-Net等等，大量的工作都在问题和文章的交互query-aware表示上创新，类似人类做阅读理解问题的思路“重复多读文章”，“带着问题读文章”等等，普通的“阅读理解思路”也都被实现了，这篇论文作者发现了很多看似复杂的问题其实通过简单的 context/type matching heruistic 就可以解出来了，过程是选择满足条件的 answer spans:

与 question 对应的 answer type 匹配，比如说问 when 就回答 time；
与重要的 question words 位置上临近；
添加问题单词是否出现在文章中这一“重要”特征；
并没有使用复杂的question与context的交互，就取得了在SQuAD榜上与SOTA接近的结果，这篇论文之后，后来的研究者们在做MRC时也会将基础特征加入到embedding中进行共同训练，开源链接。

以下是阅读源码的一些总结：

1.Highway Network的使用

Highway Network主要解决的问题是，网络深度加深，梯度信息回流受阻造成网络训练困难的问题。

当网络加深，训练的误差反而上升了，而加入了Highway Network之后，这个问题得到了缓解。一般来说，深度网络训练困难是由于梯度回流受阻的问题，可能浅层网络没有办法得到调整。Highway Network 受LSTM启发，增加了一个门函数，让网络的输出由两部分组成，分别是网络的直接输入以及输入变形后的部分。

网络中把此层放在embedding层后面

图片来源 https://www.cnblogs.com/jie-dcai/p/5803220.html

import tensorflow as tf
from keras import backend as K
from keras.engine.topology import Layer
from keras.layers import Lambda, Wrapper

class Highway(Layer):
    def __init__(self, hidden_size, **kwargs):
        self.hidden_size = hidden_size
        super().__init__(**kwargs)

    def build(self, input_shape):
        self.projection = self.add_weight(name='projection',
                                          shape=(1, input_shape[-1], self.hidden_size),
                                          initializer='glorot_uniform')
        self.W_h = self.add_weight(name='W_h',
                                   shape=(1, self.hidden_size, self.hidden_size),
                                   initializer='glorot_uniform')
        self.b_h = self.add_weight(name='b_h',
                                   shape=(self.hidden_size,),
                                   initializer='zeros')
        self.W_t = self.add_weight(name='W_t',
                                   shape=(1, self.hidden_size, self.hidden_size),
                                   initializer='glorot_uniform')
        self.b_t = self.add_weight(name='b_t',
                                   shape=(self.hidden_size,),
                                   initializer='zeros')

    def call(self, x):
        x = K.conv1d(x, self.projection)
        H = tf.nn.tanh(K.bias_add(K.conv1d(x, self.W_h), self.b_h))
        T = tf.nn.sigmoid(K.bias_add(K.conv1d(x, self.W_t), self.b_t))
        return T * x + (1 - T) * H

    def compute_output_shape(self, input_shape):
        batch, seq_len, d = input_shape
        return (batch, seq_len, self.hidden_size)

2.tf.sequence_mask的学习

这个操作和one hot也很像，但是指定的不是index而是从前到后有多少个True，返回的是True和False。

sq_mask = tf.sequence_mask([1, 3, 2], 5)
print(sess.run(sq_mask))

输出：

[[True, False, False, False, False],
[True, True, True, False, False],
[True, True, False, False, False]]

3.tf.expand_dims()学习

TensorFlow中，想要维度增加一维，可以使用 tf.expand_dims(input, dim, name=None) 函数。当然，我们常用tf.reshape(input,shape=[])也可以达到相同效果，但是有些时候在构建图的过程中，placeholder没有被feed具体的值，这时就会包下面的错误：TypeError: Expected binary or unicode string, got 1

在这种情况下，我们就可以考虑使用expand_dims来将维度加1。比如我自己代码中遇到的情况，在对图像维度降到二维做特定操作后，要还原成四维[batch, height, width, channels]，前后各增加一维。如果用reshape，则因为上述原因报错

给出官方的例子：

# 't' is a tensor of shape [2]
shape(expand_dims(t, 0)) ==> [1, 2]
shape(expand_dims(t, 1)) ==> [2, 1]
shape(expand_dims(t, -1)) ==> [2, 1]

# 't2' is a tensor of shape [2, 3, 5]
shape(expand_dims(t2, 0)) ==> [1, 2, 3, 5]
shape(expand_dims(t2, 2)) ==> [2, 3, 1, 5]
shape(expand_dims(t2, 3)) ==> [2, 3, 5, 1]

Args: 
input: A Tensor. 
dim: A Tensor. Must be one of the following types: int32, int64. 0-D (scalar). Specifies the dimension index at which to expand the shape of input. 
name: A name for the operation (optional).

Returns: 
A Tensor. Has the same type as input. Contains the same data as input, but its shape has an additional dimension of size 1 added.

4.tf.tile()学习

推荐博客

tf.tile(  
    input,     #输入  
    multiples,  #某一维度上复制的次数  
    name=None  
)

import tensorflow as tf

a = tf.constant([[1, 2], [3, 4], [5, 6]], dtype=tf.float32)
a1 = tf.tile(a, [2, 3])
a2 = tf.tile(a, [1, 2])
with tf.Session() as sess:
    print(sess.run(a))
    print(sess.run(a1))
    print(sess.run(a2))

输出：

[[1. 2.]
 [3. 4.]
 [5. 6.]]
 
[[1. 2. 1. 2. 1. 2.]
 [3. 4. 3. 4. 3. 4.]
 [5. 6. 5. 6. 5. 6.]
 [1. 2. 1. 2. 1. 2.]
 [3. 4. 3. 4. 3. 4.]
 [5. 6. 5. 6. 5. 6.]]

[[1. 2. 1. 2.]
 [3. 4. 3. 4.]
 [5. 6. 5. 6.]]

5.tf.equal()学习

equal，相等的意思。顾名思义，就是判断，x, y 是不是相等，它的判断方法不是整体判断，而是逐个元素进行判断，如果相等就是 True，不相等，就是 False。

由于是逐个元素判断，所以 x，y 的维度要一致。

例子：

import tensorflow as tf
a = [[1,2,3],[4,5,6]]
b = [[1,0,3],[1,5,1]]
with tf.Session() as sess:
    print(sess.run(tf.equal(a,b)))

输出：

[[ True False  True]
 [False  True False]]

6.tf.reduce_any()学习

在boolean张量的维度上计算元素的 “逻辑或”

x = tf.constant([[True,  True], [False, False]])
with tf.Session() as sess:
    print(tf.reduce_any(x))  # True
    print(tf.reduce_any(x, 0))  # [True, True]
    print(tf.reduce_any(x, 1))  # [True, False]

7.tf.squeeze()学习

该函数返回一个张量，这个张量是将原始input中所有维度为1的那些维都删掉的结果
axis可以用来指定要删掉的为1的维度，此处要注意指定的维度必须确保其是1，否则会报错

squeeze(
    input,
    axis=None,
    name=None,
    squeeze_dims=None
)

例子：

#  't' 是一个维度是[1, 2, 1, 3, 1, 1]的张量
tf.shape(tf.squeeze(t))   # [2, 3]， 默认删除所有为1的维度

# 't' 是一个维度[1, 2, 1, 3, 1, 1]的张量
tf.shape(tf.squeeze(t, [2, 4]))  # [1, 2, 3, 1]，标号从零开始，只删掉了2和4维的1

8.RepeatVector层

RepeatVector层将输入重复n次

keras.layers.core.RepeatVector(n)

参数

n：整数，重复的次数

输入shape
形如（nb_samples, features）的2D张量
输出shape
形如（nb_samples, n, features）的3D张量

例子

model = Sequential()
model.add(Dense(32, input_dim=32))
# now: model.output_shape == (None, 32)
# note: `None` is the batch dimension

model.add(RepeatVector(3))
# now: model.output_shape == (None, 3, 32)

9.tf.gather()学习

类似于数组的索引，可以把向量中某些索引值提取出来，得到新的向量，适用于要提取的索引为不连续的情况。这个函数似乎只适合在一维的情况下使用。

import tensorflow as tf 
 
a = tf.Variable([[1,2,3,4,5], [6,7,8,9,10], [11,12,13,14,15]])
index_a = tf.Variable([0,2])
 
b = tf.Variable([1,2,3,4,5,6,7,8,9,10])
index_b = tf.Variable([2,4,6,8])
 
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(tf.gather(a, index_a)))
    print(sess.run(tf.gather(b, index_b)))
 
#  [[ 1  2  3  4  5]
#   [11 12 13 14 15]]
 
#  [3 5 7 9]

tf.gather_nd
同上，但允许在多维上进行索引，例子只展示了一种很简单的用法，