我正在尝试编写一个Keras模型(使用Tensorflow后端),它使用LSTM来预测序列的标签,就像在词性标记任务中一样。我编写的模型返回nan
作为所有训练时期和所有标签预测的损失。我怀疑我的模型配置错误,但我不知道我做错了什么。
完整的程序在这里。
from random import shuffle, sample
from typing import Tuple, Callable
from numpy import arange, zeros, array, argmax, newaxis
def sequence_to_sequence_model(time_steps: int, labels: int, units: int = 16):
from keras import Sequential
from keras.layers import LSTM, TimeDistributed, Dense
model = Sequential()
model.add(LSTM(units=units, input_shape=(time_steps, 1), return_sequences=True))
model.add(TimeDistributed(Dense(labels)))
model.compile(loss='categorical_crossentropy', optimizer='adam')
return model
def labeled_sequences(n: int, sequence_sampler: Callable[[], Tuple[array, array]]) -> Tuple[array, array]:
"""
Create training data for a sequence-to-sequence labeling model.
The features are an array of size samples * time steps * 1.
The labels are a one-hot encoding of time step labels of size samples * time steps * number of labels.
:param n: number of sequence pairs to generate
:param sequence_sampler: a function that returns two numeric sequences of equal length
:return: feature and label sequences
"""
from keras.utils import to_categorical
xs, ys = sequence_sampler()
assert len(xs) == len(ys)
x = zeros((n, len(xs)), int)
y = zeros((n, len(ys)), int)
for i in range(n):
xs, ys = sequence_sampler()
x[i] = xs
y[i] = ys
x = x[:, :, newaxis]
y = to_categorical(y)
return x, y
def digits_with_repetition_labels() -> Tuple[array, array]:
"""
Return a random list of 10 digits from 0 to 9. Two of the digits will be repeated. The rest will be unique.
Along with this list, return a list of 10 labels, where the label is 0 if the corresponding digits is unique and 1
if it is repeated.
:return: digits and labels
"""
n = 10
xs = arange(n)
ys = zeros(n, int)
shuffle(xs)
i, j = sample(range(n), 2)
xs[j] = xs[i]
ys[i] = ys[j] = 1
return xs, ys
def main():
# Train
x, y = labeled_sequences(1000, digits_with_repetition_labels)
model = sequence_to_sequence_model(x.shape[1], y.shape[2])
model.summary()
model.fit(x, y, epochs=20, verbose=2)
# Test
x, y = labeled_sequences(5, digits_with_repetition_labels)
y_ = model.predict(x, verbose=0)
x = x[:, :, 0]
for i in range(x.shape[0]):
print(' '.join(str(n) for n in x[i]))
print(' '.join([' ', '*'][int(argmax(n))] for n in y[i]))
print(y_[i])
if __name__ == '__main__':
main()
我的特征序列是从0到9的10个数字的数组。我对应的标签序列是10个零和1的数组,其中零表示一个唯一的数字,一个表示一个重复的数字。(这个想法是创建一个包含长距离依赖关系的简单分类任务。)
训练看起来像这样
Epoch 1/20
- 1s - loss: nan
Epoch 2/20
- 0s - loss: nan
Epoch 3/20
- 0s - loss: nan
所有的标签数组预测都像这样
[[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]
[nan nan]]
所以很明显有些不对劲。
传递给model.fit
的特征矩阵是维度样本
×时间步长
×1
。标签矩阵是维度样本
×时间步长
×2
,其中2来自标签0和1的一热编码。
我正在使用时间分布的密集层来预测序列,遵循Keras留档和这样和这样的帖子。据我所知,上面sequence_to_sequence_model
中定义的模型拓扑是正确的。模型摘要如下所示
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 10, 16) 1152
_________________________________________________________________
time_distributed_1 (TimeDist (None, 10, 2) 34
=================================================================
Total params: 1,186
Trainable params: 1,186
Non-trainable params: 0
_________________________________________________________________
像这样的堆栈溢出问题听起来像nan
结果是数字问题的指标:失控的梯度等等。然而,由于我正在处理一个微小的集合数据,并且从我的模型返回的每个数字都是nan
,我怀疑我没有看到数字问题,而是我如何构建模型的问题。
上面的代码是否具有正确的序列到序列学习的模型/数据形状?如果是这样,为什么我到处都是nan
s?
默认情况下,密集
层没有激活。如果您指定一个,nan
就会消失。更改上面代码中的以下行。
model.add(TimeDistributed(Dense(labels, activation='softmax')))
如果模型权重和损失很快变成NaN,这是梯度爆炸的指标。我会在LSTM层之后添加批量归一化并检查它是否有帮助。
from keras.layers.normalization import BatchNormalization
# [...]
model.add(LSTM(units=units, input_shape=(time_steps, 1), return_sequences=True))
model.add(BatchNormalization())
对我来说(关于分类问题)批量归一化解决了这个问题。
首先,在训练前检查预测。如果模型已经为您提供了NaN,那么您的数据中也可能存在问题:
dtype
,尝试双精度(即tf. float64
)。否则,您可以:
LSTM
后添加层规范化
。model. compile(…,优化器=Adam(clip规范=1.0))
。指定1
的全局范数通常是一个很好的默认值。SGD
这样简单的东西。categorical_crossentropy
不能很好地处理3D张量。