提问者:小点点

有状态LSTM和流预测


我已经训练了一个LSTM模型(用Keras和TF构建)在多批7个样本上,每个样本有3个特征,样本下面有一个类似的形状(下面的数字只是为了解释的目的而占位符),每个批次被标记为0或1:

数据:

[
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   ...
]

即:m个序列的批次,每个长度为7,其元素是三维向量(因此批次具有形状(m73))

目标:

[
   [1]
   [0]
   [1]
   ...
]

在我的正式生产环境数据中,有一个具有3个特征([1,2,3],[1,2,3]...我想流式传输每个样本,因为它到达我的模型,并获得中间概率,而无需等待整个批次(7)-请参阅下面的动画。

我的一个想法是用0填充批次中缺少的样本,[[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0],[0,0],[1,2,3],但这似乎效率低下。

如果您能为我指明正确的方向,我将不胜感激。在等待下一个样本时,请持久保存LSTM中间状态,并在使用部分数据对特定批量训练的模型进行预测。

更新,包括型号代码:

    opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=10e-8, decay=0.001)
    model = Sequential()

    num_features = data.shape[2]
    num_samples = data.shape[1]

    first_lstm = LSTM(32, batch_input_shape=(None, num_samples, num_features), 
                      return_sequences=True, activation='tanh')
    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))
    model.add(LSTM(16, return_sequences=True, activation='tanh'))
    model.add(Dropout(0.2))
    model.add(LeakyReLU())
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer=opt,
                  metrics=['accuracy', keras_metrics.precision(), 
                           keras_metrics.recall(), f1])

模型摘要:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 100, 32)           6272      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 100, 32)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 32)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 16)           3136      
_________________________________________________________________
dropout_2 (Dropout)          (None, 100, 16)           0         
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 100, 16)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 1601      
=================================================================
Total params: 11,009
Trainable params: 11,009
Non-trainable params: 0
_________________________________________________________________

共3个答案

匿名用户

我认为可能有一个更简单的解决办法。

如果您的模型没有卷积层或作用于长度/步长维度的任何其他层,您可以简单地将其标记为stateful=True

Flatten层将长度维度转换为特征维度。这将完全阻止你实现你的目标。如果Flatten层期待7个步骤,那么您将始终需要7个步骤。

因此,在应用我下面的答案之前,请修复您的模型,使其不使用展平层。相反,它可以删除最后一个LSTM层的return\u sequences=True

下面的代码修复了这个问题,并准备了一些与下面的答案一起使用的东西:

def createModel(forTraining):

    #model for training, stateful=False, any batch size   
    if forTraining == True:
        batchSize = None
        stateful = False

    #model for predicting, stateful=True, fixed batch size
    else:
        batchSize = 1
        stateful = True

    model = Sequential()

    first_lstm = LSTM(32, 
        batch_input_shape=(batchSize, num_samples, num_features), 
        return_sequences=True, activation='tanh', 
        stateful=stateful)   

    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))

    #this is the last LSTM layer, use return_sequences=False
    model.add(LSTM(16, return_sequences=False, stateful=stateful,  activation='tanh'))

    model.add(Dropout(0.2))
    model.add(LeakyReLU())

    #don't add a Flatten!!!
    #model.add(Flatten())

    model.add(Dense(1, activation='sigmoid'))

    if forTraining == True:
        compileThisModel(model)

有了这个,你将能够用7个步骤进行训练,用一个步骤进行预测。否则就不可能了。

首先,再次训练此新模型,因为它没有展平层:

trainingModel = createModel(forTraining=True)
trainThisModel(trainingModel)

现在,使用这个经过训练的模型,您可以简单地创建一个与创建经过训练的模型完全相同的新模型,但在其所有LSTM层中标记stateful=True。我们应该从训练过的模型中复制权重。

由于这些新层将需要一个固定的批量大小(Keras的规则),我假设它将是1(一个单一的流,而不是m个流),并将其添加到上面的模型创建中。

predictingModel = createModel(forTraining=False)
predictingModel.set_weights(trainingModel.get_weights())

瞧。只需通过一个步骤预测模型的输出:

pseudo for loop as samples arrive to your model:
    prob = predictingModel.predict_on_batch(sample)

    #where sample.shape == (1, 1, 3)

当你决定到达一个连续的序列的结尾时,调用<代码>预测模型。reset_states()这样您就可以安全地启动一个新序列,而不会让模型认为应该在上一个序列的末尾进行修复。

只需获取并设置它们,用h5py保存:

def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, 
            #consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s),
                                 data=K.eval(stat), 
                                 dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()
import h5py, numpy as np
from keras.layers import RNN, LSTM, Dense, Input
from keras.models import Model
import keras.backend as K




def createModel():
    inp = Input(batch_shape=(1,None,3))
    out = LSTM(5,return_sequences=True, stateful=True)(inp)
    out = LSTM(2, stateful=True)(out)
    out = Dense(1)(out)
    model = Model(inp,out)
    return model


def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s), data=K.eval(stat), dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()

def printStates(model):

    for l in model.layers:
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(l,RNN):
            for s in l.states:
                print(K.eval(s))   

model1 = createModel()
model2 = createModel()
model1.predict_on_batch(np.ones((1,5,3))) #changes model 1 states

print('model1')
printStates(model1)
print('model2')
printStates(model2)

saveStates(model1,'testStates5')
loadStates(model2,'testStates5')

print('model1')
printStates(model1)
print('model2')
printStates(model2)

在您的第一个模型中(如果它是stateful=False),它认为m中的每个序列都是独立的,并且没有连接到其他序列。它还认为每个批次包含唯一的序列。

如果不是这样,您可能希望训练有状态模型(考虑到每个序列实际上与前一个序列相连)。然后你需要m1个序列的批次。-

匿名用户

如果我理解正确,您有一批m序列,每个序列长度为7,其元素是三维向量(因此批具有形状(m*7*3))。在任何Keras RNN中,您可以将return_sequences标志设置为True以成为中间状态,即,对于每个批次,您将获得相应的7个输出,而不是最终预测,其中输出i表示给定从0到i的所有输入的阶段i的预测。

但最后你会一次得到所有。据我所知,Keras没有提供直接接口来在处理批处理时检索吞吐量。如果您使用任何CUDNN优化的变体,这可能会受到更大的限制。你可以做的基本上是将你的批次视为7个连续的形状批次(m*1*3),并逐步将它们输入到你的LSTM,记录每个步骤的隐藏状态和预测。为此,您可以将return_state设置为True并手动执行,或者您可以简单地将有状态的设置为True并让对象跟踪它。

下面的Python2 Keras示例应该正好代表您想要的内容。明确地:

  • 允许以持久方式保存整个LSTM中间状态
  • 等待下一个样本时
  • 以及根据特定批量训练的模型进行预测,该批量可能是任意和未知的

为此,它包括一个stateful=True的示例,用于最简单的训练,而return_state=True用于最精确的推断,因此您可以获得两种方法的味道。它还假设您得到一个已经序列化的模型,并且您对其了解不多。结构与吴恩达课程中的那位密切相关,他在话题上肯定比我更权威。由于您没有指定模型是如何训练的,所以我假设了多对一的训练设置,但这很容易调整。

from __future__ import print_function
from keras.layers import Input, LSTM, Dense
from keras.models import Model, load_model
from keras.optimizers import Adam
import numpy as np

# globals
SEQ_LEN = 7
HID_DIMS = 32
OUTPUT_DIMS = 3 # outputs are assumed to be scalars


##############################################################################
# define the model to be trained on a fixed batch size:
# assume many-to-one training setup (otherwise set return_sequences=True)
TRAIN_BATCH_SIZE = 20

x_in = Input(batch_shape=[TRAIN_BATCH_SIZE, SEQ_LEN, 3])
lstm = LSTM(HID_DIMS, activation="tanh", return_sequences=False, stateful=True)
dense = Dense(OUTPUT_DIMS, activation='linear')
m_train = Model(inputs=x_in, outputs=dense(lstm(x_in)))
m_train.summary()

# a dummy batch of training data of shape (TRAIN_BATCH_SIZE, SEQ_LEN, 3), with targets of shape (TRAIN_BATCH_SIZE, 3):
batch123 = np.repeat([[1, 2, 3]], SEQ_LEN, axis=0).reshape(1, SEQ_LEN, 3).repeat(TRAIN_BATCH_SIZE, axis=0)
targets = np.repeat([[123,234,345]], TRAIN_BATCH_SIZE, axis=0) # dummy [[1,2,3],,,]-> [123,234,345] mapping to be learned


# train the model on a fixed batch size and save it
print(">> INFERECE BEFORE TRAINING MODEL:", m_train.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))
m_train.compile(optimizer=Adam(lr=0.5), loss='mean_squared_error', metrics=['mae'])
m_train.fit(batch123, targets, epochs=100, batch_size=TRAIN_BATCH_SIZE)
m_train.save("trained_lstm.h5")
print(">> INFERECE AFTER TRAINING MODEL:", m_train.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))


##############################################################################
# Now, although we aren't training anymore, we want to do step-wise predictions
# that do alter the inner state of the model, and keep track of that.


m_trained = load_model("trained_lstm.h5")
print(">> INFERECE AFTER RELOADING TRAINED MODEL:", m_trained.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))

# now define an analogous model that allows a flexible batch size for inference:
x_in = Input(shape=[SEQ_LEN, 3])
h_in = Input(shape=[HID_DIMS])
c_in = Input(shape=[HID_DIMS])
pred_lstm = LSTM(HID_DIMS, activation="tanh", return_sequences=False, return_state=True, name="lstm_infer")
h, cc, c = pred_lstm(x_in, initial_state=[h_in, c_in])
prediction = Dense(OUTPUT_DIMS, activation='linear', name="dense_infer")(h)
m_inference = Model(inputs=[x_in, h_in, c_in], outputs=[prediction, h,cc,c])

#  Let's confirm that this model is able to load the trained parameters:
# first, check that the performance from scratch is not good:
print(">> INFERENCE BEFORE SWAPPING MODEL:")
predictions, hs, zs, cs = m_inference.predict([batch123,
                                               np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)),
                                               np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))],
                                              batch_size=1)
print(predictions)


# import state from the trained model state and check that it works:
print(">> INFERENCE AFTER SWAPPING MODEL:")
for layer in m_trained.layers:
    if "lstm" in layer.name:
        m_inference.get_layer("lstm_infer").set_weights(layer.get_weights())
    elif "dense" in layer.name:
        m_inference.get_layer("dense_infer").set_weights(layer.get_weights())

predictions, _, _, _ = m_inference.predict([batch123,
                                            np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)),
                                            np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))],
                                           batch_size=1)
print(predictions)


# finally perform granular predictions while keeping the recurrent activations. Starting the sequence with zeros is a common practice, but depending on how you trained, you might have an <END_OF_SEQUENCE> character that you might want to propagate instead:
h, c = np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)), np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))
for i in range(len(batch123)):
    # about output shape: https://keras.io/layers/recurrent/#rnn
    # h,z,c hold the network's throughput: h is the proper LSTM output, c is the accumulator and cc is (probably) the candidate
    current_input = batch123[i:i+1] # the length of this feed is arbitrary, doesn't have to be 1
    pred, h, cc, c = m_inference.predict([current_input, h, c])
    print("input:", current_input)
    print("output:", pred)
    print(h.shape, cc.shape, c.shape)
    raw_input("do something with your prediction and hidden state and press any key to continue")

因为我们有两种形式的状态持久性:
1。每个序列相同的模型保存/训练参数
2。ac表示在整个序列中不断演变,可能会“重新启动”

看看LSTM对象的内部是很有趣的。在我提供的Python示例中,ac权重是显式处理的,但经过训练的参数不是,它们的内部实现方式或含义可能并不明显。可按以下方式进行检查:

for w in lstm.weights:
    print(w.name, w.shape)

在本例中(32个隐藏状态)返回以下内容:

lstm_1/kernel:0 (3, 128)
lstm_1/recurrent_kernel:0 (32, 128)
lstm_1/bias:0 (128,)

我们观察到128的维度。那是为什么呢?此链接描述Keras LSTM实现如下:

g是循环激活,p是激活,Ws是核,Us是循环核,h是隐藏变量,也是输出,符号*是元素乘法。

这解释了128=32*4是在4个门中的每一个内发生仿射变换的参数,串联:

  • 形状(3,128)(名为内核)的矩阵处理给定序列元素的输入
  • 形状(32,128)(名为recurrent_kernel)的矩阵处理最后一个循环状态h的输入。
  • 形状的向量(128,)(命名为偏差),在任何其他NN设置中都是如此。

匿名用户

注意:这个答案假设您在培训阶段的模型不是有状态的。您必须了解什么是有状态RNN层,并确保培训数据具有相应的有状态属性。简而言之,这意味着序列之间有一个依赖关系,即一个序列是对另一个序列的后续操作,这是您在模型中要考虑的。如果您的模型和训练数据是有状态的,那么我认为从一开始就为RNN层设置stateful=True的其他答案更简单。

更新:无论训练模型是否有状态,您都可以将其权重复制到推理模型并启用状态。所以我认为基于设置stateful=True的解决方案比我的更短更好。它们唯一的缺点是这些解决方案中的批次大小必须是固定的。

请注意,LSTM层在单个序列上的输出由其固定的权重矩阵及其依赖于先前处理的时间步的内部状态决定。现在要获得长度为m的单个序列的LSTM层的输出,一个明显的方法是一次性将整个序列馈送到LSTM层。然而,正如我前面所说的,由于它的内部状态取决于之前的时间步,我们可以利用这一事实,通过在处理块结束时获取LSTM层的状态并将其传递给LSTM层来逐块地馈送单个序列处理下一个块。为了更清楚,假设序列长度为7(即它有7个固定长度特征向量的时间步长)。例如,可以这样处理这个序列:

  1. 将时间步1和2馈送到LSTM层;获取最终状态(称之为C1

如果我们一次向LSTM层提供全部7个时间步,那么最终输出相当于LSTM层产生的输出。

因此,为了在Keras中实现这一点,可以将LSTM层的return_state参数设置为True,以便获得中间状态。此外,在定义输入层时,不要指定固定的时间步长长度。相反,使用None可以为模型提供任意长度的序列,这使我们能够逐步处理每个序列(如果您在训练时间内的输入数据是固定长度的序列,这很好)。

由于您在推理时需要这种处理能力,我们需要定义一个新的模型,该模型共享训练模型中使用的LSTM层,可以将初始状态作为输入,也可以将结果状态作为输出。以下是可以完成的一般草图(请注意,在训练模型时不使用LSTM层的返回状态,我们只在测试时需要它):

# define training model
train_input = Input(shape=(None, n_feats))   # note that the number of timesteps is None
lstm_layer = LSTM(n_units, return_state=True)
lstm_output, _, _ =  lstm_layer(train_input) # note that we ignore the returned states
classifier = Dense(1, activation='sigmoid')
train_output = classifier(lstm_output)

train_model = Model(train_input, train_output)

# compile and fit the model on training data ...

# ==================================================

# define inference model
inf_input = Input(shape=(None, n_feats))
state_h_input = Input(shape=(n_units,))
state_c_input = Input(shape=(n_units,))

# we use the layers of previous model
lstm_output, state_h, state_c = lstm_layer(inf_input,
                                           initial_state=[state_h_input, state_c_input])
output = classifier(lstm_output)

inf_model = Model([inf_input, state_h_input, state_c_input],
                  [output, state_h, state_c])  # note that we return the states as output

现在,您可以为inf\u模型提供序列的时间步长。但是,请注意,最初必须向状态提供全零向量(这是状态的默认初始值)。例如,如果序列长度为7,则新数据流可用时发生的情况的示意图如下所示:

state_h = np.zeros((1, n_units,))
state_c = np.zeros((1, n_units))

# three new timesteps are available
outputs = inf_model.predict([timesteps, state_h, state_c])

out = output[0,0]  # you may ignore this output since the entire sequence has not been processed yet
state_h = outputs[0,1]
state_c = outputs[0,2]

# after some time another four new timesteps are available
outputs = inf_model.predict([timesteps, state_h, state_c])

# we have processed 7 timesteps, so the output is valid
out = output[0,0]  # store it, pass it to another thread or do whatever you want to do with it

# reinitialize the state to make them ready for the next sequence chunk
state_h = np.zeros((1, n_units))
state_c = np.zeros((1, n_units))

# to be continued...

当然,您需要在某种循环中执行此操作,或者实现一个控制流结构来处理数据流,但我认为您会得到总体思路。

最后,虽然您的具体示例不是序列到序列模型,但我强烈建议您阅读官方的Keras seq2seq教程,我认为您可以从中学到很多想法。