保存模型（随机森林）不能作为“新拟合”模型工作——类别变量的问题

提问者：小点点

保存模型（随机森林）不能作为“新拟合”模型工作——类别变量的问题

我建立了一个模型，并保存了它。然后我再次加载这个模型，并尝试将其应用于用于训练的相同数据集。我收到了错误信息

msgstr"无法将字符串转换为浮点"

因为我有几个类别变量。但在保存模型之前，我能够将此模型应用于此数据集而没有错误。问题似乎是关于这两个类别变量的信息没有在我保存模型时保存。事实上，我对这些变量使用了Labelencoder。是否有任何方法保存有关这些类别变量的信息，以便保存的模型与“新安装”模型一样工作？提前谢谢！

共1个答案

匿名用户

这是管道的典型用例。

将工作流创建为单个管道，然后保存管道。

当您加载管道时，您可以直接获得对新数据的预测，而无需任何编码。

另外，labelEncoder并不用于转换输入数据。顾名思义，它是用于目标变量的。

如果需要将分类变量转换为序数，请使用OrdinalEncoder。

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import make_column_transformer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

X = [[1, 'orange', 'yes'], [1, 'apple', 'yes'],
     [-1, 'orange', 'no'], [-1, 'apple', 'no']]
y = [[1], [1], [0], [0]]

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    random_state=42)
pipe = Pipeline(
    [('encoder', make_column_transformer((OrdinalEncoder(), [1, 2]), 
                                         remainder='passthrough')),
    # applies OrdinalEncoder using column transformer for 2nd and 3rd column
     ('rf', RandomForestClassifier(n_estimators=2,random_state=42))])

pipe.fit(X_train, y_train)

import joblib
joblib.dump(pipe, 'pipe.pkl')

loaded_pipe = joblib.load('pipe.pkl')
loaded_pipe.score(X_test, y_test)

保存模型（随机森林）不能作为“新拟合”模型工作——类别变量的问题

共1个答案

相关问题

热门标签

保存模型（随机森林）不能作为“新拟合”模型工作——类别变量的问题

共1个答案

相关问题

热门标签

微信关注