提问者:小点点

用matplotlib绘制tictactoe游戏Agent的均值学习曲线


我已经编写了一个Q学习代理,它可以和一个随机的玩家玩井字游戏。 我想玩这个游戏20次,并用Matplotlib绘制一个单一的均值学习曲线。 第一个for循环玩游戏20次,并产生一个numpy.ndarrays列表。 如何获得平均报酬,我可以绘制成一条曲线? 下面是我到目前为止所做的工作:

lines = []

#play tictactoe 20 times
for i in range(0, 20):

    # Instantiate environment        
    environment = tictactoe.Tictactoe(verbose=True)

    # play the game which returns the rewards gained in a number of episodes
    line = play_tictactoe(environment,
                               player_o=player_o,
                               player_x=player_x,
                               episodes=m)
    #line is a numpy.ndarray
    lines.append(line)

for j in lines:
    avg_line = #calculate the single mean learning curve (performance of the 20 agents)

x = np.arange(0, len(line))

fig, (ax2) = plt.subplots(1,1)

ax2.plot(x, avg_line, color="red", label="Q-Agent")
ax2.set_title("The mean performance of 20 Q-Learning Agents")
ax2.set_xlabel('Episodes')
ax2.set_ylabel('Rewards')

plt.legend()
plt.show()

共1个答案

匿名用户

您可以计算每行的平均值,并使用列表理解将输出存储在列表中,然后绘制平均行

avg_line = [np.mean(j) for j in lines] # This is called list comprehension

x = np.arange(0, len(lines))
fig, (ax2) = plt.subplots(1,1)

ax2.plot(x, avg_line, color="red", label="Q-Agent")