我已经编写了一个Q学习代理,它可以和一个随机的玩家玩井字游戏。 我想玩这个游戏20次,并用Matplotlib绘制一个单一的均值学习曲线。 第一个for循环玩游戏20次,并产生一个numpy.ndarrays列表。 如何获得平均报酬,我可以绘制成一条曲线? 下面是我到目前为止所做的工作:
lines = []
#play tictactoe 20 times
for i in range(0, 20):
# Instantiate environment
environment = tictactoe.Tictactoe(verbose=True)
# play the game which returns the rewards gained in a number of episodes
line = play_tictactoe(environment,
player_o=player_o,
player_x=player_x,
episodes=m)
#line is a numpy.ndarray
lines.append(line)
for j in lines:
avg_line = #calculate the single mean learning curve (performance of the 20 agents)
x = np.arange(0, len(line))
fig, (ax2) = plt.subplots(1,1)
ax2.plot(x, avg_line, color="red", label="Q-Agent")
ax2.set_title("The mean performance of 20 Q-Learning Agents")
ax2.set_xlabel('Episodes')
ax2.set_ylabel('Rewards')
plt.legend()
plt.show()
您可以计算每行的平均值,并使用列表理解将输出存储在列表中,然后绘制平均行
avg_line = [np.mean(j) for j in lines] # This is called list comprehension
x = np.arange(0, len(lines))
fig, (ax2) = plt.subplots(1,1)
ax2.plot(x, avg_line, color="red", label="Q-Agent")