当前位置：移动技术网 > IT编程>脚本编程>Python > AlphaZero并行五子棋AI

AlphaZero并行五子棋AI

2018年12月14日 | 移动技术网IT编程 | 我要评论

月星汐,极品少年花都燃情,9c8844

alphazero-gomoku-mpi

link

github : alphazero-gomoku-mpi

overview

this repo is based on junxiaosong/alphazero_gomoku, sincerely grateful for it.

i do these things:

implement asynchronous self-play training pipeline in parallel like alphago zero's way
write a root parallel mcts (vote a move using ensemble way)
use resnet structure to train the model and set a transfer learning api to train a larger board model based on small board's model (like pre-training way in order to save time)

strength

current model is on 11x11 board, and playout 400 times when test
play with this model, can always win regardless of black or white
play with , can rank around 20th-30th for some rough tests
when i play white, i can't win ai. when i play black, end up with tie/lose for most of my time

references

mastering the game of go without human knowledge

blog

installation dependencies

python3
tensorflow>=1.8.0
tensorlayer>=1.8.5
mpi4py (parallel train and play)
pygame (gui)

how to install

tensorflow/tensorlayer/pygame install :

conda install tensorflow
conda install tensorlayer
conda install pygame

mpi4py install

mpi4py on windows

how to run

play with ai

python human_play.py

play with parallel ai (-np : set number of processings, take care of oom !)

mpiexec -np 3 python -u human_play_mpi.py

train from scratch

python train.py

train in parallel

mpiexec -np 43 python -u train_mpi.py

algorithm

it's almost no difference between alphago zero except apv-mcts.
a ppt can be found in dir demo/slides

details

most settings are the same with alphago zero, details as follow :

network structure
- current model uses 19 residual blocks, more blocks means more accurate prediction but also slower speed
- the number of filters in convolutional layer shows in the follow picture
feature planes
- in alphago zero paper, there are 19 feature planes: 8 for current player's stones, 8 for opponent's stones, and the final feature plane represents the colour to play
- here i only use 4 for each player, it can be easily changed in game_board.py
dirichlet noise
- i add dirichlet noises in each node, it's different from paper that only add noises in root node. i guess alphago zero discard the whole tree after each move and rebuild a new tree, while here i keep the nodes under the chosen action, it's a little different
- weights between prior probabilities and noises are not changed here (0.75/0.25), though i think maybe 0.8/0.2 or even 0.9/0.1 is better because noises are added in every node

parameters in detail

i try to maintain the original parameters in alphago zero paper, so as to testify it's generalization. besides, i also take training time and computer configuration into consideration.

parameters setting	gomoku	alphago zero
mpi num	43	-
c_puct	5	5
n_playout	400	1600
blocks	19	19/39
buffer size	500,000(data)	500,000(games)
batch_size	512	2048
lr	0.001	annealed
optimizer	adam	sgd with momentum
dirichlet noise	0.3	0.03
weight of noise	0.25	0.25
first n move	12	30

training detials
- i train the model for about 100,000 games and takes 800 hours or so
- computer configuration : 2 cpu and 2 1080ti gpu
- we can easily find the computation gap with deepmind and rich people can do some future work

some tips

network
- zeropadding with input : sometimes when play with ai, it's unaware of the risk at the edge of board even though i'm three/four in a row. zeropadding data input can mitigate the problem
- put the network on gpu : if the network is shallow, it's not matter cpu/gpu to use, otherwise it's faster to use gpu when self-play
dirichlet noise
- add noise in node : in junxiaosong/alphazero_gomoku, noises are added outside the tree, seemingly like dqn's \(\epsilon-greedy\) way. it's ok when i test on 6x6 and 8x8 board, but when on 11x11 some problems occur. after a long time training on 11x11, black player will always play the first stone in the middle place with policy probability equal to 1. it's very rational for black to play here, however, the white player will never see other kifu that play in the other place at first stone. so, when i play black with ai and place somewhere not the middle place, ai will get very stupid because it has never seen this way at all. add noise in node can mitigate the problem
- smaller weight with noise : as i said before, i think maybe 0.8/0.2 or even 0.9/0.1 is a better choice between prior probabilities and noises' weights, because noises are added in every node
randomness
- dihedral reflection or rotation : when use the network to output probabilities/value, it's better to do as paper said: the leaf node \(s_l\) is added to a queue for neural network evaluation, \((d_i(p),v)=f_{\theta}(d_i(s_l))\), where \(d_i\) is a dihedral reflection or rotation selected uniformly at random from \(i\) in \([1..8]\)
- add randomness when test : i add the dihedral reflection or rotation also when play with it, so as to avoid to play the same game all the time
tradeoffs
- network depth : if the network is too shallow, loss will increase. if too deep, it's slow when train and test. (my network is still a little slow when play with it, i think maybe 9 blocks is all right)
- buffer size : if the size is small, it's easy to fit by network but can't guarantee it's performance for only learning from these few data. if it's too large, much longer time and deeper network structure should be taken
- playout number : if small, it's quick to finish a self-play game but can't guarantee kifu's quality. on the contrary with more playout times, better kifu will get but also take longer time

future work can try

continue to train (a larger board) and increase the playout number
try some other parameters for better performance
alter network structure
alter feature planes
implement apv-mcts
train on standard/renju rule

您可能感兴趣的文章:

如对本文有疑问，请在下面进行留言讨论，广大热心网友会与你互动！！点击进行留言回复

python如何查看网页代码

用python查看网页代码的方法：1、使用“import”导入requests包import requests2、使用requests包的get()函数通过网页... [阅读全文]
Python如何用wx模块创建文本编辑器

用python的wx模块创建文本编辑器的方法：1、设置按钮的位置import wxapp = wx.app()win = wx.frame(none,title... [阅读全文]
python如何保存文本文件

python保存文本文件的方法：使用python内置的open()类可以打开文本文件，向文件里面写入数据可以用write()函数，写完之后，使用close()函... [阅读全文]
python如何编写win程序

python可以编写win程序。win程序的格式是exe，下面我们就来看一下使用python编写exe程序的方法。编写好python程序后py2exe模块即可将... [阅读全文]
Python替换NumPy数组中大于某个值的所有元素实例

我有一个2d(二维) numpy数组，并希望用255.0替换大于或等于阈值t的所有值。据我所知，最基础的方法是：shape = arr.shaperesult ... [阅读全文]
使用Numpy对特征中的异常值进行替换及条件替换方式

原始数据为excel文件，由传感器获得，通过pyhton xlrd模块读入，读入后为数组形式，由于其存在部分异常值和缺失值，所以便利用numpy对其中的异常值进... [阅读全文]
Python 实现将numpy中的nan和inf,nan替换成对应的均值

nan：not a numberinf：infinity;正无穷numpy中的nan和inf都是float类型t!=t 返回bool类型的数组(矩阵)np.co... [阅读全文]
给ubuntu18安装python3.7的详细教程

参考文章准备工作安装工具sudo apt updatesudo apt upgradesudo apt install gccsudo apt install ... [阅读全文]
python爬虫把url链接编码成gbk2312格式过程解析

1. 问题　　抓取某个网站，发现请求参数是乱码格式，这是点击 textview，发现请求参数如下图所示3. 那么=%b9%fa%ce%f1%d4%ba%b7%a... [阅读全文]
pyecharts在数据可视化中的应用详解

使用pyecharts进行数据可视化安装 pip install pyecharts也可以在pycharm软件里进行下载pyecharts库包。下载成功后进行查... [阅读全文]