Analyze the essence of the AlphaGo algorithm principle

and son to discuss the algorithm, my son suggested that I refer to the principle of AlphaGo, sent me a reference to look at it, I feel very inspired.

一, AlphaGo's mathematical model

1.1 The essence of the Go problem

The Go problem is essentially a data classification problem. The Go question can be described as follows: Give you a layout of Go and find the best place to drop. In other words:

  • Input is a Go layout, which can be abstracted into a 19x19 matrix, with elements taking the value -1, 0, 1. The
  • output is a location with a total of 19x19=361 possibilities.

is easy to associate with, in essence, this is the classification of the 19x19 size image into the 361 class next time.

1.2 CNN model of image classification

Here is a schematic diagram of the CNN network model for image classification: 这里写图片描述 图中输入一幅图片,输出一个四维向量,给出了输入图像是 dog、cat、boat、bird 四类目标的概率分布。

1.3 CNN model of Go

@ The CNN model of Go and the image classification model are essentially the same The input is a 19x19 resolution game "image" and the output is a 361-dimensional probability distribution vector. Among them, the position with the highest probability is the optimal position.

1.4 Smart Google Research Team

The Go Problem is abstracted into pure mathematics and is the core issue of AlphaGo. AlphaGo is very smart and abstracts the problem into a CNN network model, which is a sensational place.

Sometimes I have to admire Google’s research team. When I read Google’s FaceNet paper, I had the feeling of suicide. Really, I think the triad model is too smart, through an indirect The method constructs the loss function and avoids the annotation of massively combined samples (in fact, the face feature extraction model is also impossible to manually label the training samples). At that time, I did not find such a clever way of thinking and meditation. This is convincing to Google.

The popularity of deep learning technology, Google has contributed. Google is a great company that has made tremendous contributions to the progress of human civilization. I hope that our Chinese enterprises can also look beyond their glory and shoulder the glorious mission of liberating all mankind.

二,Models and Training

2.1 Learning from Humans

Humans have accumulated a large number of chessboards, showing examples of how various chess games can be dropped. These can be used as training samples. Google has obtained the chess game of human players from the KGS game platform. For each game, there will be a human-made game. This is a natural training sample, so that 30 million training samples can be obtained.

谷歌's experimental results show that this method is not good enough. There are two reasons for this:

  • is too small: the number is not enough to cover all situations.
  • 质量质量: The human experience may not be correct. Learn to play chess with the bad chess player.

2.2 MCTS Monte Carlo Search Tree

MCTS This idea is wonderful and does not require any human game.

2.2.1 Decision Tree

We can imagine that all possible situations and methods of Go are unfolding, and they are a huge decision tree. Each node corresponds to a game, and each game has an optimal drop recommendation position. This way the Go problem is fixed.

Since this decision tree is too large, we can construct a part of it, use this part as a training sample, and then the rest, by training the convolutional neural network model to perform predictive calculation.

2.2.2 MCTS Decision Tree

MCTS Before starting to construct a search tree, first assume that any game is in any position and the chances of winning are the same. That is to say, when a new game is played, the MCTS constructs an initial 19x19 order drop decision weight matrix for the game, and the weight of each drop position is initialized to the same value.

棋局和决策权重矩阵示意图

starts with a blank game and randomly falls until the game ends. Putting the game that appears in the game process into the decision tree, each game adjusts the weight of its decision matrix according to the final outcome. Next time we went to the second set, we had a little reference to the weight reference guide. In this way, we will construct a decision tree with certain "intelligence" by playing 100,000 games.

2.2.3 MCTS Features Analysis

MCTS has the following characteristics:

  • MCTS can not only be used to build training samples, but also in the process of game, you can continue to "online" according to the situation of opponents. simulation. It is conceivable that after a large number of random simulations, the best walking method can be obtained. This method is called Rollout in Google's paper.
  • MCTS can be calculated in parallel, and the calculation speed can be increased by increasing the amount of computing resources.

2.3 Human experience + MCTS, enhanced algorithm ability

MCTS The chess player weight matrix can be initialized according to the human game, which can improve the training speed. Experiments have shown that the 20-step human experience in front of the opening can be effective.

三, Reinforced learning and self-competition

The next story is not exciting, but introduces another new technology - reinforcement learning, so AlphaGo Zero was born. Due to space limitations, intensive learning and AlphaGo Zero will be discussed later.