This series introduces the process of learning CNN, and combines Tensorflow to use CNN for image recognition
Convolutional neural network is in common Based on the development of BP full connection, CNN focuses on solving the problem of untrainable in the BP fully connected network due to too many network weight parameters. CNN proposes local connection, weight sharing and pooling technology. It is all for the purpose of reducing network parameters. As shown below: 假如我们的网络模型是： 输入层->一层隐含层->输出层 样图是1000x1000的图像，把它变行向量就是1x1000000的向量 。如果隐含层设置为1000000个节点，那么根据全连接的定义输入层的每一个元素都要连接到隐含层中，此时输入层与隐含层之间的权值矩阵 的大小应该为1000000*1000000=10^12 ，这是蛮恐怖的。
Now we use the local join method to reduce the parameters. We specify that the hidden layer is still 1,000,000 neurons, but the neurons of each hidden layer are only related to 10*10 pixels of the input layer. For example, on the right side of Figure 1, we let the black neurons only relate to the block in the upper right corner of the picture. At this time, each neuron in the hidden layer has only 100 parameters, and then the parameters between the input layer and the hidden layer are 10^8, which is four orders of magnitude less, which is quite powerful. But 10^8 is still a lot, and each layer 10^8 still does not work. What should I do? Then there is the weight sharing ···
What is 权值分享嘞?
This is the weight sharing! With this setting, there are only 100 parameters left at this level! ! Society, society! Later, the designer found that a few rules were not good, so a set of theories came from the circle: The weight matrix used by the same layer is called the feature extractor. We think that only one feature is extracted in each layer. That's fine, so you can let the feature extractor of the same layer be shared by neurons in this layer. In order to extract different features, you need to use a few layers of layers, and more layers, the features come. The above is the understanding of the vernacular.
Local connection, pool technology is the inspiration of modern biological neural network related research. Weight sharing is defined at the beginning of the design. The weight of a single layer is shared (don't ask why, because this method can greatly reduce the number of parameters in each layer, and the starting point of this design is to think if every If the weight of one layer is shared, then the weight of this layer is used to extract a feature. To extract different features, set up a multi-layer network. This is the core idea of CNN.
Overview The core idea of CNN is to reduce parameters, mainly through local connection, pooling, and weight sharing. We use weight sharing as a rule and expand the following.
Local connection means that the neurons in this layer are only related to some neurons in the upper layer. Which of the operations we are familiar with are specifically for displaying similar, and the output is only related to a part of the input? Obviously, convolution is one of the typical representatives.
于 is in CNN is the idea of using convolution operations to achieve local connections. The following animations explain the locality of convolution very well:
At the same time, we also know that feature extraction can be performed on images using convolution operations, such as edge feature extraction using Laplace operator and convolution using local joins. After the operation, it can also be regarded as the extraction of a certain feature from the upper output using a convolution kernel.
池化技术, also known as sampling technology, this technology is also the integration of the results of the local connection, the purpose of this is mainly to reduce parameters, reduce output nodes Number while reducing the degree of overfitting. There are usually two ways to pool/sample:
Max-Pooling: Select the maximum value in the Pooling window as the sample value; Mean-Pooling: Add all the values in the Pooling window to average, and use the average value as the sample value;
The schematic is as follows:
It will convolve the result The integration of a 2 * 2 output matrix, this aggregation ability is still quite powerful. however! Because the ability of pooling is too powerful, it reduces the size of data too quickly. Therefore, many networks tend to have a pooling layer with low aggregation capability, and even do not directly use the pooling layer.
Loss Function Layer
Loss function layer (loss layer) is used to determine how the training process "punishes" the difference between the network's predictions and real results. It is usually the last layer of the network. A variety of different loss functions are available for different types of tasks. For example, the Softmax cross entropy loss function is often used to select one of the K categories, while the Sigmoid cross entropy loss function is often used for multiple independent dichotomy problems. The Euclidean loss function is often used for problems where the result range is any real number.
CNN's network structure: