DeepLearning | Classic Convolutional Neural Network VGG_Net

VGGNet is a deep convolutional neural network developed by researchers at the Oxford University Computer Vision Group and Google DeepMind. VGGNet explored the relationship between depth and performance of convolutional neural networks. By repeatedly stacking 3x3 small convolution kernels and 2x2 maximum pooling layers, VGGNet successfully constructed 16 to 19 deep convolutional neural networks, and He won the second place in the ILSVRC2014 competition classification project and the first place in the positioning project. Its network structure and ideas are mainly presented in the paper Very Deep Convolutional Networks for Large-Scale Image Recognition

一, VGGNet Network Structure All VGGNet papers use 3x3 convolution kernels and 2x2 pooling cores to improve performance by continuously deepening the network structure. The following figure shows the network structure diagrams of various levels of VGGNet


It gets darker, but the amount of parameters in the network does not increase much, because the amount of parameters is mainly consumed in the last three fully connected layers. Although the previous convolution part is very deep, the amount of parameters consumed is not large, but the time-consuming part of the training is still convolution because it is computationally intensive. Among them, D, E is also what we often say VGGNet-16 and VGGNet-19, C is very interesting. Compared with B, there are several 1x1 convolutional layers. The main meaning of 1x1 convolution is linear transformation, and input The number of channels and the number of output channels are unchanged, and no dimensionality reduction occurs.

VGGNet has 5 segments of convolution, each segment has 2 to 3 convolution layers, and each segment has 2 to 3 convolution layers. A maximum pooling layer is connected to each end of the paragraph to reduce the size of the image. The number of convolution kernels in each segment is the same, and the number of convolution kernels in the later segment is more: 64-128-256-512-512. There are often multiple identical 3x3 convolutional layers stacked together. This is a very useful design. Two 3x3 convolutional layers are connected in series to form a 5x5 convolutional layer, that is, one pixel will follow The surrounding 5x5 pixels are related, and it can be said that the receptive field size is 5x5. The effect of three 3x3 convolution layers in series is equivalent to a 7x7 convolution layer. In addition, the three series of 3x3 convolutional layers have more nonlinear changes than a 7x7 convolutional layer, making CNN more capable of learning features

二, techniques used in VGGNet VGGNet has a small skill in training. It first trains the simple network of level A, and then reuses the weight of the A network to initialize several complex models. The training convergence speed is faster. In predicting, VGG uses the Multi-Scale method to scale the image to a size Q and input the image into a convolutional network. Then, in the last convolutional layer, the method of windowing is used for classification prediction, the classification results of different windows are averaged, and the results of different sizes of Q are averaged to obtain the final result, which can improve the utilization of the image and improve the prediction accuracy. At the same time, in training, VGGNet also uses the Multi-Scale method for data enhancement, scaling the original image to a different size S, and then randomly cropping the 244x244 image, which can increase the amount of data, which is very good for preventing over-fitting. Effect The author summarizes the following points when comparing the networks at all levels. (1) The LRN layer has little effect (2) The deeper the network, the better (3) 1x1 convolution is also effective, but no 3x3 convolution effect is good, and larger convolution kernels can learn larger spatial features

三, VGGNet tensorflow implementation The data set used is relatively large, you can leave a mailbox in the comments, it will be sent to you

import tensorflow as tf
import os
import numpy as np
from PIL import Image
import pandas as pd
from sklearn import preprocessing
import cv2
from sklearn.model_selection import train_test_split

def load_Img(imgDir):
    imgs = os.listdir(imgDir)
    imgs = np.ravel(pd.DataFrame(imgs).sort_values(by=0).values)
    imgNum = len(imgs)
    data = np.empty((imgNum,image_size,image_size,3),dtype="float32")
    for i in range (imgNum):
        img ="/"+imgs[i])
        arr = np.asarray(img,dtype="float32")
        arr = cv2.resize(arr,(image_size,image_size))
        if len(arr.shape) == 2:
            temp = np.empty((image_size,image_size,3))
            temp[:,:,0] = arr
            temp[:,:,1] = arr
            temp[:,:,2] = arr
            arr = temp
        data[i,:,:,:] = arr
    return data        

def make_label(labelFile):
    label_list = pd.read_csv(labelFile,sep = '\t',header = None) 
    label_list = label_list.sort_values(by=0)
    le = preprocessing.LabelEncoder()
    for item in [1]:
        label_list[item] = le.fit_transform(label_list[item])     
    label = label_list[1].values
    onehot = preprocessing.OneHotEncoder(sparse = False)
    label_onehot = onehot.fit_transform(np.mat(label).T)
    return label_onehot

def conv_op(input_op,name,kh,kw,n_out,dh,dw,p):
    n_in = input_op.get_shape()[-1].value

    with tf.name_scope(name) as scope:
        kernel = tf.get_variable(scope+'W',
#        kernel = tf.Variable(tf.truncated_normal([kh,kw,n_in,n_out],
#                                                  dtype=tf.float32,
#                                                  stddev=0.01), 
#                                                  name=scope+'W')
        conv = tf.nn.conv2d(input_op,kernel,(1,dh,dw,1),padding='SAME')
        bias_init_val = tf.constant(0,shape=[n_out],dtype=tf.float32)
        biases = tf.Variable(bias_init_val,trainable=True,name='b')
        z = tf.nn.bias_add(conv,biases)
        activation = tf.nn.relu(z,name=scope)
        p += [kernel,biases]
        return activation

def fc_op(input_op,name,n_out,p):
    n_in = input_op.get_shape()[-1].value 

    with tf.name_scope(name) as scope:
#        kernel = tf.get_variable(scope+'w',
#                                 shape = [n_in,n_out],
#                                 dtype = tf.float32,
#                                 initializer = tf.contrib.layers.xavier_initializer())
        kernel = tf.Variable(tf.truncated_normal([n_in,n_out],
        biases = tf.Variable(tf.constant(0.1,shape=[n_out],dtype=tf.float32),name='b')

        activation = tf.nn.sigmoid(tf.matmul(input_op,kernel)+biases,name = 'ac')
#        activation = tf.nn.relu_layer(input_op,kernel,biases,name=scope)
        p += [kernel,biases]
        return activation

def mpool_op(input_op,name,kh,kw,dh,dw):
    return tf.nn.max_pool(input_op,

def inference_op(input_op,y,keep_prob):

    p = []

    conv1_1 = conv_op(input_op,name='conv1_1',kh=3,kw=3,n_out=64,dh=1,dw=1,p=p)
    conv1_2 = conv_op(conv1_1,name='conv1_2',kh=3,kw=3,n_out=64,dh=1,dw=1,p=p)
    pool1 = mpool_op(conv1_2,name='pool1',kh=2,kw=2,dw=2,dh=2)

    conv2_1 = conv_op(pool1,name='conv2_1',kh=3,kw=3,n_out=128,dh=1,dw=1,p=p)
    conv2_2 = conv_op(conv2_1,name='conv2_2',kh=3,kw=3,n_out=128,dh=1,dw=1,p=p)
    pool2 = mpool_op(conv2_2,name='pool2',kh=2,kw=2,dw=2,dh=2)

    conv3_1 = conv_op(pool2,name='conv3_1',kh=3,kw=3,n_out=256,dh=1,dw=1,p=p)
    conv3_2 = conv_op(conv3_1,name='conv3_2',kh=3,kw=3,n_out=256,dh=1,dw=1,p=p)    
    conv3_3 = conv_op(conv3_2,name='conv3_3',kh=3,kw=3,n_out=256,dh=1,dw=1,p=p) 
    pool3 = mpool_op(conv3_3,name='pool3',kh=2,kw=2,dh=2,dw=2)

    conv4_1 = conv_op(pool3,name='conv4_1',kh=3,kw=3,n_out=512,dh=1,dw=1,p=p)
    conv4_2 = conv_op(conv4_1,name='conv4_2',kh=3,kw=3,n_out=512,dh=1,dw=1,p=p)    
    conv4_3 = conv_op(conv4_2,name='conv4_3',kh=3,kw=3,n_out=512,dh=1,dw=1,p=p) 
    pool4 = mpool_op(conv4_3,name='pool4',kh=2,kw=2,dh=2,dw=2)           

    conv5_1 = conv_op(pool4,name='conv5_1',kh=3,kw=3,n_out=512,dh=1,dw=1,p=p)
    conv5_2 = conv_op(conv5_1,name='conv5_2',kh=3,kw=3,n_out=512,dh=1,dw=1,p=p)    
    conv5_3 = conv_op(conv5_2,name='conv5_3',kh=3,kw=3,n_out=512,dh=1,dw=1,p=p) 
    pool5 = mpool_op(conv5_3,name='pool5',kh=2,kw=2,dh=2,dw=2)   

    shp = pool5.get_shape()
    flattened_shape = shp[1].value*shp[2].value*shp[3].value
    resh1 = tf.reshape(pool5,[-1,flattened_shape],name='resh1')

    fc6 = fc_op(resh1,name='fc6',n_out=4096,p=p)
    fc6_drop = tf.nn.dropout(fc6,keep_prob,name='fc6_drop')

    fc7 = fc_op(fc6_drop,name='fc7',n_out=4096,p=p)
    fc7_drop = tf.nn.dropout(fc7,keep_prob,name='fc7_drop')

    fc8 = fc_op(fc7_drop,name='fc8',n_out=190,p=p)  

    y_conv = tf.nn.softmax(fc8) 

    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(tf.clip_by_value(y_conv, 1e-10, 1.0)),reduction_indices=[1]))
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

    correct_prediction = tf.equal(tf.arg_max(y_conv,1),tf.arg_max(y,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))  

    return accuracy,train_step,cross_entropy,fc8,p

def run_benchmark():

    imgDir = '/Users/zhuxiaoxiansheng/Desktop/DatasetA_train_20180813/train'
    labelFile = '/Users/zhuxiaoxiansheng/Desktop/DatasetA_train_20180813/train.txt'  

    data = load_Img(imgDir)
    data = data/255.0
    label = make_label(labelFile)

    traindata,testdata,trainlabel,testlabel = train_test_split(data,label,test_size=100,random_state = 2018)   

    with tf.Graph().as_default():

        os.environ["CUDA_VISIBLE_DEVICES"] = '0'   #Specify the first GPU availableconfig = tf.ConfigProto()
        Config.gpu_options.per_process_gpu_memory_fraction =1.0
        my_graph = tf.Graph()
        sess = tf.InteractiveSession(graph=my_graph,config=config)

        keep_prob = tf.placeholder(tf.float32)
        input_op = tf.placeholder(tf.float32,[None,image_size*image_size*3])
        input_op = tf.reshape(input_op,[-1,image_size,image_size,3])
        y = tf.placeholder(tf.float32,[None,190]) 

        accuracy,train_step,cross_entropy,fc8,p = inference_op(input_op,y,keep_prob)

        init = tf.global_variables_initializer()

        for i in range(num_batches):
            rand_index = np.random.choice(38121,size=(batch_size))

            if i%100 == 0:
                rand_index = np.random.choice(38121,size=(100))
                train_accuracy = accuracy.eval(feed_dict={input_op:traindata[rand_index],y:trainlabel[rand_index],keep_prob:1.0})
                print('step %d, training accuracy %g'%(i,train_accuracy))
        print("test accuracy %g"%accuracy.eval(feed_dict={input_op:testdata,y:testlabel,keep_prob:1.0}))

image_size = 224
batch_size = 64
num_batches = 10000