Python data analysis commonly used library summary

Python has the unique advantage of being the best language in the field of data analysis and mining. Because he has a lot of libraries in this field that can be used, and it works well, such as Numpy, SciPy, Matploglib, Pandas, ScikitLearn, Keras, Gensim, etc.
1) Numpy, which provides Python with a true array function. Including multi-dimensional arrays, as well as functions for fast processing of data, Numpy is still a dependency library for more advanced extension libraries, such as the subsequent Scipy, Matplotlib, Pandas, etc.;
2)Scipy, he made Python Half a MATLAB, Scipy provides a true matrix type, and a large number of objects and functions based on matrix operations, including functions such as optimization, linear algebra, integration, interpolation, you and, special functions, fast Fourier transform, Signal processing and image processing, ordinary differential equations and other commonly used calculations in science and engineering; Scipy relies on Numpy;
3) Matplotlib, for Python, Matplotlib is the most famous drawing library, mainly two-dimensional Drawing, of course, can also support some short-cut drawings;
4) Pandas, he is one of the most powerful data analysis and exploration tools under Python. His high-level data structure and ingenious tools make processing data very fast and simple in Python. Pandas is built on top of NumPy, making Numpy-centric applications easy to use. The name of Pandas comes from panel data ( Panel Data) and Python Data Analysis, which was originally developed as a financial data analysis tool, was developed by AQR Capital Management in April 2008 and is open sourced at the end of 2009;
他功能Very powerful, support SQL-like data additions and deletions, and with a wealth of data processing functions, support for time series analysis, support for flexible processing of real data. Pandas is actually very complicated, enough to write a book alone. If you are interested in him, you can check out the book "Data Analysis with Python" by WesMcKinney, one of the main authors of Pandas.
5) StatModels, Pandas looks at the reading, processing and exploration of data, while StatsModels pays more attention to the statistical modeling and analysis of data, which makes Python have the taste of R language. StatModels supports data interaction with Pandas, so he combined with Pandas to become a powerful data mining combination under Python;
6)Scikit-Learn, which is a library related to machine learning, he is powerful under Python And its learning toolkit, he provides a complete and learning toolbox, including: data preprocessing, classification, regression, clustering, prediction and model analysis. He relies on NumPy, SciPy, Matplotlib, etc.;
7)Keras, he is used to build neural networks, he is not a simple neural network library, but a powerful deep learning library based on Theano, using it not only Only ordinary neural networks can be built, and various deep learning models such as self-encoders, cyclic neural networks, recurrent neural networks, and convolutional neural networks can be built. Since it is based on Theano, the speed is quite fast.
8) Theano, also a Python library, developed by the lab led by deep learning expert Yoshua Bengio, is used to define, optimize, and efficiently solve the simulation estimation problem of multidimensional arrays corresponding to mathematical expressions. He has the characteristics of efficient symbol decomposition, high optimization speed, and stability. The most important thing is to achieve GPU acceleration. The processing speed of intensive data is ten times that of CPU;
9)Gensim , topic modelling of humans, he is mainly used to deal with language tasks, such as text similarity calculation, LDA, Word2Vec, etc., these areas of the task often require more background knowledge, the usual situation is: research readers have I don't need to say anything more, and I don't study readers in this area. I can't say it here.