1. The easiest way to draw
Drawing scatter plots is a common requirement in data analysis. The most famous drawing tool in Python is matplotlib. The scatter method in matplotlib makes it easy to implement the requirements of drawing scatter plots. Let's draw a simpler scatter plot.
The data format is as follows:
0 746403 1 1263043 2 982360 3 1202602 ...
The first column is the X coordinate and the second column is the Y coordinate. Let's draw a picture below.
#!/usr/bin/env python #coding:utf-8 import matplotlib.pyplot as plt def pltpicture(): file = "xxx" xlist =  ylist =  with open(file, "r") as f: for line in f.readlines(): lines = line.strip().split() if len(lines) != 2 or int(lines) < 100000: continue x, y = int(lines), int(lines) xlist.append(x) ylist.append(y) plt.xlabel('X') plt.ylabel('Y') plt.scatter(xlist, ylist) plt.show()
2. More beautiful way of drawing
The picture above is rough, the easiest way, without any related configuration items. Below we use another data set to draw a more beautiful picture.
dataset is a public dataset from the network with the following data format:
40920 8.326976 0.953952 3 14488 7.153469 1.673904 2 26052 1.441871 0.805124 1 75136 13.147394 0.428964 1 ...
The number of frequent flyer miles earned each year in the first column;
The second column is the percentage of time spent playing video games;
The third column of weekly ice cream liters;
The fourth column is label:
1 means people who don't like
2 means a charming person
3 means very attractive people
Now take the number of flight miles obtained each year as the X coordinate, and use the percentage of events consumed by playing video games as the Y coordinate to draw the map.
from matplotlib import pyplot as plt file = "/home/mi/wanglei/data/datingTestSet2.txt" label1X, label1Y, label2X, label2Y, label3X, label3Y = , , , , ,  with open(file, "r") as f: for line in f: lines = line.strip().split() if len(lines) != 4: continue distance, rate, label = lines, lines, lines if label == "1": label1X.append(distance) label1Y.append(rate) elif label == "2": label2X.append(distance) label2Y.append(rate) elif label == "3": label3X.append(distance) label3Y.append(rate) plt.figure(figsize=(8, 5), dpi=80) axes = plt.subplot(111) label1 = axes.scatter(label1X, label1Y, s=20, c="red") label2 = axes.scatter(label2X, label2Y, s=40, c="green") label3 = axes.scatter(label3X, label3Y, s=50, c="blue") plt.xlabel("every year fly distance") plt.ylabel("play video game rate") axes.legend((label1, label2, label3), ("don't like", "attraction common", "attraction perfect"), loc=2) plt.show()
3. scatter function detailed
Let's take a look at the signature of the scatter function:
def scatter(self, x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, edgecolors=None, **kwargs): """ Make a scatter plot of `x` vs `y` Marker size is scaled by `s` and marker color is mapped to `c` Parameters ---------- x, y : array_like, shape (n, ) Input data s : scalar or array_like, shape (n, ), optional size in points^2. Default is `rcParams['lines.markersize'] ** 2`. c : color, sequence, or sequence of color, optional, default: 'b' `c` can be a single color format string, or a sequence of color specifications of length `N`, or a sequence of `N` numbers to be mapped to colors using the `cmap` and `norm` specified via kwargs (see below). Note that `c` should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. `c` can be a 2-D array in which the rows are RGB or RGBA, however, including the case of a single row to specify the same color for all points. marker : `~matplotlib.markers.MarkerStyle`, optional, default: 'o' See `~matplotlib.markers` for more information on the different styles of markers scatter supports. `marker` can be either an instance of the class or the text shorthand for a particular marker. cmap : `~matplotlib.colors.Colormap`, optional, default: None A `~matplotlib.colors.Colormap` instance or registered name. `cmap` is only used if `c` is an array of floats. If None, defaults to rc `image.cmap`. norm : `~matplotlib.colors.Normalize`, optional, default: None A `~matplotlib.colors.Normalize` instance is used to scale luminance data to 0, 1. `norm` is only used if `c` is an array of floats. If `None`, use the default :func:`normalize`. vmin, vmax : scalar, optional, default: None `vmin` and `vmax` are used in conjunction with `norm` to normalize luminance data. If either are `None`, the min and max of the color array is used. Note if you pass a `norm` instance, your settings for `vmin` and `vmax` will be ignored. alpha : scalar, optional, default: None The alpha blending value, between 0 (transparent) and 1 (opaque) linewidths : scalar or array_like, optional, default: None If None, defaults to (lines.linewidth,). verts : sequence of (x, y), optional If `marker` is None, these vertices will be used to construct the marker. The center of the marker is located at (0,0) in normalized units. The overall marker is rescaled by ``s``. edgecolors : color or sequence of color, optional, default: None If None, defaults to 'face' If 'face', the edge color will always be the same as the face color. If it is 'none', the patch boundary will not be drawn. For non-filled markers, the `edgecolors` kwarg is ignored and forced to 'face' internally. Returns ------- paths : `~matplotlib.collections.PathCollection` Other parameters ---------------- kwargs : `~matplotlib.collections.Collection` properties See Also -------- plot : to plot scatter plots when markers are identical in size and color Notes ----- * The `plot` function will be faster for scatterplots where markers don't vary in size or color. * Any or all of `x`, `y`, `s`, and `c` may be masked arrays, in which case all masks will be combined and only unmasked points will be plotted. Fundamentally, scatter works with 1-D arrays; `x`, `y`, `s`, and `c` may be input as 2-D arrays, but within scatter they will be flattened. The exception is `c`, which will be flattened only if its size matches the size of `x` and `y`. Examples -------- .. plot:: mpl_examples/shapes_and_collections/scatter_demo.py """
The specific parameters have the following meanings:
x, y are arrays of the same length.
s can be a scalar, or an array of the same length as x, y, indicating the size of the scatter. The default is 20.
c is color, which means the color of the point.
marker is the shape of a scatter.