Draw a scatter plot using the scatter method in matplotlib

1. The easiest way to draw

Drawing scatter plots is a common requirement in data analysis. The most famous drawing tool in Python is matplotlib. The scatter method in matplotlib makes it easy to implement the requirements of drawing scatter plots. Let's draw a simpler scatter plot.
The data format is as follows:

0   746403                                                                                                                                                                
1   1263043
2   982360
3   1202602
...

The first column is the X coordinate and the second column is the Y coordinate. Let's draw a picture below.

#!/usr/bin/env python
#coding:utf-8

import matplotlib.pyplot as plt 

def pltpicture():
    file = "xxx"                                                                                                                                                       
    xlist = []
    ylist = []
    with open(file, "r") as f:
        for line in f.readlines():
            lines = line.strip().split()
            if len(lines) != 2 or int(lines[1]) < 100000:
                continue
            x, y = int(lines[0]), int(lines[1])
            xlist.append(x)
            ylist.append(y)

    plt.xlabel('X')
    plt.ylabel('Y')
    plt.scatter(xlist, ylist)
    plt.show()

这里写图片描述

2. More beautiful way of drawing

The picture above is rough, the easiest way, without any related configuration items. Below we use another data set to draw a more beautiful picture.
dataset is a public dataset from the network with the following data format:

40920   8.326976    0.953952    3
14488   7.153469    1.673904    2
26052   1.441871    0.805124    1
75136   13.147394   0.428964    1
...

The number of frequent flyer miles earned each year in the first column;
The second column is the percentage of time spent playing video games;
The third column of weekly ice cream liters;
The fourth column is label:
1 means people who don't like
2 means a charming person
3 means very attractive people

Now take the number of flight miles obtained each year as the X coordinate, and use the percentage of events consumed by playing video games as the Y coordinate to draw the map.

from matplotlib import pyplot as plt

file = "/home/mi/wanglei/data/datingTestSet2.txt"
label1X, label1Y, label2X, label2Y, label3X, label3Y = [], [], [], [], [], []

with open(file, "r") as f:
    for line in f:
        lines = line.strip().split()
        if len(lines) != 4:
            continue
        distance, rate, label = lines[0], lines[1], lines[3]
        if label == "1":
            label1X.append(distance)
            label1Y.append(rate)

        elif label == "2":
            label2X.append(distance)
            label2Y.append(rate)

        elif label == "3":
            label3X.append(distance)
            label3Y.append(rate)

plt.figure(figsize=(8, 5), dpi=80)
axes = plt.subplot(111)

label1 = axes.scatter(label1X, label1Y, s=20, c="red")
label2 = axes.scatter(label2X, label2Y, s=40, c="green")
label3 = axes.scatter(label3X, label3Y, s=50, c="blue")

plt.xlabel("every year fly distance")
plt.ylabel("play video game rate")
axes.legend((label1, label2, label3), ("don't like", "attraction common", "attraction perfect"), loc=2)

plt.show()

Final rendering:
这里写图片描述

3. scatter function detailed

Let's take a look at the signature of the scatter function:

    def scatter(self, x, y, s=None, c=None, marker=None, cmap=None, norm=None,
                vmin=None, vmax=None, alpha=None, linewidths=None,
                verts=None, edgecolors=None,
                **kwargs):
        """
        Make a scatter plot of `x` vs `y`

        Marker size is scaled by `s` and marker color is mapped to `c`

        Parameters
        ----------
        x, y : array_like, shape (n, )
            Input data

        s : scalar or array_like, shape (n, ), optional
            size in points^2.  Default is `rcParams['lines.markersize'] ** 2`.

        c : color, sequence, or sequence of color, optional, default: 'b'
            `c` can be a single color format string, or a sequence of color
            specifications of length `N`, or a sequence of `N` numbers to be
            mapped to colors using the `cmap` and `norm` specified via kwargs
            (see below). Note that `c` should not be a single numeric RGB or
            RGBA sequence because that is indistinguishable from an array of
            values to be colormapped.  `c` can be a 2-D array in which the
            rows are RGB or RGBA, however, including the case of a single
            row to specify the same color for all points.

        marker : `~matplotlib.markers.MarkerStyle`, optional, default: 'o'
            See `~matplotlib.markers` for more information on the different
            styles of markers scatter supports. `marker` can be either
            an instance of the class or the text shorthand for a particular
            marker.

        cmap : `~matplotlib.colors.Colormap`, optional, default: None
            A `~matplotlib.colors.Colormap` instance or registered name.
            `cmap` is only used if `c` is an array of floats. If None,
            defaults to rc `image.cmap`.

        norm : `~matplotlib.colors.Normalize`, optional, default: None
            A `~matplotlib.colors.Normalize` instance is used to scale
            luminance data to 0, 1. `norm` is only used if `c` is an array of
            floats. If `None`, use the default :func:`normalize`.

        vmin, vmax : scalar, optional, default: None
            `vmin` and `vmax` are used in conjunction with `norm` to normalize
            luminance data.  If either are `None`, the min and max of the
            color array is used.  Note if you pass a `norm` instance, your
            settings for `vmin` and `vmax` will be ignored.

        alpha : scalar, optional, default: None
            The alpha blending value, between 0 (transparent) and 1 (opaque)

        linewidths : scalar or array_like, optional, default: None
            If None, defaults to (lines.linewidth,).

        verts : sequence of (x, y), optional
            If `marker` is None, these vertices will be used to
            construct the marker.  The center of the marker is located
            at (0,0) in normalized units.  The overall marker is rescaled
            by ``s``.

        edgecolors : color or sequence of color, optional, default: None
            If None, defaults to 'face'

            If 'face', the edge color will always be the same as
            the face color.

            If it is 'none', the patch boundary will not
            be drawn.

            For non-filled markers, the `edgecolors` kwarg
            is ignored and forced to 'face' internally.

        Returns
        -------
        paths : `~matplotlib.collections.PathCollection`

        Other parameters
        ----------------
        kwargs : `~matplotlib.collections.Collection` properties

        See Also
        --------
        plot : to plot scatter plots when markers are identical in size and
            color

        Notes
        -----

        * The `plot` function will be faster for scatterplots where markers
          don't vary in size or color.

        * Any or all of `x`, `y`, `s`, and `c` may be masked arrays, in which
          case all masks will be combined and only unmasked points will be
          plotted.

          Fundamentally, scatter works with 1-D arrays; `x`, `y`, `s`, and `c`
          may be input as 2-D arrays, but within scatter they will be
          flattened. The exception is `c`, which will be flattened only if its
          size matches the size of `x` and `y`.

        Examples
        --------
        .. plot:: mpl_examples/shapes_and_collections/scatter_demo.py

        """

The specific parameters have the following meanings:

x, y are arrays of the same length.
s can be a scalar, or an array of the same length as x, y, indicating the size of the scatter. The default is 20.
c is color, which means the color of the point.
marker is the shape of a scatter.