查看“Pandas”的源代码

Pandas是[[Python]]的一个开源软件库，用于数据分析，可以方便对数据进行处理、计算、分析、存储及可视化。

==简介==
===时间轴===
*2008年，开发者Wes McKinney在AQR Capital Management开始制作pandas来满足在财务数据上进行定量分析对高性能、灵活工具的需要。在离开AQR之前他说服管理者允许他将这个库开放源代码。
*2012年，另一个AQR雇员Chang She加入了这项努力并成为这个库的第二个主要贡献者。
*2015年，Pandas签约了NumFOCUS的一个财务赞助项目，它是美国的501(c)(3)非营利慈善团体。

===安装和导入===
使用pip安装Pandas
 pip install pandas
如果使用的是Anaconda等计算科学软件包，已经安装好了pandas库。

导入Pandas，在脚本顶部导入，一般写法如下：
 import pandas as pd

查看Pandas版本：
 pd.__version__

==数据结构==
pandas定义了2种数据类型，Series和DataFrame，大部分操作都在这两种数据类型上进行。

{{了解更多
|[https://pandas.pydata.org/docs/user_guide/dsintro.html Pandas 用户指南：数据结构]
}}
===Series===
Series是一个有轴标签（索引）的一维数组，能够保存任何数据类型（整数，字符串，浮点数，Python对象等）。轴标签称为<code>index</code>。和Python字典类似。

====创建Series====
创建Series的基本方法为，使用[[Pandas/pandas.Series|pandas.Series]]类新建一个Series对象，格式如下：
 pd.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
轴标签index不是必须，如果省略，轴标签默认为从0开始的整数数组。一些示例如下：
<syntaxhighlight lang="python" >
s = pd.Series(["foo", "bar", "foba"])
print(type(s))   #<class 'pandas.core.series.Series'>

s2 = pd.Series(["foo", "bar", "foba"], index=['b','d','c'])

# 创建日期索引
date_index = pd.date_range("2020-01-01", periods=3, freq="D")
s3 = pd.Series(["foo", "bar", "foba"], index=date_index)
</syntaxhighlight>

====Series数据操作====

====Series属性====
下表示例中s为Series对象：
<syntaxhighlight lang="python" >
>>> s = pd.Series(['a', 'b', 'c'])
>>> s
0    a
1    b
2    c
dtype: object
</syntaxhighlight>
{| class="wikitable" 
|-
!属性名
!描述
!示例
!结果
|-
| T
| 返回转置，根据定义，Series转置为自身。 
| s.T
| 自身
|-
| array
| 返回 Series 或 Index 数据的数组，该数组为pangdas扩展的python数组.
| s.array
| <PandasArray><br \>['a', 'b', 'c']<br \>Length: 3, dtype: object
|-
| at
| 通过行轴和列轴标签获取或设置单个值。
| s.at[1]<br \>s.at[2]='d'
|'b'
|-
| attrs
| 此对象全局属性字典。
| s.attrs
| {}
|-
| axes
| 返回行轴标签的列表。
| s.axes
| [RangeIndex(start=0, stop=3, step=1)]
|-
| dtype
| 返回数据的Numpy数据类型
| s.dtype
| dtype('O')  
|-
| dtypes
| 返回数据的Numpy数据类型
| s.dtypes
| dtype('O') 
|-
| hasnans
| 如果有任何空值（如Python的None，np.NaN）返回True，否则返回False。
| s2 = pd.Series(['a', None, 'c']) <br \>s2.hasnans
| True
|-
| iat
| 通过行轴和列轴整数位置获取或设置单个值。
| s.iat[1]<br \>s.iat[2]='d'
|'b'
|-
| iloc
|通过索引(行轴)整数位置获取或设置值。
|1. <code>s.iloc[2]</code> <br \>2. <code>s.iloc[:2]</code> <br \>3. <code><nowiki>s.iloc[[True,False,True]]</nowiki></code> <br \>4. <code>s.iloc[lambda x: x.index % 2 == 0]</code>
|1. 'b'<br \>2. 选取索引为0到2（不包含2）的值<br \>3. 选取索引位置为True的值 <br \>4. 选取索引为双数的值
|-
| index
| The index (axis labels) of the Series.
|-
| is_monotonic
| Return boolean if values in the object are monotonic_increasing.
|-
| is_monotonic_decreasing
| Return boolean if values in the object are monotonic_decreasing.
|-
| is_monotonic_increasing
| Alias for is_monotonic.
|-
| is_unique
| Return boolean if values in the object are unique.
|-
| loc
| Access a group of rows and columns by label(s) or a boolean array.
|-
| name
| Return the name of the Series.
|-
| nbytes
| Return the number of bytes in the underlying data.
|-
| ndim
| Number of dimensions of the underlying data, by definition 1.
|-
| shape
| Return a tuple of the shape of the underlying data.
|-
| size
| Return the number of elements in the underlying data.
|-
| values
| Return Series as ndarray or ndarray-like depending on the dtype.
|}
{{了解更多|[https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series Pandas API：pandas.Series]}}
====Series方法====
{| class="wikitable"
|-
! 方法
! 描述
! 示例
! 结果
|-
| abs()
| 返回 Series/DataFrame 每个元素的绝对值。
| s.abs()
| 
|-
| add(other[, level, fill_value, axis])
| Return Addition of series and other, element-wise (binary operator add).
|
|
|-
| add_prefix(prefix)
| Prefix labels with string prefix.
|-
| add_suffix(suffix)
| Suffix labels with string suffix.
|-
| agg([func, axis])
| Aggregate using one or more operations over the specified axis.
|-
| aggregate([func, axis])
| Aggregate using one or more operations over the specified axis.
|-
| align(other[, join, axis, level, copy, …])
| Align two objects on their axes with the specified join method.
|-
| all([axis, bool_only, skipna, level])
| Return whether all elements are True, potentially over an axis.
|-
| any([axis, bool_only, skipna, level])
| Return whether any element is True, potentially over an axis.
|-
| append(to_append[, ignore_index, …])
| Concatenate two or more Series.
|-
| apply(func[, convert_dtype, args])
| Invoke function on values of Series.
|-
| argmax([axis, skipna])
| Return int position of the largest value in the Series.
|-
| argmin([axis, skipna])
| Return int position of the smallest value in the Series.
|-
| argsort([axis, kind, order])
| Return the integer indices that would sort the Series values.
|-
| asfreq(freq[, method, how, normalize, …])
| Convert TimeSeries to specified frequency.
|-
| asof(where[, subset])
| Return the last row(s) without any NaNs before where.
|-
| astype(dtype[, copy, errors])
| Cast a pandas object to a specified dtype dtype.
|-
| at_time(time[, asof, axis])
| Select values at particular time of day (e.g., 9:30AM).
|-
| autocorr([lag])
| Compute the lag-N autocorrelation.
|-
| backfill([axis, inplace, limit, downcast])
| Synonym for DataFrame.fillna() with method='bfill'.
|-
| between(left, right[, inclusive])
| Return boolean Series equivalent to left <= series <= right.
|-
| between_time(start_time, end_time[, …])
| Select values between particular times of the day (e.g., 9:00-9:30 AM).
|-
| bfill([axis, inplace, limit, downcast])
| Synonym for DataFrame.fillna() with method='bfill'.
|-
| bool()
| Return the bool of a single element Series or DataFrame.
|-
| cat
| alias of pandas.core.arrays.categorical.CategoricalAccessor
|-
| clip([lower, upper, axis, inplace])
| Trim values at input threshold(s).
|-
| combine(other, func[, fill_value])
| Combine the Series with a Series or scalar according to func.
|-
| combine_first(other)
| Combine Series values, choosing the calling Series’s values first.
|-
| compare(other[, align_axis, keep_shape, …])
| Compare to another Series and show the differences.
|-
| convert_dtypes([infer_objects, …])
| Convert columns to best possible dtypes using dtypes supporting pd.NA.
|-
| copy([deep])
| Make a copy of this object’s indices and data.
|-
| corr(other[, method, min_periods])
| Compute correlation with other Series, excluding missing values.
|-
| count([level])
| Return number of non-NA/null observations in the Series.
|-
| cov(other[, min_periods, ddof])
| Compute covariance with Series, excluding missing values.
|-
| cummax([axis, skipna])
| Return cumulative maximum over a DataFrame or Series axis.
|-
| cummin([axis, skipna])
| Return cumulative minimum over a DataFrame or Series axis.
|-
| cumprod([axis, skipna])
| Return cumulative product over a DataFrame or Series axis.
|-
| cumsum([axis, skipna])
| Return cumulative sum over a DataFrame or Series axis.
|-
| describe([percentiles, include, exclude, …])
| Generate descriptive statistics.
|-
| diff([periods])
| First discrete difference of element.
|-
| div(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator truediv).
|-
| divide(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator truediv).
|-
| divmod(other[, level, fill_value, axis])
| Return Integer division and modulo of series and other, element-wise (binary operator divmod).
|-
| dot(other)
| Compute the dot product between the Series and the columns of other.
|-
| drop([labels, axis, index, columns, level, …])
| Return Series with specified index labels removed.
|-
| drop_duplicates([keep, inplace])
| Return Series with duplicate values removed.
|-
| droplevel(level[, axis])
| Return DataFrame with requested index / column level(s) removed.
|-
| dropna([axis, inplace, how])
| Return a new Series with missing values removed.
|-
| dt
| alias of pandas.core.indexes.accessors.CombinedDatetimelikeProperties
|-
| duplicated([keep])
| Indicate duplicate Series values.
|-
| eq(other[, level, fill_value, axis])
| Return Equal to of series and other, element-wise (binary operator eq).
|-
| equals(other)
| Test whether two objects contain the same elements.
|-
| ewm([com, span, halflife, alpha, …])
| Provide exponential weighted (EW) functions.
|-
| expanding([min_periods, center, axis])
| Provide expanding transformations.
|-
| explode([ignore_index])
| Transform each element of a list-like to a row.
|-
| factorize([sort, na_sentinel])
| Encode the object as an enumerated type or categorical variable.
|-
| ffill([axis, inplace, limit, downcast])
| Synonym for DataFrame.fillna() with method='ffill'.
|-
| fillna([value, method, axis, inplace, …])
| Fill NA/NaN values using the specified method.
|-
| filter([items, like, regex, axis])
| Subset the dataframe rows or columns according to the specified index labels.
|-
| first(offset)
| Select initial periods of time series data based on a date offset.
|-
| first_valid_index()
| Return index for first non-NA/null value.
|-
| floordiv(other[, level, fill_value, axis])
| Return Integer division of series and other, element-wise (binary operator floordiv).
|-
| ge(other[, level, fill_value, axis])
| Return Greater than or equal to of series and other, element-wise (binary operator ge).
|-
| get(key[, default])
| Get item from object for given key (ex: DataFrame column).
|-
| groupby([by, axis, level, as_index, sort, …])
| Group Series using a mapper or by a Series of columns.
|-
| gt(other[, level, fill_value, axis])
| Return Greater than of series and other, element-wise (binary operator gt).
|-
| head([n])
| Return the first n rows.
|-
| hist([by, ax, grid, xlabelsize, xrot, …])
| Draw histogram of the input series using matplotlib.
|-
| idxmax([axis, skipna])
| Return the row label of the maximum value.
|-
| idxmin([axis, skipna])
| Return the row label of the minimum value.
|-
| infer_objects()
| Attempt to infer better dtypes for object columns.
|-
| interpolate([method, axis, limit, inplace, …])
| Please note that only method='linear' is supported for DataFrame/Series with a MultiIndex.
|-
| isin(values)
| Whether elements in Series are contained in values.
|-
| isna()
| Detect missing values.
|-
| isnull()
| Detect missing values.
|-
| item()
| Return the first element of the underlying data as a python scalar.
|-
| items()
| Lazily iterate over (index, value) tuples.
|-
| iteritems()
| Lazily iterate over (index, value) tuples.
|-
| keys()
| Return alias for index.
|-
| kurt([axis, skipna, level, numeric_only])
| Return unbiased kurtosis over requested axis.
|-
| kurtosis([axis, skipna, level, numeric_only])
| Return unbiased kurtosis over requested axis.
|-
| last(offset)
| Select final periods of time series data based on a date offset.
|-
| last_valid_index()
| Return index for last non-NA/null value.
|-
| le(other[, level, fill_value, axis])
| Return Less than or equal to of series and other, element-wise (binary operator le).
|-
| lt(other[, level, fill_value, axis])
| Return Less than of series and other, element-wise (binary operator lt).
|-
| mad([axis, skipna, level])
| Return the mean absolute deviation of the values for the requested axis.
|-
| map(arg[, na_action])
| Map values of Series according to input correspondence.
|-
| mask(cond[, other, inplace, axis, level, …])
| Replace values where the condition is True.
|-
| max([axis, skipna, level, numeric_only])
| Return the maximum of the values for the requested axis.
|-
| mean([axis, skipna, level, numeric_only])
| Return the mean of the values for the requested axis.
|-
| median([axis, skipna, level, numeric_only])
| Return the median of the values for the requested axis.
|-
| memory_usage([index, deep])
| Return the memory usage of the Series.
|-
| min([axis, skipna, level, numeric_only])
| Return the minimum of the values for the requested axis.
|-
| mod(other[, level, fill_value, axis])
| Return Modulo of series and other, element-wise (binary operator mod).
|-
| mode([dropna])
| Return the mode(s) of the dataset.
|-
| mul(other[, level, fill_value, axis])
| Return Multiplication of series and other, element-wise (binary operator mul).
|-
| multiply(other[, level, fill_value, axis])
| Return Multiplication of series and other, element-wise (binary operator mul).
|-
| ne(other[, level, fill_value, axis])
| Return Not equal to of series and other, element-wise (binary operator ne).
|-
| nlargest([n, keep])
| Return the largest n elements.
|-
| notna()
| Detect existing (non-missing) values.
|-
| notnull()
| Detect existing (non-missing) values.
|-
| nsmallest([n, keep])
| Return the smallest n elements.
|-
| nunique([dropna])
| Return number of unique elements in the object.
|-
| pad([axis, inplace, limit, downcast])
| Synonym for DataFrame.fillna() with method='ffill'.
|-
| pct_change([periods, fill_method, limit, freq])
| Percentage change between the current and a prior element.
|-
| pipe(func, *args, **kwargs)
| Apply func(self, *args, **kwargs).
|-
| plot
| alias of pandas.plotting._core.PlotAccessor
|-
| pop(item)
| Return item and drops from series.
|-
| pow(other[, level, fill_value, axis])
| Return Exponential power of series and other, element-wise (binary operator pow).
|-
| prod([axis, skipna, level, numeric_only, …])
| Return the product of the values for the requested axis.
|-
| product([axis, skipna, level, numeric_only, …])
| Return the product of the values for the requested axis.
|-
| quantile([q, interpolation])
| Return value at the given quantile.
|-
| radd(other[, level, fill_value, axis])
| Return Addition of series and other, element-wise (binary operator radd).
|-
| rank([axis, method, numeric_only, …])
| Compute numerical data ranks (1 through n) along axis.
|-
| ravel([order])
| Return the flattened underlying data as an ndarray.
|-
| rdiv(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator rtruediv).
|-
| rdivmod(other[, level, fill_value, axis])
| Return Integer division and modulo of series and other, element-wise (binary operator rdivmod).
|-
| reindex([index])
| Conform Series to new index with optional filling logic.
|-
| reindex_like(other[, method, copy, limit, …])
| Return an object with matching indices as other object.
|-
| rename([index, axis, copy, inplace, level, …])
| Alter Series index labels or name.
|-
| rename_axis(**kwargs)
| Set the name of the axis for the index or columns.
|-
| reorder_levels(order)
| Rearrange index levels using input order.
|-
| repeat(repeats[, axis])
| Repeat elements of a Series.
|-
| replace([to_replace, value, inplace, limit, …])
| Replace values given in to_replace with value.
|-
| resample(rule[, axis, closed, label, …])
| Resample time-series data.
|-
| reset_index([level, drop, name, inplace])
| Generate a new DataFrame or Series with the index reset.
|-
| rfloordiv(other[, level, fill_value, axis])
| Return Integer division of series and other, element-wise (binary operator rfloordiv).
|-
| rmod(other[, level, fill_value, axis])
| Return Modulo of series and other, element-wise (binary operator rmod).
|-
| rmul(other[, level, fill_value, axis])
| Return Multiplication of series and other, element-wise (binary operator rmul).
|-
| rolling(window[, min_periods, center, …])
| Provide rolling window calculations.
|-
| round([decimals])
| Round each value in a Series to the given number of decimals.
|-
| rpow(other[, level, fill_value, axis])
| Return Exponential power of series and other, element-wise (binary operator rpow).
|-
| rsub(other[, level, fill_value, axis])
| Return Subtraction of series and other, element-wise (binary operator rsub).
|-
| rtruediv(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator rtruediv).
|-
| sample([n, frac, replace, weights, …])
| Return a random sample of items from an axis of object.
|-
| searchsorted(value[, side, sorter])
| Find indices where elements should be inserted to maintain order.
|-
| sem([axis, skipna, level, ddof, numeric_only])
| Return unbiased standard error of the mean over requested axis.
|-
| set_axis(labels[, axis, inplace])
| Assign desired index to given axis.
|-
| shift([periods, freq, axis, fill_value])
| Shift index by desired number of periods with an optional time freq.
|-
| skew([axis, skipna, level, numeric_only])
| Return unbiased skew over requested axis.
|-
| slice_shift([periods, axis])
| Equivalent to shift without copying data.
|-
| sort_index([axis, level, ascending, …])
| Sort Series by index labels.
|-
| sort_values([axis, ascending, inplace, …])
| Sort by the values.
|-
| sparse
| alias of pandas.core.arrays.sparse.accessor.SparseAccessor
|-
| squeeze([axis])
| Squeeze 1 dimensional axis objects into scalars.
|-
| std([axis, skipna, level, ddof, numeric_only])
| Return sample standard deviation over requested axis.
|-
| str
| alias of pandas.core.strings.StringMethods
|-
| sub(other[, level, fill_value, axis])
| Return Subtraction of series and other, element-wise (binary operator sub).
|-
| subtract(other[, level, fill_value, axis])
| Return Subtraction of series and other, element-wise (binary operator sub).
|-
| sum([axis, skipna, level, numeric_only, …])
| Return the sum of the values for the requested axis.
|-
| swapaxes(axis1, axis2[, copy])
| Interchange axes and swap values axes appropriately.
|-
| swaplevel([i, j, copy])
| Swap levels i and j in a MultiIndex.
|-
| tail([n])
| Return the last n rows.
|-
| take(indices[, axis, is_copy])
| Return the elements in the given positional indices along an axis.
|-
| to_clipboard([excel, sep])
| Copy object to the system clipboard.
|-
| to_csv([path_or_buf, sep, na_rep, …])
| Write object to a comma-separated values (csv) file.
|-
| to_dict([into])
| Convert Series to {label -> value} dict or dict-like object.
|-
| to_excel(excel_writer[, sheet_name, na_rep, …])
| Write object to an Excel sheet.
|-
| to_frame([name])
| Convert Series to DataFrame.
|-
| to_hdf(path_or_buf, key[, mode, complevel, …])
| Write the contained data to an HDF5 file using HDFStore.
|-
| to_json([path_or_buf, orient, date_format, …])
| Convert the object to a JSON string.
|-
| to_latex([buf, columns, col_space, header, …])
| Render object to a LaTeX tabular, longtable, or nested table/tabular.
|-
| to_list()
| Return a list of the values.
|-
| to_markdown([buf, mode, index])
| Print Series in Markdown-friendly format.
|-
| to_numpy([dtype, copy, na_value])
| A NumPy ndarray representing the values in this Series or Index.
|-
| to_period([freq, copy])
| Convert Series from DatetimeIndex to PeriodIndex.
|-
| to_pickle(path[, compression, protocol])
| Pickle (serialize) object to file.
|-
| to_sql(name, con[, schema, if_exists, …])
| Write records stored in a DataFrame to a SQL database.
|-
| to_string([buf, na_rep, float_format, …])
| Render a string representation of the Series.
|-
| to_timestamp([freq, how, copy])
| Cast to DatetimeIndex of Timestamps, at beginning of period.
|-
| to_xarray()
| Return an xarray object from the pandas object.
|-
| tolist()
| Return a list of the values.
|-
| transform(func[, axis])
| Call func on self producing a Series with transformed values.
|-
| transpose(*args, **kwargs)
| Return the transpose, which is by definition self.
|-
| truediv(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator truediv).
|-
| truncate([before, after, axis, copy])
| Truncate a Series or DataFrame before and after some index value.
|-
| tshift([periods, freq, axis])
| (DEPRECATED) Shift the time index, using the index’s frequency if available.
|-
| tz_convert(tz[, axis, level, copy])
| Convert tz-aware axis to target time zone.
|-
| tz_localize(tz[, axis, level, copy, …])
| Localize tz-naive index of a Series or DataFrame to target time zone.
|-
| unique()
| Return unique values of Series object.
|-
| unstack([level, fill_value])
| Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
|-
| update(other)
| Modify Series in place using values from passed Series.
|-
| value_counts([normalize, sort, ascending, …])
| Return a Series containing counts of unique values.
|-
| var([axis, skipna, level, ddof, numeric_only])
| Return unbiased variance over requested axis.
|-
| view([dtype])
| Create a new view of the Series.
|-
| where(cond[, other, inplace, axis, level, …])
| Replace values where the condition is False.
|-
| xs(key[, axis, level, drop_level])
| Return cross-section from the Series/DataFrame.
|}
{{了解更多|[https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series Pandas API：pandas.Series]}}
===DataFrame===
DataFrame是有标记的二维的数据结构，具有可能不同类型的列。由数据，行标签，列标签构成。

==Pandas绘图==
pandas绘图基于[[Matplotlib]]，pandas的DataFrame和Series都自带生成各类图表的plot方法，能够方便快速生成各种图表。

{{了解更多
|[https://pandas.pydata.org/docs/user_guide/visualization.html pandas文档：用户指南 - 可视化]
}}
===基本图形===
====折线图====
plot方法默认生成的就是折线图。如prices是一个DataFrame的含有收盘价close列，绘制收盘价的折线图：
<syntaxhighlight lang="python" >
s = prices['close']
s.plot() 

#设置图片大小，使用figsize参数
s.plot(figsize=(20,10)) 
</syntaxhighlight>

====条形图====
对于不连续标签，没有时间序列的数据，可以绘制条形图，使用以下两种方法：
*使用plot()函数，设置kind参数为‘bar’ or ‘barh’，
*使用plot.bar()函数，plot.barh()函数

<syntaxhighlight lang="python" >
df.plot(kind='bar')    #假设df为每天股票数据  
df.plot.bar()          
df.resample('A-DEC').mean().volume.plot(kind='bar')    #重采集每年成交量平均值，绘制条形图（volume为df的成交量列）

df.plot.bar(stacked=True)    #stacked=True表示堆积条形图
df.plot.barh(stacked=True)    #barh 表示水平条形图 </nowiki>
</syntaxhighlight>
====直方图====
直方图使用plot.hist()方法绘制，一般为频数分布直方图，x轴分区间，y轴为频数。组数用参数bins控制，如分20组bins=20
<syntaxhighlight lang="python" >
df.volume.plot.hist()    #df股票数据中成交量volume的频数分布直方图。
df.plot.hist(alpha=0.5)    #alpha=0.5 表示柱形的透明度为0.5
df.plot.hist(stacked=True, bins=20)    #stacked=True表示堆积绘制，bins=20表示分20组。
df.plot.hist(orientation='horizontal')    #orientation='horizontal' 表示水平直方图
df.plot.hist(cumulative=True)    #表示累计直方图  

df['close'].diff().hist()    #收盘价上应用diff函数，再绘制直方图
df.hist(color='k', bins=50)     #DataFrame.hist函数将每列绘制在不同的子图形上。
</syntaxhighlight>

====箱型图====
箱型图可以使用plot.box()函数或DataFrame的boxplot()绘制。
参数：
*color，用来设置颜色，通过传入颜色字典，如color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
*sym，用来设置异常值样式，如sym='r+'表示异常值用'红色+'表示。
<syntaxhighlight lang="python" >
df.plot.box()
df[['close','open', 'high']].plot.box()
#改变箱型颜色，通过传入颜色字典
color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
df.plot.box(color=color, sym='r+')    #sym用来设置异常值样式，'r+'表示'红色+'
df.plot.box(positions=[1, 4, 5, 6, 8])    #positions表示显示位置，df有5个列， 第一列显示在x轴1上，第二列显示在x轴4上，以此类推
df.plot.box(vert=False)    #表示绘制水平箱型图
df.boxplot()   

#绘制分层箱型图，通过设置by关键词创建分组，再按组，分别绘制箱型图。如下面例子，每列按A组，B组分别绘制箱型图。
df = pd.DataFrame(np.random.rand(10, 2), columns=['Col1', 'Col2'])
df['x'] = pd.Series(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'])
df.boxplot(by='x')

#还可以再传入一个子分类，再进一步分组绘制。如：
df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y'])
</syntaxhighlight>

====散点图====
散点图使用DataFrame.plot.scatter()方法绘制。通过参数x，y指定x轴和y轴的数据列。
<syntaxhighlight lang="python" >
df.plot.scatter(x='close', y='volume')    #假如df为每日股票数据，图表示收盘价与成交量的散点图

#将两组散点图绘制在一张图表上，重新ax参数如
ax = df.plot.scatter(x='close', y='volume', color='DarkBlue', label='Group 1')    #设置标签名label设置标名
df.plot.scatter(x='open', y='value', color='DarkGreen', label='Group 2', ax=ax)

#c参数表示圆点的颜色按按volume列大小来渐变表示。
df.plot.scatter(x='close', y='open', c='volume', s=50)    #s表示原点面积大小
df.plot.scatter(x='close', y='open', s=df['volume']/50000)  #圆点的大小也可以根据某列数值大小相应设置。
</syntaxhighlight>

====饼图====
饼图使用DataFrame.plot.pie()或Series.plot.pie()绘制。如果数据中有空值，会自动使用0填充。

===其他绘图函数===
这些绘图函数来自[https://pandas.pydata.org/pandas-docs/stable/reference/plotting.html pandas.plotting]模块。

====矩阵散点图（Scatter Matrix Plot）====
矩阵散点图（Scatter Matrix Plot）使用scatter_matrix()方法绘制
<syntaxhighlight lang="python" >
from pandas.plotting import scatter_matrix     #使用前需要从模块中导入该函数
scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde')    #假设df是每日股票数据，会每一列相对其他每一列生成一个散点图。
</syntaxhighlight>

====密度图（Density Plot）====
密度图使用Series.plot.kde()和DataFrame.plot.kde()函数。
 df.plot.kde()

====安德鲁斯曲线（Andrews Curves）====
安德鲁斯曲线
 
====平行坐标图（Parallel Coordinates）====

====Lag plot====

====自相关图（Autocorrelation Plot）====
自相关图

====自举图（Bootstrap plot）====

===绘图格式===
====预设置图形样式====
matplotlib 从1.5开始，可以预先设置样式，绘图前通过matplotlib.style.use(my_plot_style)。如matplotlib.style.use('ggplot') 定义ggplot-style plots.
====样式参数====
大多数绘图函数，可以通过一组参数来设置颜色。

====标签设置====
可通过设置legend参数为False来隐藏图片标签，如
 df.plot(legend=False)

====尺度====
*logy参数用来将y轴设置对数标尺
*logx参数用来将x轴设置对数标尺
*loglog参数用来将x轴和y轴设置对数标尺
 ts.plot(logy=True)

====双坐标图====
两组序列同x轴，但y轴数据不同，可以通过第二个序列设置参数：secondary_y=True，来设置第二个y轴。
<syntaxhighlight lang="python" >
#比如想在收盘价图形上显示cci指标：
prices['close'].plot()
prices['cci'].plot(secondary_y=True)

#第二个坐标轴要显示多个，可以直接传入列名
ax = df.plot(secondary_y=['cci', 'RSI'], mark_right=False)    #右边轴数据标签默认会加个右边，设置mark_right为False取消显示
ax.set_ylabel('CD scale')     #设置左边y轴名称
ax.right_ax.set_ylabel('AB scale')    #设置右边y轴名称
</syntaxhighlight>

====子图====
DataFrame的每一列可以绘制在不同的坐标轴(axis）中，使用subplots参数设置，例如：
 df.plot(subplots=True, figsize=(6, 6))

====子图布局====
子图布局使用关键词layout设置，
==资源==
===官网===
*[https://pandas.pydata.org/ Pandas官网]
*[https://pandas.pydata.org/docs/ Pandas文档]
*[https://pandas.pydata.org/docs/user_guide/10min.html Pandas 用户指南 - 10分钟入门Pandas]
*[https://pandas.pydata.org/docs/user_guide/index.html Pandas 用户指南]
*[https://pandas.pydata.org/docs/reference/index.html Pandas API参考]
*[https://github.com/pandas-dev/pandas Pandas 的 Github]

===相关网站===
*[https://quant.itiger.com/tquant/research/hub/classroom/detail?nid=4 老虎量化：pandas 介绍]
*[https://www.pypandas.cn/docs/ pypandas.cn：Pandas文档]
*[https://www.yiibai.com/pandas 易百教程：Pandas]

===书籍===
《利用Python进行数据分析 第2版》 - Wes McKinney

==参考文献==
*[https://zh.wikipedia.org/wiki/Pandas 维基百科：Pandas]
*[https://en.wikipedia.org/wiki/Pandas_(software) 维基百科：Pandas（英）]

[[分类:数据分析]]
[[分类:数据可视化]]