Pandas：修订间差异

2023年4月4日 (二) 04:00的最新版本

Pandas是一个Python语言的开源软件库，用于数据分析，可以方便对数据进行处理、计算、分析、存储及可视化。

简介

时间轴

2008年，开发者Wes McKinney在AQR Capital Management开始制作pandas来满足在财务数据上进行定量分析对高性能、灵活工具的需要。在离开AQR之前他说服管理者允许他将这个库开放源代码。
2011年10月24日，发布Pandas 0.5
2012年，另一个AQR雇员Chang She加入了这项努力并成为这个库的第二个主要贡献者。
2015年，Pandas签约了NumFOCUS的一个财务赞助项目，它是美国的501(c)(3)非营利慈善团体。
2019年7月18日，发布Pandas 0.25.0
2020年1月29日，发布Pandas 1.0.0
2020年7月2日，发布Pandas 1.3.0

安装和升级

使用pip安装Pandas，如果使用的是Anaconda等计算科学软件包，已经包含了pandas库。

pip install pandas   #安装最新版本
pip install pandas==0.25.0  #安装特定版本

验证是否安装好，可以导入Pandas，使用__version__属性查看Pandas版本：

import pandas as pd

pd.__version__

升级：

pip install --upgrade pandas

了解更多 >> Pandas 开始：安装

数据结构

pandas定义了2种数据类型，Series和DataFrame，大部分操作都在这两种数据类型上进行。

了解更多 >> Pandas 用户指南：数据结构

Series

Series是一个有轴标签（索引）的一维数组，能够保存任何数据类型（整数，字符串，浮点数，Python对象等）。轴标签称为index。和Python字典类似。

创建Series的基本方法为，使用pandas.Series类新建一个Series对象，格式如下：

pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

轴标签index不是必须，如果省略，轴标签默认为从0开始的整数数组。一些示例如下：

s = pd.Series(["foo", "bar", "foba"])
print(type(s))   #<class 'pandas.core.series.Series'>

s2 = pd.Series(["foo", "bar", "foba"], index=['b','d','c'])

# 创建日期索引
date_index = pd.date_range("2020-01-01", periods=3, freq="D")
s3 = pd.Series(["foo", "bar", "foba"], index=date_index)

DataFrame

DataFrame是有标记的二维的数据结构，具有可能不同类型的列。由数据，行标签（索引，index），列标签（列，columns）构成。类似电子表格或SQL表或Series对象的字典。它通常是最常用的Pandas对象。

创建DataFrame对象有多种方法：

使用pandas.DataFrame()构造方法
使用pandas.DataFrame.from_dict()方法，类似构造方法
使用pandas.DataFrame.from_records()方法，类似构造方法
使用函数从导入文件创建，如使用pandas.read_csv()函数导入csv文件创建一个DataFrame对象。

构造方法pandas.DataFrame()的格式为：

pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

示例：

df = pd.DataFrame([['foo', 22], ['bar', 25], ['test', 18]],columns=['name', 'age'])

查看数据

表示例中s为一个Series对象，df为一个DataFrame对象：

>>> s = pd.Series(['a', 'b', 'c'])
>>> s
0    a
1    b
2    c
dtype: object

>>> df = pd.DataFrame([['foo', 22], ['bar', 25], ['test', 18]],columns=['name', 'age'])
>>> df

属性/方法	描述	支持对象	示例
head()	返回前n行数据，默认前5行	Series DataFrame	`df.head()`返回df前5行数据 `df.head(10)`返回df前10行数据。
tail()	返回最后n行数据，默认最后5行	Series DataFrame	`df.tail()`返回df最后5行数据 `df.tail(10)`返回df最后10行数据。
dtypes	返回数据的Numpy数据类型（dtype对象）	Series DataFrame	`s.dtypes` `df.dtypes`
dtype	返回数据的Numpy数据类型（dtype对象）	Series	`s.dtype`
array	返回 Series 或 Index 数据的数组，该数组为pangdas扩展的python数组.	Series	`s.array` 返回：<PandasArray> ['a', 'b', 'c'] Length: 3, dtype: object
attrs	此对象全局属性字典。	Series DataFrame	`s.attrs`返回{}
hasnans	如果有任何空值（如Python的None，np.NaN）返回True，否则返回False。	Series	`s.hasnans` 返回False
values	返回ndarray（NumPy的多维数组）或类似ndarray的形式。	Series DataFrame	`s.values`返回array(['a', 'b', 'c'], dtype=object)
ndim	返回数据的维数，Series返回1，DataFrame返回2	Series DataFrame	`s.ndim`返回1 `df.ndim`返回2
size	返回数据中元素的个数	Series DataFrame	`s.size`返回3 `df.ndim`返回6
shape	返回数据形状（行数和列数）的元组	Series DataFrame	`s.shape`返回(3, ) `df.shape`返回(3, 2)
empty	返回是否为空，为空返回Ture	Series DataFrame	`s.empty`返回False `df.empty`返回False
name	返回Series的名称。	Series	`s.name`返回空
memory_usage()	返回Series或DataFrame的内存使用情况，单位Bytes。参数index默认为True，表示包含index。参数deep默认为False，表示不通过查询dtypes对象来深入了解数据的系统级内存使用情况	Series DataFrame	`s.memory_usage()`返回空152 `df.memory_usage(index=False)`
info()	打印DataFrame的简要信息。	DataFrame	`df.info()`
select_dtypes()	根据列的dtypes返回符合条件的DataFrame子集	DataFrame	`df.select_dtypes(include=['float64'])`

索引

查看索引

属性/方法	描述	支持对象	示例
index	索引（行标签），可以查看和设置	Series DataFrame	`s.index`返回RangeIndex(start=0, stop=3, step=1) `s.index[0]` 返回第一个索引值 `df.index`
columns	列标签，Series无，可以查看和设置	DataFrame	`df.columns`
keys()	列标签，没有就返回索引	Series DataFrame	`df.keys()`返回列标签
axes	返回轴标签（行标签和列标签）的列表。 Series返回[index] DataFrame返回[index, columns]	Series DataFrame	`s.axes`返回[RangeIndex(start=0, stop=3, step=1)] `df.axes`返回索引和列名。
idxmax()	返回第一次出现最大值的索引位置。	Series DataFrame	`df.idxmax()`
idxmin()	返回第一次出现最小值的索引位置。	Series DataFrame	`s.idxmin()`

设置与重置索引

Series对象和DataFrame对象可以通过.index或.columns属性设置，还可以通过以下方法来设置与重置。

属性/方法	描述	支持对象	示例
set_index()	将某列设置为索引	DataFrame	`df.set_index('col_3')`将‘col_3’列设置为索引。
reset_index()	重置索引，默认从0开始整数。参数： `drop`是否删除原索引，默认不删除 `level`重置多索引的一个或多个级别。	Series DataFrame
reindex()	用Series或DataFrame匹配新索引。对于新索引有旧索引无的默认使用NaN填充，新索引无旧索引有的删除。	Series DataFrame
reindex_like()	Return an object with matching indices as other object.	Series DataFrame
rename()	修改轴（索引或列）标签。	Series DataFrame Index	`df.rename(columns={"date": "日期", "A": "a"})` 修改部分列名 `df.rename(index={0: "x", 1: "y", 2: "z"})` 将原来索引012修改为xyz `df.rename(index=str)` 将索引转换为字符串 `df.rename(str.lower, axis='columns')`列名小写
rename_axis()	Set the name of the axis for the index or columns.	Series DataFrame
set_axis()	Assign desired index to given axis.	Series DataFrame	`df.set_axis(['a', 'b', 'c'], axis='index')` `df.set_axis(['I', 'II'], axis='columns')`
add_prefix()	索引或列标签添加前缀	Series DataFrame	`s.add_prefix('item_')` `df.add_prefix('col_')`
add_suffix()	索引或列标签添加后缀	Series DataFrame

多层索引

属性/方法	描述	函数	示例
MultiIndex.from_arrays()	创建多层索引	pandas.MultiIndex.from_arrays(arrays, sortorder=None, names=NoDefault.no_default)	arrays = [['手机', '手机', '手机', '电脑'], ['黑色', '白色', '灰色', '黑色']] pd.MultiIndex.from_arrays(arrays, names=('类别', '颜色'))
MultiIndex.from_tuples()	创建多层索引
MultiIndex.from_product()	创建多层索引
MultiIndex.from_frame()	创建多层索引

选取与迭代

概览

方法	描述	示例
索引运算符 `[ ]`	Python中序列对象使用`self[key]`是在调用对象的特殊方法`__getitem__()` 。Python运算符`[ ]`有3种通用序列操作： `self[i]` 取第i项(起始为0) `self[i:j]` 从 i 到 j 的切片 `self[i:j:k]` s 从 i 到 j 步长为 k 的切片 Pandas支持NumPy扩展的一些操作： `self[布尔索引]`，如s[s>5]	`s[1]` 取s的第二个值 `df[1:-1]`切片，返回df第二行到倒数第二行组成的DataFrame对象
属性运算符 `.`	同Python字典属性获取	`df.a`返回df的名称为a的列
按标签选择 `loc[ ]`	通过对象调用`.loc`属性生成序列对象，序列对象调用索引运算符`[]`。	`df.loc[2]`选取索引（行标签）值为2的行 `df.loc[1:2]` 选取索引值为1到2的行 `df.loc[[1,2]]`选取索引值为1和2的行 `df.loc[1,'name']`选取行标签值为1，列标签值为'name'的单个值 `df.loc[[1:2],'name']`选取行标签值为1到2，列标签值为'name'的数据
按位置选择 `iloc[ ]`	纯粹基于整数位置的索引方法，通过对象调用`.iloc`属性生成序列对象，然后序列对象调用索引运算符`[]`。	`s.iloc[2]`选取行标签位置为2的行 `s.iloc[:2]` 选取索引为0到2（不包含2）的值 `s.iloc[[True,False,True]]`选取索引位置为True的值 `s.iloc[lambda x: x.index % 2 == 0]`选取索引为双数的值
按标签选择单个 `at[ ]`	通过行轴和列轴标签对获取或设置单个值。	`s.at[1]`返回'b' `s.at[2]='d'`设置索引位置为第三的值等于'd' `df.at[2, 'name']'`获取index=2，columns='name'点的值
按位置选择单个 `iat[ ]`	通过行轴和列轴整数位置获取或设置单个值。	`s.iat[1]` `s.iat[2]='d'`
查询方法 `query()`	DataFrame对象query()方法，使用表达式进行选择。 `DataFrame.query(expr, inplace=False, **kwargs)`	`df.query('A > B')`相当于`df[df.A > df.B]`
通过行列标签筛选 `filter()`	通过行列标签筛选 `Series.filter(items=None, like=None, regex=None, axis=None)` `DataFrame.filter(items=None, like=None, regex=None, axis=None)`	`df.filter(like='bbi', axis=0)`选取行标签包含'bbi'的行。
多索引选择 `xs()`	只能用于选择数据，不能设置值。可以使用`iloc[ ]`或`loc[ ]`替换。 `Series.xs(key, axis=0, level=None, drop_level=True)` `DataFrame.xs(key, axis=0, level=None, drop_level=True)`	df.xs('a', level=1)
选择一列 get()	选择某一列 `Series.get(key, default=None)` `DataFrame.get(key, default=None)`	`df.get('a')`返回a列
选择指定标签列并删除 `pop()`	返回某一列，并从数据中删除，如果列名没找到抛出KeyError。 `Series.pop(item)` `DataFrame.pop(item)`	`df.pop('a')`返回a列并从df中删除。
删除指定标签列 `drop()`	返回删除指定标签列后的数据 `Series.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')` `DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')`
抽样 `sample()`	返回抽样数据 `Series.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)` `DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)`

按标签选择

pandas提供基于标签的索引方法，通过对象调用.loc属性生成序列对象，序列对象调用索引运算符[]。该方法严格要求，每个标签都必须在索引中，否则会抛出KeyError错误。切片时，如果索引中存在起始边界和终止边界，则都将包括在内。整数是有效的标签，但它们引用的是标签，而不是位置（索引顺序）。

.loc索引输入值	描述	Series示例	DataFrame示例
单个标签	例如5或'a'（注意，5被解释为索引的标签，而不是整数位置。）	`s.loc['a']` 返回s索引为'a'的值	`df.loc['b']` 返回df索引（行标签）为'b'的行（Series对象）
标签列表或标签数组	如['a', 'c']（注意：这种方式会有两组方括号`[[]]`，里面是生成列表，外面是索引取值操作）	`s.loc[['a', 'c']]`返回s索引为'a'和'c'的值（Series对象）	`df.loc[['a', 'c']]`返回df索引（行标签）为'a'和'c'的行（DataFrame对象）
带标签的切片对象	切片如 'a':'f'表示标签'a'到标签'f'，步长切片如 'a':'f':2表示标签'a'到标签'f'按步长2选取（注意：和Python切片不同，这里包含开始标签和结束标签），还有一些常用示例如： `'f':`从标签'f'开始到最后 `:'f'`从最开始到标签'f' `:`全部标签	`s.loc[a:c]` 返回s索引'a'到'c'的值	`df.loc[b:f]` 返回df索引（行标签）'b'到'f'的行（DataFrame对象）
行标签,列标签	只有DataFrame可用，格式`行标签,列标签`，行标签或列标签可以使用切片或数组等。	−	`df.loc['a','name']`选取索引为'a'，列标签为'name'的单个值。 `df.loc['a':'c','name' ]`返回Series对象 `df.loc['a':'c','id':'name' ]`返回DataFrame对象
布尔数组	如[True, False, True]。注意布尔数组长度要与轴标签长度相同，否则会抛出IndexError错误。	`s.loc[[True, False, True]]` 返回s的第1个和第3个值	`df.loc[[False, True, True]]` 返回df的第2行和第3行
callable function	会返回上面的一种索引形式

按位置选择

pandas还提供纯粹基于整数位置的索引方法，通过对象调用.iloc属性生成序列对象，然后序列对象调用索引运算符[]。尝试使用非整数，即使有效标签也会引发IndexError。索引是从0开始的整数。切片时，包含起始索引，不包含结束索引。

.iloc索引输入值	描述	Series示例	DataFrame示例
单个整数	例如3	`s.iloc[0]` 返回s位置索引为0的值，即第一值	`df.iloc[5]` 返回df索引为5的行（Series对象），即df的第六行的
整数列表或数组	如[0,5]（注意：这种方式会有两组方括号`[[]]`，里面是生成列表，外面是索引取值操作）	`s.iloc[[0,5]]`返回s索引为0和5的值（Series对象）	`df.iloc[[2,5]]`返回df索引为2和5的行（DataFrame对象）
带标签的切片对象	切片如 3:5表示索引3到索引5，步长切片如 0:5:2表示索引0到索引5按步长2选取，还有一些常用示例如： `2:`从索引2开始到最后 `:6`从最开始到索引6 `:`全部索引	`s.iloc[3:5]` 返回s索引3到索引5的值	`df.iloc[3:5]` 返回df索引3到索引5的行（DataFrame对象）
行位置索引,列位置索引	只有DataFrame可用，格式`行位置索引,列位置索引`，行位置或列位置可以使用切片或数组等。	−	`df.iloc[0, 2]`选取第1行第3列的单个值。 `df.iloc[2:5, 6 ]`返回第3行到5行中的第7列（Series对象） `df.iloc[2:5, 0:2 ]`返回Data第3行到5行中的第1列到第2列（Frame对象）
布尔数组	如[True, False, True]。注意布尔数组长度要与轴标签长度相同，否则会抛出IndexError错误。	`s.iloc[[True, False, True]]` 返回s的第1个和第3个值	`df.iloc[[False, True, True]]` 返回df的第2行和第3行
callable function	会返回上面的一种索引形式

迭代

属性/方法	描述	示例
__iter__()	Series返回值的迭代器 DataFrame返回轴的迭代器 Series.__iter__() DataFrame.__iter__()	`s.__iter__()`
items()	Series遍历，返回索引和值的迭代器 DataFrame按列遍历，返回列标签和列的Series对迭代器。 Series.items() DataFrame.__iter__()	`s.items()` `df.items()` `for label, content in df.items():`
iteritems()	返回可迭代的键值对，Series返回索引和值，DataFrame返回列名和列。 Series.iteritems() DataFrame.iteritems()
iterrows()	Iterate over DataFrame rows as (index, Series) pairs. DataFrame.iterrows()
itertuples()	DataFrame.itertuples(index=True, name='Pandas')
apply()	也可以使用apply()	打印每一列： def test(x): print(x) df.apply(test) 打印每一行的price： def test(x): print(x['price']) df.apply(test, axis=1)

处理

重复数据

如果要标识或删除重复的行，可以使用duplicated和drop_duplicates方法。

方法	描述	不同对象的方法	示例
duplicated	标识重复行，返回一个布尔值序列。参数： keep：默认为`keep='first'`标记第一次出现的重复项为False，其他都为Ture。`keep='last'`标记最后出现的重复项为False，其他都为Ture。`keep=False`标记所有重复项为Ture。
drop_duplicates	删除重复行，返回删除后的对象。参数： keep：默认为`keep='first'`保留第一次出现的重复项，其他都删除。`keep='last'`保留最后出现的重复项，其他都删除。`keep=False`重复项都删除。	Series.drop_duplicates(keep='first', inplace=False) DataFrame.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False) Index.drop_duplicates(keep='first')	`df.drop_duplicates()`删除df中所有列的值都相同的行。 `df.drop_duplicates(['日期', '品种'])`删除df中日期和品种列都相同的行

缺失值 NA

名称	描述	示例
缺失值表示	`NaN`，Python的float类型，可以使用float('nan')创建，NaN是not a number的缩写。Numpy中的`np.nan`一样是Python的float类型，`np.NaN`和`np.NAN`是别名。 pandas使用其用来表示缺失值。 `None`，Python一种数据类型（NoneType） `NA`， Pandas 1.0开始实验的使用该类型来表示缺失值。 `NaT`
判断缺失值	`NaN`类型缺失值是浮点数，不能直接比较。 `pd.isnull()`，判断单个值， `pd.isna()` 判断单个值 `df.isnull()`、`s.isnull()` 判断DataFrame或Series空值，返回每个值是否空值 `df.isnull().any()`、`s..isnull().any()` 返回布尔值，是否有空值	`pd.isna(pd.NA)`
填充缺失值	`fillna()`，填充缺失值常用参数：`method` pad或ffill向前填充，backfill或bfill向后填充	`fillna(0)`缺失值填充0 `df.fillna(method="pad")`缺失值向前填充 `df.fillna(method="pad", limit=1)`缺失值向前填充，但限制1次
	dropna()

类型转换

文本数据

Series和Index配备了一组字符串处理方法，这些方法使您可以轻松地对数组的每个元素进行操作。也许最重要的是，这些方法会自动排除丢失/ NA值。这些可以通过str属性访问。

方法	描述	示例
upper( )	字符串全部大写	`s.str.upper( )`s字符串全部转为大写
lower( )	字符串全部小写	`s.str.lower( )`s字符串全部转为小写 `df.columns.str.lower()`df的列索引全部转为小写
strip() lstrip() rstrip()	删除字符串开始和结束位置某些字符，默认删除空格。 `lstrip()`删除左边，`rstrip()`删除右边	`s.str.strip`删除s两端的空格。 `s.str.lstrip( )` 删除开始位置的所有空格。 `s.str.lstrip('12345.')` 删除s开始位置包含'12345.'中任意的字符，如'1.开始'返回'开始'。 `s.str.rstrip( )` 删除字符串结束位置的所有空格。 `s.str.rstrip('\n\t')`删除字符串后面的'\n'或'\t'
split() rsplit()	字符拆分。 `rsplit()`从结束位置开始拆分。参数： pat：拆分依据，字符串或正则表达式，默认空格。 n：拆分次数，默认全部拆分。 expand：是否将拆分的每一组展开为一列，默认不展开。	`s.str.split()`s按空格全部拆分。 `s.str.split('/', n=2)`s按'/'拆分，且只拆前面两个'/'。 `s.str.split('/', n=2, expand=True)`拆分后并按组展开列。 `s.str.rsplit('/', n=2)`s按'/'拆分，且只拆最后两个'/'。
contains( )	测试字符串是否包含在序列中，默认使用正则表达式。 na如果有空值，需要使用na参数指定空值为True或False，否者会报错误：`ValueError: Cannot mask with non-boolean array containing NA / NaN values`	`df['code'].str.contains('TC', na=False)` code列是否包含'TC'，遇到Nan值为False，返回值是bool的序列。 `df[df['code'].str.contains('TC', na=False)]` 筛选出df的'code'列中包含'TC'的行 `s.str.contains('TC',regex=False )` 仅使用文字模式，测试是否包含'TC'
match( )	从开头位置测试是否匹配正则表达式，返回值是bool的序列。 contains()在字符串中间位置匹配也会返回True，而match()需要从字符串开始位置匹配。	AF') s中是否以'abc'或'AF'开头。
replace()	替换，默认使用正则表达式。参数：	`s.str.replace('f.', 'ba', regex=False)` 将s列中的'f.'替换成'ba'。
extract( )	提取，正则表达式匹配到的分组。
extractall( )	Extract capture groups in the regex pat as columns in DataFrame.
findall( )	Find all occurrences of pattern or regular expression in the Series/Index.
cat( )	Concatenate strings in the Series/Index with given separator.
center( )	Pad left and right side of strings in the Series/Index.
count( )	Count occurrences of pattern in each string of the Series/Index.
decode()	Decode character string in the Series/Index using indicated encoding.
encode( )	Encode character string in the Series/Index using indicated encoding.
endswith( )	Test if the end of each string element matches a pattern.
find( )	Return lowest indexes in each strings in the Series/Index.
get(i)	Extract element from each component at specified position.
index( )	Return lowest indexes in each string in Series/Index.
join( )	Join lists contained as elements in the Series/Index with passed delimiter.
len( )	Compute the length of each element in the Series/Index.
ljust( )	Pad right side of strings in the Series/Index.
normalize( )	Return the Unicode normal form for the strings in the Series/Index.
pad( )	Pad strings in the Series/Index up to width.
partition( )	Split the string at the first occurrence of sep.
repeat( )	Duplicate each string in the Series or Index.
rfind( )	Return highest indexes in each strings in the Series/Index.
rindex( )	Return highest indexes in each string in Series/Index.
rjust( )	Pad left side of strings in the Series/Index.
rpartition( )	Split the string at the last occurrence of sep.
slice()	Slice substrings from each element in the Series or Index.
slice_replace( )	Replace a positional slice of a string with another value.
startswith( )	Test if the start of each string element matches a pattern.
swapcase( )	Convert strings in the Series/Index to be swapcased.
title( )	Convert strings in the Series/Index to titlecase.
translate( )	Map all characters in the string through the given mapping table.
wrap( )	Wrap strings in Series/Index at specified line width.
zfill( )	Pad strings in the Series/Index by prepending ‘0’ characters.
isalnum( )	Check whether all characters in each string are alphanumeric.
isalpha( )	Check whether all characters in each string are alphabetic.
isdigit( )	Check whether all characters in each string are digits.
isspace( )	Check whether all characters in each string are whitespace.
islower( )	Check whether all characters in each string are lowercase.
isupper( )	Check whether all characters in each string are uppercase.
istitle( )	Check whether all characters in each string are titlecase.
isnumeric( )	Check whether all characters in each string are numeric.
isdecimal( )	Check whether all characters in each string are decimal.
get_dummies( )	Return DataFrame of dummy/indicator variables for Series.
capitalize( )	转为首字母大写，其余全部小写的字符串	s.str.capitalize()
casefold( )	全部小写	s.str.casefold()

数据转换

方法或属性	描述	格式	示例
replace()	替换。参数： `to_replace` 需要替换，可以是1.字符串，数字，正则表达式。 2.列表，其值为1中的标量，当替换值与需要替换个数相等按顺序替换，替换值只有一个则全部替换为该值。3字典。 `value` 替换值 `inplace` 是否在原数据上保存修改，默认否	Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad') DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')	`df.replace(0, 5)` 将df中0替换为5 `df.replace([1, 2, 3], 0)`将df中1,2,3替换为0 `df.replace([1, 2, 3], [3, 2, 1])`将df中1,2,3替换为3,2,1
apply()	在行或列上应用函数，可以使用聚合函数或简单转换函数。参数： `func` 处理函数，可以是Python函数（自定义函数，lambda函数），或NumPy ufunc函数（如np.mean），或函数名（如'mean'） `axis` 轴，默认axis=0表示在每一列上应用函数，axis=1表示在每行上应用函数。	Series.apply(func, convert_dtype=True, args=(), kwargs) DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), kwargs)	`df.apply(np.mean)`返回df每列的平均值。 `df.apply(np.mean, axis=1)`返回df每行的平均值。 `df.apply(lambda x:x['价格']+100, , axis =1)`返回一个series，价格列每个值+100 `df.apply(lambda x:x+100)`df每个元素值+100。 `df.apply(myfunc)`其中myfunc是自定义函数，按照myfunc函数处理返回结果。 `df.apply(['mean', 'sum'])`返回df每列的平均值和每列总和。
applymap()	在每个元素上应用函数。使用聚合函数没有意义。	Series无 DataFrame.applymap(func, na_action=None, **kwargs)	`df.applymap(lambda x:x+100)`df每个元素值+100。
agg() aggregate()	聚合，在行或列上使用一项或多项操作进行汇总。	Series.aggregate(func=None, axis=0, args, kwargs) DataFrame.aggregate(func=None, axis=0, args, **kwargs)	`df.agg(np.mean)`返回df每列的平均值 `df.agg([np.mean, np.sum])`返回df每列的平均值和每列总和。 `df.agg({'A' : [np.mean, np.sum], 'B' : ['mean', 'max']})` A列计算平均值和总和，B列计算平均值和最大值。
transform()	在行或列上使用一项或多项操作。转化前和转化后形状要一样，不能使用聚合函数。	Series.transform(func, axis=0, args, kwargs) DataFrame.transform(func, axis=0, args, **kwargs)
pipe()	将自身（Series，DataFrame）传给函数并返回结果，用于在链中调用函数。如df.pipe(myfunc, a=100)就相当于myfunc(df, a=100)	Series.pipe(func, args, kwargs) DataFrame.pipe(func, args, **kwargs)	`df.agg(['mean', 'sum']).pip(my_table_style, theme='light')`数据汇总后再传入自定义的my_table_style()函数进行处理。

重塑

方法或属性	描述	格式	示例
T	转置，即行列互换。Series转置后不变。	Series.T DataFrame.T	`df.T`df的行变列，列变行。
stack	堆叠，将列索引转为行索引。对于多层列索引的DataFrame数据改变形状有用，当为一层列索引的DataFrame堆叠后变为Series。参数：`level` 索引级别，可为正数或列表。默认level=- 1表示最后一层列索引，即最里层索引。level=0表示第一层索引。	Series无 DataFrame.stack(level=- 1, dropna=True)	`df.stack()` 将最后一层列索引堆叠到行索引上 `df.stack(0)` 将第一层列索引堆叠到行索引上 `df.stack([0, 1])` 将第一层和第二层列索引堆叠到行索引上
unstack	不堆叠，将行索引转为列索引。	Series.unstack(level=- 1, fill_value=None) DataFrame.unstack(level=- 1, fill_value=None)	`df.unstack()` 将最后一层行索引转到列索引上。 `df.unstack(0)` 将第一层行索引转到列索引上。
pivot	透视，通过指定的行或列的值来重塑。	DataFrame.pivot(index=None, columns=None, values=None)	`df.pivot(index='col_1', columns='col_2', values='col_3')` 将col_1作为索引，col_2作为列标签，col_3作为值。

排序

方法或属性	描述	格式	示例
sort_values()	值按行或列排序。参数： `axis`：按行还是列排序，默认axis=0表示按列排序，axis=1表示按行排序 `by` `ascending` 是否升序，默认ascending=True表示升序，ascending=False表示降序。	Series.sort_values(axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None) DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)	`s.sort_values()`按s的值升序排列 `df.sort_values(by='col_1')` df按col_1列的值升序排序 `df.sort_values(by=['col_1', 'col_2'], ascending=False)` df按col_1列的值降序排列，相同时再按col_2值降序。
sort_index()	行标签或列标签排序。	Series.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None) DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)	`s.sort_index()`按s的索引升序排列 `df.sort_values(by='col_1')` df按col_1列的值升序排序
nlargest()	返回前n个最大的元素。等效df.sort_values(columns, ascending=False).head(n)，但性能好点。	Series.nlargest(n=5, keep='first') DataFrame.nlargest(n, columns, keep='first')	`df.nlargest(5, 'col_1')` 返回col_1列降序后前5行。
nsmallest()	返回前n个最小的元素。	Series.nlargest(n=5, keep='first') DataFrame.nsmallest(n, columns, keep='first')	`df.nsmallest(10,columns='col_2')` 返回col_2列升序后前5行。

合并

方法	描述	对象的方法	示例
concat()	沿指定轴合并Series或DataFrame。参数： `objs`,由Series或DataFrame组成的列表或字典。 `axis`，指定轴{0，1，…}，默认为axis=0表示沿行标签合并，axis=1表示沿列标签合并。 `join`, {'inner','outer'}，默认'outer'表示沿轴取并集，'inner'沿轴取交集。 `ignore_index`，布尔值，默认为False表示使用轴原来的标签（索引），True表示原来轴标签都不用，使用0开始递增的整数。 `keys`，列表，默认无。使用列表在轴标签（索引）外层再构造一层标签（索引）。	pandas.concat( objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True )	`pd.concat([df1,df2])`沿行标签合并 `pd.concat([df1, df4], axis=1)`沿列标签合并 `pd.concat([df1,df2,df3], keys=["x", "y", "z"])`按行标签合并，并再添加一层行标签(由x,y,z组成)。对结果调用loc["y"]可选取df2数据 `pd.concat([df1, df4], axis=1, join="inner")`沿列标签取交集合并 `pd.concat([s1, s2, s3], axis=1, keys=["time", "code", "price"])`
append()	加入，Series的append方法用于连接多个Series。DataFrame的append方法用于从其他DataFrame对象加入多行，并返回一个新的DataFrame对象。	Series.append(to_append, ignore_index=False, verify_integrity=False) DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)	`s1.append(s2)`s1后加入s2 `df1.append(df2)`df1后加入df2，返回加入后的DataFrame对象。 `df1.append(df2, ignore_index=True)` 忽略原来行标签，结果为从0开始递增的整数。
merge()	将DataFrame或命名的Series合并，与数据库join操作类似。参数： `left`，DataFrame或命名的Series对象。 `right`，另一个DataFrame或命名的Series对象。 `how` 连接方式，{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}，默认‘inner’ `on`，连接的条件，要连接的列或索引级别名称，左右列名要相同。 `left_on` `right_on` 连接的条件，列名不同时可以分开指定。	pandas.merge( left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None )	`pd.merge(df1, df2, how='left', on=["年", "月"], suffixes=("_左", "_右"),)` `df1.merge(df2, left_on='lkey', right_on='rkey')`
join()	连接另一个DataFrame的多列。	DataFrame.join(other, on=None, how='left', lsuffix=, rsuffix=, sort=False)
merge_ordered()
merge_asof()
assign()	Assign new columns to a DataFrame.	DataFrame.assign(**kwargs)
update()	Modify in place using non-NA values from another DataFrame.	Series.update(other) DataFrame.update(other, join='left', overwrite=True, filter_func=None, errors='ignore')
insert()	在指定位置插入列。	DataFrame.insert(loc, column, value, allow_duplicates=False)

比较

属性/方法	描述	Series	DataFrame	示例
compare()	比较两个Series或DataFrame差异并返回，V1.1.0新增。	Series.compare(other, align_axis=1, keep_shape=False, keep_equal=False)	DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False)	`s1.compare(s2)` `df.compare(df2)`
isin()	Whether each element in the Series/DataFrame is contained in values.	Series.isin(values)	DataFrame.isin(values)
equals()	Test whether two objects contain the same elements.	Series.equals(other)	DataFrame.equals(other)	`df.equals(df2)`

分组聚合

GroupBy分组聚合

使用GroupBy分组聚合的一般步骤：

分组：将数据按条件拆分为几组。
应用：在每组上应用聚合函数、转换函数或过滤。

创建GroupBy对象

类名	创建对象方法	格式	示例
SeriesGroupBy	Series.groupby()	Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
DataFrameGroupBy	DataFrame.groupby()	DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)	`df.groupby('code')`或`df.groupby(by='code')`按code列分组，创建一个GroupBy对象

选取与迭代

属性/方法	描述	示例
GroupBy.__iter__（）	Groupby迭代器
GroupBy.groups	Dict{组名->组数据}	for name, group in grouped: print(name) print(group )
GroupBy.indices	Dict{组名->组索引}
GroupBy.get_group(name, obj=None)	通过组名选取一个组，返回DataFrame格式。	grouped.get_group('AAPL')
pandas.Grouper(args, *kwargs)	x.describe()

功能应用

属性/方法	描述	Series	DataFrame	示例
GroupBy.apply()	应用，按组应用函数func，并将结果组合在一起。	GroupBy.apply（func，* args，** kwargs）	GroupBy.apply（func，* args，** kwargs）	grouped['C'].apply(lambda x: x.describe())
GroupBy.agg()	聚合，等效aggregate()	GroupBy.agg(func，* args，** kwargs)	GroupBy.agg(func，* args，** kwargs)
aggregate()	聚合，在指定轴上使用一项或多项操作进行汇总。	SeriesGroupBy.aggregate(func=None, args, engine=None, engine_kwargs=None, *kwargs)	DataFrameGroupBy.aggregate(func=None, args, engine=None, engine_kwargs=None, *kwargs)
transform()	转换，按组调用函数，并将原始数据替换为转换后的结果	SeriesGroupBy.transform(func, args, engine=None, engine_kwargs=None, *kwargs)	DataFrameGroupBy.transform(func, args, engine=None, engine_kwargs=None, *kwargs)
GroupBy.pipe()	将带有参数的函数func应用于GroupBy对象，并返回函数的结果。	GroupBy.pipe（func，* args，** kwargs）	GroupBy.pipe（func，* args，** kwargs）

计算/描述统计

属性/方法	描述	Series	DataFrame	示例
GroupBy.all()	Return True if all values in the group are truthful, else False.	GroupBy.all(skipna=True)	DataFrameGroupBy.all(skipna=True)
GroupBy.any()	Return True if any value in the group is truthful, else False.	GroupBy.any(skipna=True)	DataFrameGroupBy.any(skipna=True)
GroupBy.backfill()	Backward fill the values.	GroupBy.backfill(limit=None)	DataFrameGroupBy.backfill(limit=None)
GroupBy.bfill()	同 GroupBy.backfill()	GroupBy.bfill(limit=None)	DataFrameGroupBy.bfill(limit=None)
GroupBy.count()	统计每组值的个数，不包含缺失值。	GroupBy.count()	DataFrameGroupBy.count()	grouped.count()
GroupBy.cumcount()	Number each item in each group from 0 to the length of that group - 1.	GroupBy.cumcount(ascending=True)	DataFrameGroupBy.cumcount(ascending=True)
GroupBy.cummax()	Cumulative max for each group.	GroupBy.cummax(axis=0, **kwargs)	DataFrameGroupBy.cummax(axis=0, **kwargs)
GroupBy.cummin()	Cumulative min for each group.	GroupBy.cummin(axis=0, **kwargs)	DataFrameGroupBy.cummin(axis=0, **kwargs)
GroupBy.cumprod()	Cumulative product for each group.	GroupBy.cumprod(axis=0, args, *kwargs)	DataFrameGroupBy.cumprod(axis=0, args, *kwargs)
GroupBy.cumsum()	Cumulative sum for each group.	GroupBy.cumsum(axis=0, args, *kwargs)	DataFrameGroupBy.cumsum(axis=0, args, *kwargs)
GroupBy.ffill()	Forward fill the values.	GroupBy.ffill(limit=None)	DataFrameGroupBy.ffill(limit=None)
GroupBy.first()	Compute first of group values.	GroupBy.first(numeric_only=False, min_count=- 1)
GroupBy.head()	返回每组的前n行，默认5行	GroupBy.head(n=5)
GroupBy.last()	Compute last of group values.	GroupBy.last(numeric_only=False, min_count=- 1)
GroupBy.max()	Compute max of group values.	GroupBy.max(numeric_only=False, min_count=- 1)
GroupBy.mean()	Compute mean of groups, excluding missing values.	GroupBy.mean(numeric_only=True)
GroupBy.median()	Compute median of groups, excluding missing values.	GroupBy.median(numeric_only=True)
GroupBy.min([numeric_only, min_count])	Compute min of group values.	GroupBy.min(numeric_only=False, min_count=- 1)
GroupBy.ngroup([ascending])	Number each group from 0 to the number of groups - 1.	GroupBy.ngroup(ascending=True)
GroupBy.nth(n[, dropna])	如果参数n是一个整数，则取每个组的第n行；如果n是一个整数列表，则取每组行的子集。	GroupBy.nth(n, dropna=None)
GroupBy.ohlc()	计算组的开始值，最高值，最低值和末尾值，不包括缺失值。	GroupBy.ohlc()
GroupBy.pad()	Forward fill the values.	GroupBy.pad(limit=None)	DataFrameGroupBy.pad(limit=None)
GroupBy.prod([numeric_only, min_count])	Compute prod of group values.	GroupBy.prod(numeric_only=True, min_count=0)
GroupBy.rank([method, ascending, na_option, …])	Provide the rank of values within each group.	GroupBy.rank(method='average', ascending=True, na_option='keep', pct=False, axis=0)	DataFrameGroupBy.rank(method='average', ascending=True, na_option='keep', pct=False, axis=0)
GroupBy.pct_change([periods, fill_method, …])	Calculate pct_change of each value to previous entry in group.	GroupBy.pct_change(periods=1, fill_method='pad', limit=None, freq=None, axis=0)	DataFrameGroupBy.pct_change(periods=1, fill_method='pad', limit=None, freq=None, axis=0)
GroupBy.size()	Compute group sizes.	GroupBy.size()	DataFrameGroupBy.size()
GroupBy.sem()	Compute standard error of the mean of groups, excluding missing values.	GroupBy.sem(ddof=1)
GroupBy.std()	Compute standard deviation of groups, excluding missing values.	GroupBy.std(ddof=1)
GroupBy.sum([numeric_only, min_count])	Compute sum of group values.	GroupBy.sum(numeric_only=True, min_count=0)
GroupBy.var([ddof])	Compute variance of groups, excluding missing values.	GroupBy.var(ddof=1)
GroupBy.tail()	返回每组的最后n行，默认5行	GroupBy.tail(n=5)

pivot_table数据透视表

pandas还提供pivot_table()函数，类似于Excel的数据透视表。

了解更多 >> pandas 用户指南：数据透视表

计算统计

计算/描述统计

二元运算功能

属性/方法	描述	Series	DataFrame
add()	Get Addition of dataframe and other, element-wise (binary operator add).	Series.add(other, level=None, fill_value=None, axis=0)	DataFrame.add(other, axis='columns', level=None, fill_value=None)
sub()	Get Subtraction of dataframe and other, element-wise (binary operator sub).	Series.sub(other, level=None, fill_value=None, axis=0)	DataFrame.sub(other, axis='columns', level=None, fill_value=None)
mul()	Get Multiplication of dataframe and other, element-wise (binary operator mul).	Series.mul(other, level=None, fill_value=None, axis=0)	DataFrame.mul(other, axis='columns', level=None, fill_value=None)
div()	Get Floating division of dataframe and other, element-wise (binary operator truediv).	Series.div(other, level=None, fill_value=None, axis=0)	DataFrame.div(other, axis='columns', level=None, fill_value=None)
truediv()	Get Floating division of dataframe and other, element-wise (binary operator truediv).	Series.truediv(other, level=None, fill_value=None, axis=0)	DataFrame.truediv(other, axis='columns', level=None, fill_value=None)
floordiv()	Get Integer division of dataframe and other, element-wise (binary operator floordiv).	Series.floordiv(other, level=None, fill_value=None, axis=0)	DataFrame.floordiv(other, axis='columns', level=None, fill_value=None)
mod()	Get Modulo of dataframe and other, element-wise (binary operator mod).	Series.mod(other, level=None, fill_value=None, axis=0)	DataFrame.mod(other, axis='columns', level=None, fill_value=None)
pow()	Get Exponential power of dataframe and other, element-wise (binary operator pow).	Series.pow(other, level=None, fill_value=None, axis=0)	DataFrame.pow(other, axis='columns', level=None, fill_value=None)
dot()	Compute the matrix multiplication between the DataFrame and other.	Series.dot(other)	DataFrame.dot(other)
radd()	Get Addition of dataframe and other, element-wise (binary operator radd).	Series.radd(other, level=None, fill_value=None, axis=0)	DataFrame.radd(other, axis='columns', level=None, fill_value=None)
rsub()	Get Subtraction of dataframe and other, element-wise (binary operator rsub).	Series.rsub(other, level=None, fill_value=None, axis=0)	DataFrame.rsub(other, axis='columns', level=None, fill_value=None)
rmul()	Get Multiplication of dataframe and other, element-wise (binary operator rmul).	Series.rmul(other, level=None, fill_value=None, axis=0)	DataFrame.rmul(other, axis='columns', level=None, fill_value=None)
rdiv()	Get Floating division of dataframe and other, element-wise (binary operator rtruediv).	Series.rdiv(other, level=None, fill_value=None, axis=0)	DataFrame.rdiv(other, axis='columns', level=None, fill_value=None)
rtruediv()	Get Floating division of dataframe and other, element-wise (binary operator rtruediv).	Series.rtruediv(other, level=None, fill_value=None, axis=0)	DataFrame.rtruediv(other, axis='columns', level=None, fill_value=None)
rfloordiv()	Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).	Series.rfloordiv(other, level=None, fill_value=None, axis=0)	DataFrame.rfloordiv(other, axis='columns', level=None, fill_value=None)
rmod()	Get Modulo of dataframe and other, element-wise (binary operator rmod).	Series.rmod(other, level=None, fill_value=None, axis=0)	DataFrame.rmod(other, axis='columns', level=None, fill_value=None)
rpow()	Get Exponential power of dataframe and other, element-wise (binary operator rpow).	Series.rpow(other, level=None, fill_value=None, axis=0)	DataFrame.rpow(other, axis='columns', level=None, fill_value=None)
lt()	Get Less than of dataframe and other, element-wise (binary operator lt).	Series.lt(other, level=None, fill_value=None, axis=0)	DataFrame.lt(other, axis='columns', level=None)
gt()	Get Greater than of dataframe and other, element-wise (binary operator gt).	Series.gt(other, level=None, fill_value=None, axis=0)	DataFrame.gt(other, axis='columns', level=None)
le()	Get Less than or equal to of dataframe and other, element-wise (binary operator le).	Series.le(other, level=None, fill_value=None, axis=0)	DataFrame.le(other, axis='columns', level=None)
ge()	Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).	Series.ge(other, level=None, fill_value=None, axis=0)	DataFrame.ge(other, axis='columns', level=None)
ne()	Get Not equal to of dataframe and other, element-wise (binary operator ne).	Series.ne(other, level=None, fill_value=None, axis=0)	DataFrame.ne(other, axis='columns', level=None)
eq()	Get Equal to of dataframe and other, element-wise (binary operator eq).	Series.eq(other, level=None, fill_value=None, axis=0)	DataFrame.eq(other, axis='columns', level=None)
combine()	Perform column-wise combine with another DataFrame.	Series.combine(other, func, fill_value=None)	DataFrame.combine(other, func, fill_value=None, overwrite=True)
combine_first()	Update null elements with value in the same location in other.	Series.combine_first(other)	DataFrame.combine_first(other)

时间序列

概览

Pandas把时间相关分为4种概念，用8个类来表示。

概念	描述	标量类	数组类	pandas数据类型	主要创建方法	示例
日期时间	支持时区的特定日期时间点。类似Python标准库的datetime.datetime。	Timestamp	DatetimeIndex	datetime64[ns] 或 datetime64[ns, tz]	to_datetime() date_range()	`pd.to_datetime('2020-01-01')`生成：Timestamp('2020-01-01 00:00:00') `pd.to_datetime(df['date'], format='%Y%m%d')` 将date列数据（格式如20201220）转为DatetimeIndex格式 `pd.date_range("2018-01-01", periods=5, freq="D")` 生成DatetimeIndex，从2018-01-01到2018-01-05。
时间增量	持续时间，即两个日期或时间的差值。类似Python标准库的datetime.timedelta。	Timedelta	TimedeltaIndex	timedelta64[ns]	to_timedelta() timedelta_range()
时间跨度	由时间点及其关联的频率定义的时间跨度。	Period	PeriodIndex	period[freq]	Period() period_range()
日期偏移	日期增量	DateOffset	None	None	DateOffset()

了解更多 >> pandas 用户指南：时间序列

日期时间属性

以下是Timestamp类和DatetimeIndex类的一些属性或方法。Seriess使用.dt来访问。如df['date'].dt.month返回该列月份Seriess

属性	描述	示例
year	年	`s.dt.year` 返回s序列年 `pd.to_datetime('2020-01-01').year`返回2020
month	月	`s.dt.month` 返回s序列月
day	日
hour	小时
minute	分钟
second	秒
microsecond	微秒
nanosecond	纳秒
date	日期（不包含时区信息）
time	时间（不包含时区信息）
timetz()	时间（包含本地时区信息）
day_of_year / dayofyear	一年里的第几天
week / weekofyear	一年里的第几周
day_of_week / dayofweek / weekday	一周里的第几天，Monday（星期一）=0，Sunday（星期天）=6
quarter	日期所处的季度，如（1月、2月、3月）=1，（4月、5月、6月）=2
days_in_month	日期所在的月有多少天
is_month_start	是否月初（由频率定义）
is_month_end	是否月末（由频率定义）
is_quarter_start	是否季初（由频率定义）
is_quarter_end	是否季末（由频率定义）
is_year_start	是否年初（由频率定义）
is_year_end	是否年末（由频率定义）
is_leap_year	是否闰年

日期偏移

DateOffset对象用来处理日期偏移。

日期偏移量	频率字符串	描述
DateOffset	无	通用偏移类，默认为24小时
Day	'D'	一天
Hour	'H'	一小时
Minute	'T' 或 'min'	一分钟
Second	'S'	一秒
Milli	'L' 或 'ms'	一毫秒
Micro	'U' 或 'us'	一微秒
Nano	'N'	一纳秒
BDay 或 BusinessDay	'B'	工作日
CDay 或 CustomBusinessDay	'C'	自定义工作日
Week	'W'	一周，可选锚定周几
WeekOfMonth	'WOM'	每月第几周的第几天
LastWeekOfMonth	'LWOM'	每月最后一周的第几天
MonthEnd	'M'	日历月末
MonthBegin	'MS'	日历月初
BMonthEnd 或 BusinessMonthEnd	'BM'	工作日月末
BMonthBegin 或 BusinessMonthBegin	'BMS'	工作日月初
CBMonthEnd 或 CustomBusinessMonthEnd	'CBM'	自定义工作日月末
CBMonthBegin 或 CustomBusinessMonthBegin	'CBMS'	自定义工作日月初
SemiMonthEnd	'SM'	月第15天（或其他天数）与日历月末
SemiMonthBegin	'SMS'	日历月初与月第15天（或其他天数）
QuarterEnd	'Q'	日历季末
QuarterBegin	'QS'	日历季初
BQuarterEnd	'BQ	工作季末
BQuarterBegin	'BQS'	工作季初
FY5253Quarter	'REQ'	零售（又名 52-53 周）季
YearEnd	'A'	日历年末
YearBegin	'AS' 或 'BYS'	日历年初
BYearEnd	'BA'	工作日年末
BYearBegin	'BAS'	工作日年初
FY5253	'RE'	零售（又名 52-53 周）年
Easter	无	复活节假日
BusinessHour	'BH'	工作小时
CustomBusinessHour	'CBH'	自定义工作小时

时间序列相关

属性/方法	描述	Series	DataFrame
asfreq()	Convert TimeSeries to specified frequency.	Series.asfreq(freq, method=None, how=None, normalize=False, fill_value=None)	DataFrame.asfreq(freq, method=None, how=None, normalize=False, fill_value=None)
asof()	Return the last row(s) without any NaNs before where.	Series.asof(where, subset=None)	DataFrame.asof(where, subset=None)
shift()	Shift index by desired number of periods with an optional time freq.	Series.shift(periods=1, freq=None, axis=0, fill_value=None)	DataFrame.shift(periods=1, freq=None, axis=0, fill_value=None)
slice_shift()	Equivalent to shift without copying data.	Series.slice_shift(periods=1, axis=0)	DataFrame.slice_shift(periods=1, axis=0)
tshift()	(DEPRECATED) Shift the time index, using the index’s frequency if available.	Series.tshift(periods=1, freq=None, axis=0)	DataFrame.tshift(periods=1, freq=None, axis=0)
first_valid_index()	Return index for first non-NA/null value.	Series.first_valid_index()	DataFrame.first_valid_index()
last_valid_index()	Return index for last non-NA/null value.	Series.last_valid_index()	DataFrame.last_valid_index()
resample()	Resample time-series data.	Series.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)	DataFrame.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)
to_period()	Convert DataFrame from DatetimeIndex to PeriodIndex.	Series.to_period(freq=None, copy=True)	DataFrame.to_period(freq=None, axis=0, copy=True)
to_timestamp()	Cast to DatetimeIndex of timestamps, at beginning of period.	Series.to_timestamp(freq=None, how='start', copy=True)	DataFrame.to_timestamp(freq=None, how='start', axis=0, copy=True)
tz_convert()	Convert tz-aware axis to target time zone.	Series.tz_convert(tz, axis=0, level=None, copy=True)	DataFrame.tz_convert(tz, axis=0, level=None, copy=True)
tz_localize()	Localize tz-naive index of a Series or DataFrame to target time zone.	Series.tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise')	DataFrame.tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise')

绘图

pandas绘图基于Matplotlib，pandas的DataFrame和Series都自带生成各类图表的plot方法，能够方便快速生成各种图表。

了解更多 >> pandas 用户指南：可视化

基本图形

折线图

plot方法默认生成的就是折线图。如prices是一个DataFrame的含有收盘价close列，绘制收盘价的折线图：

s = prices['close']
s.plot() 

#设置图片大小，使用figsize参数
s.plot(figsize=(20,10))

条形图

对于不连续标签，没有时间序列的数据，可以绘制条形图，使用以下两种方法：

使用plot()函数，设置kind参数为‘bar’ or ‘barh’，
使用plot.bar()函数，plot.barh()函数

df.plot(kind='bar')    #假设df为每天股票数据  
df.plot.bar()          
df.resample('A-DEC').mean().volume.plot(kind='bar')    #重采集每年成交量平均值，绘制条形图（volume为df的成交量列）

df.plot.bar(stacked=True)    #stacked=True表示堆积条形图
df.plot.barh(stacked=True)    #barh 表示水平条形图 </nowiki>

直方图

直方图使用plot.hist()方法绘制，一般为频数分布直方图，x轴分区间，y轴为频数。组数用参数bins控制，如分20组bins=20

df.volume.plot.hist()    #df股票数据中成交量volume的频数分布直方图。
df.plot.hist(alpha=0.5)    #alpha=0.5 表示柱形的透明度为0.5
df.plot.hist(stacked=True, bins=20)    #stacked=True表示堆积绘制，bins=20表示分20组。
df.plot.hist(orientation='horizontal')    #orientation='horizontal' 表示水平直方图
df.plot.hist(cumulative=True)    #表示累计直方图  

df['close'].diff().hist()    #收盘价上应用diff函数，再绘制直方图
df.hist(color='k', bins=50)     #DataFrame.hist函数将每列绘制在不同的子图形上。

箱型图

箱型图可以使用plot.box()函数或DataFrame的boxplot()绘制。参数：

color，用来设置颜色，通过传入颜色字典，如color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
sym，用来设置异常值样式，如sym='r+'表示异常值用'红色+'表示。

df.plot.box()
df[['close','open', 'high']].plot.box()
#改变箱型颜色，通过传入颜色字典
color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
df.plot.box(color=color, sym='r+')    #sym用来设置异常值样式，'r+'表示'红色+'
df.plot.box(positions=[1, 4, 5, 6, 8])    #positions表示显示位置，df有5个列， 第一列显示在x轴1上，第二列显示在x轴4上，以此类推
df.plot.box(vert=False)    #表示绘制水平箱型图
df.boxplot()   

#绘制分层箱型图，通过设置by关键词创建分组，再按组，分别绘制箱型图。如下面例子，每列按A组，B组分别绘制箱型图。
df = pd.DataFrame(np.random.rand(10, 2), columns=['Col1', 'Col2'])
df['x'] = pd.Series(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'])
df.boxplot(by='x')

#还可以再传入一个子分类，再进一步分组绘制。如：
df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y'])

散点图

散点图使用DataFrame.plot.scatter()方法绘制。通过参数x，y指定x轴和y轴的数据列。

df.plot.scatter(x='close', y='volume')    #假如df为每日股票数据，图表示收盘价与成交量的散点图

#将两组散点图绘制在一张图表上，重新ax参数如
ax = df.plot.scatter(x='close', y='volume', color='DarkBlue', label='Group 1')    #设置标签名label设置标名
df.plot.scatter(x='open', y='value', color='DarkGreen', label='Group 2', ax=ax)

#c参数表示圆点的颜色按按volume列大小来渐变表示。
df.plot.scatter(x='close', y='open', c='volume', s=50)    #s表示原点面积大小
df.plot.scatter(x='close', y='open', s=df['volume']/50000)  #圆点的大小也可以根据某列数值大小相应设置。

饼图

饼图使用DataFrame.plot.pie()或Series.plot.pie()绘制。如果数据中有空值，会自动使用0填充。

其他绘图函数

这些绘图函数来自pandas.plotting模块。

矩阵散点图（Scatter Matrix Plot）

矩阵散点图（Scatter Matrix Plot）使用scatter_matrix()方法绘制

from pandas.plotting import scatter_matrix     #使用前需要从模块中导入该函数
scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde')    #假设df是每日股票数据，会每一列相对其他每一列生成一个散点图。

密度图（Density Plot）

密度图使用Series.plot.kde()和DataFrame.plot.kde()函数。

df.plot.kde()

安德鲁斯曲线（Andrews Curves）

安德鲁斯曲线

平行坐标图（Parallel Coordinates）

Lag plot

自相关图（Autocorrelation Plot）

自相关图

自举图（Bootstrap plot）

绘图格式

预设置图形样式

matplotlib 从1.5开始，可以预先设置样式，绘图前通过matplotlib.style.use(my_plot_style)。如matplotlib.style.use('ggplot') 定义ggplot-style plots.

样式参数

大多数绘图函数，可以通过一组参数来设置颜色。

标签设置

可通过设置legend参数为False来隐藏图片标签，如

df.plot(legend=False)

尺度

logy参数用来将y轴设置对数标尺
logx参数用来将x轴设置对数标尺
loglog参数用来将x轴和y轴设置对数标尺

ts.plot(logy=True)

双坐标图

两组序列同x轴，但y轴数据不同，可以通过第二个序列设置参数：secondary_y=True，来设置第二个y轴。

#比如想在收盘价图形上显示cci指标：
prices['close'].plot()
prices['cci'].plot(secondary_y=True)

#第二个坐标轴要显示多个，可以直接传入列名
ax = df.plot(secondary_y=['cci', 'RSI'], mark_right=False)    #右边轴数据标签默认会加个右边，设置mark_right为False取消显示
ax.set_ylabel('CD scale')     #设置左边y轴名称
ax.right_ax.set_ylabel('AB scale')    #设置右边y轴名称

子图

DataFrame的每一列可以绘制在不同的坐标轴(axis）中，使用subplots参数设置，例如：

df.plot(subplots=True, figsize=(6, 6))

子图布局

子图布局使用关键词layout设置，

输入输出

pandas的读取函数是顶层函数，如pandas.read_csv()一般返回一个pandas对象。写入函数是相应对象的方法，如DataFrame.to_csv()将DataFrame对象写入到csv文件。下表是可用的读取和写入函数。

数据描述	格式类型	读取函数	写入函数	示例
CSV	text	read_csv	to_csv	`pd.read_csv('test.csv')` 读取test.csv文件。 `pd.read_csv('test.csv', sep='\t', header=0, dtype={'a': np.float64, 'b': np.int32, 'c': 'Int64'} )` `df.to('out.csv')`将df保存到out.csv。
Fixed-Width Text File	text	read_fwf
JSON	text	read_json	to_json
HTML	text	read_html	to_html
Local clipboard	text	read_clipboard	to_clipboard
MS Excel		read_excel	to_excel	`pd.read_excel(r'D:\data\test.xlsx', sheet_name="Sheet1")` 读取test.xlsx文件的Sheet1 `pd.read_excel('test.xlsx', converters={'日期':lambda x: pd.to_datetime(x, unit='d', origin='1899-12-30') })` 直接读取日期会变数字，日期列转换以下。
OpenDocument	binary	read_excel
HDF5 Format	binary	read_hdf	to_hdf
Feather Format	binary	read_feather	to_feather
Parquet Format	binary	read_parquet	to_parquet
ORC Format	binary	read_orc
Msgpack	binary	read_msgpack	to_msgpack
Stata	binary	read_stata	to_stata
SAS	binary	read_sas
SPSS	binary	read_spss
Python Pickle Format	binary	read_pickle	to_pickle
SQL	SQL	read_sql	to_sql
Google BigQuery	SQL	read_gbq	to_gbq

了解更多 >> Pandas 教程：IO tools

CSV

CSV读取

read_csv 常用参数：

参数名称	描述	示例
sep	分隔符。str,默认 ','	`pd.read_csv('test.csv', sep='\t')`

CSV写入

to_csv

常见问题：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte

使用pd.read_csv()函数时，csv文本的编码格式不是UTF-8。一种是将csv文件编码格式改为'utf-8'，另一种是尝试几种常见的编码格式如：

df = pd.read_csv('test.csv', encoding='gbk')
df = pd.read_csv('test.csv', encoding='gb18030')
df = pd.read_csv('test.csv', encoding='ISO-8859-1')
df = pd.read_csv('test.csv', encoding='utf-16')

类似错误，有时也可能文件格式错误，文件原本是excel格式，将文件后缀改成.xls或.xlsx后使用pd.read_excel()尝试。

Excel

设置

Pandas提供一些设置API，可以改变DataFrame的显示等。如pd.options.display.max_rows = 300DataFrame最多显示300行。通过pandas 的5个相关函数来设置，这些函数都接受正则表达式模式（样式）作为参数，以匹配明确的子字符串：

名称	描述	示例
get_option（） set_option（）	获取/设置单个选项的值。	`pd.set_option("display.max_rows", 5)`或`pd.options.display.max_rows = 5` 设置最多显示5行 `pd.set_option("display.max_columns", None)` 列全部显示
reset_option（）	将一个或多个选项重置为其默认值。	`pd.reset_option("display.max_rows")` 重置最多显示行数
describe_option（）	打印一个或多个选项的说明。
option_context（）	执行一个代码块，其中包含一组选项，这些选项在执行后恢复到以前的设置。

了解更多 >> Pandas 用户指南：选项和设置

资源

官网

Pandas官网：https://pandas.pydata.org/
Pandas文档：https://pandas.pydata.org/docs/
Pandas 用户指南 - 10分钟入门Pandas：https://pandas.pydata.org/docs/user_guide/10min.html
Pandas 用户指南：https://pandas.pydata.org/docs/user_guide/index.html
Pandas API参考：https://pandas.pydata.org/docs/reference/index.html
Pandas 源代码：https://github.com/pandas-dev/pandas

网站

教程

书籍

《利用Python进行数据分析第2版》 - Wes McKinney

@@ 第4行： / 第4行： @@
 ===时间轴===
 *2008年，开发者Wes McKinney在AQR Capital Management开始制作pandas来满足在财务数据上进行定量分析对高性能、灵活工具的需要。在离开AQR之前他说服管理者允许他将这个库开放源代码。
+*2011年10月24日，发布Pandas 0.5
 *2012年，另一个AQR雇员Chang She加入了这项努力并成为这个库的第二个主要贡献者。
 *2015年，Pandas签约了NumFOCUS的一个财务赞助项目，它是美国的501(c)(3)非营利慈善团体。
+*2019年7月18日，发布Pandas 0.25.0
+*2020年1月29日，发布Pandas 1.0.0
+*2020年7月2日，发布Pandas 1.3.0
-===安装和导入===
+{{了解更多
-使用pip安装Pandas
+|[https://pandas.pydata.org/docs/whatsnew/index.html Pandas 发布日志]
- pip install pandas
+|[https://github.com/pandas-dev/pandas/releases Pandas Github：发行]
-如果使用的是Anaconda等计算科学软件包，已经安装好了pandas库。
+}}
+===安装和升级===
+使用[[pip]]安装Pandas，如果使用的是[[Anaconda]]等计算科学软件包，已经包含了pandas库。
+<syntaxhighlight lang="python">
+pip install pandas   #安装最新版本
+pip install pandas==0.25.0  #安装特定版本
+</syntaxhighlight>
+验证是否安装好，可以导入Pandas，使用<code>__version__</code>属性查看Pandas版本：
+<syntaxhighlight lang="python">
+import pandas as pd
-导入Pandas，在脚本顶部导入，一般写法如下：
+pd.__version__
- import pandas as pd
+</syntaxhighlight>
-查看Pandas版本：
+升级：
-  pd.__version__
+  pip install --upgrade pandas
+{{了解更多
+|[https://pandas.pydata.org/docs/getting_started/install.html Pandas 开始：安装]
+}}
 ==数据结构==
@@ 第43行： / 第61行： @@
 {{了解更多
 |[https://pandas.pydata.org/docs/user_guide/dsintro.html#series Pandas 用户指南：Series ]
+|[https://pandas.pydata.org/docs/reference/series.html Pandas API：Series]
 }}
 ===DataFrame===
-DataFrame是有标记的二维的数据结构，具有可能不同类型的列。由数据，行标签（索引，index），列标签（列，columns）构成。您可以将其视为电子表格或SQL表，或Series对象的字典。它通常是最常用的Pandas对象。
+DataFrame是有标记的二维的数据结构，具有可能不同类型的列。由数据，行标签（索引，index），列标签（列，columns）构成。类似电子表格或SQL表或Series对象的字典。它通常是最常用的Pandas对象。
 创建DataFrame对象有多种方法：
@@ 第56行： / 第75行： @@
 构造方法<code>pandas.DataFrame()</code>的格式为：
   pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
+示例：
+<syntaxhighlight lang="python">
+df = pd.DataFrame([['foo', 22], ['bar', 25], ['test', 18]],columns=['name', 'age'])
+</syntaxhighlight>
-{{了解更多|[https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe Pandas 用户指南：DataFrame]}}
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe Pandas 用户指南：DataFrame]
+|[https://pandas.pydata.org/docs/reference/frame.html  Pandas API：DataFrame]
+}}
-===属性和方法===
-下面将Series和DataFrame的属性、方法按作用分类展示。
+==查看数据==
 表示例中s为一个Series对象，df为一个DataFrame对象：
 <syntaxhighlight lang="python" >
@@ 第75行： / 第100行： @@
 </syntaxhighlight>
-{{了解更多
-|[https://pandas.pydata.org/docs/reference/frame.html  Pandas API：DataFrame]
-|[https://pandas.pydata.org/docs/reference/series.html Pandas API：Series]}}
-====构造方法====
 {| class="wikitable"
 |-
-!方法名
+!属性/方法
 !描述
-!Series
+!支持对象
-!DataFrame
 !示例
 |-
-|构造方法
+| head()
-|创建一个Series对象或DataFrame对象
+| 返回前n行数据，默认前5行
-|pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
+| Series DataFrame
-|pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
+| <code>df.head()</code>返回df前5行数据<br \><code>df.head(10)</code>返回df前10行数据。
-|<code>s = pd.Series(["a", "b", "c"])</code>  <br \><br \><code>df = pd.DataFrame([['foo', 22], ['bar', 25], ['test', 18]],columns=['name', 'age'])</code>
 |-
-|}
+| tail()
+| 返回最后n行数据，默认最后5行
-====属性和基本信息====
+| Series DataFrame
-{| class="wikitable"
+| <code>df.tail()</code>返回df最后5行数据<br \><code>df.tail(10)</code>返回df最后10行数据。
-|-
-!属性/方法
-!描述
-!Series
-!DataFrame
-!示例
-|-
-| index
-| 索引（行标签）
-|Series.index
-|DataFrame.index
-| <code>s.index</code>返回RangeIndex(start=0, stop=3, step=1) <br \> <code>df.index</code>
-|-
-| columns
-| 列标签，Series无
-| &minus;
-|DataFrame.columns
-| <code>df.columns</code>
-|-
-| axes
-| 返回轴标签（行标签和列标签）的列表。<br \>Series返回[index] <br \>DataFrame返回[index, columns]
-| Series.axes
-| DataFrame.axes
-| <code>s.axes</code>返回[RangeIndex(start=0, stop=3, step=1)]
 |-
 | dtypes
 | 返回数据的Numpy数据类型（dtype对象）
-|Series.index
+| Series DataFrame
-|DataFrame.index
 | <code>s.dtypes</code><br \> <code>df.dtypes</code>
 |-
 | dtype
 | 返回数据的Numpy数据类型（dtype对象）
-| Series.index
+| Series
-| &minus;
 | <code>s.dtype</code>
 |-
 | array
 | 返回 Series 或 Index 数据的数组，该数组为pangdas扩展的python数组.
-| Series.index
+| Series
-| &minus;
 | <code>s.array</code> <br \>返回：<PandasArray><br \>['a', 'b', 'c']<br \>Length: 3, dtype: object
 |-
 | attrs
 | 此对象全局属性字典。
-| Series.attrs
+| Series DataFrame
-| DataFrame.attrs
 | <code>s.attrs</code>返回{}
 |-
 | hasnans
 | 如果有任何空值（如Python的None，np.NaN）返回True，否则返回False。
-| Series.hasnans
+| Series
-| &minus;
 | <code>s.hasnans</code> <br \>返回False
 |-
 | values
 | 返回ndarray（NumPy的多维数组）或类似ndarray的形式。
-| Series.values
+| Series DataFrame
-| DataFrame.values
 | <code>s.values</code>返回array(['a', 'b', 'c'], dtype=object)
 |-
 | ndim
 | 返回数据的维数，Series返回1，DataFrame返回2
-| Series.ndim
+| Series DataFrame
-| DataFrame.ndim
 | <code>s.ndim</code>返回1 <br \><code>df.ndim</code>返回2
 |-
 | size
 | 返回数据中元素的个数
-| Series.size
+| Series DataFrame
-| DataFrame.size
 | <code>s.size</code>返回3 <br \><code>df.ndim</code>返回6
 |-
 | shape
 | 返回数据形状（行数和列数）的元组
-| Series.shape
+| Series DataFrame
-| DataFrame.shape
 | <code>s.shape</code>返回(3, ) <br \><code>df.shape</code>返回(3, 2)
 |-
 | empty
 | 返回是否为空，为空返回Ture
-| Series.empty
+| Series DataFrame
-| DataFrame.empty
 | <code>s.empty</code>返回False <br \><code>df.empty</code>返回False
 |-
 | name
 | 返回Series的名称。
-| Series.name
+| Series
-| &minus;
 | <code>s.name</code>返回空
 |-
 | memory_usage()
 | 返回Series或DataFrame的内存使用情况，单位Bytes。参数index默认为True，表示包含index。<br \>参数deep默认为False，表示不通过查询dtypes对象来深入了解数据的系统级内存使用情况
-| Series.memory_usage(index=True, deep=False)
+| Series DataFrame
-| DataFrame.memory_usage(index=True, deep=False)
 | <code>s.memory_usage()</code>返回空152 <br \><code>df.memory_usage(index=False)</code>
 |-
 | info()
 | 打印DataFrame的简要信息。
-| &minus;
+| DataFrame
-| DataFrame.info(verbose=True, buf=None, max_cols=None, memory_usage=True, null_counts=True)
 | <code>df.info()</code>
 |-
 | select_dtypes()
 | 根据列的dtypes返回符合条件的DataFrame子集
-| &minus;
+| DataFrame
-| DataFrame.select_dtypes(include=None, exclude=None)
 | <code>df.select_dtypes(include=['float64'])</code>
 |-
 |}
-====数据选取/索引标签/迭代====
+==索引==
+===查看索引===
 {| class="wikitable"
 |-
 !属性/方法
 !描述
-!Series
+!支持对象
-!DataFrame
 !示例
 |-
-| head()
+| index
-| 返回前n行数据，默认前5行
+| 索引（行标签），可以查看和设置
-| Series.head(n=5)
+| Series DataFrame
-| DataFrame.head(n=5)
+| <code>s.index</code>返回RangeIndex(start=0, stop=3, step=1) <br \> <code>s.index[0]</code> 返回第一个索引值  <br \> <code>df.index</code>
-| <code>df.head()</code>返回df前5行数据<br \><code>df.head(10)</code>返回df前10行数据。
 |-
-| tail()
+| columns
-| 返回最后n行数据，默认最后5行
+| 列标签，Series无，可以查看和设置
-| Series.tail(n=5)
+| DataFrame
-| DataFrame.tail(n=5)
+| <code>df.columns</code>
-| <code>df.tail()</code>返回df最后5行数据<br \><code>df.tail(10)</code>返回df最后10行数据。
+|-
+| keys()
+| 列标签，没有就返回索引
+| Series DataFrame
+| <code>df.keys()</code>返回列标签
 |-
-| at
+| axes
-| 通过行轴和列轴标签对获取或设置单个值。
+| 返回轴标签（行标签和列标签）的列表。<br \>Series返回[index] <br \>DataFrame返回[index, columns]
-| Series.at
+| Series DataFrame
-| DataFrame.at
+| <code>s.axes</code>返回[RangeIndex(start=0, stop=3, step=1)]  <br \><code>df.axes</code>返回索引和列名。
-| <code>s.at[1]</code>返回'b'<br \><code>s.at[2]='d'</code>设置索引位置为第三的值等于'd' <br \><code>df.at[2, 'name']'</code>获取index=2，columns='name'点的值
 |-
-| iat
+|idxmax()
-| 通过行轴和列轴整数位置获取或设置单个值。
+|返回第一次出现最大值的索引位置。
-| Series.iat
+| Series DataFrame
-| DataFrame.iat
+|<code>df.idxmax()</code>
-| <code>s.iat[1]</code><br \><code>s.iat[2]='d'</code>
+|-
+|idxmin()
+|返回第一次出现最小值的索引位置。
+| Series DataFrame
+|<code>s.idxmin()</code>
+|}
+===设置与重置索引===
+Series对象和DataFrame对象可以通过<code>.index</code>或<code>.columns</code>属性设置，还可以通过以下方法来设置与重置。
+{| class="wikitable"
 |-
-| loc
+!属性/方法
-| 通过标签值或布尔数组访问一组行和列。
+!描述
-| [https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html#pandas.Series.loc Series.loc]
+!支持对象
-| [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc DataFrame.loc]
+!示例
-|<code>df.loc[2]</code>选取索引（行标签）值为2的行 <br \><code>df.loc[1:2]</code> 选取索引值为1到2的行 <br \><code><nowiki>df.loc[[1,2]]</nowiki></code>选取索引值为1和2的行 <br \><code>df.loc[1,'name']</code>选取行标签值为1，列标签值为'name'的单个值<br \><code>df.loc[[1:2],'name']</code>选取行标签值为1到2，列标签值为'name'的数据
 |-
-| iloc
+|set_index()
-| 通过标签整数位置或布尔数组访问一组行和列。
+|将某列设置为索引
-| Series.iloc
+|DataFrame
-| DataFrame.iloc
+|<code>df.set_index('col_3')</code>将‘col_3’列设置为索引。
-|<code>s.iloc[2]</code>选取行标签位置为2的行 <br \><code>s.iloc[:2]</code> 选取索引为0到2（不包含2）的值 <br \><code><nowiki>s.iloc[[True,False,True]]</nowiki></code>选取索引位置为True的值 <br \><code>s.iloc[lambda x: x.index % 2 == 0]</code>选取索引为双数的值
 |-
-| insert
+|reset_index()
-| 在指定位置插入列。
+|重置索引，默认从0开始整数。参数：<br \><code>drop</code>是否删除原索引，默认不删除 <br \><code>level</code>重置多索引的一个或多个级别。
-| &minus;
+|Series DataFrame
-| DataFrame.insert(loc, column, value, allow_duplicates=False)
 |
 |-
-| __iter__()
+|reindex()
-| Series返回值的迭代器 <br \>DataFrame返回轴的迭代器
+| 用Series或DataFrame匹配新索引。对于新索引有旧索引无的默认使用NaN填充，新索引无旧索引有的删除。
-| Series.__iter__()
+|Series DataFrame
-| DataFrame.__iter__()
-| <code>s.__iter__()</code>
-|-
-| items()
-| Series遍历，返回索引和值的迭代器 <br \>DataFrame按列遍历，返回列标签和列的Series对迭代器。
-| Series.items()
-| DataFrame.items()
-| <code>s.items()</code> <br \> <code>df.items()</code> <br \> <code>for label, content in df.items():</code>
-|-
-| iteritems()
-| 返回可迭代的键值对，Series返回索引和值，DataFrame返回列名和列。
-|Series.iteritems()
-|DataFrame.iteritems()
 |
 |-
-| keys()
+|reindex_like()
-| Get the ‘info axis’
+|Return an object with matching indices as other object.
-|Series.keys()
+|Series DataFrame
-|DataFrame.keys()
 |
 |-
-| iterrows()
+|rename()
-| Iterate over DataFrame rows as (index, Series) pairs.
+|修改轴（索引或列）标签。
-| &minus;
+|[https://pandas.pydata.org/docs/reference/api/pandas.Series.rename.html Series] [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html DataFrame] [https://pandas.pydata.org/docs/reference/api/pandas.Index.rename.html Index]
-|DataFrame.iterrows()
+| <code>df.rename(columns={"date": "日期", "A": "a"})</code> 修改部分列名 <br /><code>df.rename(index={0: "x", 1: "y", 2: "z"})</code> 将原来索引012修改为xyz <br /><code>df.rename(index=str)</code> 将索引转换为字符串 <br /><code>df.rename(str.lower, axis='columns')</code>列名小写
-|
 |-
-| itertuples()
+|rename_axis()
-|Iterate over DataFrame rows as namedtuples.
+|Set the name of the axis for the index or columns.
-| &minus;
+|Series DataFrame
-|DataFrame.itertuples(index=True, name='Pandas')
 |
 |-
-| lookup()
+|set_axis()
-| Label-based “fancy indexing” function for DataFrame.
+|Assign desired index to given axis.
-| &minus;
+|Series DataFrame
-|DataFrame.lookup(row_labels, col_labels)
+|<code>df.set_axis(['a', 'b', 'c'], axis='index')</code><br \><code>df.set_axis(['I', 'II'], axis='columns')</code>
-|
 |-
-| pop()
+|add_prefix()
-| Return item and drop from frame.
+|索引或列标签添加前缀
-|Series.pop(item)
+|Series DataFrame
-|DataFrame.pop(item)
+|<code>s.add_prefix('item_')</code>  <br \><code>df.add_prefix('col_')</code>
-|
 |-
-| xs()
+|add_suffix()
-| Return cross-section from the Series/DataFrame.
+|索引或列标签添加后缀
-|Series.xs(key, axis=0, level=None, drop_level=True)
+|Series DataFrame
-|DataFrame.xs(key, axis=0, level=None, drop_level=True)
 |
+|}
+===多层索引===
+{| class="wikitable"
 |-
-| get()
+!属性/方法
-| Get item from object for given key (ex: DataFrame column).
+!描述
-|Series.get(key, default=None)
+!函数
-|DataFrame.get(key, default=None)
+!示例
-|
 |-
-| isin()
+| [https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_arrays.html#pandas.MultiIndex.from_arrays MultiIndex.from_arrays()]
-| Whether each element in the Series/DataFrame is contained in values.
+| 创建多层索引
-|Series.isin(values)
+|pandas.MultiIndex.from_arrays(arrays, sortorder=None, names=NoDefault.no_default)
-|DataFrame.isin(values)
+| <syntaxhighlight lang="python" >
-|
+arrays = [['手机', '手机', '手机', '电脑'], ['黑色', '白色', '灰色', '黑色']]
+pd.MultiIndex.from_arrays(arrays, names=('类别', '颜色'))
+</syntaxhighlight>
 |-
-| where()
+|[https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_tuples.html#pandas.MultiIndex.from_tuples MultiIndex.from_tuples()]
-| Replace values where the condition is False.
+| 创建多层索引
-|Series.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
-|DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
 |
-|-
-| mask()
-| Replace values where the condition is True.
-|Series.mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
-|DataFrame.mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
 |
 |-
-|query()
+|[https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_product.html#pandas.MultiIndex.from_product MultiIndex.from_product()]
-|Query the columns of a DataFrame with a boolean expression.
+| 创建多层索引
-| &minus;
-|DataFrame.query(expr, inplace=False, **kwargs)
-|<code>df.query('A > B')</code>相当于<code>df[df.A > df.B]</code>
-|-
-|add_prefix()
-|索引或列标签添加前缀
-|Series.add_prefix(prefix)
-|DataFrame.add_prefix(prefix)
-|<code>s.add_prefix('item_')</code>  <br \><code>df.add_prefix('col_')</code>
-|-
-|add_suffix()
-|索引或列标签添加后缀
-|Series.add_suffix(suffix)
-|DataFrame.add_suffix(suffix)
 |
-|-
-|align()
-|Align two objects on their axes with the specified join method.
-|Series.align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)
-|DataFrame.align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)
 |
 |-
-|at_time()
+|[https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_frame.html#pandas.MultiIndex.from_frame MultiIndex.from_frame()]
-|select values at particular time of day (e.g., 9:30AM).
+| 创建多层索引
-|Series.at_time(time, asof=False, axis=None)
-|DataFrame.at_time(time, asof=False, axis=None)
 |
-|-
-|between_time()
-|Select values between particular times of the day (e.g., 9:00-9:30 AM).
-|Series.between_time(start_time, end_time, include_start=True, include_end=True, axis=None)
-|DataFrame.between_time(start_time, end_time, include_start=True, include_end=True, axis=None)
-|<code>df2.between_time('0:15', '0:45')</code>
-|-
-|drop()
-|Drop specified labels from rows or columns.
-|Series.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
-|DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
 |
 |-
-|drop_duplicates()
-|Return Series with duplicate values removed.<br \>Return DataFrame with duplicate rows removed.
-|Series.drop_duplicates(keep='first', inplace=False)
-|DataFrame.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)
 |
-|-
-|duplicated()
-|Indicate duplicate Series values.<br \>Return boolean Series denoting duplicate rows.
-|Series.duplicated(keep='first')
-|DataFrame.duplicated(subset=None, keep='first')
 |
+|
+|}
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/advanced.html Pandas 用户指南：MultiIndex / advanced indexing]
+}}
+==选取与迭代==
+===概览===
+{| class="wikitable" style="width: 100%;
 |-
-|equals()
+! 方法
-|Test whether two objects contain the same elements.
+! 描述
-|Series.equals(other)
+! 示例
-|DataFrame.equals(other)
-|<code>df.equals(df2)</code>
 |-
-|filter()
+|索引运算符 <br \><code>[ ]</code>
-|Subset the dataframe rows or columns according to the specified index labels.
+|Python中序列对象使用<code>self[key]</code>是在调用对象的特殊方法<code>__getitem__()</code> 。Python运算符<code>[ ]</code>有3种通用序列操作：<br \> <code>self[i]</code> 取第i项(起始为0)<br \> <code>self[i:j]</code> 从 i 到 j 的切片<br \> <code>self[i:j:k]</code> s 从 i 到 j 步长为 k 的切片 <br \>Pandas支持NumPy扩展的一些操作：<br \><code>self[布尔索引]</code>，如s[s>5]
-|Series.filter(items=None, like=None, regex=None, axis=None)
+|<code>s[1]</code> 取s的第二个值<br \> <code>df[1:-1]</code>切片，返回df第二行到倒数第二行组成的DataFrame对象
-|DataFrame.filter(items=None, like=None, regex=None, axis=None)
-|<code>df.filter(like='bbi', axis=0)</code>选取行标签包含'bbi'的行。
 |-
-|first()
+|属性运算符<br \><code>.</code>
-|Select initial periods of time series data based on a date offset.
+|同Python字典属性获取
-|Series.first(offset)
+|<code>df.a</code>返回df的名称为a的列
-|DataFrame.first(offset)
-|
 |-
-|last()
+|按标签选择 <br \><code>loc[ ]</code>
-|Select final periods of time series data based on a date offset.
+|通过对象调用<code>.loc</code>属性生成序列对象，序列对象调用索引运算符<code>[]</code>。
-|Series.last(offset)
+|<code>df.loc[2]</code>选取索引（行标签）值为2的行 <br \><code>df.loc[1:2]</code> 选取索引值为1到2的行 <br \><code><nowiki>df.loc[[1,2]]</nowiki></code>选取索引值为1和2的行 <br \><code>df.loc[1,'name']</code>选取行标签值为1，列标签值为'name'的单个值<br \><code>df.loc[[1:2],'name']</code>选取行标签值为1到2，列标签值为'name'的数据
-|DataFrame.last(offset)
-|
 |-
-|idxmax()
+|按位置选择 <br \><code>iloc[ ]</code>
-|返回第一次出现最大值的轴标签。
+|纯粹基于整数位置的索引方法，通过对象调用<code>.iloc</code>属性生成序列对象，然后序列对象调用索引运算符<code>[]</code>。
-|Series.idxmax(axis=0, skipna=True, *args, **kwargs)
+|<code>s.iloc[2]</code>选取行标签位置为2的行 <br \><code>s.iloc[:2]</code> 选取索引为0到2（不包含2）的值 <br \><code><nowiki>s.iloc[[True,False,True]]</nowiki></code>选取索引位置为True的值 <br \><code>s.iloc[lambda x: x.index % 2 == 0]</code>选取索引为双数的值
-|DataFrame.idxmax(axis=0, skipna=True)
-|<code>df.idxmax()</code>
 |-
-|idxmin()
+|按标签选择单个 <br \><code>at[ ]</code>
-|返回第一次出现最小值的轴标签。
+|通过行轴和列轴标签对获取或设置单个值。
-|Series.idxmin(axis=0, skipna=True, *args, **kwargs)
+|<code>s.at[1]</code>返回'b'<br \><code>s.at[2]='d'</code>设置索引位置为第三的值等于'd' <br \><code>df.at[2, 'name']'</code>获取index=2，columns='name'点的值
-|DataFrame.idxmin(axis=0, skipna=True)
-|<code>s.idxmin()</code>
 |-
-|reindex()
+|按位置选择单个 <br \><code>iat[ ]</code>
-|Conform Series/DataFrame to new index with optional filling logic.
+|通过行轴和列轴整数位置获取或设置单个值。
-|Series.reindex(index=None, **kwargs)
+|<code>s.iat[1]</code><br \><code>s.iat[2]='d'</code>
-|DataFrame.reindex(**kwargs)
-|
 |-
-|reindex_like()
+|查询方法 <br \><code>query()</code>
-|Return an object with matching indices as other object.
+| DataFrame对象query()方法，使用表达式进行选择。<br \><code>DataFrame.query(expr, inplace=False, **kwargs)</code>
-|Series.reindex_like(other, method=None, copy=True, limit=None, tolerance=None)
+|<code>df.query('A > B')</code>相当于<code>df[df.A > df.B]</code>
-|DataFrame.reindex_like(other, method=None, copy=True, limit=None, tolerance=None)
-|
 |-
-|rename()
+|通过行列标签筛选 <br \><code>filter()</code>
-|Alter axes labels.
+|通过行列标签筛选 <br \> <code>Series.filter(items=None, like=None, regex=None, axis=None)</code> <br \> <code>DataFrame.filter(items=None, like=None, regex=None, axis=None)</code>
-|Series.rename(index=None, *, axis=None, copy=True, inplace=False, level=None, errors='ignore')
+|<code>df.filter(like='bbi', axis=0)</code>选取行标签包含'bbi'的行。
-|DataFrame.rename(**kwargs)
-|
-|-
-|rename_axis()
-|Set the name of the axis for the index or columns.
-|Series.rename_axis(**kwargs)
-|DataFrame.rename_axis(**kwargs)
-|
 |-
-|set_index()
+|多索引选择 <br \><code>xs()</code>
-|Set the DataFrame index using existing columns.
+| 只能用于选择数据，不能设置值。可以使用<code>iloc[ ]</code>或<code>loc[ ]</code>替换。<br \><code>Series.xs(key, axis=0, level=None, drop_level=True)</code> <br \> <code>DataFrame.xs(key, axis=0, level=None, drop_level=True)</code>
-|
+| df.xs('a', level=1)
-|DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
-|<code>df.set_index('col_3')</code>将‘col_3’列设置为索引。
 |-
-|reset_index()
+| 选择一列  <br \>get()
-|Reset the index, or a level of it.
+| 选择某一列 <br \> <code>Series.get(key, default=None)  </code> <br \> <code>DataFrame.get(key, default=None)</code>
-|Series.reset_index(level=None, drop=False, name=None, inplace=False)
+| <code>df.get('a')</code>返回a列
-|DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
-|
 |-
-|sample()
+| 选择指定标签列并删除 <br \><code>pop()</code>
-|Return a random sample of items from an axis of object.
+| 返回某一列，并从数据中删除，如果列名没找到抛出KeyError。<br \> <code>Series.pop(item) </code> <br \> <code>DataFrame.pop(item) </code>
-|Series.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
+|<code> df.pop('a')</code>返回a列并从df中删除。
-|DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
-|
 |-
-|set_axis()
-|Assign desired index to given axis.
-|Series.set_axis(labels, axis=0, inplace=False)
-|DataFrame.set_axis(labels, axis=0, inplace=False)
-|<code>df.set_axis(['a', 'b', 'c'], axis='index')</code><br \><code>df.set_axis(['I', 'II'], axis='columns')</code>
 |-
-|take()
+| 删除指定标签列 <br \><code>drop()</code>
-|Return the elements in the given positional indices along an axis.
+| 返回删除指定标签列后的数据 <br \> <code>Series.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')</code> <br \> <br \> <code>DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise') </code>
-|Series.take(indices, axis=0, is_copy=None, **kwargs)
-|DataFrame.take(indices, axis=0, is_copy=None, **kwargs)
 |
 |-
-|truncate()
+| 抽样 <br \><code>sample()</code>
-|Truncate a Series or DataFrame before and after some index value.
+| 返回抽样数据 <br \> <code>Series.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None) </code>  <br \><br \> <code>DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)</code>
-|Series.truncate(before=None, after=None, axis=None, copy=True)
-|DataFrame.truncate(before=None, after=None, axis=None, copy=True)
 |
-|-
 |}
-====计算/描述统计====
-{| class="wikitable"
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/indexing.html Pandas 指南：索引与选择数据]
+|[https://docs.python.org/zh-cn/3/library/stdtypes.html#common-sequence-operations Python 3 文档：序列类型 - 通用序列操作]
+|[https://docs.python.org/zh-cn/3/reference/datamodel.html#special-method-names Python 3 文档：数据模型 -  特殊方法名称]
+|[https://numpy.org/doc/stable/user/absolute_beginners.html#indexing-and-slicing NumPy 文档：初学者基础知识 - 索引和切片]
+}}
+===按标签选择===
+pandas提供基于标签的索引方法，通过对象调用<code>.loc</code>属性生成序列对象，序列对象调用索引运算符<code>[]</code>。该方法严格要求，每个标签都必须在索引中，否则会抛出KeyError错误。切片时，如果索引中存在起始边界和终止边界，则都将包括在内。整数是有效的标签，但它们引用的是标签，而不是位置（索引顺序）。
+{| class="wikitable" style="width: 100%;
 |-
-!属性/方法
+! .loc索引输入值
-!描述
+! 描述
-!Series
+! Series示例
-!DataFrame
+! DataFrame示例
-!示例
 |-
-| abs()
+|单个标签
-| 返回 Series/DataFrame 每个元素的绝对值。
+|例如5或'a'（注意，5被解释为索引的标签，而不是整数位置。）
-| Series.abs()
+|<code>s.loc['a']</code> 返回s索引为'a'的值
-| DataFrame.abs()
+|<code>df.loc['b']</code> 返回df索引（行标签）为'b'的行（Series对象）
-| <code>s.abs()</code> <br \> <code>df.abs()</code>
 |-
-| all()
+|标签列表或标签数组
-| Return whether all elements are True, potentially over an axis.
+|如['a', 'c']（注意：这种方式会有两组方括号<code><nowiki>[[]]</nowiki></code>，里面是生成列表，外面是索引取值操作）
-| Series.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
+|<code><nowiki>s.loc[['a', 'c']]</nowiki></code>返回s索引为'a'和'c'的值（Series对象）
-| DataFrame.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
+|<code><nowiki>df.loc[['a', 'c']]</nowiki></code>返回df索引（行标签）为'a'和'c'的行（DataFrame对象）
-|
 |-
-| any()
+|带标签的切片对象
-| Return whether any element is True, potentially over an axis.
+|切片如 'a':'f'表示标签'a'到标签'f'，步长切片如 'a':'f':2表示标签'a'到标签'f'按步长2选取（注意：和Python切片不同，这里包含开始标签和结束标签），还有一些常用示例如：<br \><code>'f':</code>从标签'f'开始到最后<br \><code>:'f'</code>从最开始到标签'f'<br \><code>:</code>全部标签
-| Series.any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
+|<code>s.loc[a:c]</code> 返回s索引'a'到'c'的值
-| DataFrame.any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
+|<code>df.loc[b:f]</code> 返回df索引（行标签）'b'到'f'的行（DataFrame对象）
-|
 |-
-| clip()
+|行标签,列标签
-| Trim values at input threshold(s).
+|只有DataFrame可用，格式<code>行标签,列标签</code>，行标签或列标签可以使用切片或数组等。
-| Series.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
+|&minus;
-| DataFrame.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
+|<code>df.loc['a','name']</code>选取索引为'a'，列标签为'name'的单个值。<br \><code>df.loc['a':'c','name' ]</code>返回Series对象<br \><code>df.loc['a':'c','id':'name' ]</code>返回DataFrame对象
-|
 |-
-| corr()
+|布尔数组
-| Compute pairwise correlation of columns, excluding NA/null values.
+|如[True, False, True]。注意布尔数组长度要与轴标签长度相同，否则会抛出IndexError错误。
-| Series.corr(other, method='pearson', min_periods=None)
+|<code><nowiki>s.loc[[True, False, True]]</nowiki></code> 返回s的第1个和第3个值
-| DataFrame.corr(method='pearson', min_periods=1)
+|<code><nowiki>df.loc[[False, True, True]]</nowiki></code> 返回df的第2行和第3行
-|
 |-
-| corrwith()
+|callable function
-| Compute pairwise correlation.
+|会返回上面的一种索引形式
 |
-| DataFrame.corrwith(other, axis=0, drop=False, method='pearson')
 |
 |-
-| count()
+|}
-|统计每行或每列值的个数，不包括NA值。
-| Series.count(level=None)
+{{了解更多
-| DataFrame.count(axis=0, level=None, numeric_only=False)
+|[https://pandas.pydata.org/docs/user_guide/indexing.html#selection-by-label Pandas 指南：索引与选择数据 - 按标签选择]
-|<code>s.count()</code><br \><code>df.count()</code><br \><code>df.count(axis='columns')</code>
+|[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html Pandas 参考：DataFrame对象 - DataFrame.loc]
+|[https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html Pandas 参考：Series对象 - Series.loc]
+}}
+===按位置选择===
+pandas还提供纯粹基于整数位置的索引方法，通过对象调用<code>.iloc</code>属性生成序列对象，然后序列对象调用索引运算符<code>[]</code>。尝试使用非整数，即使有效标签也会引发IndexError。索引是从0开始的整数。切片时，包含起始索引，不包含结束索引。
+{| class="wikitable" style="width: 100%;
 |-
-| cov()
+! .iloc索引输入值
-| Compute pairwise covariance of columns, excluding NA/null values.
+! 描述
-| Series.cov(other, min_periods=None, ddof=1)
+! Series示例
-| DataFrame.cov(min_periods=None, ddof=1)
+! DataFrame示例
-|
 |-
-| cummax()
+|单个整数
-| Return cumulative maximum over a DataFrame or Series axis.
+|例如3
-| Series.cummax(axis=None, skipna=True, *args, **kwargs)
+|<code>s.iloc[0]</code> 返回s位置索引为0的值，即第一值
-| DataFrame.cummax(axis=None, skipna=True, *args, **kwargs)
+|<code>df.iloc[5]</code> 返回df索引为5的行（Series对象），即df的第六行的
-|
 |-
-| cummin()
+|整数列表或数组
-| Return cumulative minimum over a DataFrame or Series axis.
+|如[0,5]（注意：这种方式会有两组方括号<code><nowiki>[[]]</nowiki></code>，里面是生成列表，外面是索引取值操作）
-| Series.cummin(axis=None, skipna=True, *args, **kwargs)
+|<code><nowiki>s.iloc[[0,5]]</nowiki></code>返回s索引为0和5的值（Series对象）
-| DataFrame.cummin(axis=None, skipna=True, *args, **kwargs)
+|<code><nowiki>df.iloc[[2,5]]</nowiki></code>返回df索引为2和5的行（DataFrame对象）
-|
 |-
-| cumprod()
+|带标签的切片对象
-| Return cumulative product over a DataFrame or Series axis.
+|切片如 3:5表示索引3到索引5，步长切片如 0:5:2表示索引0到索引5按步长2选取，还有一些常用示例如：<br \><code>2:</code>从索引2开始到最后<br \><code>:6</code>从最开始到索引6<br \><code>:</code>全部索引
-| Series.cumprod(axis=None, skipna=True, *args, **kwargs)
+|<code>s.iloc[3:5]</code> 返回s索引3到索引5的值
-| DataFrame.cumprod(axis=None, skipna=True, *args, **kwargs)
+|<code>df.iloc[3:5]</code> 返回df索引3到索引5的行（DataFrame对象）
-|
 |-
-| cumsum()
+|行位置索引,列位置索引
-| Return cumulative sum over a DataFrame or Series axis.
+|只有DataFrame可用，格式<code>行位置索引,列位置索引</code>，行位置或列位置可以使用切片或数组等。
-| Series.cumsum(axis=None, skipna=True, *args, **kwargs)
+|&minus;
-| DataFrame.cumsum(axis=None, skipna=True, *args, **kwargs)
+|<code>df.iloc[0, 2]</code>选取第1行第3列的单个值。<br \><code>df.iloc[2:5, 6 ]</code>返回第3行到5行中的第7列（Series对象）<br \><code>df.iloc[2:5, 0:2 ]</code>返回Data第3行到5行中的第1列到第2列（Frame对象）
-|
 |-
-| describe()
+|布尔数组
-| Generate descriptive statistics.
+|如[True, False, True]。注意布尔数组长度要与轴标签长度相同，否则会抛出IndexError错误。
-| Series.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
+|<code><nowiki>s.iloc[[True, False, True]]</nowiki></code> 返回s的第1个和第3个值
-| DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
+|<code><nowiki>df.iloc[[False, True, True]]</nowiki></code> 返回df的第2行和第3行
-|
 |-
-| diff()
+|callable function
-| First discrete difference of element.
+|会返回上面的一种索引形式
-| Series.diff(periods=1)
-| DataFrame.diff(periods=1, axis=0)
 |
-|-
-| eval()
-| Evaluate a string describing operations on DataFrame columns.
-|
-| DataFrame.eval(expr, inplace=False, **kwargs)
 |
 |-
-| kurt()
+|}
-| Return unbiased kurtosis over requested axis.
-| Series.kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+{{了解更多
-| DataFrame.kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|[https://pandas.pydata.org/docs/user_guide/indexing.html#selection-by-position Pandas 指南：索引与选择数据 - 按位置选择]
-|
+|[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html Pandas 参考：DataFrame对象 - DataFrame.iloc]
+|[https://pandas.pydata.org/docs/reference/api/pandas.Series.iloc.html Pandas 参考：Series对象 - Series.iloc]
+}}
+===迭代===
+{| class="wikitable"
+|-
+!属性/方法
+!描述
+!示例
 |-
-| kurtosis()
+| __iter__()
-| Return unbiased kurtosis over requested axis.
+| Series返回值的迭代器 <br \>DataFrame返回轴的迭代器 <br />Series.__iter__()<br />DataFrame.__iter__()
-| Series.kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+| <code>s.__iter__()</code>
-| DataFrame.kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-|
 |-
-| mad()
+| items()
-| Return the mean absolute deviation of the values for the requested axis.
+| Series遍历，返回索引和值的迭代器 <br \>DataFrame按列遍历，返回列标签和列的Series对迭代器。<br />Series.items() <br />DataFrame.__iter__()
-| Series.mad(axis=None, skipna=None, level=None)
+| <code>s.items()</code> <br \> <code>df.items()</code> <br \> <code>for label, content in df.items():</code>
-| DataFrame.mad(axis=None, skipna=None, level=None)
-|
 |-
-| max()
+| iteritems()
-| Return the maximum of the values for the requested axis.
+| 返回可迭代的键值对，Series返回索引和值，DataFrame返回列名和列。 <br />Series.iteritems()<br />DataFrame.iteritems()
-| Series.max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-| DataFrame.max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
 |
 |-
-| mean()
+| iterrows()
-| Return the mean of the values for the requested axis.
+| Iterate over DataFrame rows as (index, Series) pairs.<br />DataFrame.iterrows()
-| Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-| DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
 |
 |-
-| median()
+| itertuples()
-| Return the median of the values for the requested axis.
+|Iterate over DataFrame rows as namedtuples.<br />|DataFrame.itertuples(index=True, name='Pandas')
-| Series.median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-| DataFrame.median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
 |
 |-
-| min()
+| apply()
-| Return the minimum of the values for the requested axis.
+| 也可以使用apply()
-| Series.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+| 打印每一列：<syntaxhighlight lang="python" >
-| DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+def test(x):
-|
+    print(x)
+df.apply(test)
+</syntaxhighlight>
+打印每一行的price：<syntaxhighlight lang="python" >
+def test(x):
+    print(x['price'])
+df.apply(test, axis=1)
+</syntaxhighlight>
+|}
+==处理==
+===重复数据===
+如果要标识或删除重复的行，可以使用<code>duplicated</code>和<code>drop_duplicates</code>方法。
+{| class="wikitable"  style="width: 100%;
+! 方法
+! 描述
+! 不同对象的方法
+! 示例
 |-
-| mode()
+| duplicated
-| Get the mode(s) of each element along the selected axis.
+| 标识重复行，返回一个布尔值序列。参数：<br \>keep：默认为<code>keep='first'</code>标记第一次出现的重复项为False，其他都为Ture。<code>keep='last'</code>标记最后出现的重复项为False，其他都为Ture。<code>keep=False</code>标记所有重复项为Ture。
-| Series.mode(dropna=True)
+|
-| DataFrame.mode(axis=0, numeric_only=False, dropna=True)
 |
 |-
-| pct_change()
+| drop_duplicates
-| Percentage change between the current and a prior element.
+| 删除重复行，返回删除后的对象。参数：<br \>keep：默认为<code>keep='first'</code>保留第一次出现的重复项，其他都删除。<code>keep='last'</code>保留最后出现的重复项，其他都删除。<code>keep=False</code>重复项都删除。
-| Series.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
+| Series.drop_duplicates(keep='first', inplace=False) <br \><br \>DataFrame.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False) <br \><br \>Index.drop_duplicates(keep='first')
-| DataFrame.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
+| <code>df.drop_duplicates()</code>删除df中所有列的值都相同的行。<br \><code>df.drop_duplicates(['日期', '品种'])</code>删除df中日期和品种列都相同的行
-|
+|}
+{{了解更多
+|[https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#duplicate-data Pandas 指南：索引和数据选择 - 重复数据]
+|[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html Pandas 参考：DataFrame.drop_duplicates]
+}}
+===缺失值 NA===
+{| class="wikitable"
+! 名称
+! 描述
+! 示例
 |-
-| prod()
+| 缺失值表示
-| Return the product of the values for the requested axis.
+| <code>NaN</code>，Python的float类型，可以使用float('nan')创建，NaN是not a number的缩写。Numpy中的<code>np.nan</code>一样是Python的float类型，<code>np.NaN</code>和<code>np.NAN</code>是别名。 pandas使用其用来表示缺失值。<br /><br /><code>None</code>，Python一种数据类型（NoneType）  <br /><br /><code>NA</code>， Pandas 1.0开始实验的使用该类型来表示缺失值。  <br />  <br /><code>NaT</code>
-| Series.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
+|
-| DataFrame.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
-|
 |-
-| product()
+| 判断缺失值
-| Return the product of the values for the requested axis.
+| <code>NaN</code>类型缺失值是浮点数，不能直接比较。 <br /><br /><code>pd.isnull()</code>，判断单个值，<br /><code>pd.isna()</code> 判断单个值 <br /><br /><code>df.isnull()</code>、<code>s.isnull()</code> 判断DataFrame或Series空值，返回每个值是否空值 <br /><code>df.isnull().any()</code>、<code>s..isnull().any()</code> 返回布尔值，是否有空值
-| Series.product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
+| <code>pd.isna(pd.NA)</code> <code></code>
-| DataFrame.product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
-|
 |-
-| quantile()
+|填充缺失值
-| Return values at the given quantile over requested axis.
+| <code> fillna()</code>，填充缺失值 常用参数：<code>method </code> pad或ffill向前填充，backfill或bfill向后填充
-| Series.quantile(q=0.5, interpolation='linear')
+| <code>fillna(0)</code>缺失值填充0 <br /> <code>df.fillna(method="pad")</code>缺失值向前填充 <br /><code>df.fillna(method="pad", limit=1)</code>缺失值向前填充，但限制1次
-| DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')
-|
 |-
-| rank()
-| Compute numerical data ranks (1 through n) along axis.
-| Series.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
-| DataFrame.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
 |
-|-
+| dropna()
-| round()
+|
-| Round a DataFrame to a variable number of decimal places.
+|}
-| Series.round(decimals=0, *args, **kwargs)
-| DataFrame.round(decimals=0, *args, **kwargs)
+{{了解更多
+|[https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html Pandas 指南：处理缺失数据]
+|[https://numpy.org/doc/stable/reference/constants.html#numpy.nan Numpy API：numpy.nan]
+}}
+===类型转换===
+{{了解更多
+|[https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#dtypes Pandas 指南：基础 - dtypes]
+|[https://numpy.org/doc/stable/reference/arrays.scalars.html Numpy 参考：标量 ]
+|[https://numpy.org/doc/stable/reference/arrays.dtypes.html Numpy 参考：数据类型对象(dtype)]
+|[https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html Pandas 参考：DataFrame.astype]
+}}
+===文本数据===
+Series和Index配备了一组字符串处理方法，这些方法使您可以轻松地对数组的每个元素进行操作。也许最重要的是，这些方法会自动排除丢失/ NA值。这些可以通过str属性访问。
+{| class="wikitable" style="width: 100%;
+! 方法
+! 描述
+! 示例
+|-
+| upper( )
+| 字符串全部大写
+|<code>s.str.upper( )</code>s字符串全部转为大写
+|-
+| lower( )
+| 字符串全部小写
+|<code>s.str.lower( )</code>s字符串全部转为小写 <br /> <code>df.columns.str.lower()</code>df的列索引全部转为小写
+|-
+| strip() <br />lstrip() <br />rstrip()
+| 删除字符串开始和结束位置某些字符，默认删除空格。 <code>lstrip()</code>删除左边，<code>rstrip()</code>删除右边
+|<code>s.str.strip</code>删除s两端的空格。 <br /> <code>s.str.lstrip( ) </code> 删除开始位置的所有空格。<br /> <code>s.str.lstrip('12345.') </code>删除s开始位置包含'12345.'中任意的字符，如'1.开始'返回'开始'。 <br /><code>s.str.rstrip( ) </code>删除字符串结束位置的所有空格。  <code>s.str.rstrip('\n\t')</code>删除字符串后面的'\n'或'\t'
+|-
+| split()  <br />rsplit()
+| 字符拆分。<code> rsplit()</code>从结束位置开始拆分。参数：<br /> pat：拆分依据，字符串或正则表达式，默认空格。<br />n：拆分次数，默认全部拆分。<br />expand：是否将拆分的每一组展开为一列，默认不展开。
+|<code>s.str.split()</code>s按空格全部拆分。 <code>s.str.split('/', n=2)</code>s按'/'拆分，且只拆前面两个'/'。 <code>s.str.split('/', n=2,  expand=True)</code>拆分后并按组展开列。 <br /> <code>s.str.rsplit('/', n=2)</code>s按'/'拆分，且只拆最后两个'/'。
+|-
+| [https://pandas.pydata.org/docs/reference/api/pandas.Series.str.contains.html contains( )]
+| 测试字符串是否包含在序列中，默认使用正则表达式。  <br />na如果有空值，需要使用na参数指定空值为True或False，否者会报错误：<code>ValueError: Cannot mask with non-boolean array containing NA / NaN values</code>
+| <code>df['code'].str.contains('TC', na=False)</code> code列是否包含'TC'，遇到Nan值为False，返回值是bool的序列。 <br /> <code>df[df['code'].str.contains('TC', na=False)]</code> 筛选出df的'code'列中包含'TC'的行 <code>s.str.contains('TC',regex=False )</code> 仅使用文字模式，测试是否包含'TC'
+|-
+| match( )
+| 从开头位置测试是否匹配正则表达式，返回值是bool的序列。 contains()在字符串中间位置匹配也会返回True，而match()需要从字符串开始位置匹配。
+| <code>df['code'].str.match('TC')</code> code列是否以'TC'开头</code>  <br /><code>s.str.match('abc|AF')</code> s中是否以'abc'或'AF'开头。
+|-
+| replace()
+| 替换，默认使用正则表达式。参数：
+| <code>s.str.replace('f.', 'ba', regex=False)</code> 将s列中的'f.'替换成'ba'。
+|-
+| extract( )
+| 提取，正则表达式匹配到的分组。
 |
 |-
-| sem()
+| extractall( )
-| Return unbiased standard error of the mean over requested axis.
+| Extract capture groups in the regex pat as columns in DataFrame.
-| Series.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
-| DataFrame.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
 |
 |-
-| skew()
+| findall( )
-| Return unbiased skew over requested axis.
+| Find all occurrences of pattern or regular expression in the Series/Index.
-| Series.skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|
-| DataFrame.skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|-
+| cat( )
+| Concatenate strings in the Series/Index with given separator.
 |
 |-
-| sum()
+| center( )
-| Return the sum of the values for the requested axis.
+| Pad left and right side of strings in the Series/Index.
-| Series.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
-| DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
 |
 |-
-| std()
+| count( )
-| Return sample standard deviation over requested axis.
+| Count occurrences of pattern in each string of the Series/Index.
-| Series.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
-| DataFrame.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
 |
 |-
-| var()
+| decode()
-| Return unbiased variance over requested axis.
+| Decode character string in the Series/Index using indicated encoding.
-| Series.var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
-| DataFrame.var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
 |
 |-
-| nunique()
+| encode( )
-| Count distinct observations over requested axis.
+| Encode character string in the Series/Index using indicated encoding.
-| Series.nunique(dropna=True)
-| DataFrame.nunique(axis=0, dropna=True)
 |
 |-
-| value_counts()
+| endswith( )
-| Return a Series containing counts of unique rows in the DataFrame.
+| Test if the end of each string element matches a pattern.
-| Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
+|
-| DataFrame.value_counts(subset=None, normalize=False, sort=True, ascending=False)
+|-
+| find( )
+| Return lowest indexes in each strings in the Series/Index.
+|
+|-
+| get(i)
+| Extract element from each component at specified position.
+|
+|-
+| index( )
+| Return lowest indexes in each string in Series/Index.
 |
-|}
-==索引与轴标签==
-===查看索引===
-<syntaxhighlight lang="python">
- s.index  #Series 索引
- df.index  #DataFrame 索引（行标签）
- df.columns #DataFrame 列标签
-</syntaxhighlight>
-===重新设置索引===
-Series对象和DataFrame对象都由reindex()方法
-==数据选取==
-===选取概览===
-{| class="wikitable" style="width: 100%;
 |-
-! 方法
+| join( )
-! 描述
+| Join lists contained as elements in the Series/Index with passed delimiter.
-! 示例
+|
 |-
-|索引运算符<code>[ ]</code>
+| len( )
-|Python中序列对象使用<code>self[key]</code>是在调用对象的特殊方法<code>__getitem__()</code> 。Python运算符<code>[ ]</code>有3种通用序列操作：<br \> 1.<code>self[i]</code> 取第i项(起始为0)<br \> 2.<code>self[i:j]</code> 从 i 到 j 的切片<br \> 3.<code>self[i:j:k]</code> s 从 i 到 j 步长为 k 的切片 <br \>Pandas支持NumPy扩展的一些操作：<br \>1.<code>self[布尔索引]</code>，如s[s>5]
+| Compute the length of each element in the Series/Index.
-|<code>s[1]</code> 取s的第二个值<br \> <code>df[1:-1]</code>切片，返回df第二行到倒数第二行组成的DataFrame对象
+|
 |-
-|属性运算符<code>. </code>
+| ljust( )
-|同Python字典属性获取
+| Pad right side of strings in the Series/Index.
 |
 |-
-|按标签选择<code>loc[ ]</code>
+| normalize( )
-|基于标签的索引方法，通过对象调用<code>.loc</code>属性生成序列对象，序列对象调用索引运算符<code>[]</code>。
+| Return the Unicode normal form for the strings in the Series/Index.
 |
 |-
-|按位置选择<code>iloc[ ]</code>
+| pad( )
-|纯粹基于整数位置的索引方法，通过对象调用<code>.iloc</code>属性生成序列对象，然后序列对象调用索引运算符<code>[]</code>。
+| Pad strings in the Series/Index up to width.
 |
-|}
-{{了解更多
-|[https://pandas.pydata.org/docs/user_guide/indexing.html Pandas 指南：索引与选择数据]
-|[https://docs.python.org/zh-cn/3/library/stdtypes.html#common-sequence-operations Python 3 文档：序列类型 - 通用序列操作]
-|[https://docs.python.org/zh-cn/3/reference/datamodel.html#special-method-names Python 3 文档：数据模型 -  特殊方法名称]
-|[https://numpy.org/doc/stable/user/absolute_beginners.html#indexing-and-slicing NumPy 文档：初学者基础知识 - 索引和切片]
-}}
-===按标签选择===
-pandas提供基于标签的索引方法，通过对象调用<code>.loc</code>属性生成序列对象，序列对象调用索引运算符<code>[]</code>。该方法严格要求，每个标签都必须在索引中，否则会抛出KeyError错误。切片时，如果索引中存在起始边界和终止边界，则都将包括在内。整数是有效的标签，但它们引用的是标签，而不是位置（索引顺序）。
-{| class="wikitable" style="width: 100%;
 |-
-! .loc索引输入值
+| partition( )
-! 描述
+| Split the string at the first occurrence of sep.
-! Series示例
+|
-! DataFrame示例
 |-
-|单个标签
+| repeat( )
-|例如5或'a'（注意，5被解释为索引的标签，而不是整数位置。）
+| Duplicate each string in the Series or Index.
-|<code>s.loc['a']</code> 返回s索引为'a'的值
+|
-|<code>df.loc['b']</code> 返回df索引（行标签）为'b'的行（Series对象）
 |-
-|标签列表或标签数组
+| rfind( )
-|如['a', 'c']（注意：这种方式会有两组方括号<code><nowiki>[[]]</nowiki></code>，里面是生成列表，外面是索引取值操作）
+| Return highest indexes in each strings in the Series/Index.
-|<code><nowiki>s.loc[['a', 'c']]</nowiki></code>返回s索引为'a'和'c'的值（Series对象）
+|
-|<code><nowiki>df.loc[['a', 'c']]</nowiki></code>返回df索引（行标签）为'a'和'c'的行（DataFrame对象）
 |-
-|带标签的切片对象
+| rindex( )
-|切片如 'a':'f'表示标签'a'到标签'f'，步长切片如 'a':'f':2表示标签'a'到标签'f'按步长2选取（注意：和Python切片不同，这里包含开始标签和结束标签），还有一些常用示例如：<br \><code>'f':</code>从标签'f'开始到最后<br \><code>:'f'</code>从最开始到标签'f'<br \><code>:</code>全部标签
+| Return highest indexes in each string in Series/Index.
-|<code>s.loc[a:c]</code> 返回s索引'a'到'c'的值
+|
-|<code>df.loc[b:f]</code> 返回df索引（行标签）'b'到'f'的行（DataFrame对象）
+|-
+| rjust( )
+| Pad left side of strings in the Series/Index.
+|
+|-
+| rpartition( )
+| Split the string at the last occurrence of sep.
+|
 |-
-|行标签,列标签
+| slice()
-|只有DataFrame可用，格式<code>行标签,列标签</code>，行标签或列标签可以使用切片或数组等。
+| Slice substrings from each element in the Series or Index.
-|&minus;
+|
-|<code>df.loc['a','name']</code>选取索引为'a'，列标签为'name'的单个值。<br \><code>df.loc['a':'c','name' ]</code>返回Series对象<br \><code>df.loc['a':'c','id':'name' ]</code>返回DataFrame对象
 |-
-|布尔数组
+| slice_replace( )
-|如[True, False, True]。注意布尔数组长度要与轴标签长度相同，否则会抛出IndexError错误。
+| Replace a positional slice of a string with another value.
-|<code><nowiki>s.loc[[True, False, True]]</nowiki></code> 返回s的第1个和第3个值
+|
-|<code><nowiki>df.loc[[False, True, True]]</nowiki></code> 返回df的第2行和第3行
 |-
-|callable function
+| startswith( )
-|会返回上面的一种索引形式
+| Test if the start of each string element matches a pattern.
 |
+|-
+| swapcase( )
+| Convert strings in the Series/Index to be swapcased.
 |
 |-
-|}
+| title( )
+| Convert strings in the Series/Index to titlecase.
-{{了解更多
+|
-|[https://pandas.pydata.org/docs/user_guide/indexing.html#selection-by-label Pandas 指南：索引与选择数据 - 按标签选择]
+|-
-|[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html Pandas 参考：DataFrame对象 - DataFrame.loc]
+| translate( )
-|[https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html Pandas 参考：Series对象 - Series.loc]
+| Map all characters in the string through the given mapping table.
-}}
+|
-===按位置选择===
-pandas还提供纯粹基于整数位置的索引方法，通过对象调用<code>.iloc</code>属性生成序列对象，然后序列对象调用索引运算符<code>[]</code>。尝试使用非整数，即使有效标签也会引发IndexError。索引是从0开始的整数。切片时，包含起始索引，不包含结束索引。
-{| class="wikitable" style="width: 100%;
 |-
-! .iloc索引输入值
+| wrap( )
-! 描述
+| Wrap strings in Series/Index at specified line width.
-! Series示例
+|
-! DataFrame示例
 |-
-|单个整数
+| zfill( )
-|例如3
+| Pad strings in the Series/Index by prepending ‘0’ characters.
-|<code>s.iloc[0]</code> 返回s位置索引为0的值，即第一值
+|
-|<code>df.iloc[5]</code> 返回df索引为5的行（Series对象），即df的第六行的
 |-
-|整数列表或数组
+| isalnum( )
-|如[0,5]（注意：这种方式会有两组方括号<code><nowiki>[[]]</nowiki></code>，里面是生成列表，外面是索引取值操作）
+| Check whether all characters in each string are alphanumeric.
-|<code><nowiki>s.iloc[[0,5]]</nowiki></code>返回s索引为0和5的值（Series对象）
+|
-|<code><nowiki>df.iloc[[2,5]]</nowiki></code>返回df索引为2和5的行（DataFrame对象）
 |-
-|带标签的切片对象
+| isalpha( )
-|切片如 3:5表示索引3到索引5，步长切片如 0:5:2表示索引0到索引5按步长2选取，还有一些常用示例如：<br \><code>2:</code>从索引2开始到最后<br \><code>:6</code>从最开始到索引6<br \><code>:</code>全部索引
+| Check whether all characters in each string are alphabetic.
-|<code>s.iloc[3:5]</code> 返回s索引3到索引5的值
+|
-|<code>df.iloc[3:5]</code> 返回df索引3到索引5的行（DataFrame对象）
+|-
+| isdigit( )
+| Check whether all characters in each string are digits.
+|
+|-
+| isspace( )
+| Check whether all characters in each string are whitespace.
+|
+|-
+| islower( )
+| Check whether all characters in each string are lowercase.
+|
+|-
+| isupper( )
+| Check whether all characters in each string are uppercase.
+|
 |-
-|行位置索引,列位置索引
+| istitle( )
-|只有DataFrame可用，格式<code>行位置索引,列位置索引</code>，行位置或列位置可以使用切片或数组等。
+| Check whether all characters in each string are titlecase.
-|&minus;
+|
-|<code>df.iloc[0, 2]</code>选取第1行第3列的单个值。<br \><code>df.iloc[2:5, 6 ]</code>返回第3行到5行中的第7列（Series对象）<br \><code>df.iloc[2:5, 0:2 ]</code>返回Data第3行到5行中的第1列到第2列（Frame对象）
 |-
-|布尔数组
+| isnumeric( )
-|如[True, False, True]。注意布尔数组长度要与轴标签长度相同，否则会抛出IndexError错误。
+| Check whether all characters in each string are numeric.
-|<code><nowiki>s.iloc[[True, False, True]]</nowiki></code> 返回s的第1个和第3个值
+|
-|<code><nowiki>df.iloc[[False, True, True]]</nowiki></code> 返回df的第2行和第3行
 |-
-|callable function
+| isdecimal( )
-|会返回上面的一种索引形式
+| Check whether all characters in each string are decimal.
 |
+|-
+| get_dummies( )
+| Return DataFrame of dummy/indicator variables for Series.
 |
 |-
+| capitalize( )
+| 转为首字母大写，其余全部小写的字符串
+|s.str.capitalize()
+|-
+| casefold( )
+| 全部小写
+| s.str.casefold()
 |}
-{{了解更多
+===数据转换===
-|[https://pandas.pydata.org/docs/user_guide/indexing.html#selection-by-position Pandas 指南：索引与选择数据 - 按位置选择]
+{| class="wikitable"  style="width: 100%;
-|[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html Pandas 参考：DataFrame对象 - DataFrame.iloc]
+! 方法或属性
-|[https://pandas.pydata.org/docs/reference/api/pandas.Series.iloc.html Pandas 参考：Series对象 - Series.iloc]
+! 描述
-}}
+! 格式
+! 示例
-==GroupBy分组==
+|-
-===创建GroupBy对象===
+|replace()
-{| class="wikitable" style="width: 100%;
+|替换。参数：<br /><code>to_replace</code> 需要替换，可以是1.字符串，数字，正则表达式。 2.列表，其值为1中的标量，当替换值与需要替换个数相等按顺序替换，替换值只有一个则全部替换为该值。3字典。 <br /><code>value</code> 替换值  <br /><code>inplace</code> 是否在原数据上保存修改，默认否
+| Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad') <br />[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html  DataFrame.replace](to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')
+|<code>df.replace(0, 5)</code> 将df中0替换为5 <br /><code>df.replace([1, 2, 3], 0)</code>将df中1,2,3替换为0 <br /><code>df.replace([1, 2, 3], [3, 2, 1])</code>将df中1,2,3替换为3,2,1
+|-
+|apply()
+| 在行或列上应用函数，可以使用聚合函数或简单转换函数。参数：<br /><code>func</code> 处理函数，可以是Python函数（自定义函数，lambda函数），或NumPy ufunc函数（如np.mean），或函数名（如'mean'）<br /><code>axis</code> 轴，默认axis=0表示在每一列上应用函数，axis=1表示在每行上应用函数。
+|Series.apply(func, convert_dtype=True, args=(), **kwargs) <br /> DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)
+|<code>df.apply(np.mean)</code>返回df每列的平均值。<br /><code>df.apply(np.mean, axis=1)</code>返回df每行的平均值。<br /><code>df.apply(lambda x:x['价格']+100, , axis =1)</code>返回一个series，价格列每个值+100 <br /><code>df.apply(lambda x:x+100)</code>df每个元素值+100。<br /><code>df.apply(myfunc)</code>其中myfunc是自定义函数，按照myfunc函数处理返回结果。<br /><code>df.apply(['mean', 'sum'])</code>返回df每列的平均值和每列总和。
 |-
-! 类名
+|applymap()
-! 创建对象方法
+| 在每个元素上应用函数。使用聚合函数没有意义。
-! 完整参数
+|Series无 <br />DataFrame.applymap(func, na_action=None, **kwargs)
-! 示例
+| <code>df.applymap(lambda x:x+100)</code>df每个元素值+100。
 |-
-| SeriesGroupBy
+| agg() <br />aggregate()
-| [https://pandas.pydata.org/docs/reference/api/pandas.Series.groupby.html#pandas.Series.groupby  Series.groupby()]
+|聚合，在行或列上使用一项或多项操作进行汇总。
-| Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
+|Series.aggregate(func=None, axis=0, *args, **kwargs) <br />DataFrame.aggregate(func=None, axis=0, *args, **kwargs)
-|
+|<code>df.agg(np.mean)</code>返回df每列的平均值 <br /><code>df.agg([np.mean, np.sum])</code>返回df每列的平均值和每列总和。<br /><code>df.agg({'A' : [np.mean, np.sum], 'B' : ['mean', 'max']}) </code> A列计算平均值和总和，B列计算平均值和最大值。
 |-
-|  DataFrameGroupBy
+| transform()
-| [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby DataFrame.groupby()]
+| 在行或列上使用一项或多项操作。转化前和转化后形状要一样，不能使用聚合函数。
-| DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
+|Series.transform(func, axis=0, *args, **kwargs) <br />DataFrame.transform(func, axis=0, *args, **kwargs)
-| <code>df.groupby('code')</code>或<code>df.groupby(by='code')</code>按code列分组，创建一个GroupBy对象
+|
 |-
+| pipe()
+| 将自身（Series，DataFrame）传给函数并返回结果，用于在链中调用函数。如df.pipe(myfunc, a=100)就相当于myfunc(df, a=100)
+| Series.pipe(func, *args, **kwargs) <br />DataFrame.pipe(func, *args, **kwargs)
+| <code>df.agg(['mean', 'sum']).pip(my_table_style, theme='light')</code>数据汇总后再传入自定义的my_table_style()函数进行处理。
 |}
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/basics.html#function-application pandas 用户指南：基础功能 - 函数应用]
+|[https://pandas.pydata.org/docs/reference/frame.html#function-application-groupby-window pandas API：DataFrame - 函数应用、GroupBy和窗口函数]
+}}
-===GroupBy属性与方法===
+===重塑===
-====选取与迭代====
 {| class="wikitable"  style="width: 100%;
+! 方法或属性
+! 描述
+! 格式
+! 示例
 |-
-!属性/方法
+| T
-!描述
+| 转置，即行列互换。Series转置后不变。
-!示例
+| Series.T <br />DataFrame.T
+| <code>df.T</code>df的行变列，列变行。
 |-
-| GroupBy.__iter__（）
+| stack
-| Groupby迭代器
+| 堆叠，将列索引转为行索引。对于多层列索引的DataFrame数据改变形状有用， 当为一层列索引的DataFrame堆叠后变为Series。<br /> 参数：<code>level</code> 索引级别，可为正数或列表。默认level=- 1表示最后一层列索引，即最里层索引。level=0表示第一层索引。
-|
+| Series无 <br />DataFrame.stack(level=- 1, dropna=True)
-|-
+| <code>df.stack()</code> 将最后一层列索引堆叠到行索引上 <code>df.stack(0)</code> 将第一层列索引堆叠到行索引上 <code>df.stack([0, 1])</code> 将第一层和第二层列索引堆叠到行索引上
-| GroupBy.groups
-| Dict{组名->组数据}
-| for name, group in grouped:<br \>&nbsp;&nbsp;&nbsp;&nbsp;print(name)<br \>&nbsp;&nbsp;&nbsp;&nbsp;print(group )
 |-
-| GroupBy.indices
+| unstack
-| Dict{组名->组索引}
+| 不堆叠，将行索引转为列索引。
-|
+| Series.unstack(level=- 1, fill_value=None) <br />DataFrame.unstack(level=- 1, fill_value=None)
+| <code>df.unstack()</code> 将最后一层行索引转到列索引上。 <code>df.unstack(0)</code> 将第一层行索引转到列索引上。
 |-
-| GroupBy.get_group(name, obj=None)
+| pivot
-| 通过组名选取一个组，返回DataFrame格式。
+| 透视，通过指定的行或列的值来重塑。
-| grouped.get_group('AAPL')
+| DataFrame.pivot(index=None, columns=None, values=None)
+| <code>df.pivot(index='col_1', columns='col_2', values='col_3') </code> 将col_1作为索引，col_2作为列标签，col_3作为值。
 |-
-| pandas.Grouper(*args, **kwargs)
-| x.describe()
 |
-|-
+|
+|
+|
 |}
-====功能应用====
+{{了解更多
-{| class="wikitable"
+|[https://pandas.pydata.org/docs/user_guide/reshaping.html pandas 用户指南：重塑与数据透视]
+|[https://pandas.pydata.org/docs/reference/series.html#reshaping-sorting pandas API：Series - 重塑和排序]
+|[https://pandas.pydata.org/docs/reference/frame.html#reshaping-sorting-transposing pandas API：DataFrame - 重塑和排序]
+}}
+===排序===
+{| class="wikitable"  style="width: 100%;
+! 方法或属性
+! 描述
+! 格式
+! 示例
 |-
-!属性/方法
+|sort_values()
-!描述
+|值按行或列排序。<br \>参数：<br \><code>axis</code>：按行还是列排序，默认axis=0表示按列排序，axis=1表示按行排序  <br \><code>by</code>  <br \><code>ascending</code> 是否升序，默认ascending=True表示升序，ascending=False表示降序。
-!Series
+|Series.sort_values(axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None) <br \><br \>DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)
-!DataFrame
+|<code>s.sort_values()</code>按s的值升序排列 <br \><code>df.sort_values(by='col_1')</code> df按col_1列的值升序排序 <br \> <code>df.sort_values(by=['col_1', 'col_2'], ascending=False)</code> df按col_1列的值降序排列，相同时再按col_2值降序。
-!示例
 |-
-|GroupBy.apply()
+|sort_index()
-|应用，按组应用函数func，并将结果组合在一起。
+|行标签或列标签排序。
-|GroupBy.apply（func，* args，** kwargs）
+|Series.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None) <br \><br \> DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)
-|GroupBy.apply（func，* args，** kwargs）
+|<code>s.sort_index()</code>按s的索引升序排列 <br \><code>df.sort_values(by='col_1')</code> df按col_1列的值升序排序
-|grouped['C'].apply(lambda x: x.describe())
 |-
-|GroupBy.agg()
+|nlargest()
-|聚合，等效aggregate()
+|返回前n个最大的元素。等效df.sort_values(columns, ascending=False).head(n)，但性能好点。
-|GroupBy.agg(func，* args，** kwargs)
+|Series.nlargest(n=5, keep='first') <br /><br />DataFrame.nlargest(n, columns, keep='first')
-|GroupBy.agg(func，* args，** kwargs)
+|<code>df.nlargest(5, 'col_1')</code> 返回col_1列降序后前5行。
-|
 |-
-|aggregate()
+|nsmallest()
-|聚合，在指定轴上使用一项或多项操作进行汇总。
+|返回前n个最小的元素。
-|SeriesGroupBy.aggregate(func=None, *args, engine=None, engine_kwargs=None, **kwargs)
+|Series.nlargest(n=5, keep='first') <br /><br />DataFrame.nsmallest(n, columns, keep='first')
-|DataFrameGroupBy.aggregate(func=None, *args, engine=None, engine_kwargs=None, **kwargs)
+|<code>df.nsmallest(10,columns='col_2') </code>返回col_2列升序后前5行。
+|}
+{{了解更多
+|[https://pandas.pydata.org/docs/reference/series.html#reshaping-sorting pandas API：Series - 重塑和排序]
+|[https://pandas.pydata.org/docs/reference/frame.html#reshaping-sorting-transposing pandas API：DataFrame - 重塑和排序]
+}}
+===合并===
+{| class="wikitable"  style="width: 100%;
+! 方法
+! 描述
+! 对象的方法
+! 示例
+|-
+| concat()
+| 沿指定轴合并Series或DataFrame。<br \>参数：<br \><code>objs</code>,由Series或DataFrame组成的列表或字典。<br \><code>axis</code>，指定轴{0，1，…}，默认为axis=0表示沿行标签合并，axis=1表示沿列标签合并。<br \><code>join</code>, {'inner','outer'}，默认'outer'表示沿轴取并集，'inner'沿轴取交集。<br \><code>ignore_index</code>，布尔值，默认为False表示使用轴原来的标签（索引），True表示原来轴标签都不用，使用0开始递增的整数。<br \><code>keys</code>，列表，默认无。使用列表在轴标签（索引）外层再构造一层标签（索引）。
+| pandas.concat(<br \>&nbsp;&nbsp; objs, <br \>&nbsp;&nbsp; axis=0, <br \>&nbsp;&nbsp; join='outer', <br \>&nbsp;&nbsp; ignore_index=False, <br \>&nbsp;&nbsp; keys=None, <br \>&nbsp;&nbsp; levels=None, <br \>&nbsp;&nbsp; names=None, <br \>&nbsp;&nbsp; verify_integrity=False, <br \>&nbsp;&nbsp; sort=False, <br \>&nbsp;&nbsp; copy=True<br \>)
+| <code>pd.concat([df1,df2])</code>沿行标签合并   <br \><code>pd.concat([df1, df4], axis=1)</code>沿列标签合并   <br \><code>pd.concat([df1,df2,df3], keys=["x", "y", "z"])</code>按行标签合并，并再添加一层行标签(由x,y,z组成)。对结果调用loc["y"]可选取df2数据<br \><code>pd.concat([df1, df4], axis=1, join="inner")</code>沿列标签取交集合并  <br \><code>pd.concat([s1, s2, s3], axis=1, keys=["time", "code", "price"])</code>
+|-
+| append()
+| 加入，Series的append方法用于连接多个Series。DataFrame的append方法用于从其他DataFrame对象加入多行，并返回一个新的DataFrame对象。
+| Series.append(to_append, ignore_index=False, verify_integrity=False)<br \><br \>DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)
+| <code>s1.append(s2)</code>s1后加入s2   <br \><code>df1.append(df2)</code>df1后加入df2，返回加入后的DataFrame对象。<br \><code>df1.append(df2, ignore_index=True)</code> 忽略原来行标签，结果为从0开始递增的整数。
+|-
+| [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html merge()]
+| 将DataFrame或命名的Series合并，与数据库join操作类似。<br \>参数：<br \><code>left</code>，DataFrame或命名的Series对象。<br \><code>right</code>，另一个DataFrame或命名的Series对象。<br /><code>how</code> 连接方式，{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}， 默认‘inner’  <br \><code>on</code>，连接的条件，要连接的列或索引级别名称，左右列名要相同。 <br /><code>left_on</code> <code>right_on</code> 连接的条件，列名不同时可以分开指定。
+| pandas.merge(<br \>&nbsp;&nbsp; left, <br \>&nbsp;&nbsp; right, <br \>&nbsp;&nbsp; how='inner', <br \>&nbsp;&nbsp; on=None, <br \>&nbsp;&nbsp; left_on=None, <br \>&nbsp;&nbsp; right_on=None, <br \>&nbsp;&nbsp; left_index=False, <br \>&nbsp;&nbsp; right_index=False, <br \>&nbsp;&nbsp; sort=False, <br \>&nbsp;&nbsp; suffixes=('_x', '_y'), <br \>&nbsp;&nbsp; copy=True, <br \>&nbsp;&nbsp; indicator=False, <br \>&nbsp;&nbsp; validate=None<br \>&nbsp;&nbsp; )
+|<code>pd.merge(df1, df2, how='left', on=["年", "月"], suffixes=("_左", "_右"),)</code>  <code>df1.merge(df2, left_on='lkey', right_on='rkey')</code>
+|-
+| join()
+| 连接另一个DataFrame的多列。
+| DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
 |
 |-
-|transform()
+| merge_ordered()
-|转换，按组调用函数，并将原始数据替换为转换后的结果
+|
-|[https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.SeriesGroupBy.transform.html#pandas.core.groupby.SeriesGroupBy.transform SeriesGroupBy.transform](func, *args, engine=None, engine_kwargs=None, **kwargs)
+|
-|[https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html#pandas.core.groupby.DataFrameGroupBy.transform DataFrameGroupBy.transform](func, *args, engine=None, engine_kwargs=None, **kwargs)
 |
 |-
-|GroupBy.pipe()
+| merge_asof()
-|将带有参数的函数func应用于GroupBy对象，并返回函数的结果。
+|
-|GroupBy.pipe（func，* args，** kwargs）
+|
-|GroupBy.pipe（func，* args，** kwargs）
 |
 |-
-|}
+| assign()
-====计算/描述统计====
+| Assign new columns to a DataFrame.
-{| class="wikitable sortable"
+| DataFrame.assign(**kwargs)
+|
+|-
+| update()
+| Modify in place using non-NA values from another DataFrame.
+| Series.update(other) <br \>DataFrame.update(other, join='left', overwrite=True, filter_func=None, errors='ignore')
+|
+|-
+| insert()
+| 在指定位置插入列。
+| DataFrame.insert(loc, column, value, allow_duplicates=False)
+|
+|}
+{{了解更多
+|[https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html  pandas 用户指南：合并、加入、连接和比较]
+|[https://pandas.pydata.org/docs/reference/frame.html#combining-comparing-joining-merging pandas API：DataFrame 合并/比较/加入/合并]
+|[https://pandas.pydata.org/docs/reference/series.html#combining-comparing-joining-merging pandas API：Series 合并/比较/加入/合并]
+}}
+===比较===
+{| class="wikitable"
 |-
 !属性/方法
@@ 第956行： / 第988行： @@
 !示例
 |-
-| GroupBy.all()
+|compare()
-| Return True if all values in the group are truthful, else False.
+|比较两个Series或DataFrame差异并返回，V1.1.0新增。
-| GroupBy.all(skipna=True)
+|Series.compare(other, align_axis=1, keep_shape=False, keep_equal=False)
-| DataFrameGroupBy.all(skipna=True)
+|DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False)
+|<code>s1.compare(s2)</code>  <code>df.compare(df2)</code>
+|-
+| isin()
+| Whether each element in the Series/DataFrame is contained in values.
+|Series.isin(values)
+|DataFrame.isin(values)
 |
 |-
-| GroupBy.any()
+|equals()
-| Return True if any value in the group is truthful, else False.
+|Test whether two objects contain the same elements.
-| GroupBy.any(skipna=True)
+|Series.equals(other)
-| DataFrameGroupBy.any(skipna=True)
+|DataFrame.equals(other)
-|
+|<code>df.equals(df2)</code>
+|}
+{{了解更多
+|[https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html  pandas 用户指南：合并、加入、连接和比较]
+|[https://pandas.pydata.org/docs/reference/frame.html#combining-comparing-joining-merging pandas API：DataFrame 合并/比较/加入/合并]
+|[https://pandas.pydata.org/docs/reference/series.html#combining-comparing-joining-merging pandas API：Series 合并/比较/加入/合并]
+}}
+==分组聚合==
+===GroupBy分组聚合===
+使用GroupBy分组聚合的一般步骤：
+* 分组：将数据按条件拆分为几组。
+* 应用：在每组上应用聚合函数、转换函数或过滤。
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/groupby.html Pandas 用户指南：Group by: split-apply-combine]
+|[https://pandas.pydata.org/docs/reference/groupby.html Pandas 参考：GroupBy]
+}}
+====创建GroupBy对象====
+{| class="wikitable" style="width: 100%;
+|-
+! 类名
+! 创建对象方法
+! 格式
+! 示例
 |-
-| GroupBy.backfill()
+| SeriesGroupBy
-| Backward fill the values.
+| [https://pandas.pydata.org/docs/reference/api/pandas.Series.groupby.html#pandas.Series.groupby  Series.groupby()]
-| GroupBy.backfill(limit=None)
+| Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
-| DataFrameGroupBy.backfill(limit=None)
 |
 |-
-| GroupBy.bfill()
+|  DataFrameGroupBy
-| 同 GroupBy.backfill()
+| [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby DataFrame.groupby()]
-| GroupBy.bfill(limit=None)
+| DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
-| DataFrameGroupBy.bfill(limit=None)
+| <code>df.groupby('code')</code>或<code>df.groupby(by='code')</code>按code列分组，创建一个GroupBy对象
-|
+|-
+|}
+====选取与迭代====
+{| class="wikitable"  style="width: 100%;
 |-
-| GroupBy.count()
+!属性/方法
-| 统计每组值的个数，不包含缺失值。
+!描述
-| GroupBy.count()
+!示例
-| DataFrameGroupBy.count()
-| grouped.count()
 |-
-| GroupBy.cumcount()
+| GroupBy.__iter__（）
-| Number each item in each group from 0 to the length of that group - 1.
+| Groupby迭代器
-| GroupBy.cumcount(ascending=True)
+|
-| DataFrameGroupBy.cumcount(ascending=True)
+|-
-|
+| GroupBy.groups
+| Dict{组名->组数据}
+| for name, group in grouped:<br \>&nbsp;&nbsp;&nbsp;&nbsp;print(name)<br \>&nbsp;&nbsp;&nbsp;&nbsp;print(group )
 |-
-| GroupBy.cummax()
+| GroupBy.indices
-| Cumulative max for each group.
+| Dict{组名->组索引}
-| GroupBy.cummax(axis=0, **kwargs)
-| DataFrameGroupBy.cummax(axis=0, **kwargs)
 |
 |-
-| GroupBy.cummin()
+| GroupBy.get_group(name, obj=None)
-| Cumulative min for each group.
+| 通过组名选取一个组，返回DataFrame格式。
-| GroupBy.cummin(axis=0, **kwargs)
+| grouped.get_group('AAPL')
-| DataFrameGroupBy.cummin(axis=0, **kwargs)
-|
 |-
-| GroupBy.cumprod()
+| pandas.Grouper(*args, **kwargs)
-| Cumulative product for each group.
+| x.describe()
-| GroupBy.cumprod(axis=0, *args, **kwargs)
-| DataFrameGroupBy.cumprod(axis=0, *args, **kwargs)
 |
 |-
-| GroupBy.cumsum()
+|}
-| Cumulative sum for each group.
+====功能应用====
-| GroupBy.cumsum(axis=0, *args, **kwargs)
+{| class="wikitable"
-| DataFrameGroupBy.cumsum(axis=0, *args, **kwargs)
-|
 |-
-| GroupBy.ffill()
+!属性/方法
-| Forward fill the values.
+!描述
-| GroupBy.ffill(limit=None)
+!Series
-| DataFrameGroupBy.ffill(limit=None)
+!DataFrame
-|
+!示例
 |-
-| GroupBy.first()
+|GroupBy.apply()
-| Compute first of group values.
+|应用，按组应用函数func，并将结果组合在一起。
-| colspan="2" |GroupBy.first(numeric_only=False, min_count=- 1)
+|GroupBy.apply（func，* args，** kwargs）
-|
+|GroupBy.apply（func，* args，** kwargs）
+|grouped['C'].apply(lambda x: x.describe())
 |-
-| GroupBy.head()
+|GroupBy.agg()
-| 返回每组的前n行，默认5行
+|聚合，等效aggregate()
-| colspan="2" | GroupBy.head(n=5)
+|GroupBy.agg(func，* args，** kwargs)
+|GroupBy.agg(func，* args，** kwargs)
 |
 |-
-| GroupBy.last()
+|aggregate()
-| Compute last of group values.
+|聚合，在指定轴上使用一项或多项操作进行汇总。
-| colspan="2" | GroupBy.last(numeric_only=False, min_count=- 1)
+|SeriesGroupBy.aggregate(func=None, *args, engine=None, engine_kwargs=None, **kwargs)
+|DataFrameGroupBy.aggregate(func=None, *args, engine=None, engine_kwargs=None, **kwargs)
 |
 |-
-| GroupBy.max()
+|transform()
-| Compute max of group values.
+|转换，按组调用函数，并将原始数据替换为转换后的结果
-| colspan="2" | GroupBy.max(numeric_only=False, min_count=- 1)
+|[https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.SeriesGroupBy.transform.html#pandas.core.groupby.SeriesGroupBy.transform SeriesGroupBy.transform](func, *args, engine=None, engine_kwargs=None, **kwargs)
+|[https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html#pandas.core.groupby.DataFrameGroupBy.transform DataFrameGroupBy.transform](func, *args, engine=None, engine_kwargs=None, **kwargs)
 |
 |-
-| GroupBy.mean()
+|GroupBy.pipe()
-| Compute mean of groups, excluding missing values.
+|将带有参数的函数func应用于GroupBy对象，并返回函数的结果。
-| colspan="2" | GroupBy.mean(numeric_only=True)
+|GroupBy.pipe（func，* args，** kwargs）
+|GroupBy.pipe（func，* args，** kwargs）
 |
 |-
-| GroupBy.median()
+|}
-| Compute median of groups, excluding missing values.
+====计算/描述统计====
-| colspan="2" | GroupBy.median(numeric_only=True)
+{| class="wikitable sortable"
-|
 |-
-| GroupBy.min([numeric_only, min_count])
+!属性/方法
-| Compute min of group values.
+!描述
-| colspan="2" | GroupBy.min(numeric_only=False, min_count=- 1)
+!Series
-|
+!DataFrame
+!示例
 |-
-| GroupBy.ngroup([ascending])
+| GroupBy.all()
-| Number each group from 0 to the number of groups - 1.
+| Return True if all values in the group are truthful, else False.
-| colspan="2" |  GroupBy.ngroup(ascending=True)
+| GroupBy.all(skipna=True)
+| DataFrameGroupBy.all(skipna=True)
+|
+|-
+| GroupBy.any()
+| Return True if any value in the group is truthful, else False.
+| GroupBy.any(skipna=True)
+| DataFrameGroupBy.any(skipna=True)
 |
 |-
-| GroupBy.nth(n[, dropna])
+| GroupBy.backfill()
-| 如果参数n是一个整数，则取每个组的第n行；如果n是一个整数列表，则取每组行的子集。
+| Backward fill the values.
-| colspan="2" | GroupBy.nth(n, dropna=None)
+| GroupBy.backfill(limit=None)
+| DataFrameGroupBy.backfill(limit=None)
 |
 |-
-| GroupBy.ohlc()
+| GroupBy.bfill()
-| 计算组的开始值，最高值，最低值和末尾值，不包括缺失值。
+| 同 GroupBy.backfill()
-| colspan="2" | GroupBy.ohlc()
+| GroupBy.bfill(limit=None)
+| DataFrameGroupBy.bfill(limit=None)
 |
 |-
-| GroupBy.pad()
+| GroupBy.count()
-| Forward fill the values.
+| 统计每组值的个数，不包含缺失值。
-| GroupBy.pad(limit=None)
+| GroupBy.count()
-|DataFrameGroupBy.pad(limit=None)
+| DataFrameGroupBy.count()
-|
+| grouped.count()
 |-
-| GroupBy.prod([numeric_only, min_count])
+| GroupBy.cumcount()
-| Compute prod of group values.
+| Number each item in each group from 0 to the length of that group - 1.
-| colspan="2" | GroupBy.prod(numeric_only=True, min_count=0)
+| GroupBy.cumcount(ascending=True)
-|
+| DataFrameGroupBy.cumcount(ascending=True)
+|
 |-
-| GroupBy.rank([method, ascending, na_option, …])
+| GroupBy.cummax()
-| Provide the rank of values within each group.
+| Cumulative max for each group.
-| GroupBy.rank(method='average', ascending=True, na_option='keep', pct=False, axis=0)
+| GroupBy.cummax(axis=0, **kwargs)
-| DataFrameGroupBy.rank(method='average', ascending=True, na_option='keep', pct=False, axis=0)
+| DataFrameGroupBy.cummax(axis=0, **kwargs)
-|
-|-
-| GroupBy.pct_change([periods, fill_method, …])
-| Calculate pct_change of each value to previous entry in group.
-| GroupBy.pct_change(periods=1, fill_method='pad', limit=None, freq=None, axis=0)
-| DataFrameGroupBy.pct_change(periods=1, fill_method='pad', limit=None, freq=None, axis=0)
 |
 |-
-| GroupBy.size()
+| GroupBy.cummin()
-| Compute group sizes.
+| Cumulative min for each group.
-| GroupBy.size()
+| GroupBy.cummin(axis=0, **kwargs)
-| DataFrameGroupBy.size()
+| DataFrameGroupBy.cummin(axis=0, **kwargs)
 |
 |-
-| GroupBy.sem()
+| GroupBy.cumprod()
-| Compute standard error of the mean of groups, excluding missing values.
+| Cumulative product for each group.
-| colspan="2" | GroupBy.sem(ddof=1)
+| GroupBy.cumprod(axis=0, *args, **kwargs)
+| DataFrameGroupBy.cumprod(axis=0, *args, **kwargs)
 |
 |-
-| GroupBy.std()
+| GroupBy.cumsum()
-| Compute standard deviation of groups, excluding missing values.
+| Cumulative sum for each group.
-| colspan="2" | GroupBy.std(ddof=1)
+| GroupBy.cumsum(axis=0, *args, **kwargs)
+| DataFrameGroupBy.cumsum(axis=0, *args, **kwargs)
 |
 |-
-| GroupBy.sum([numeric_only, min_count])
+| GroupBy.ffill()
-| Compute sum of group values.
+| Forward fill the values.
-| colspan="2" | GroupBy.sum(numeric_only=True, min_count=0)
+| GroupBy.ffill(limit=None)
+| DataFrameGroupBy.ffill(limit=None)
+|
+|-
+| GroupBy.first()
+| Compute first of group values.
+| colspan="2" |GroupBy.first(numeric_only=False, min_count=- 1)
+|
+|-
+| GroupBy.head()
+| 返回每组的前n行，默认5行
+| colspan="2" | GroupBy.head(n=5)
 |
 |-
-| GroupBy.var([ddof])
+| GroupBy.last()
-| Compute variance of groups, excluding missing values.
+| Compute last of group values.
-| colspan="2" | GroupBy.var(ddof=1)
+| colspan="2" | GroupBy.last(numeric_only=False, min_count=- 1)
 |
 |-
-| GroupBy.tail()
+| GroupBy.max()
-| 返回每组的最后n行，默认5行
+| Compute max of group values.
-| colspan="2" | GroupBy.tail(n=5)
+| colspan="2" | GroupBy.max(numeric_only=False, min_count=- 1)
 |
-|}
-{{了解更多
-|[https://pandas.pydata.org/docs/user_guide/groupby.html Pandas 用户指南：Group by: split-apply-combine]
-|[https://pandas.pydata.org/docs/reference/groupby.html Pandas 参考：GroupBy]
-}}
-==时间序列==
-===概览===
-Pandas把时间相关分为4种概念，用8个类来表示。
-{| class="wikitable"
 |-
-! 概念
+| GroupBy.mean()
-! 描述
+| Compute mean of groups, excluding missing values.
-! 标量类
+| colspan="2" | GroupBy.mean(numeric_only=True)
-! 数组类
+|
-! pandas数据类型
-! 主要创建方法
-! 示例
 |-
-| 日期时间
+| GroupBy.median()
-| 支持时区的特定日期时间点。<br \>类似Python标准库的datetime.datetime。
+| Compute median of groups, excluding missing values.
-| Timestamp
+| colspan="2" | GroupBy.median(numeric_only=True)
-| DatetimeIndex
-| datetime64[ns] <br \>或 datetime64[ns, tz]
-| to_datetime <br \>date_range
-| <code>pd.to_datetime('2020-01-01')</code>生成：Timestamp('2020-01-01 00:00:00')
-|-
-| 时间增量
-| 持续时间，即两个日期或时间的差值。<br \>类似Python标准库的datetime.timedelta。
-| Timedelta
-| TimedeltaIndex
-| timedelta64[ns]
-| to_timedelta <br \>timedelta_range
 |
 |-
-| 时间跨度
+| GroupBy.min([numeric_only, min_count])
-| 由时间点及其关联的频率定义的时间跨度。
+| Compute min of group values.
-| Period
+| colspan="2" | GroupBy.min(numeric_only=False, min_count=- 1)
-| PeriodIndex
-| period[freq]
-| Period <br \>period_range
 |
 |-
-| 日期偏移
+| GroupBy.ngroup([ascending])
-| 日期增量
+| Number each group from 0 to the number of groups - 1.
-| DateOffset
+| colspan="2" |  GroupBy.ngroup(ascending=True)
-| None
-| None
-| DateOffset
 |
-|}
+|-
+| GroupBy.nth(n[, dropna])
-{{了解更多
+| 如果参数n是一个整数，则取每个组的第n行；如果n是一个整数列表，则取每组行的子集。
-|[https://pandas.pydata.org/docs/user_guide/timeseries.html  pandas 文档：用户指南 - 时间序列]
+| colspan="2" | GroupBy.nth(n, dropna=None)
-}}
+|
+|-
-==合并==
+| GroupBy.ohlc()
-===concat===
+| 计算组的开始值，最高值，最低值和末尾值，不包括缺失值。
-===append===
+| colspan="2" | GroupBy.ohlc()
-===merge===
+|
-===join===
+|-
+| GroupBy.pad()
-==绘图==
+| Forward fill the values.
-pandas绘图基于[[Matplotlib]]，pandas的DataFrame和Series都自带生成各类图表的plot方法，能够方便快速生成各种图表。
+| GroupBy.pad(limit=None)
+|DataFrameGroupBy.pad(limit=None)
-{{了解更多
+|
-|[https://pandas.pydata.org/docs/user_guide/visualization.html pandas文档：用户指南 - 可视化]
+|-
-}}
+| GroupBy.prod([numeric_only, min_count])
-===基本图形===
+| Compute prod of group values.
-====折线图====
+| colspan="2" | GroupBy.prod(numeric_only=True, min_count=0)
-plot方法默认生成的就是折线图。如prices是一个DataFrame的含有收盘价close列，绘制收盘价的折线图：
+|
-<syntaxhighlight lang="python" >
+|-
-s = prices['close']
+| GroupBy.rank([method, ascending, na_option, …])
-s.plot()
+| Provide the rank of values within each group.
+| GroupBy.rank(method='average', ascending=True, na_option='keep', pct=False, axis=0)
-#设置图片大小，使用figsize参数
+| DataFrameGroupBy.rank(method='average', ascending=True, na_option='keep', pct=False, axis=0)
-s.plot(figsize=(20,10))
+|
-</syntaxhighlight>
+|-
+| GroupBy.pct_change([periods, fill_method, …])
-====条形图====
+| Calculate pct_change of each value to previous entry in group.
-对于不连续标签，没有时间序列的数据，可以绘制条形图，使用以下两种方法：
+| GroupBy.pct_change(periods=1, fill_method='pad', limit=None, freq=None, axis=0)
-*使用plot()函数，设置kind参数为‘bar’ or ‘barh’，
+| DataFrameGroupBy.pct_change(periods=1, fill_method='pad', limit=None, freq=None, axis=0)
-*使用plot.bar()函数，plot.barh()函数
+|
+|-
-<syntaxhighlight lang="python" >
+| GroupBy.size()
-df.plot(kind='bar')    #假设df为每天股票数据
+| Compute group sizes.
-df.plot.bar()
+| GroupBy.size()
-df.resample('A-DEC').mean().volume.plot(kind='bar')    #重采集每年成交量平均值，绘制条形图（volume为df的成交量列）
+| DataFrameGroupBy.size()
+|
-df.plot.bar(stacked=True)    #stacked=True表示堆积条形图
+|-
-df.plot.barh(stacked=True)    #barh 表示水平条形图 </nowiki>
+| GroupBy.sem()
-</syntaxhighlight>
+| Compute standard error of the mean of groups, excluding missing values.
-====直方图====
+| colspan="2" | GroupBy.sem(ddof=1)
-直方图使用plot.hist()方法绘制，一般为频数分布直方图，x轴分区间，y轴为频数。组数用参数bins控制，如分20组bins=20
+|
-<syntaxhighlight lang="python" >
+|-
-df.volume.plot.hist()    #df股票数据中成交量volume的频数分布直方图。
+| GroupBy.std()
-df.plot.hist(alpha=0.5)    #alpha=0.5 表示柱形的透明度为0.5
+| Compute standard deviation of groups, excluding missing values.
-df.plot.hist(stacked=True, bins=20)    #stacked=True表示堆积绘制，bins=20表示分20组。
+| colspan="2" | GroupBy.std(ddof=1)
-df.plot.hist(orientation='horizontal')    #orientation='horizontal' 表示水平直方图
+|
-df.plot.hist(cumulative=True)    #表示累计直方图
+|-
+| GroupBy.sum([numeric_only, min_count])
-df['close'].diff().hist()    #收盘价上应用diff函数，再绘制直方图
+| Compute sum of group values.
-df.hist(color='k', bins=50)     #DataFrame.hist函数将每列绘制在不同的子图形上。
+| colspan="2" | GroupBy.sum(numeric_only=True, min_count=0)
-</syntaxhighlight>
+|
+|-
-====箱型图====
+| GroupBy.var([ddof])
-箱型图可以使用plot.box()函数或DataFrame的boxplot()绘制。
+| Compute variance of groups, excluding missing values.
-参数：
+| colspan="2" | GroupBy.var(ddof=1)
-*color，用来设置颜色，通过传入颜色字典，如color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
+|
-*sym，用来设置异常值样式，如sym='r+'表示异常值用'红色+'表示。
+|-
-<syntaxhighlight lang="python" >
+| GroupBy.tail()
-df.plot.box()
+| 返回每组的最后n行，默认5行
-df[['close','open', 'high']].plot.box()
+| colspan="2" | GroupBy.tail(n=5)
-#改变箱型颜色，通过传入颜色字典
+|
-color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
+|}
-df.plot.box(color=color, sym='r+')    #sym用来设置异常值样式，'r+'表示'红色+'
-df.plot.box(positions=[1, 4, 5, 6, 8])    #positions表示显示位置，df有5个列， 第一列显示在x轴1上，第二列显示在x轴4上，以此类推
-df.plot.box(vert=False)    #表示绘制水平箱型图
-df.boxplot()
-#绘制分层箱型图，通过设置by关键词创建分组，再按组，分别绘制箱型图。如下面例子，每列按A组，B组分别绘制箱型图。
+===pivot_table数据透视表===
-df = pd.DataFrame(np.random.rand(10, 2), columns=['Col1', 'Col2'])
+pandas还提供pivot_table()函数，类似于[[Excel]]的数据透视表。
-df['x'] = pd.Series(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'])
-df.boxplot(by='x')
-#还可以再传入一个子分类，再进一步分组绘制。如：
+{{了解更多
-df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y'])
+|[https://pandas.pydata.org/docs/user_guide/reshaping.html#pivot-tables pandas 用户指南：数据透视表]
-</syntaxhighlight>
+}}
-====散点图====
-散点图使用DataFrame.plot.scatter()方法绘制。通过参数x，y指定x轴和y轴的数据列。
-<syntaxhighlight lang="python" >
-df.plot.scatter(x='close', y='volume')    #假如df为每日股票数据，图表示收盘价与成交量的散点图
-#将两组散点图绘制在一张图表上，重新ax参数如
+==计算统计==
-ax = df.plot.scatter(x='close', y='volume', color='DarkBlue', label='Group 1')    #设置标签名label设置标名
+===计算/描述统计===
-df.plot.scatter(x='open', y='value', color='DarkGreen', label='Group 2', ax=ax)
+{| class="wikitable"
+|-
-#c参数表示圆点的颜色按按volume列大小来渐变表示。
+!属性/方法
-df.plot.scatter(x='close', y='open', c='volume', s=50)    #s表示原点面积大小
+!描述
-df.plot.scatter(x='close', y='open', s=df['volume']/50000)  #圆点的大小也可以根据某列数值大小相应设置。
+!Series
-</syntaxhighlight>
+!DataFrame
+!示例
-====饼图====
+|-
-饼图使用DataFrame.plot.pie()或Series.plot.pie()绘制。如果数据中有空值，会自动使用0填充。
+| abs()
+| 返回 Series/DataFrame 每个元素的绝对值。
-===其他绘图函数===
+| Series.abs()
-这些绘图函数来自[https://pandas.pydata.org/pandas-docs/stable/reference/plotting.html pandas.plotting]模块。
+| DataFrame.abs()
+| <code>s.abs()</code> <br \> <code>df.abs()</code>
-====矩阵散点图（Scatter Matrix Plot）====
+|-
-矩阵散点图（Scatter Matrix Plot）使用scatter_matrix()方法绘制
+| all()
-<syntaxhighlight lang="python" >
+| Return whether all elements are True, potentially over an axis.
-from pandas.plotting import scatter_matrix     #使用前需要从模块中导入该函数
+| Series.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
-scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde')    #假设df是每日股票数据，会每一列相对其他每一列生成一个散点图。
+| DataFrame.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
-</syntaxhighlight>
+|
+|-
-====密度图（Density Plot）====
+| any()
-密度图使用Series.plot.kde()和DataFrame.plot.kde()函数。
+| Return whether any element is True, potentially over an axis.
- df.plot.kde()
+| Series.any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
+| DataFrame.any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
-====安德鲁斯曲线（Andrews Curves）====
+|
-安德鲁斯曲线
+|-
+| clip()
-====平行坐标图（Parallel Coordinates）====
+| Trim values at input threshold(s).
+| Series.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
-====Lag plot====
+| DataFrame.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)
+|
-====自相关图（Autocorrelation Plot）====
+|-
-自相关图
+| corr()
+| Compute pairwise correlation of columns, excluding NA/null values.
-====自举图（Bootstrap plot）====
+| Series.corr(other, method='pearson', min_periods=None)
+| DataFrame.corr(method='pearson', min_periods=1)
-===绘图格式===
+|
-====预设置图形样式====
+|-
-matplotlib 从1.5开始，可以预先设置样式，绘图前通过matplotlib.style.use(my_plot_style)。如matplotlib.style.use('ggplot') 定义ggplot-style plots.
+| corrwith()
-====样式参数====
+| Compute pairwise correlation.
-大多数绘图函数，可以通过一组参数来设置颜色。
+|
+| DataFrame.corrwith(other, axis=0, drop=False, method='pearson')
-====标签设置====
+|
-可通过设置legend参数为False来隐藏图片标签，如
+|-
- df.plot(legend=False)
+| count()
+|统计每行或每列值的个数，不包括NA值。
-====尺度====
+| Series.count(level=None)
-*logy参数用来将y轴设置对数标尺
+| DataFrame.count(axis=0, level=None, numeric_only=False)
-*logx参数用来将x轴设置对数标尺
+|<code>s.count()</code><br \><code>df.count()</code><br \><code>df.count(axis='columns')</code>
-*loglog参数用来将x轴和y轴设置对数标尺
+|-
- ts.plot(logy=True)
+| cov()
+| Compute pairwise covariance of columns, excluding NA/null values.
-====双坐标图====
+| Series.cov(other, min_periods=None, ddof=1)
-两组序列同x轴，但y轴数据不同，可以通过第二个序列设置参数：secondary_y=True，来设置第二个y轴。
+| DataFrame.cov(min_periods=None, ddof=1)
-<syntaxhighlight lang="python" >
+|
-#比如想在收盘价图形上显示cci指标：
+|-
-prices['close'].plot()
+| cummax()
-prices['cci'].plot(secondary_y=True)
+| Return cumulative maximum over a DataFrame or Series axis.
+| Series.cummax(axis=None, skipna=True, *args, **kwargs)
-#第二个坐标轴要显示多个，可以直接传入列名
+| DataFrame.cummax(axis=None, skipna=True, *args, **kwargs)
-ax = df.plot(secondary_y=['cci', 'RSI'], mark_right=False)    #右边轴数据标签默认会加个右边，设置mark_right为False取消显示
+|
-ax.set_ylabel('CD scale')     #设置左边y轴名称
-ax.right_ax.set_ylabel('AB scale')    #设置右边y轴名称
-</syntaxhighlight>
-====子图====
-DataFrame的每一列可以绘制在不同的坐标轴(axis）中，使用subplots参数设置，例如：
- df.plot(subplots=True, figsize=(6, 6))
-====子图布局====
-子图布局使用关键词layout设置，
-==输入输出==
-pandas的读取函数是顶层函数，如pandas.read_csv()一般返回一个pandas对象。写入函数是相应对象的方法，如DataFrame.to_csv()将DataFrame对象写入到csv文件。下表是可用的读取和写入函数。
-{| class="wikitable"
 |-
-! 数据描述
+| cummin()
-! 格式类型
+| Return cumulative minimum over a DataFrame or Series axis.
-! 读取函数
+| Series.cummin(axis=None, skipna=True, *args, **kwargs)
-! 写入函数
+| DataFrame.cummin(axis=None, skipna=True, *args, **kwargs)
+|
 |-
-| CSV
+| cumprod()
-| text
+| Return cumulative product over a DataFrame or Series axis.
-| read_csv
+| Series.cumprod(axis=None, skipna=True, *args, **kwargs)
-| to_csv
+| DataFrame.cumprod(axis=None, skipna=True, *args, **kwargs)
+|
 |-
-| Fixed-Width Text File
+| cumsum()
-| text
+| Return cumulative sum over a DataFrame or Series axis.
-| read_fwf
+| Series.cumsum(axis=None, skipna=True, *args, **kwargs)
-|
+| DataFrame.cumsum(axis=None, skipna=True, *args, **kwargs)
+|
 |-
-| JSON
+| describe()
-| text
+| Generate descriptive statistics.
-| read_json
+| Series.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
-| to_json
+| DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
+|
 |-
-| HTML
+| diff()
-| text
+| First discrete difference of element.
-| read_html
+| Series.diff(periods=1)
-| to_html
+| DataFrame.diff(periods=1, axis=0)
+|
 |-
-| Local clipboard
+| eval()
-| text
+| Evaluate a string describing operations on DataFrame columns.
-| read_clipboard
+|
-| to_clipboard
+| DataFrame.eval(expr, inplace=False, **kwargs)
+|
 |-
-| MS Excel
+| kurt()
-|
+| Return unbiased kurtosis over requested axis.
-| read_excel
+| Series.kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-| to_excel
+| DataFrame.kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|
 |-
-| OpenDocument
+| kurtosis()
-| binary
+| Return unbiased kurtosis over requested axis.
-| read_excel
+| Series.kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-|
+| DataFrame.kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|
 |-
-| HDF5 Format
+| mad()
-| binary
+| Return the mean absolute deviation of the values for the requested axis.
-| read_hdf
+| Series.mad(axis=None, skipna=None, level=None)
-| to_hdf
+| DataFrame.mad(axis=None, skipna=None, level=None)
+|
 |-
-| Feather Format
+| max()
-| binary
+| Return the maximum of the values for the requested axis.
-| read_feather
+| Series.max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-| to_feather
+| DataFrame.max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|
 |-
-| Parquet Format
+| mean()
-| binary
+| Return the mean of the values for the requested axis.
-| read_parquet
+| Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-| to_parquet
+| DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|
 |-
-| ORC Format
+| median()
-| binary
+| Return the median of the values for the requested axis.
-| read_orc
+| Series.median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-|
+| DataFrame.median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|
 |-
-| Msgpack
+| min()
-| binary
+| Return the minimum of the values for the requested axis.
-| read_msgpack
+| Series.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
-| to_msgpack
+| DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|
 |-
-| Stata
+| mode()
-| binary
+| Get the mode(s) of each element along the selected axis.
-| read_stata
+| Series.mode(dropna=True)
-| to_stata
+| DataFrame.mode(axis=0, numeric_only=False, dropna=True)
+|
 |-
-| SAS
+| pct_change()
-| binary
+| Percentage change between the current and a prior element.
-| read_sas
+| Series.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
-|
+| DataFrame.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
+|
+|-
+| prod()
+| Return the product of the values for the requested axis.
+| Series.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
+| DataFrame.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
+|
+|-
+| product()
+| Return the product of the values for the requested axis.
+| Series.product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
+| DataFrame.product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
+|
+|-
+| quantile()
+| Return values at the given quantile over requested axis.
+| Series.quantile(q=0.5, interpolation='linear')
+| DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')
+|
 |-
-| SPSS
+| rank()
-| binary
+| Compute numerical data ranks (1 through n) along axis.
-| read_spss
+| Series.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
-|
+| DataFrame.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
-|-
+|
-| Python Pickle Format
+|-
-| binary
+| round()
-| read_pickle
+| Round a DataFrame to a variable number of decimal places.
-| to_pickle
+| Series.round(decimals=0, *args, **kwargs)
-|-
+| DataFrame.round(decimals=0, *args, **kwargs)
-| SQL
+|
-| SQL
+|-
-| read_sql
+| sem()
-| to_sql
+| Return unbiased standard error of the mean over requested axis.
-|-
+| Series.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
-| Google BigQuery
+| DataFrame.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
-| SQL
+|
-| read_gbq
+|-
-| to_gbq
+| skew()
-|}
+| Return unbiased skew over requested axis.
+| Series.skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+| DataFrame.skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
+|
+|-
+| sum()
+| Return the sum of the values for the requested axis.
+| Series.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
+| DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
+|
+|-
+| std()
+| Return sample standard deviation over requested axis.
+| Series.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
+| DataFrame.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
+|
+|-
+| var()
+| Return unbiased variance over requested axis.
+| Series.var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
+| DataFrame.var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
+|
+|-
+| nunique()
+| Count distinct observations over requested axis.
+| Series.nunique(dropna=True)
+| DataFrame.nunique(axis=0, dropna=True)
+|
+|-
+| value_counts()
+| Return a Series containing counts of unique rows in the DataFrame.
+| Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
+| DataFrame.value_counts(subset=None, normalize=False, sort=True, ascending=False)
+|
+|}
+===二元运算功能===
+{| class="wikitable"
+|-
+!属性/方法
+!描述
+!Series
+!DataFrame
+!示例
+|-
+| add()
+| Get Addition of dataframe and other, element-wise (binary operator add).
+| Series.add(other, level=None, fill_value=None, axis=0)
+| DataFrame.add(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| sub()
+| Get Subtraction of dataframe and other, element-wise (binary operator sub).
+| Series.sub(other, level=None, fill_value=None, axis=0)
+| DataFrame.sub(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| mul()
+| Get Multiplication of dataframe and other, element-wise (binary operator mul).
+| Series.mul(other, level=None, fill_value=None, axis=0)
+| DataFrame.mul(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| div()
+| Get Floating division of dataframe and other, element-wise (binary operator truediv).
+| Series.div(other, level=None, fill_value=None, axis=0)
+| DataFrame.div(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| truediv()
+| Get Floating division of dataframe and other, element-wise (binary operator truediv).
+| Series.truediv(other, level=None, fill_value=None, axis=0)
+| DataFrame.truediv(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| floordiv()
+| Get Integer division of dataframe and other, element-wise (binary operator floordiv).
+| Series.floordiv(other, level=None, fill_value=None, axis=0)
+| DataFrame.floordiv(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| mod()
+| Get Modulo of dataframe and other, element-wise (binary operator mod).
+| Series.mod(other, level=None, fill_value=None, axis=0)
+| DataFrame.mod(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| pow()
+| Get Exponential power of dataframe and other, element-wise (binary operator pow).
+| Series.pow(other, level=None, fill_value=None, axis=0)
+| DataFrame.pow(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| dot()
+| Compute the matrix multiplication between the DataFrame and other.
+| Series.dot(other)
+| DataFrame.dot(other)
+|
+|-
+| radd()
+| Get Addition of dataframe and other, element-wise (binary operator radd).
+| Series.radd(other, level=None, fill_value=None, axis=0)
+| DataFrame.radd(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| rsub()
+| Get Subtraction of dataframe and other, element-wise (binary operator rsub).
+| Series.rsub(other, level=None, fill_value=None, axis=0)
+| DataFrame.rsub(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| rmul()
+| Get Multiplication of dataframe and other, element-wise (binary operator rmul).
+| Series.rmul(other, level=None, fill_value=None, axis=0)
+| DataFrame.rmul(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| rdiv()
+| Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
+| Series.rdiv(other, level=None, fill_value=None, axis=0)
+| DataFrame.rdiv(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| rtruediv()
+| Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
+| Series.rtruediv(other, level=None, fill_value=None, axis=0)
+| DataFrame.rtruediv(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| rfloordiv()
+| Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
+| Series.rfloordiv(other, level=None, fill_value=None, axis=0)
+| DataFrame.rfloordiv(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| rmod()
+| Get Modulo of dataframe and other, element-wise (binary operator rmod).
+| Series.rmod(other, level=None, fill_value=None, axis=0)
+| DataFrame.rmod(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| rpow()
+| Get Exponential power of dataframe and other, element-wise (binary operator rpow).
+| Series.rpow(other, level=None, fill_value=None, axis=0)
+| DataFrame.rpow(other, axis='columns', level=None, fill_value=None)
+|
+|-
+| lt()
+| Get Less than of dataframe and other, element-wise (binary operator lt).
+| Series.lt(other, level=None, fill_value=None, axis=0)
+| DataFrame.lt(other, axis='columns', level=None)
+|
+|-
+| gt()
+| Get Greater than of dataframe and other, element-wise (binary operator gt).
+| Series.gt(other, level=None, fill_value=None, axis=0)
+| DataFrame.gt(other, axis='columns', level=None)
+|
+|-
+| le()
+| Get Less than or equal to of dataframe and other, element-wise (binary operator le).
+| Series.le(other, level=None, fill_value=None, axis=0)
+| DataFrame.le(other, axis='columns', level=None)
+|
+|-
+| ge()
+| Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).
+| Series.ge(other, level=None, fill_value=None, axis=0)
+| DataFrame.ge(other, axis='columns', level=None)
+|
+|-
+| ne()
+| Get Not equal to of dataframe and other, element-wise (binary operator ne).
+| Series.ne(other, level=None, fill_value=None, axis=0)
+| DataFrame.ne(other, axis='columns', level=None)
+|
+|-
+| eq()
+| Get Equal to of dataframe and other, element-wise (binary operator eq).
+| Series.eq(other, level=None, fill_value=None, axis=0)
+| DataFrame.eq(other, axis='columns', level=None)
+|
+|-
+| combine()
+| Perform column-wise combine with another DataFrame.
+| Series.combine(other, func, fill_value=None)
+| DataFrame.combine(other, func, fill_value=None, overwrite=True)
+|
+|-
+| combine_first()
+| Update null elements with value in the same location in other.
+| Series.combine_first(other)
+| DataFrame.combine_first(other)
+|
+|}
+==时间序列==
+===概览===
+Pandas把时间相关分为4种概念，用8个类来表示。
+{| class="wikitable"
+|-
+! 概念
+! 描述
+! 标量类
+! 数组类
+! pandas数据类型
+! 主要创建方法
+! 示例
+|-
+| 日期时间
+| 支持时区的特定日期时间点。<br \>类似Python标准库的datetime.datetime。
+| Timestamp
+| DatetimeIndex
+| datetime64[ns] <br \>或 datetime64[ns, tz]
+| [https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html to_datetime()] <br \>[https://pandas.pydata.org/docs/reference/api/pandas.date_range.html date_range()]
+| <code>pd.to_datetime('2020-01-01')</code>生成：Timestamp('2020-01-01 00:00:00') <br /><code>pd.to_datetime(df['date'], format='%Y%m%d')</code> 将date列数据（格式如20201220）转为DatetimeIndex格式 <br /><code>pd.date_range("2018-01-01", periods=5, freq="D")</code> 生成DatetimeIndex，从2018-01-01到2018-01-05。
+|-
+| 时间增量
+| 持续时间，即两个日期或时间的差值。<br \>类似Python标准库的datetime.timedelta。
+| Timedelta
+| TimedeltaIndex
+| timedelta64[ns]
+| to_timedelta() <br \>timedelta_range()
+|
+|-
+| 时间跨度
+| 由时间点及其关联的频率定义的时间跨度。
+| Period
+| PeriodIndex
+| period[freq]
+| Period() <br \>period_range()
+|
+|-
+| 日期偏移
+| 日期增量
+| DateOffset
+| None
+| None
+| DateOffset()
+|
+|}
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/timeseries.html  pandas 用户指南：时间序列]
+}}
+===日期时间属性===
+以下是Timestamp类和DatetimeIndex类的一些属性或方法。Seriess使用<code>.dt</code>来访问。如<code>df['date'].dt.month</code>返回该列月份Seriess
+{| class="wikitable"
+|-
+! 属性
+! 描述
+! 示例
+|-
+| year
+| 年
+| <code>s.dt.year</code> 返回s序列年 <br /><code>pd.to_datetime('2020-01-01').year</code>返回2020
+|-
+| month
+| 月
+| <code>s.dt.month</code> 返回s序列月
+|-
+| day
+| 日
+|
+|-
+| hour
+| 小时
+|
+|-
+| minute
+| 分钟
+|
+|-
+| second
+| 秒
+|
+|-
+| microsecond
+| 微秒
+|
+|-
+| nanosecond
+| 纳秒
+|
+|-
+| date
+| 日期（不包含时区信息）
+|
+|-
+| time
+| 时间（不包含时区信息）
+|
+|-
+| timetz()
+| 时间（包含本地时区信息）
+|
+|-
+| day_of_year / dayofyear
+| 一年里的第几天
+|
+|-
+| week / weekofyear
+| 一年里的第几周
+|
+|-
+| day_of_week / dayofweek  / weekday
+| 一周里的第几天，Monday（星期一）=0，Sunday（星期天）=6
+|
+|-
+| quarter
+| 日期所处的季度，如（1月、2月、3月）=1，（4月、5月、6月）=2
+|
+|-
+| days_in_month
+| 日期所在的月有多少天
+|
+|-
+| is_month_start
+| 是否月初（由频率定义）
+|
+|-
+| is_month_end
+| 是否月末（由频率定义）
+|
+|-
+| is_quarter_start
+| 是否季初（由频率定义）
+|
+|-
+| is_quarter_end
+| 是否季末（由频率定义）
+|
+|-
+| is_year_start
+| 是否年初（由频率定义）
+|
+|-
+| is_year_end
+| 是否年末（由频率定义）
+|
+|-
+| is_leap_year
+| 是否闰年
+|
+|}
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/timeseries.html#time-date-components pandas 用户指南：时间序列 Time/date components]
+|[https://pandas.pydata.org/docs/user_guide/basics.html#dt-accessor pandas 用户指南：基础/dt accessor]
+}}
+===日期偏移===
+DateOffset对象用来处理日期偏移。
+{| class="wikitable"
+|-
+! 日期偏移量
+! 频率字符串
+! 描述
+! 示例
+|-
+| DateOffset
+| 无
+| 通用偏移类，默认为24小时
+|
+|-
+| Day
+| 'D'
+| 一天
+|
+|-
+| Hour
+| 'H'
+| 一小时
+|
+|-
+| Minute
+| 'T' 或 'min'
+| 一分钟
+|
+|-
+| Second
+| 'S'
+| 一秒
+|
+|-
+| Milli
+| 'L' 或 'ms'
+| 一毫秒
+|
+|-
+| Micro
+| 'U' 或 'us'
+| 一微秒
+|
+|-
+| Nano
+| 'N'
+| 一纳秒
+|
+|-
+| BDay 或 BusinessDay
+| 'B'
+| 工作日
+|
+|-
+| CDay 或 CustomBusinessDay
+| 'C'
+| 自定义工作日
+|
+|-
+| Week
+| 'W'
+| 一周，可选锚定周几
+|
+|-
+| WeekOfMonth
+| 'WOM'
+| 每月第几周的第几天
+|
+|-
+| LastWeekOfMonth
+| 'LWOM'
+| 每月最后一周的第几天
+|
+|-
+| MonthEnd
+| 'M'
+| 日历月末
+|
+|-
+| MonthBegin
+| 'MS'
+| 日历月初
+|
+|-
+| BMonthEnd 或 BusinessMonthEnd
+| 'BM'
+| 工作日月末
+|
+|-
+| BMonthBegin 或 BusinessMonthBegin
+| 'BMS'
+| 工作日月初
+|
+|-
+| CBMonthEnd 或 CustomBusinessMonthEnd
+| 'CBM'
+| 自定义工作日月末
+|
+|-
+| CBMonthBegin 或 CustomBusinessMonthBegin
+| 'CBMS'
+| 自定义工作日月初
+|
+|-
+| SemiMonthEnd
+| 'SM'
+| 月第15天（或其他天数）与日历月末
+|
+|-
+| SemiMonthBegin
+| 'SMS'
+| 日历月初与月第15天（或其他天数）
+|
+|-
+| QuarterEnd
+| 'Q'
+| 日历季末
+|
+|-
+| QuarterBegin
+| 'QS'
+| 日历季初
+|
+|-
+| BQuarterEnd
+| 'BQ
+| 工作季末
+|
+|-
+| BQuarterBegin
+| 'BQS'
+| 工作季初
+|
+|-
+| FY5253Quarter
+| 'REQ'
+| 零售（又名 52-53 周）季
+|
+|-
+| YearEnd
+| 'A'
+| 日历年末
+|
+|-
+| YearBegin
+| 'AS' 或 'BYS'
+| 日历年初
+|
+|-
+| BYearEnd
+| 'BA'
+| 工作日年末
+|
+|-
+| BYearBegin
+| 'BAS'
+| 工作日年初
+|
+|-
+| FY5253
+| 'RE'
+| 零售（又名 52-53 周）年
+|
+|-
+| Easter
+| 无
+| 复活节假日
+|
+|-
+| BusinessHour
+| 'BH'
+| 工作小时
+|
+|-
+| CustomBusinessHour
+| 'CBH'
+| 自定义工作小时
+|
+|}
+===时间序列相关===
+{| class="wikitable"
+|-
+!属性/方法
+!描述
+!Series
+!DataFrame
+!示例
+|-
+| asfreq()
+| Convert TimeSeries to specified frequency.
+| Series.asfreq(freq, method=None, how=None, normalize=False, fill_value=None)
+| DataFrame.asfreq(freq, method=None, how=None, normalize=False, fill_value=None)
+|
+|-
+| asof()
+| Return the last row(s) without any NaNs before where.
+| Series.asof(where, subset=None)
+| DataFrame.asof(where, subset=None)
+|
+|-
+| shift()
+| Shift index by desired number of periods with an optional time freq.
+| Series.shift(periods=1, freq=None, axis=0, fill_value=None)
+| DataFrame.shift(periods=1, freq=None, axis=0, fill_value=None)
+|
+|-
+| slice_shift()
+| Equivalent to shift without copying data.
+| Series.slice_shift(periods=1, axis=0)
+| DataFrame.slice_shift(periods=1, axis=0)
+|
+|-
+| tshift()
+| (DEPRECATED) Shift the time index, using the index’s frequency if available.
+| Series.tshift(periods=1, freq=None, axis=0)
+| DataFrame.tshift(periods=1, freq=None, axis=0)
+|
+|-
+| first_valid_index()
+| Return index for first non-NA/null value.
+| Series.first_valid_index()
+| DataFrame.first_valid_index()
+|
+|-
+| last_valid_index()
+| Return index for last non-NA/null value.
+| Series.last_valid_index()
+| DataFrame.last_valid_index()
+|
+|-
+| resample()
+| Resample time-series data.
+| Series.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)
+| DataFrame.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)
+|
+|-
+| to_period()
+| Convert DataFrame from DatetimeIndex to PeriodIndex.
+| Series.to_period(freq=None, copy=True)
+| DataFrame.to_period(freq=None, axis=0, copy=True)
+|
+|-
+| to_timestamp()
+| Cast to DatetimeIndex of timestamps, at beginning of period.
+| Series.to_timestamp(freq=None, how='start', copy=True)
+| DataFrame.to_timestamp(freq=None, how='start', axis=0, copy=True)
+|
+|-
+| tz_convert()
+| Convert tz-aware axis to target time zone.
+| Series.tz_convert(tz, axis=0, level=None, copy=True)
+| DataFrame.tz_convert(tz, axis=0, level=None, copy=True)
+|
+|-
+| tz_localize()
+| Localize tz-naive index of a Series or DataFrame to target time zone.
+| Series.tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise')
+| DataFrame.tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise')
+|
+|}
+==绘图==
+pandas绘图基于[[Matplotlib]]，pandas的DataFrame和Series都自带生成各类图表的plot方法，能够方便快速生成各种图表。
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/visualization.html pandas 用户指南：可视化]
+}}
+===基本图形===
+====折线图====
+plot方法默认生成的就是折线图。如prices是一个DataFrame的含有收盘价close列，绘制收盘价的折线图：
+<syntaxhighlight lang="python" >
+s = prices['close']
+s.plot()
+#设置图片大小，使用figsize参数
+s.plot(figsize=(20,10))
+</syntaxhighlight>
+====条形图====
+对于不连续标签，没有时间序列的数据，可以绘制条形图，使用以下两种方法：
+*使用plot()函数，设置kind参数为‘bar’ or ‘barh’，
+*使用plot.bar()函数，plot.barh()函数
+<syntaxhighlight lang="python" >
+df.plot(kind='bar')    #假设df为每天股票数据
+df.plot.bar()
+df.resample('A-DEC').mean().volume.plot(kind='bar')    #重采集每年成交量平均值，绘制条形图（volume为df的成交量列）
+df.plot.bar(stacked=True)    #stacked=True表示堆积条形图
+df.plot.barh(stacked=True)    #barh 表示水平条形图 </nowiki>
+</syntaxhighlight>
+====直方图====
+直方图使用plot.hist()方法绘制，一般为频数分布直方图，x轴分区间，y轴为频数。组数用参数bins控制，如分20组bins=20
+<syntaxhighlight lang="python" >
+df.volume.plot.hist()    #df股票数据中成交量volume的频数分布直方图。
+df.plot.hist(alpha=0.5)    #alpha=0.5 表示柱形的透明度为0.5
+df.plot.hist(stacked=True, bins=20)    #stacked=True表示堆积绘制，bins=20表示分20组。
+df.plot.hist(orientation='horizontal')    #orientation='horizontal' 表示水平直方图
+df.plot.hist(cumulative=True)    #表示累计直方图
+df['close'].diff().hist()    #收盘价上应用diff函数，再绘制直方图
+df.hist(color='k', bins=50)     #DataFrame.hist函数将每列绘制在不同的子图形上。
+</syntaxhighlight>
+====箱型图====
+箱型图可以使用plot.box()函数或DataFrame的boxplot()绘制。
+参数：
+*color，用来设置颜色，通过传入颜色字典，如color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
+*sym，用来设置异常值样式，如sym='r+'表示异常值用'红色+'表示。
+<syntaxhighlight lang="python" >
+df.plot.box()
+df[['close','open', 'high']].plot.box()
+#改变箱型颜色，通过传入颜色字典
+color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
+df.plot.box(color=color, sym='r+')    #sym用来设置异常值样式，'r+'表示'红色+'
+df.plot.box(positions=[1, 4, 5, 6, 8])    #positions表示显示位置，df有5个列， 第一列显示在x轴1上，第二列显示在x轴4上，以此类推
+df.plot.box(vert=False)    #表示绘制水平箱型图
+df.boxplot()
+#绘制分层箱型图，通过设置by关键词创建分组，再按组，分别绘制箱型图。如下面例子，每列按A组，B组分别绘制箱型图。
+df = pd.DataFrame(np.random.rand(10, 2), columns=['Col1', 'Col2'])
+df['x'] = pd.Series(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'])
+df.boxplot(by='x')
+#还可以再传入一个子分类，再进一步分组绘制。如：
+df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y'])
+</syntaxhighlight>
+====散点图====
+散点图使用DataFrame.plot.scatter()方法绘制。通过参数x，y指定x轴和y轴的数据列。
+<syntaxhighlight lang="python" >
+df.plot.scatter(x='close', y='volume')    #假如df为每日股票数据，图表示收盘价与成交量的散点图
+#将两组散点图绘制在一张图表上，重新ax参数如
+ax = df.plot.scatter(x='close', y='volume', color='DarkBlue', label='Group 1')    #设置标签名label设置标名
+df.plot.scatter(x='open', y='value', color='DarkGreen', label='Group 2', ax=ax)
+#c参数表示圆点的颜色按按volume列大小来渐变表示。
+df.plot.scatter(x='close', y='open', c='volume', s=50)    #s表示原点面积大小
+df.plot.scatter(x='close', y='open', s=df['volume']/50000)  #圆点的大小也可以根据某列数值大小相应设置。
+</syntaxhighlight>
+====饼图====
+饼图使用DataFrame.plot.pie()或Series.plot.pie()绘制。如果数据中有空值，会自动使用0填充。
+===其他绘图函数===
+这些绘图函数来自[https://pandas.pydata.org/pandas-docs/stable/reference/plotting.html pandas.plotting]模块。
+====矩阵散点图（Scatter Matrix Plot）====
+矩阵散点图（Scatter Matrix Plot）使用scatter_matrix()方法绘制
+<syntaxhighlight lang="python" >
+from pandas.plotting import scatter_matrix     #使用前需要从模块中导入该函数
+scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde')    #假设df是每日股票数据，会每一列相对其他每一列生成一个散点图。
+</syntaxhighlight>
+====密度图（Density Plot）====
+密度图使用Series.plot.kde()和DataFrame.plot.kde()函数。
+ df.plot.kde()
+====安德鲁斯曲线（Andrews Curves）====
+安德鲁斯曲线
+====平行坐标图（Parallel Coordinates）====
+====Lag plot====
+====自相关图（Autocorrelation Plot）====
+自相关图
+====自举图（Bootstrap plot）====
+===绘图格式===
+====预设置图形样式====
+matplotlib 从1.5开始，可以预先设置样式，绘图前通过matplotlib.style.use(my_plot_style)。如matplotlib.style.use('ggplot') 定义ggplot-style plots.
+====样式参数====
+大多数绘图函数，可以通过一组参数来设置颜色。
+====标签设置====
+可通过设置legend参数为False来隐藏图片标签，如
+ df.plot(legend=False)
+====尺度====
+*logy参数用来将y轴设置对数标尺
+*logx参数用来将x轴设置对数标尺
+*loglog参数用来将x轴和y轴设置对数标尺
+ ts.plot(logy=True)
+====双坐标图====
+两组序列同x轴，但y轴数据不同，可以通过第二个序列设置参数：secondary_y=True，来设置第二个y轴。
+<syntaxhighlight lang="python" >
+#比如想在收盘价图形上显示cci指标：
+prices['close'].plot()
+prices['cci'].plot(secondary_y=True)
+#第二个坐标轴要显示多个，可以直接传入列名
+ax = df.plot(secondary_y=['cci', 'RSI'], mark_right=False)    #右边轴数据标签默认会加个右边，设置mark_right为False取消显示
+ax.set_ylabel('CD scale')     #设置左边y轴名称
+ax.right_ax.set_ylabel('AB scale')    #设置右边y轴名称
+</syntaxhighlight>
+====子图====
+DataFrame的每一列可以绘制在不同的坐标轴(axis）中，使用subplots参数设置，例如：
+ df.plot(subplots=True, figsize=(6, 6))
+====子图布局====
+子图布局使用关键词layout设置，
+==输入输出==
+pandas的读取函数是顶层函数，如pandas.read_csv()一般返回一个pandas对象。写入函数是相应对象的方法，如DataFrame.to_csv()将DataFrame对象写入到csv文件。下表是可用的读取和写入函数。
+{| class="wikitable"
+|-
+! 数据描述
+! 格式类型
+! 读取函数
+! 写入函数
+! 示例
+|-
+| CSV
+| text
+| [https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html read_csv]
+| [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html to_csv]
+| <code>pd.read_csv('test.csv')</code> 读取test.csv文件。 <br /><code>pd.read_csv('test.csv', sep='\t', header=0, dtype={'a': np.float64, 'b': np.int32, 'c': 'Int64'} )</code> <br /><code>df.to('out.csv')</code>将df保存到out.csv。
+|-
+| Fixed-Width Text File
+| text
+| read_fwf
+|
+|
+|-
+| JSON
+| text
+| read_json
+| to_json
+|
+|-
+| HTML
+| text
+| read_html
+| to_html
+|
+|-
+| Local clipboard
+| text
+| read_clipboard
+| to_clipboard
+|
+|-
+| MS Excel
+|
+| read_excel
+| to_excel
+|  <code>pd.read_excel(r'D:\data\test.xlsx', sheet_name="Sheet1")</code> 读取test.xlsx文件的Sheet1 <br /><code>pd.read_excel('test.xlsx',  converters={'日期':lambda x: pd.to_datetime(x, unit='d',  origin='1899-12-30') })</code> 直接读取日期会变数字，日期列转换以下。
+|-
+| OpenDocument
+| binary
+| read_excel
+|
+|
+|-
+| HDF5 Format
+| binary
+| read_hdf
+| to_hdf
+|
+|-
+| Feather Format
+| binary
+| read_feather
+| to_feather
+|
+|-
+| Parquet Format
+| binary
+| read_parquet
+| to_parquet
+|
+|-
+| ORC Format
+| binary
+| read_orc
+|
+|
+|-
+| Msgpack
+| binary
+| read_msgpack
+| to_msgpack
+|
+|-
+| Stata
+| binary
+| read_stata
+| to_stata
+|
+|-
+| SAS
+| binary
+| read_sas
+|
+|
+|-
+| SPSS
+| binary
+| read_spss
+|
+|
+|-
+| Python Pickle Format
+| binary
+| read_pickle
+| to_pickle
+|
+|-
+| SQL
+| SQL
+| read_sql
+| to_sql
+|
+|-
+| Google BigQuery
+| SQL
+| read_gbq
+| to_gbq
+|
+|}
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/io.html Pandas 教程：IO tools]
+}}
+=== CSV ===
+==== CSV读取 ====
+[https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html read_csv]
+常用参数：
+{| class="wikitable"
+! 参数名称
+! 描述
+! 示例
+|-
+| sep
+| 分隔符。str,默认 ','
+| <code>pd.read_csv('test.csv', sep='\t')</code>
+|-
+|
+|
+|
+|}
+==== CSV写入 ====
+ [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html to_csv]
+常见问题：
+ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte
+使用pd.read_csv()函数时，csv文本的[[编码]]格式不是[[UTF-8]]。一种是将csv文件编码格式改为'utf-8'，另一种是尝试几种常见的编码格式如：
+<syntaxhighlight lang="python" >
+df = pd.read_csv('test.csv', encoding='gbk')
+df = pd.read_csv('test.csv', encoding='gb18030')
+df = pd.read_csv('test.csv', encoding='ISO-8859-1')
+df = pd.read_csv('test.csv', encoding='utf-16')
+</syntaxhighlight>
+类似错误，有时也可能文件格式错误，文件原本是excel格式，将文件后缀改成.xls或.xlsx后使用pd.read_excel()尝试。
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/io.html#csv-text-files Pandas 教程：IO tools/CSV & text files]
+}}
+===Excel===
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/io.html#excel-files Pandas 教程：IO tools/Excel files]
+|[https://stackoverflow.com/questions/38454403/convert-excel-style-date-with-pandas stackoverflow：Convert Excel style date with pandas]
+}}
+==设置==
+Pandas提供一些设置API，可以改变DataFrame的显示等。如<code>pd.options.display.max_rows = 300</code>DataFrame最多显示300行。通过<code>pandas </code>的5个相关函数来设置，这些函数都接受正则表达式模式（样式）作为参数，以匹配明确的子字符串：
+{| class="wikitable"
+! 名称
+! 描述
+! 示例
+|-
+| [https://pandas.pydata.org/docs/reference/api/pandas.get_option.html get_option（）] <br />[https://pandas.pydata.org/docs/reference/api/pandas.set_option.html set_option（）]
+| 获取/设置单个选项的值。
+| <code>pd.set_option("display.max_rows", 5)</code>或<code>pd.options.display.max_rows = 5</code> 设置最多显示5行  <br /><code>pd.set_option("display.max_columns", None)</code> 列全部显示
+|-
+| [https://pandas.pydata.org/docs/reference/api/pandas.reset_option.html reset_option（）]
+| 将一个或多个选项重置为其默认值。
+| <code>pd.reset_option("display.max_rows")</code> 重置最多显示行数
+|-
+| [https://pandas.pydata.org/docs/reference/api/pandas.describe_option.html describe_option（）]
+| 打印一个或多个选项的说明。
+|
+|-
+| [https://pandas.pydata.org/docs/reference/api/pandas.option_context.html option_context（）]
+|  执行一个代码块，其中包含一组选项，这些选项在执行后恢复到以前的设置。
+|
+|}
+{{了解更多
+|[https://pandas.pydata.org/docs/user_guide/options.htm Pandas 用户指南：选项和设置]
+}}
+==资源==
+===官网===
+* Pandas官网：https://pandas.pydata.org/
+* Pandas文档：https://pandas.pydata.org/docs/
+* Pandas 用户指南 - 10分钟入门Pandas：https://pandas.pydata.org/docs/user_guide/10min.html
+* Pandas 用户指南：https://pandas.pydata.org/docs/user_guide/index.html
+* Pandas API参考：https://pandas.pydata.org/docs/reference/index.html
+* Pandas 源代码：https://github.com/pandas-dev/pandas
+=== 网站 ===
+*[https://zh.wikipedia.org/wiki/Pandas 维基百科：Pandas]
+*[https://en.wikipedia.org/wiki/Pandas_(software) 维基百科：Pandas（英）]
-==资源==
+===教程===
-===官网===
-*[https://pandas.pydata.org/ Pandas官网]
-*[https://pandas.pydata.org/docs/ Pandas文档]
-*[https://pandas.pydata.org/docs/user_guide/10min.html Pandas 用户指南 - 10分钟入门Pandas]
-*[https://pandas.pydata.org/docs/user_guide/index.html Pandas 用户指南]
-*[https://pandas.pydata.org/docs/reference/index.html Pandas API参考]
-*[https://github.com/pandas-dev/pandas Pandas 的 Github]
-===相关网站===
 *[https://quant.itiger.com/tquant/research/hub/classroom/detail?nid=4 老虎量化：pandas 介绍]
+*[https://www.gairuo.com/p/pandas-tutorial 盖若：Pandas教程]
 *[https://www.pypandas.cn/docs/ pypandas.cn：Pandas文档]
 *[https://www.yiibai.com/pandas 易百教程：Pandas]
@@ 第1,451行： / 第2,470行： @@
 ===书籍===
 《利用Python进行数据分析 第2版》 - Wes McKinney
-==参考文献==
-*[https://zh.wikipedia.org/wiki/Pandas 维基百科：Pandas]
-*[https://en.wikipedia.org/wiki/Pandas_(software) 维基百科：Pandas（英）]
 [[分类:数据分析]]
-[[分类:数据可视化]]

属性/方法	描述	Series	DataFrame	示例
abs()	返回 Series/DataFrame 每个元素的绝对值。	Series.abs()	DataFrame.abs()	`s.abs()` `df.abs()`
all()	Return whether all elements are True, potentially over an axis.	Series.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)	DataFrame.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
any()	Return whether any element is True, potentially over an axis.	Series.any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)	DataFrame.any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)
clip()	Trim values at input threshold(s).	Series.clip(lower=None, upper=None, axis=None, inplace=False, args, *kwargs)	DataFrame.clip(lower=None, upper=None, axis=None, inplace=False, args, *kwargs)
corr()	Compute pairwise correlation of columns, excluding NA/null values.	Series.corr(other, method='pearson', min_periods=None)	DataFrame.corr(method='pearson', min_periods=1)
corrwith()	Compute pairwise correlation.		DataFrame.corrwith(other, axis=0, drop=False, method='pearson')
count()	统计每行或每列值的个数，不包括NA值。	Series.count(level=None)	DataFrame.count(axis=0, level=None, numeric_only=False)	`s.count()` `df.count()` `df.count(axis='columns')`
cov()	Compute pairwise covariance of columns, excluding NA/null values.	Series.cov(other, min_periods=None, ddof=1)	DataFrame.cov(min_periods=None, ddof=1)
cummax()	Return cumulative maximum over a DataFrame or Series axis.	Series.cummax(axis=None, skipna=True, args, *kwargs)	DataFrame.cummax(axis=None, skipna=True, args, *kwargs)
cummin()	Return cumulative minimum over a DataFrame or Series axis.	Series.cummin(axis=None, skipna=True, args, *kwargs)	DataFrame.cummin(axis=None, skipna=True, args, *kwargs)
cumprod()	Return cumulative product over a DataFrame or Series axis.	Series.cumprod(axis=None, skipna=True, args, *kwargs)	DataFrame.cumprod(axis=None, skipna=True, args, *kwargs)
cumsum()	Return cumulative sum over a DataFrame or Series axis.	Series.cumsum(axis=None, skipna=True, args, *kwargs)	DataFrame.cumsum(axis=None, skipna=True, args, *kwargs)
describe()	Generate descriptive statistics.	Series.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)	DataFrame.describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)
diff()	First discrete difference of element.	Series.diff(periods=1)	DataFrame.diff(periods=1, axis=0)
eval()	Evaluate a string describing operations on DataFrame columns.		DataFrame.eval(expr, inplace=False, **kwargs)
kurt()	Return unbiased kurtosis over requested axis.	Series.kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)	DataFrame.kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
kurtosis()	Return unbiased kurtosis over requested axis.	Series.kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)	DataFrame.kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
mad()	Return the mean absolute deviation of the values for the requested axis.	Series.mad(axis=None, skipna=None, level=None)	DataFrame.mad(axis=None, skipna=None, level=None)
max()	Return the maximum of the values for the requested axis.	Series.max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)	DataFrame.max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
mean()	Return the mean of the values for the requested axis.	Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)	DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
median()	Return the median of the values for the requested axis.	Series.median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)	DataFrame.median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
min()	Return the minimum of the values for the requested axis.	Series.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)	DataFrame.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
mode()	Get the mode(s) of each element along the selected axis.	Series.mode(dropna=True)	DataFrame.mode(axis=0, numeric_only=False, dropna=True)
pct_change()	Percentage change between the current and a prior element.	Series.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)	DataFrame.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)
prod()	Return the product of the values for the requested axis.	Series.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)	DataFrame.prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
product()	Return the product of the values for the requested axis.	Series.product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)	DataFrame.product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
quantile()	Return values at the given quantile over requested axis.	Series.quantile(q=0.5, interpolation='linear')	DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')
rank()	Compute numerical data ranks (1 through n) along axis.	Series.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)	DataFrame.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)
round()	Round a DataFrame to a variable number of decimal places.	Series.round(decimals=0, args, *kwargs)	DataFrame.round(decimals=0, args, *kwargs)
sem()	Return unbiased standard error of the mean over requested axis.	Series.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)	DataFrame.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
skew()	Return unbiased skew over requested axis.	Series.skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)	DataFrame.skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
sum()	Return the sum of the values for the requested axis.	Series.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)	DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
std()	Return sample standard deviation over requested axis.	Series.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)	DataFrame.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
var()	Return unbiased variance over requested axis.	Series.var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)	DataFrame.var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
nunique()	Count distinct observations over requested axis.	Series.nunique(dropna=True)	DataFrame.nunique(axis=0, dropna=True)
value_counts()	Return a Series containing counts of unique rows in the DataFrame.	Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)	DataFrame.value_counts(subset=None, normalize=False, sort=True, ascending=False)