Pandas:修订间差异

无编辑摘要
第149行: 第149行:
{{了解更多|[https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series Pandas API:pandas.Series]}}
{{了解更多|[https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series Pandas API:pandas.Series]}}
====Series方法====
====Series方法====
 
{| class="wikitable"
|-
! 方法
! 描述
! 示例
! 结果
|-
| abs()
| 返回 Series/DataFrame 每个元素的绝对值。
| s.abs()
|
|-
| add(other[, level, fill_value, axis])
| Return Addition of series and other, element-wise (binary operator add).
|
|
|-
| add_prefix(prefix)
| Prefix labels with string prefix.
|-
| add_suffix(suffix)
| Suffix labels with string suffix.
|-
| agg([func, axis])
| Aggregate using one or more operations over the specified axis.
|-
| aggregate([func, axis])
| Aggregate using one or more operations over the specified axis.
|-
| align(other[, join, axis, level, copy, …])
| Align two objects on their axes with the specified join method.
|-
| all([axis, bool_only, skipna, level])
| Return whether all elements are True, potentially over an axis.
|-
| any([axis, bool_only, skipna, level])
| Return whether any element is True, potentially over an axis.
|-
| append(to_append[, ignore_index, …])
| Concatenate two or more Series.
|-
| apply(func[, convert_dtype, args])
| Invoke function on values of Series.
|-
| argmax([axis, skipna])
| Return int position of the largest value in the Series.
|-
| argmin([axis, skipna])
| Return int position of the smallest value in the Series.
|-
| argsort([axis, kind, order])
| Return the integer indices that would sort the Series values.
|-
| asfreq(freq[, method, how, normalize, …])
| Convert TimeSeries to specified frequency.
|-
| asof(where[, subset])
| Return the last row(s) without any NaNs before where.
|-
| astype(dtype[, copy, errors])
| Cast a pandas object to a specified dtype dtype.
|-
| at_time(time[, asof, axis])
| Select values at particular time of day (e.g., 9:30AM).
|-
| autocorr([lag])
| Compute the lag-N autocorrelation.
|-
| backfill([axis, inplace, limit, downcast])
| Synonym for DataFrame.fillna() with method='bfill'.
|-
| between(left, right[, inclusive])
| Return boolean Series equivalent to left <= series <= right.
|-
| between_time(start_time, end_time[, …])
| Select values between particular times of the day (e.g., 9:00-9:30 AM).
|-
| bfill([axis, inplace, limit, downcast])
| Synonym for DataFrame.fillna() with method='bfill'.
|-
| bool()
| Return the bool of a single element Series or DataFrame.
|-
| cat
| alias of pandas.core.arrays.categorical.CategoricalAccessor
|-
| clip([lower, upper, axis, inplace])
| Trim values at input threshold(s).
|-
| combine(other, func[, fill_value])
| Combine the Series with a Series or scalar according to func.
|-
| combine_first(other)
| Combine Series values, choosing the calling Series’s values first.
|-
| compare(other[, align_axis, keep_shape, …])
| Compare to another Series and show the differences.
|-
| convert_dtypes([infer_objects, …])
| Convert columns to best possible dtypes using dtypes supporting pd.NA.
|-
| copy([deep])
| Make a copy of this object’s indices and data.
|-
| corr(other[, method, min_periods])
| Compute correlation with other Series, excluding missing values.
|-
| count([level])
| Return number of non-NA/null observations in the Series.
|-
| cov(other[, min_periods, ddof])
| Compute covariance with Series, excluding missing values.
|-
| cummax([axis, skipna])
| Return cumulative maximum over a DataFrame or Series axis.
|-
| cummin([axis, skipna])
| Return cumulative minimum over a DataFrame or Series axis.
|-
| cumprod([axis, skipna])
| Return cumulative product over a DataFrame or Series axis.
|-
| cumsum([axis, skipna])
| Return cumulative sum over a DataFrame or Series axis.
|-
| describe([percentiles, include, exclude, …])
| Generate descriptive statistics.
|-
| diff([periods])
| First discrete difference of element.
|-
| div(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator truediv).
|-
| divide(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator truediv).
|-
| divmod(other[, level, fill_value, axis])
| Return Integer division and modulo of series and other, element-wise (binary operator divmod).
|-
| dot(other)
| Compute the dot product between the Series and the columns of other.
|-
| drop([labels, axis, index, columns, level, …])
| Return Series with specified index labels removed.
|-
| drop_duplicates([keep, inplace])
| Return Series with duplicate values removed.
|-
| droplevel(level[, axis])
| Return DataFrame with requested index / column level(s) removed.
|-
| dropna([axis, inplace, how])
| Return a new Series with missing values removed.
|-
| dt
| alias of pandas.core.indexes.accessors.CombinedDatetimelikeProperties
|-
| duplicated([keep])
| Indicate duplicate Series values.
|-
| eq(other[, level, fill_value, axis])
| Return Equal to of series and other, element-wise (binary operator eq).
|-
| equals(other)
| Test whether two objects contain the same elements.
|-
| ewm([com, span, halflife, alpha, …])
| Provide exponential weighted (EW) functions.
|-
| expanding([min_periods, center, axis])
| Provide expanding transformations.
|-
| explode([ignore_index])
| Transform each element of a list-like to a row.
|-
| factorize([sort, na_sentinel])
| Encode the object as an enumerated type or categorical variable.
|-
| ffill([axis, inplace, limit, downcast])
| Synonym for DataFrame.fillna() with method='ffill'.
|-
| fillna([value, method, axis, inplace, …])
| Fill NA/NaN values using the specified method.
|-
| filter([items, like, regex, axis])
| Subset the dataframe rows or columns according to the specified index labels.
|-
| first(offset)
| Select initial periods of time series data based on a date offset.
|-
| first_valid_index()
| Return index for first non-NA/null value.
|-
| floordiv(other[, level, fill_value, axis])
| Return Integer division of series and other, element-wise (binary operator floordiv).
|-
| ge(other[, level, fill_value, axis])
| Return Greater than or equal to of series and other, element-wise (binary operator ge).
|-
| get(key[, default])
| Get item from object for given key (ex: DataFrame column).
|-
| groupby([by, axis, level, as_index, sort, …])
| Group Series using a mapper or by a Series of columns.
|-
| gt(other[, level, fill_value, axis])
| Return Greater than of series and other, element-wise (binary operator gt).
|-
| head([n])
| Return the first n rows.
|-
| hist([by, ax, grid, xlabelsize, xrot, …])
| Draw histogram of the input series using matplotlib.
|-
| idxmax([axis, skipna])
| Return the row label of the maximum value.
|-
| idxmin([axis, skipna])
| Return the row label of the minimum value.
|-
| infer_objects()
| Attempt to infer better dtypes for object columns.
|-
| interpolate([method, axis, limit, inplace, …])
| Please note that only method='linear' is supported for DataFrame/Series with a MultiIndex.
|-
| isin(values)
| Whether elements in Series are contained in values.
|-
| isna()
| Detect missing values.
|-
| isnull()
| Detect missing values.
|-
| item()
| Return the first element of the underlying data as a python scalar.
|-
| items()
| Lazily iterate over (index, value) tuples.
|-
| iteritems()
| Lazily iterate over (index, value) tuples.
|-
| keys()
| Return alias for index.
|-
| kurt([axis, skipna, level, numeric_only])
| Return unbiased kurtosis over requested axis.
|-
| kurtosis([axis, skipna, level, numeric_only])
| Return unbiased kurtosis over requested axis.
|-
| last(offset)
| Select final periods of time series data based on a date offset.
|-
| last_valid_index()
| Return index for last non-NA/null value.
|-
| le(other[, level, fill_value, axis])
| Return Less than or equal to of series and other, element-wise (binary operator le).
|-
| lt(other[, level, fill_value, axis])
| Return Less than of series and other, element-wise (binary operator lt).
|-
| mad([axis, skipna, level])
| Return the mean absolute deviation of the values for the requested axis.
|-
| map(arg[, na_action])
| Map values of Series according to input correspondence.
|-
| mask(cond[, other, inplace, axis, level, …])
| Replace values where the condition is True.
|-
| max([axis, skipna, level, numeric_only])
| Return the maximum of the values for the requested axis.
|-
| mean([axis, skipna, level, numeric_only])
| Return the mean of the values for the requested axis.
|-
| median([axis, skipna, level, numeric_only])
| Return the median of the values for the requested axis.
|-
| memory_usage([index, deep])
| Return the memory usage of the Series.
|-
| min([axis, skipna, level, numeric_only])
| Return the minimum of the values for the requested axis.
|-
| mod(other[, level, fill_value, axis])
| Return Modulo of series and other, element-wise (binary operator mod).
|-
| mode([dropna])
| Return the mode(s) of the dataset.
|-
| mul(other[, level, fill_value, axis])
| Return Multiplication of series and other, element-wise (binary operator mul).
|-
| multiply(other[, level, fill_value, axis])
| Return Multiplication of series and other, element-wise (binary operator mul).
|-
| ne(other[, level, fill_value, axis])
| Return Not equal to of series and other, element-wise (binary operator ne).
|-
| nlargest([n, keep])
| Return the largest n elements.
|-
| notna()
| Detect existing (non-missing) values.
|-
| notnull()
| Detect existing (non-missing) values.
|-
| nsmallest([n, keep])
| Return the smallest n elements.
|-
| nunique([dropna])
| Return number of unique elements in the object.
|-
| pad([axis, inplace, limit, downcast])
| Synonym for DataFrame.fillna() with method='ffill'.
|-
| pct_change([periods, fill_method, limit, freq])
| Percentage change between the current and a prior element.
|-
| pipe(func, *args, **kwargs)
| Apply func(self, *args, **kwargs).
|-
| plot
| alias of pandas.plotting._core.PlotAccessor
|-
| pop(item)
| Return item and drops from series.
|-
| pow(other[, level, fill_value, axis])
| Return Exponential power of series and other, element-wise (binary operator pow).
|-
| prod([axis, skipna, level, numeric_only, …])
| Return the product of the values for the requested axis.
|-
| product([axis, skipna, level, numeric_only, …])
| Return the product of the values for the requested axis.
|-
| quantile([q, interpolation])
| Return value at the given quantile.
|-
| radd(other[, level, fill_value, axis])
| Return Addition of series and other, element-wise (binary operator radd).
|-
| rank([axis, method, numeric_only, …])
| Compute numerical data ranks (1 through n) along axis.
|-
| ravel([order])
| Return the flattened underlying data as an ndarray.
|-
| rdiv(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator rtruediv).
|-
| rdivmod(other[, level, fill_value, axis])
| Return Integer division and modulo of series and other, element-wise (binary operator rdivmod).
|-
| reindex([index])
| Conform Series to new index with optional filling logic.
|-
| reindex_like(other[, method, copy, limit, …])
| Return an object with matching indices as other object.
|-
| rename([index, axis, copy, inplace, level, …])
| Alter Series index labels or name.
|-
| rename_axis(**kwargs)
| Set the name of the axis for the index or columns.
|-
| reorder_levels(order)
| Rearrange index levels using input order.
|-
| repeat(repeats[, axis])
| Repeat elements of a Series.
|-
| replace([to_replace, value, inplace, limit, …])
| Replace values given in to_replace with value.
|-
| resample(rule[, axis, closed, label, …])
| Resample time-series data.
|-
| reset_index([level, drop, name, inplace])
| Generate a new DataFrame or Series with the index reset.
|-
| rfloordiv(other[, level, fill_value, axis])
| Return Integer division of series and other, element-wise (binary operator rfloordiv).
|-
| rmod(other[, level, fill_value, axis])
| Return Modulo of series and other, element-wise (binary operator rmod).
|-
| rmul(other[, level, fill_value, axis])
| Return Multiplication of series and other, element-wise (binary operator rmul).
|-
| rolling(window[, min_periods, center, …])
| Provide rolling window calculations.
|-
| round([decimals])
| Round each value in a Series to the given number of decimals.
|-
| rpow(other[, level, fill_value, axis])
| Return Exponential power of series and other, element-wise (binary operator rpow).
|-
| rsub(other[, level, fill_value, axis])
| Return Subtraction of series and other, element-wise (binary operator rsub).
|-
| rtruediv(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator rtruediv).
|-
| sample([n, frac, replace, weights, …])
| Return a random sample of items from an axis of object.
|-
| searchsorted(value[, side, sorter])
| Find indices where elements should be inserted to maintain order.
|-
| sem([axis, skipna, level, ddof, numeric_only])
| Return unbiased standard error of the mean over requested axis.
|-
| set_axis(labels[, axis, inplace])
| Assign desired index to given axis.
|-
| shift([periods, freq, axis, fill_value])
| Shift index by desired number of periods with an optional time freq.
|-
| skew([axis, skipna, level, numeric_only])
| Return unbiased skew over requested axis.
|-
| slice_shift([periods, axis])
| Equivalent to shift without copying data.
|-
| sort_index([axis, level, ascending, …])
| Sort Series by index labels.
|-
| sort_values([axis, ascending, inplace, …])
| Sort by the values.
|-
| sparse
| alias of pandas.core.arrays.sparse.accessor.SparseAccessor
|-
| squeeze([axis])
| Squeeze 1 dimensional axis objects into scalars.
|-
| std([axis, skipna, level, ddof, numeric_only])
| Return sample standard deviation over requested axis.
|-
| str
| alias of pandas.core.strings.StringMethods
|-
| sub(other[, level, fill_value, axis])
| Return Subtraction of series and other, element-wise (binary operator sub).
|-
| subtract(other[, level, fill_value, axis])
| Return Subtraction of series and other, element-wise (binary operator sub).
|-
| sum([axis, skipna, level, numeric_only, …])
| Return the sum of the values for the requested axis.
|-
| swapaxes(axis1, axis2[, copy])
| Interchange axes and swap values axes appropriately.
|-
| swaplevel([i, j, copy])
| Swap levels i and j in a MultiIndex.
|-
| tail([n])
| Return the last n rows.
|-
| take(indices[, axis, is_copy])
| Return the elements in the given positional indices along an axis.
|-
| to_clipboard([excel, sep])
| Copy object to the system clipboard.
|-
| to_csv([path_or_buf, sep, na_rep, …])
| Write object to a comma-separated values (csv) file.
|-
| to_dict([into])
| Convert Series to {label -> value} dict or dict-like object.
|-
| to_excel(excel_writer[, sheet_name, na_rep, …])
| Write object to an Excel sheet.
|-
| to_frame([name])
| Convert Series to DataFrame.
|-
| to_hdf(path_or_buf, key[, mode, complevel, …])
| Write the contained data to an HDF5 file using HDFStore.
|-
| to_json([path_or_buf, orient, date_format, …])
| Convert the object to a JSON string.
|-
| to_latex([buf, columns, col_space, header, …])
| Render object to a LaTeX tabular, longtable, or nested table/tabular.
|-
| to_list()
| Return a list of the values.
|-
| to_markdown([buf, mode, index])
| Print Series in Markdown-friendly format.
|-
| to_numpy([dtype, copy, na_value])
| A NumPy ndarray representing the values in this Series or Index.
|-
| to_period([freq, copy])
| Convert Series from DatetimeIndex to PeriodIndex.
|-
| to_pickle(path[, compression, protocol])
| Pickle (serialize) object to file.
|-
| to_sql(name, con[, schema, if_exists, …])
| Write records stored in a DataFrame to a SQL database.
|-
| to_string([buf, na_rep, float_format, …])
| Render a string representation of the Series.
|-
| to_timestamp([freq, how, copy])
| Cast to DatetimeIndex of Timestamps, at beginning of period.
|-
| to_xarray()
| Return an xarray object from the pandas object.
|-
| tolist()
| Return a list of the values.
|-
| transform(func[, axis])
| Call func on self producing a Series with transformed values.
|-
| transpose(*args, **kwargs)
| Return the transpose, which is by definition self.
|-
| truediv(other[, level, fill_value, axis])
| Return Floating division of series and other, element-wise (binary operator truediv).
|-
| truncate([before, after, axis, copy])
| Truncate a Series or DataFrame before and after some index value.
|-
| tshift([periods, freq, axis])
| (DEPRECATED) Shift the time index, using the index’s frequency if available.
|-
| tz_convert(tz[, axis, level, copy])
| Convert tz-aware axis to target time zone.
|-
| tz_localize(tz[, axis, level, copy, …])
| Localize tz-naive index of a Series or DataFrame to target time zone.
|-
| unique()
| Return unique values of Series object.
|-
| unstack([level, fill_value])
| Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
|-
| update(other)
| Modify Series in place using values from passed Series.
|-
| value_counts([normalize, sort, ascending, …])
| Return a Series containing counts of unique values.
|-
| var([axis, skipna, level, ddof, numeric_only])
| Return unbiased variance over requested axis.
|-
| view([dtype])
| Create a new view of the Series.
|-
| where(cond[, other, inplace, axis, level, …])
| Replace values where the condition is False.
|-
| xs(key[, axis, level, drop_level])
| Return cross-section from the Series/DataFrame.
|}
{{了解更多|[https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series Pandas API:pandas.Series]}}
===DataFrame===
===DataFrame===
DataFrame是有标记的二维的数据结构,具有可能不同类型的列。由数据,行标签,列标签构成。
DataFrame是有标记的二维的数据结构,具有可能不同类型的列。由数据,行标签,列标签构成。

2020年10月3日 (六) 13:37的版本

Pandas是Python的一个开源软件库,用于数据分析,可以方便对数据进行处理、计算、分析、存储及可视化。

简介

时间轴

  • 2008年,开发者Wes McKinney在AQR Capital Management开始制作pandas来满足在财务数据上进行定量分析对高性能、灵活工具的需要。在离开AQR之前他说服管理者允许他将这个库开放源代码。
  • 2012年,另一个AQR雇员Chang She加入了这项努力并成为这个库的第二个主要贡献者。
  • 2015年,Pandas签约了NumFOCUS的一个财务赞助项目,它是美国的501(c)(3)非营利慈善团体。

安装和导入

使用pip安装Pandas

pip install pandas

如果使用的是Anaconda等计算科学软件包,已经安装好了pandas库。

导入Pandas,在脚本顶部导入,一般写法如下:

import pandas as pd

查看Pandas版本:

pd.__version__

数据结构

pandas定义了2种数据类型,Series和DataFrame,大部分操作都在这两种数据类型上进行。

了解更多 >> Pandas 用户指南:数据结构


Series

Series是一个有轴标签(索引)的一维数组,能够保存任何数据类型(整数,字符串,浮点数,Python对象等)。轴标签称为index。和Python字典类似。

创建Series

创建Series的基本方法为,使用pandas.Series类新建一个Series对象,格式如下:

pd.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

轴标签index不是必须,如果省略,轴标签默认为从0开始的整数数组。一些示例如下:

s = pd.Series(["foo", "bar", "foba"])
print(type(s))   #<class 'pandas.core.series.Series'>

s2 = pd.Series(["foo", "bar", "foba"], index=['b','d','c'])

# 创建日期索引
date_index = pd.date_range("2020-01-01", periods=3, freq="D")
s3 = pd.Series(["foo", "bar", "foba"], index=date_index)

Series数据操作

Series属性

下表示例中s为Series对象:

>>> s = pd.Series(['a', 'b', 'c'])
>>> s
0    a
1    b
2    c
dtype: object
属性名 描述 示例 结果
T 返回转置,根据定义,Series转置为自身。 s.T 自身
array 返回 Series 或 Index 数据的数组,该数组为pangdas扩展的python数组. s.array <PandasArray>
['a', 'b', 'c']
Length: 3, dtype: object
at 通过行轴和列轴标签获取或设置单个值。 s.at[1]
s.at[2]='d'
'b'
attrs 此对象全局属性字典。 s.attrs {}
axes 返回行轴标签的列表。 s.axes [RangeIndex(start=0, stop=3, step=1)]
dtype 返回数据的Numpy数据类型 s.dtype dtype('O')
dtypes 返回数据的Numpy数据类型 s.dtypes dtype('O')
hasnans 如果有任何空值(如Python的None,np.NaN)返回True,否则返回False。 s2 = pd.Series(['a', None, 'c'])
s2.hasnans
True
iat 通过行轴和列轴整数位置获取或设置单个值。 s.iat[1]
s.iat[2]='d'
'b'
iloc 通过索引(行轴)整数位置获取或设置值。 1. s.iloc[2]
2. s.iloc[:2]
3. s.iloc[[True,False,True]]
4. s.iloc[lambda x: x.index % 2 == 0]
1. 'b'
2. 选取索引为0到2(不包含2)的值
3. 选取索引位置为True的值
4. 选取索引为双数的值
index The index (axis labels) of the Series.
is_monotonic Return boolean if values in the object are monotonic_increasing.
is_monotonic_decreasing Return boolean if values in the object are monotonic_decreasing.
is_monotonic_increasing Alias for is_monotonic.
is_unique Return boolean if values in the object are unique.
loc Access a group of rows and columns by label(s) or a boolean array.
name Return the name of the Series.
nbytes Return the number of bytes in the underlying data.
ndim Number of dimensions of the underlying data, by definition 1.
shape Return a tuple of the shape of the underlying data.
size Return the number of elements in the underlying data.
values Return Series as ndarray or ndarray-like depending on the dtype.

了解更多 >> Pandas API:pandas.Series


Series方法

方法 描述 示例 结果
abs() 返回 Series/DataFrame 每个元素的绝对值。 s.abs()
add(other[, level, fill_value, axis]) Return Addition of series and other, element-wise (binary operator add).
add_prefix(prefix) Prefix labels with string prefix.
add_suffix(suffix) Suffix labels with string suffix.
agg([func, axis]) Aggregate using one or more operations over the specified axis.
aggregate([func, axis]) Aggregate using one or more operations over the specified axis.
align(other[, join, axis, level, copy, …]) Align two objects on their axes with the specified join method.
all([axis, bool_only, skipna, level]) Return whether all elements are True, potentially over an axis.
any([axis, bool_only, skipna, level]) Return whether any element is True, potentially over an axis.
append(to_append[, ignore_index, …]) Concatenate two or more Series.
apply(func[, convert_dtype, args]) Invoke function on values of Series.
argmax([axis, skipna]) Return int position of the largest value in the Series.
argmin([axis, skipna]) Return int position of the smallest value in the Series.
argsort([axis, kind, order]) Return the integer indices that would sort the Series values.
asfreq(freq[, method, how, normalize, …]) Convert TimeSeries to specified frequency.
asof(where[, subset]) Return the last row(s) without any NaNs before where.
astype(dtype[, copy, errors]) Cast a pandas object to a specified dtype dtype.
at_time(time[, asof, axis]) Select values at particular time of day (e.g., 9:30AM).
autocorr([lag]) Compute the lag-N autocorrelation.
backfill([axis, inplace, limit, downcast]) Synonym for DataFrame.fillna() with method='bfill'.
between(left, right[, inclusive]) Return boolean Series equivalent to left <= series <= right.
between_time(start_time, end_time[, …]) Select values between particular times of the day (e.g., 9:00-9:30 AM).
bfill([axis, inplace, limit, downcast]) Synonym for DataFrame.fillna() with method='bfill'.
bool() Return the bool of a single element Series or DataFrame.
cat alias of pandas.core.arrays.categorical.CategoricalAccessor
clip([lower, upper, axis, inplace]) Trim values at input threshold(s).
combine(other, func[, fill_value]) Combine the Series with a Series or scalar according to func.
combine_first(other) Combine Series values, choosing the calling Series’s values first.
compare(other[, align_axis, keep_shape, …]) Compare to another Series and show the differences.
convert_dtypes([infer_objects, …]) Convert columns to best possible dtypes using dtypes supporting pd.NA.
copy([deep]) Make a copy of this object’s indices and data.
corr(other[, method, min_periods]) Compute correlation with other Series, excluding missing values.
count([level]) Return number of non-NA/null observations in the Series.
cov(other[, min_periods, ddof]) Compute covariance with Series, excluding missing values.
cummax([axis, skipna]) Return cumulative maximum over a DataFrame or Series axis.
cummin([axis, skipna]) Return cumulative minimum over a DataFrame or Series axis.
cumprod([axis, skipna]) Return cumulative product over a DataFrame or Series axis.
cumsum([axis, skipna]) Return cumulative sum over a DataFrame or Series axis.
describe([percentiles, include, exclude, …]) Generate descriptive statistics.
diff([periods]) First discrete difference of element.
div(other[, level, fill_value, axis]) Return Floating division of series and other, element-wise (binary operator truediv).
divide(other[, level, fill_value, axis]) Return Floating division of series and other, element-wise (binary operator truediv).
divmod(other[, level, fill_value, axis]) Return Integer division and modulo of series and other, element-wise (binary operator divmod).
dot(other) Compute the dot product between the Series and the columns of other.
drop([labels, axis, index, columns, level, …]) Return Series with specified index labels removed.
drop_duplicates([keep, inplace]) Return Series with duplicate values removed.
droplevel(level[, axis]) Return DataFrame with requested index / column level(s) removed.
dropna([axis, inplace, how]) Return a new Series with missing values removed.
dt alias of pandas.core.indexes.accessors.CombinedDatetimelikeProperties
duplicated([keep]) Indicate duplicate Series values.
eq(other[, level, fill_value, axis]) Return Equal to of series and other, element-wise (binary operator eq).
equals(other) Test whether two objects contain the same elements.
ewm([com, span, halflife, alpha, …]) Provide exponential weighted (EW) functions.
expanding([min_periods, center, axis]) Provide expanding transformations.
explode([ignore_index]) Transform each element of a list-like to a row.
factorize([sort, na_sentinel]) Encode the object as an enumerated type or categorical variable.
ffill([axis, inplace, limit, downcast]) Synonym for DataFrame.fillna() with method='ffill'.
fillna([value, method, axis, inplace, …]) Fill NA/NaN values using the specified method.
filter([items, like, regex, axis]) Subset the dataframe rows or columns according to the specified index labels.
first(offset) Select initial periods of time series data based on a date offset.
first_valid_index() Return index for first non-NA/null value.
floordiv(other[, level, fill_value, axis]) Return Integer division of series and other, element-wise (binary operator floordiv).
ge(other[, level, fill_value, axis]) Return Greater than or equal to of series and other, element-wise (binary operator ge).
get(key[, default]) Get item from object for given key (ex: DataFrame column).
groupby([by, axis, level, as_index, sort, …]) Group Series using a mapper or by a Series of columns.
gt(other[, level, fill_value, axis]) Return Greater than of series and other, element-wise (binary operator gt).
head([n]) Return the first n rows.
hist([by, ax, grid, xlabelsize, xrot, …]) Draw histogram of the input series using matplotlib.
idxmax([axis, skipna]) Return the row label of the maximum value.
idxmin([axis, skipna]) Return the row label of the minimum value.
infer_objects() Attempt to infer better dtypes for object columns.
interpolate([method, axis, limit, inplace, …]) Please note that only method='linear' is supported for DataFrame/Series with a MultiIndex.
isin(values) Whether elements in Series are contained in values.
isna() Detect missing values.
isnull() Detect missing values.
item() Return the first element of the underlying data as a python scalar.
items() Lazily iterate over (index, value) tuples.
iteritems() Lazily iterate over (index, value) tuples.
keys() Return alias for index.
kurt([axis, skipna, level, numeric_only]) Return unbiased kurtosis over requested axis.
kurtosis([axis, skipna, level, numeric_only]) Return unbiased kurtosis over requested axis.
last(offset) Select final periods of time series data based on a date offset.
last_valid_index() Return index for last non-NA/null value.
le(other[, level, fill_value, axis]) Return Less than or equal to of series and other, element-wise (binary operator le).
lt(other[, level, fill_value, axis]) Return Less than of series and other, element-wise (binary operator lt).
mad([axis, skipna, level]) Return the mean absolute deviation of the values for the requested axis.
map(arg[, na_action]) Map values of Series according to input correspondence.
mask(cond[, other, inplace, axis, level, …]) Replace values where the condition is True.
max([axis, skipna, level, numeric_only]) Return the maximum of the values for the requested axis.
mean([axis, skipna, level, numeric_only]) Return the mean of the values for the requested axis.
median([axis, skipna, level, numeric_only]) Return the median of the values for the requested axis.
memory_usage([index, deep]) Return the memory usage of the Series.
min([axis, skipna, level, numeric_only]) Return the minimum of the values for the requested axis.
mod(other[, level, fill_value, axis]) Return Modulo of series and other, element-wise (binary operator mod).
mode([dropna]) Return the mode(s) of the dataset.
mul(other[, level, fill_value, axis]) Return Multiplication of series and other, element-wise (binary operator mul).
multiply(other[, level, fill_value, axis]) Return Multiplication of series and other, element-wise (binary operator mul).
ne(other[, level, fill_value, axis]) Return Not equal to of series and other, element-wise (binary operator ne).
nlargest([n, keep]) Return the largest n elements.
notna() Detect existing (non-missing) values.
notnull() Detect existing (non-missing) values.
nsmallest([n, keep]) Return the smallest n elements.
nunique([dropna]) Return number of unique elements in the object.
pad([axis, inplace, limit, downcast]) Synonym for DataFrame.fillna() with method='ffill'.
pct_change([periods, fill_method, limit, freq]) Percentage change between the current and a prior element.
pipe(func, *args, **kwargs) Apply func(self, *args, **kwargs).
plot alias of pandas.plotting._core.PlotAccessor
pop(item) Return item and drops from series.
pow(other[, level, fill_value, axis]) Return Exponential power of series and other, element-wise (binary operator pow).
prod([axis, skipna, level, numeric_only, …]) Return the product of the values for the requested axis.
product([axis, skipna, level, numeric_only, …]) Return the product of the values for the requested axis.
quantile([q, interpolation]) Return value at the given quantile.
radd(other[, level, fill_value, axis]) Return Addition of series and other, element-wise (binary operator radd).
rank([axis, method, numeric_only, …]) Compute numerical data ranks (1 through n) along axis.
ravel([order]) Return the flattened underlying data as an ndarray.
rdiv(other[, level, fill_value, axis]) Return Floating division of series and other, element-wise (binary operator rtruediv).
rdivmod(other[, level, fill_value, axis]) Return Integer division and modulo of series and other, element-wise (binary operator rdivmod).
reindex([index]) Conform Series to new index with optional filling logic.
reindex_like(other[, method, copy, limit, …]) Return an object with matching indices as other object.
rename([index, axis, copy, inplace, level, …]) Alter Series index labels or name.
rename_axis(**kwargs) Set the name of the axis for the index or columns.
reorder_levels(order) Rearrange index levels using input order.
repeat(repeats[, axis]) Repeat elements of a Series.
replace([to_replace, value, inplace, limit, …]) Replace values given in to_replace with value.
resample(rule[, axis, closed, label, …]) Resample time-series data.
reset_index([level, drop, name, inplace]) Generate a new DataFrame or Series with the index reset.
rfloordiv(other[, level, fill_value, axis]) Return Integer division of series and other, element-wise (binary operator rfloordiv).
rmod(other[, level, fill_value, axis]) Return Modulo of series and other, element-wise (binary operator rmod).
rmul(other[, level, fill_value, axis]) Return Multiplication of series and other, element-wise (binary operator rmul).
rolling(window[, min_periods, center, …]) Provide rolling window calculations.
round([decimals]) Round each value in a Series to the given number of decimals.
rpow(other[, level, fill_value, axis]) Return Exponential power of series and other, element-wise (binary operator rpow).
rsub(other[, level, fill_value, axis]) Return Subtraction of series and other, element-wise (binary operator rsub).
rtruediv(other[, level, fill_value, axis]) Return Floating division of series and other, element-wise (binary operator rtruediv).
sample([n, frac, replace, weights, …]) Return a random sample of items from an axis of object.
searchsorted(value[, side, sorter]) Find indices where elements should be inserted to maintain order.
sem([axis, skipna, level, ddof, numeric_only]) Return unbiased standard error of the mean over requested axis.
set_axis(labels[, axis, inplace]) Assign desired index to given axis.
shift([periods, freq, axis, fill_value]) Shift index by desired number of periods with an optional time freq.
skew([axis, skipna, level, numeric_only]) Return unbiased skew over requested axis.
slice_shift([periods, axis]) Equivalent to shift without copying data.
sort_index([axis, level, ascending, …]) Sort Series by index labels.
sort_values([axis, ascending, inplace, …]) Sort by the values.
sparse alias of pandas.core.arrays.sparse.accessor.SparseAccessor
squeeze([axis]) Squeeze 1 dimensional axis objects into scalars.
std([axis, skipna, level, ddof, numeric_only]) Return sample standard deviation over requested axis.
str alias of pandas.core.strings.StringMethods
sub(other[, level, fill_value, axis]) Return Subtraction of series and other, element-wise (binary operator sub).
subtract(other[, level, fill_value, axis]) Return Subtraction of series and other, element-wise (binary operator sub).
sum([axis, skipna, level, numeric_only, …]) Return the sum of the values for the requested axis.
swapaxes(axis1, axis2[, copy]) Interchange axes and swap values axes appropriately.
swaplevel([i, j, copy]) Swap levels i and j in a MultiIndex.
tail([n]) Return the last n rows.
take(indices[, axis, is_copy]) Return the elements in the given positional indices along an axis.
to_clipboard([excel, sep]) Copy object to the system clipboard.
to_csv([path_or_buf, sep, na_rep, …]) Write object to a comma-separated values (csv) file.
to_dict([into]) Convert Series to {label -> value} dict or dict-like object.
to_excel(excel_writer[, sheet_name, na_rep, …]) Write object to an Excel sheet.
to_frame([name]) Convert Series to DataFrame.
to_hdf(path_or_buf, key[, mode, complevel, …]) Write the contained data to an HDF5 file using HDFStore.
to_json([path_or_buf, orient, date_format, …]) Convert the object to a JSON string.
to_latex([buf, columns, col_space, header, …]) Render object to a LaTeX tabular, longtable, or nested table/tabular.
to_list() Return a list of the values.
to_markdown([buf, mode, index]) Print Series in Markdown-friendly format.
to_numpy([dtype, copy, na_value]) A NumPy ndarray representing the values in this Series or Index.
to_period([freq, copy]) Convert Series from DatetimeIndex to PeriodIndex.
to_pickle(path[, compression, protocol]) Pickle (serialize) object to file.
to_sql(name, con[, schema, if_exists, …]) Write records stored in a DataFrame to a SQL database.
to_string([buf, na_rep, float_format, …]) Render a string representation of the Series.
to_timestamp([freq, how, copy]) Cast to DatetimeIndex of Timestamps, at beginning of period.
to_xarray() Return an xarray object from the pandas object.
tolist() Return a list of the values.
transform(func[, axis]) Call func on self producing a Series with transformed values.
transpose(*args, **kwargs) Return the transpose, which is by definition self.
truediv(other[, level, fill_value, axis]) Return Floating division of series and other, element-wise (binary operator truediv).
truncate([before, after, axis, copy]) Truncate a Series or DataFrame before and after some index value.
tshift([periods, freq, axis]) (DEPRECATED) Shift the time index, using the index’s frequency if available.
tz_convert(tz[, axis, level, copy]) Convert tz-aware axis to target time zone.
tz_localize(tz[, axis, level, copy, …]) Localize tz-naive index of a Series or DataFrame to target time zone.
unique() Return unique values of Series object.
unstack([level, fill_value]) Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
update(other) Modify Series in place using values from passed Series.
value_counts([normalize, sort, ascending, …]) Return a Series containing counts of unique values.
var([axis, skipna, level, ddof, numeric_only]) Return unbiased variance over requested axis.
view([dtype]) Create a new view of the Series.
where(cond[, other, inplace, axis, level, …]) Replace values where the condition is False.
xs(key[, axis, level, drop_level]) Return cross-section from the Series/DataFrame.

了解更多 >> Pandas API:pandas.Series


DataFrame

DataFrame是有标记的二维的数据结构,具有可能不同类型的列。由数据,行标签,列标签构成。

Pandas绘图

pandas绘图基于Matplotlib,pandas的DataFrame和Series都自带生成各类图表的plot方法,能够方便快速生成各种图表。

了解更多 >> pandas文档:用户指南 - 可视化


基本图形

折线图

plot方法默认生成的就是折线图。如prices是一个DataFrame的含有收盘价close列,绘制收盘价的折线图:

s = prices['close']
s.plot() 

#设置图片大小,使用figsize参数
s.plot(figsize=(20,10))

条形图

对于不连续标签,没有时间序列的数据,可以绘制条形图,使用以下两种方法:

  • 使用plot()函数,设置kind参数为‘bar’ or ‘barh’,
  • 使用plot.bar()函数,plot.barh()函数
df.plot(kind='bar')    #假设df为每天股票数据  
df.plot.bar()          
df.resample('A-DEC').mean().volume.plot(kind='bar')    #重采集每年成交量平均值,绘制条形图(volume为df的成交量列)

df.plot.bar(stacked=True)    #stacked=True表示堆积条形图
df.plot.barh(stacked=True)    #barh 表示水平条形图 </nowiki>

直方图

直方图使用plot.hist()方法绘制,一般为频数分布直方图,x轴分区间,y轴为频数。组数用参数bins控制,如分20组bins=20

df.volume.plot.hist()    #df股票数据中成交量volume的频数分布直方图。
df.plot.hist(alpha=0.5)    #alpha=0.5 表示柱形的透明度为0.5
df.plot.hist(stacked=True, bins=20)    #stacked=True表示堆积绘制,bins=20表示分20组。
df.plot.hist(orientation='horizontal')    #orientation='horizontal' 表示水平直方图
df.plot.hist(cumulative=True)    #表示累计直方图  

df['close'].diff().hist()    #收盘价上应用diff函数,再绘制直方图
df.hist(color='k', bins=50)     #DataFrame.hist函数将每列绘制在不同的子图形上。

箱型图

箱型图可以使用plot.box()函数或DataFrame的boxplot()绘制。 参数:

  • color,用来设置颜色,通过传入颜色字典,如color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
  • sym,用来设置异常值样式,如sym='r+'表示异常值用'红色+'表示。
df.plot.box()
df[['close','open', 'high']].plot.box()
#改变箱型颜色,通过传入颜色字典
color={'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', 'medians': 'DarkBlue', 'caps': 'Gray'}
df.plot.box(color=color, sym='r+')    #sym用来设置异常值样式,'r+'表示'红色+'
df.plot.box(positions=[1, 4, 5, 6, 8])    #positions表示显示位置,df有5个列, 第一列显示在x轴1上,第二列显示在x轴4上,以此类推
df.plot.box(vert=False)    #表示绘制水平箱型图
df.boxplot()   

#绘制分层箱型图,通过设置by关键词创建分组,再按组,分别绘制箱型图。如下面例子,每列按A组,B组分别绘制箱型图。
df = pd.DataFrame(np.random.rand(10, 2), columns=['Col1', 'Col2'])
df['x'] = pd.Series(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'])
df.boxplot(by='x')

#还可以再传入一个子分类,再进一步分组绘制。如:
df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y'])

散点图

散点图使用DataFrame.plot.scatter()方法绘制。通过参数x,y指定x轴和y轴的数据列。

df.plot.scatter(x='close', y='volume')    #假如df为每日股票数据,图表示收盘价与成交量的散点图

#将两组散点图绘制在一张图表上,重新ax参数如
ax = df.plot.scatter(x='close', y='volume', color='DarkBlue', label='Group 1')    #设置标签名label设置标名
df.plot.scatter(x='open', y='value', color='DarkGreen', label='Group 2', ax=ax)

#c参数表示圆点的颜色按按volume列大小来渐变表示。
df.plot.scatter(x='close', y='open', c='volume', s=50)    #s表示原点面积大小
df.plot.scatter(x='close', y='open', s=df['volume']/50000)  #圆点的大小也可以根据某列数值大小相应设置。

饼图

饼图使用DataFrame.plot.pie()或Series.plot.pie()绘制。如果数据中有空值,会自动使用0填充。

其他绘图函数

这些绘图函数来自pandas.plotting模块。

矩阵散点图(Scatter Matrix Plot)

矩阵散点图(Scatter Matrix Plot)使用scatter_matrix()方法绘制

from pandas.plotting import scatter_matrix     #使用前需要从模块中导入该函数
scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde')    #假设df是每日股票数据,会每一列相对其他每一列生成一个散点图。

密度图(Density Plot)

密度图使用Series.plot.kde()和DataFrame.plot.kde()函数。

df.plot.kde()

安德鲁斯曲线(Andrews Curves)

安德鲁斯曲线

平行坐标图(Parallel Coordinates)

Lag plot

自相关图(Autocorrelation Plot)

自相关图

自举图(Bootstrap plot)

绘图格式

预设置图形样式

matplotlib 从1.5开始,可以预先设置样式,绘图前通过matplotlib.style.use(my_plot_style)。如matplotlib.style.use('ggplot') 定义ggplot-style plots.

样式参数

大多数绘图函数,可以通过一组参数来设置颜色。

标签设置

可通过设置legend参数为False来隐藏图片标签,如

df.plot(legend=False)

尺度

  • logy参数用来将y轴设置对数标尺
  • logx参数用来将x轴设置对数标尺
  • loglog参数用来将x轴和y轴设置对数标尺
ts.plot(logy=True)

双坐标图

两组序列同x轴,但y轴数据不同,可以通过第二个序列设置参数:secondary_y=True,来设置第二个y轴。

#比如想在收盘价图形上显示cci指标:
prices['close'].plot()
prices['cci'].plot(secondary_y=True)

#第二个坐标轴要显示多个,可以直接传入列名
ax = df.plot(secondary_y=['cci', 'RSI'], mark_right=False)    #右边轴数据标签默认会加个右边,设置mark_right为False取消显示
ax.set_ylabel('CD scale')     #设置左边y轴名称
ax.right_ax.set_ylabel('AB scale')    #设置右边y轴名称

子图

DataFrame的每一列可以绘制在不同的坐标轴(axis)中,使用subplots参数设置,例如:

df.plot(subplots=True, figsize=(6, 6))

子图布局

子图布局使用关键词layout设置,

资源

官网

相关网站

书籍

《利用Python进行数据分析 第2版》 - Wes McKinney

参考文献