Skip to content

Pandas Notes

Data Structures

  • Indexes: Sequence of labels
    • Immutable (Like dictionary keys
    • Homogenous in data type (Like NumPy array)
  • Series: 1D array with Index
  • DataFrames: 2D array with Series as columns

Index

Index Examples

import pandas as pd
prices = [10.70, 10.86, 10.74, 10.71, 10.79]
shares = pd.Series(prices) 
days = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri']
pd.Series(prices, index=days)
shares.index.name = 'weekday'
# Indivdual elements in index are immutable
shares.index[2] = 'Wednesday' #error
# entire index can be re-built
shares.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']

Index Multiple Values

Indexes can be built with multiple values using tuples

df = df.set_index(['col1', 'col2'])
print(df.index.name)
=> None
print(df.index.names)
=> ['col1', 'col2']

df.unstack(level='col1')

df.unstack a multilevel index results in the same type of hierarchical columns from df.pivot

df.stack(level='col1')

df.stack hierarchical columns to create multilevel index

df.swaplevel()

swaps inner/outter indexes in multilevel index

df.sort_index()

pd.melt()

pd.melt(df, id_vars=['colN'], value_vars=['colN'])
pd.melt(df, id_vars=['colN'], var_name='col1', value_name='col2')

Index Sorting

df = df.sort_index()

df.loc[]

stocks.loc[('CSCO', '2016-10-04')] # returns all columns
stocks.loc[('CSCO', '2016-10-04'), 'col1'] # returns col1
stocks.loc['CSCO'] # returns rows within 'CSCO' index
stocks.loc['CSCO':'MSFT'] # returns rows with index b/t
stocks.loc[(['AAPL', 'MSFT'], '2016-10-05']), :]
stocks.loc[(['AAPL', 'MSFT'], '2016-10-05'), 'Close']
stocks.loc[('CSCO', ['2016-10-05', '2016-10-03']), :]

Slicing (both indexes)

stocks.loc[(slice(None), slice('2016-10-03','2016-10-04')), :]

# Look up data for CA and TX in month 2: CA_TX_month2
CA_TX_month2 = sales.loc[(['CA', 'TX'], 2), :]

# Look up data for all states in month 2: all_month2
all_month2 = sales.loc[(slice(None), 2), :]

TODO(Wes) - go back to lecture on this

df.iloc[]

TODO(Wes)

List Comprehensions

TODO(Wes)

Rotating / Pivot Data

df.pivot()

df.pivot(   index='col1'
            ,columns='col2'
            ,values='col3'      
)

# all columns used as values
df.pivot(index='col1' ,columns='col2'
)