Skip to content

Python | Pandas | Operating on Data

Posted on:February 18, 2019

1. 一元运算:保留索引

df = pd.DataFrame(np.random.randint(0,100,(3,4)), columns=['A','B','C','D'])
df
A B C D
0 25 73 54 82
1 74 23 96 47
2 32 23 87 39
np.sin(df * np.pi / 4)
A B C D
0 7.071068e-01 0.707107 -1.000000e+00 1.000000
1 1.000000e+00 -0.707107 -2.939152e-15 -0.707107
2 -9.797174e-16 -0.707107 -7.071068e-01 -0.707107

2. 二元运算:索引对齐

# Series
# 先进行索引合并操作,求索引的并集,再根据索引进行对齐,对齐后进行运算
area = pd.Series({'Alaska':1723337, 'Texas':695662, 'California':423967}, name = 'area')
population = pd.Series({'Alaska':38332521, 'Texas':26448193, 'New York':19651127}, name = 'population')
population / area
Alaska        22.243195
California          NaN
New York            NaN
Texas         38.018740
dtype: float64
# DataFrame
# 合并,对齐,运算
A = pd.DataFrame(np.random.randint(0,10,(2,3)), columns=list('ABD'))
B = pd.DataFrame(np.random.randint(0,10,(3,3)), columns=list('BCA'))
print(A)
print("="*20)
print(B)
print("="*20)
print(A+B)
   A  B  D
0  3  8  3
1  5  7  5
====================
   B  C  A
0  0  6  0
1  2  6  3
2  5  6  7
====================
     A    B   C   D
0  3.0  8.0 NaN NaN
1  8.0  9.0 NaN NaN
2  NaN  NaN NaN NaN
# 可以使用运算符的通用函数形式,设置fill_value参数:一方有值就不会NaN
A.add(B, fill_value=0)
A B C D
0 3.0 8.0 6.0 3.0
1 8.0 9.0 6.0 5.0
2 7.0 5.0 6.0 NaN

3. Series和DataFrame的混合运算