1 series
线性的数据结构, series是一个一维数组
pandas 会默然用0到n-1来作为series的index, 但也可以自己指定index( 可以把index理解为dict里面的key )
1.1创造一个serise数据
import pandas as pd import numpy as np s = pd.series([9, 'zheng', 'beijing', 128]) print(s)
打印
0 9
1 zheng
2 beijing
3 128
dtype: object
访问其中某个数据
print(s[1:2]) # 打印 1 zheng dtype: object
series类型的基本操作:
series类型包括index和values两部分
in [14]: a = pd.series({'a':1,'b':5}) in [15]: a.index out[15]: index(['a', 'b'], dtype='object') in [16]: a.values #返回一个多维数组numpy对象 out[16]: array([1, 5], dtype=int64)
series类型的操作类似ndarray类型
#自动索引和自定义索引并存,但不能混用 in [17]: a[0] #自动索引 out[17]: 1 #自定义索引 in [18]: a['a'] out[18]: 1 #不能混用 in [20]: a[['a',1]] out[20]: a 1.0 1 nan dtype: float64
series类型的操作类似python字典类型
#通过自定义索引访问 #对索引保留字in操作,值不可以 in [21]: 'a' in a out[21]: true in [22]: 1 in a out[22]: false
series类型在运算中会自动对齐不同索引的数据
in [29]: a = pd.series([1,3,5],index = ['a','b','c']) in [30]: b = pd.series([2,4,5,6],index = ['c,','d','e','b']) in [31]: a+b out[31]: a nan b 9.0 c nan c, nan d nan e nan dtype: float64
series对象可以随时修改并即刻生效
in [32]: a.index = ['c','d','e'] in [33]: a out[33]: c 1 d 3 e 5 dtype: int64 in [34]: a+b out[34]: b nan c nan c, nan d 7.0 e 10.0 dtype: float64
1.2 指定index
import pandas as pd import numpy as np s = pd.series([9, 'zheng', 'beijing', 128, 'usa', 990], index=[1,2,3,'e','f','g']) print(s)
打印
1 9
2 zheng
3 beijing
e 128
f usa
g 990
dtype: object
根据索引找出值
print(s['f']) # usa
1.3 用dictionary构造一个series
import pandas as pd import numpy as np s = {"ton": 20, "mary": 18, "jack": 19, "car": none} sa = pd.series(s, name="age") print(sa)
打印
car nan
jack 19.0
mary 18.0
ton 20.0
name: age, dtype: float64
检测类型
print(type(sa)) # <class 'pandas.core.series.series'>
1.4 用numpy ndarray构造一个series
生成一个随机数
import pandas as pd import numpy as np num_abc = pd.series(np.random.randn(5), index=list('abcde')) num = pd.series(np.random.randn(5)) print(num) print(num_abc) # 打印 0 -0.102860 1 -1.138242 2 1.408063 3 -0.893559 4 1.378845 dtype: float64 a -0.658398 b 1.568236 c 0.535451 d 0.103117 e -1.556231 dtype: float64
1.5 选择数据
import pandas as pd import numpy as np s = pd.series([9, 'zheng', 'beijing', 128, 'usa', 990], index=[1,2,3,'e','f','g']) print(s[1:3]) # 选择第1到3个, 包左不包右 zheng beijing print(s[[1,3]]) # 选择第1个和第3个, zheng 128 print(s[:-1]) # 选择第1个到倒数第1个, 9 zheng beijing 128 usa
1.6 操作数据
import pandas as pd import numpy as np s = pd.series([9, 'zheng', 'beijing', 128, 'usa', 990], index=[1,2,3,'e','f','g']) sum = s[1:3] + s[1:3] sum1 = s[1:4] + s[1:4] sum2 = s[1:3] + s[1:4] sum3 = s[:3] + s[1:] print(sum) print(sum1) print(sum2) print(sum3)
打印
2 zhengzheng
3 beijingbeijing
dtype: object
2 zhengzheng
3 beijingbeijing
e 256
dtype: object
2 zhengzheng
3 beijingbeijing
e nan
dtype: object
1 nan
2 zhengzheng
3 beijingbeijing
e nan
f nan
g nan
dtype: object
1.7 查找
是否存在
usa in s # true
范围查找
import pandas as pd import numpy as np s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": none} sa = pd.series(s, name="age") print(sa[sa>19])
中位数
import pandas as pd import numpy as np s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": none} sa = pd.series(s, name="age") print(sa.median()) # 20
判断是否大于中位数
import pandas as pd import numpy as np s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": none} sa = pd.series(s, name="age") print(sa>sa.median())
找出大于中位数的数
import pandas as pd import numpy as np s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": none} sa = pd.series(s, name="age") print(sa[sa > sa.median()])
中位数
import pandas as pd import numpy as np s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": none} sa = pd.series(s, name="age") more_than_midian = sa>sa.median() print(more_than_midian) print('---------------------') print(sa[more_than_midian])
1.8 series赋值
import pandas as pd import numpy as np s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": none} sa = pd.series(s, name="age") print(s) print('----------------') sa['ton'] = 99 print(sa)
1.9 满足条件的统一赋值
import pandas as pd import numpy as np s = {"ton": 20, "mary": 18, "jack": 19, "jim": 22, "lj": 24, "car": none} sa = pd.series(s, name="age") print(s) # 打印原字典 print('---------------------') # 分割线 sa[sa>19] = 88 # 将所有大于19的同一改为88 print(sa) # 打印更改之后的数据 print('---------------------') # 分割线 print(sa / 2) # 将所有数据除以2
到此这篇关于pandas的series类型与基本操作详解的文章就介绍到这了,更多相关pandas series基本操作内容请搜索www.887551.com以前的文章或继续浏览下面的相关文章希望大家以后多多支持www.887551.com!