資源簡介 (共13張PPT)第四章數(shù)據(jù)處理與應(yīng)用4.2.2 pandas數(shù)據(jù)處理人生苦短我用pythonpandas處理數(shù)據(jù)數(shù)據(jù)處理可以使用現(xiàn)成的軟件或平臺,也可以通過編寫程序?qū)崿F(xiàn)。Python語言豐富的標(biāo)準(zhǔn)模塊和擴展庫提供了許多高效靈活的函數(shù),可以幫助我們較好地進行數(shù)據(jù)整理。numpyscipypandasmatplotlibDateFrame數(shù)據(jù)結(jié)構(gòu)Series數(shù)據(jù)結(jié)構(gòu)在Python中引入pandas模塊:import pandas as pd1. SeriesSeries是一維數(shù)組,由一個數(shù)組的數(shù)據(jù)和一個與數(shù)據(jù)關(guān)聯(lián)的索引(index),索引值默認(rèn)是從0起遞增的整數(shù)。import pandas as pd#導(dǎo)入pandas模塊s1 = pd.Series([3,4, 5, 6])print(s1)0 31 42 53 6左列:index右列:valuesimport pandas as pd#導(dǎo)入pandas模塊s2 = pd.Series([“高二”,16, 180], index=["年級","年齡","身高"])print(s2)年級 高二年齡 16身高 180左列:index右列:valuesfor i in s2.index :print(i)運行結(jié)果:年級年齡身高for i in s2.values :print(i)運行結(jié)果:高二16180Series數(shù)據(jù)結(jié)構(gòu)Series數(shù)據(jù)結(jié)構(gòu)import pandas as pd#導(dǎo)入pandas模塊s2 = pd.Series([“高二”,16, 180], index=["年級","年齡","身高"])#通過索引賦值,改變s2中對象的值s2[“身高”]=190print(s2)年級 高二年齡 16身高 190DateFrame數(shù)據(jù)結(jié)構(gòu)DataFrame對象是一個二維表格,由1個索引列(index)和若干個數(shù)據(jù)列組成。其中,每列中的元素類型必須一致,而不同的列可以擁有不同的元素類型,由長度相等的列表或字典創(chuàng)建。import pandas as pddata = {"name":["王曉明","李靜","田海"], "sex":["男","女","男"], "aged":[20,19,21]}#字典是無序的,因此需要通過columns指定列索引的排列順序df = pd.DataFrame(data,columns=["name","sex","aged"])print(df)indexcolumns中存放列標(biāo)題,決定數(shù)據(jù)列輸出的順序,若columns不設(shè)置參數(shù),默認(rèn)列順序為:name,sex,agedimport pandas as pddata = {"name":["王曉明","李靜","田海"], "sex":["男","女","男"], "aged":[20,19,21]}#字典是無序的,因此需要通過columns指定列索引的排列順序df = pd.DataFrame(data,columns=["aged","sex","name"])print(df)DateFrame數(shù)據(jù)結(jié)構(gòu)還可以通過導(dǎo)入二維數(shù)據(jù)文件創(chuàng)建DataFrame對象pd.read_csv(文件名) 從csv文件導(dǎo)入數(shù)據(jù)pd.read_excel(文件名) 從excel文件中導(dǎo)入數(shù)據(jù)df.to_csv(文件名) 導(dǎo)入數(shù)據(jù)到csv文件df.to_excel(文件名) 導(dǎo)入數(shù)據(jù)到excel文件import pandas as pddf=pd.read_excel("test.xlsx")print(df)test.xlsxDateFrame數(shù)據(jù)結(jié)構(gòu)import pandas as pd data = {"name":["王曉明","李靜","田海"], "sex":["男","女","男"], "aged":[20,19,21]} df = pd.DataFrame(data,columns=["name","sex","aged"]) print(df)for i in df.index: print(i) for i in df.columns: print(i) for i in df.values: print(i) df.T #行列轉(zhuǎn)置運行結(jié)果:012運行結(jié)果:namesexaged運行結(jié)果:[‘王曉明’ ‘男’ 20][‘李靜’ ‘女’ 19][‘田海’ ‘男’ 21 ]DateFrame數(shù)據(jù)結(jié)構(gòu)查看DataFrame中的數(shù)據(jù)列:通過字典記法,屬性檢索 import pandas as pd data = {"name":["王曉明","李靜","田海"], "sex":["男","女","男"], "aged":[20,19,21]} df = pd.DataFrame(data,columns=["name","sex","aged"]) print(df)#通過屬性檢索列 print(df.name) #通過字典記法檢索列 print(df[“sex”]) #修改列內(nèi)容df.aged=[20,20,22]print(df)DateFrame數(shù)據(jù)結(jié)構(gòu)查看DataFrame中的數(shù)據(jù)行:通過索引查看指定行、通過布爾型數(shù)據(jù)選取滿足條件的行 import pandas as pd data = {"name":["王曉明","李靜","田海"], "sex":["男","女","男"], "aged":[20,19,21]} df = pd.DataFrame(data,columns=["name","sex","aged"]) print(df)#使用索引查看指定行 print(df[0:2]) #使用布爾型數(shù)據(jù)選擇行 print(df[df[“sex”]==“女”]) #使用at[]精準(zhǔn)定位print(df.at[2,”name”])DateFrame數(shù)據(jù)結(jié)構(gòu)DateFrame數(shù)據(jù)結(jié)構(gòu)import pandas as pd data = {"name":["王曉明","李靜","田海"], "sex":["男","女","男"], "aged":[20,19,21]} df = pd.DataFrame(data,columns=["name","sex","aged"]) print(df)#添加一行數(shù)據(jù) df_add=df.append({"name":"張亮","sex":"男","aged":17},ignore_index=True) print(df_add) #刪除“sex”列 df_delc=df.drop(“sex”,axis=1) print(df_delc) #刪除第1行df_delr=df.drop(0)print(df_delr)append()/drop()不改變原有對象中的數(shù)據(jù);del會永久刪除原有數(shù)據(jù)添加一列,并賦值df[“height”]=[175,180,182]DateFrame數(shù)據(jù)結(jié)構(gòu)#groupby()分組 #mean()計算平均值 g=df.groupby(“地區(qū)”,as_index=False) print(g.mean()) --------------or----------- g=df.groupby(“地區(qū)”,as_index=False).mean() #按價格降序排序df_sort=df.sort_values(“價格”,ascending=False)print(df_sort)1.按索引排序 sort_index()2.按值排序 sort_values()參數(shù):axis=0(默認(rèn))為縱向排序,axis=1為橫向排序;參數(shù):ascending=True(默認(rèn))為升序,ascending=False降序排序結(jié)果返回一個新對象課堂小結(jié)import 模塊名1 [as 別名1]…from 模塊名 import 成員名1 [as 別名1],…pandas數(shù)據(jù)結(jié)構(gòu)Series(序列)DataFrame(數(shù)據(jù)框) 展開更多...... 收起↑ 資源預(yù)覽 縮略圖、資源來源于二一教育資源庫