read_excel和转换器用于Excel文件读取到Pandas DataFrame会导致对象类型的数字列
问题说明
我正在阅读此Excel文件联合国能源指标在此处使用代码段:
I am reading this Excel file United Nations Energy Indicators using the code snippet here:
def convert_energy(energy):
if isinstance(energy, float):
return energy*1000000
else:
return energy
def energy_df():
return pd.read_excel("Energy Indicators.xls", skiprows=17, skip_footer=38, usecols=[2,3,4,5], na_values=['...'], names=['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable'], converters={1: convert_energy}).set_index('Country')
这将导致能源供应列具有对象类型而不是浮点型.为什么会这样?
This results in Energy Supply column having the object type instead of float. Why is it the case?
energy = energy_df()
print(energy.dtypes)
Energy Supply object
Energy Supply per Capita float64
% Renewable float64
正确答案
暂时删除converters
参数-
c = ['Energy Supply', 'Energy Supply per Capita', '% Renewable']
df = pd.read_excel("Energy Indicators.xls",
skiprows=17,
skip_footer=38,
usecols=[2,3,4,5],
na_values=['...'],
names=c,
index_col=[0])
df.index.name = 'Country'
df.head()
Energy Supply Energy Supply per Capita % Renewable
Country
Afghanistan 321.0 10.0 78.669280
Albania 102.0 35.0 100.000000
Algeria 1959.0 51.0 0.551010
American Samoa NaN NaN 0.641026
Andorra 9.0 121.0 88.695650
df.dtypes
Energy Supply float64
Energy Supply per Capita float64
% Renewable float64
dtype: object
没有转换器,您的数据加载就很好.有一个技巧可以理解为什么会发生这种情况.
Your data loads just fine without a converter. There's a trick to understanding why this happens.
默认情况下,pandas
将读入该列并尝试解释"您的数据.通过指定您自己的转换器,您可以覆盖熊猫转换,因此不会发生这种情况.
By default, pandas
will read in the column and try to "interpret" your data. By specifying your own converter, you override pandas conversion, so this does not happen.
pandas将整数和字符串值传递给convert_energy
,因此isinstance(energy, float)
永远不会被评估为True
.相反,else
运行,并且这些值按原样返回,因此您得到的列是字符串和整数的混合.如果将print(type(energy))
放在函数中,这将变得很明显.
pandas passes integer and string values to convert_energy
, so the isinstance(energy, float)
is never evaluated to True
. Instead, the else
runs, and these values are returned as is, so your resultant column is a mixture of strings and integers. If you put a print(type(energy))
inside your function, this becomes obvious.
由于您混合使用类型,因此结果类型为object
.但是,如果您不使用转换器,熊猫将尝试解释您的数据,并将成功将其解析为数字.
Since you have mixtures of types, the resultant type is object
. However, if you do not use a converter, pandas will attempt to interpret your data, and will successfully parse it to numeric.
所以,只需-
df['Energy Supply'] *= 1000000
会绰绰有余.
这篇好文章是转载于:学新通技术网
- 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
- 本站站名: 学新通技术网
- 本文地址: /reply/detail/tanhcfhijf
-
YouTube API 不能在 iOS (iPhone/iPad) 工作,但在桌面浏览器工作正常?
it1352 07-30 -
iPhone,一张图像叠加到另一张图像上以创建要保存的新图像?(水印)
it1352 07-17 -
保持在后台运行的 iPhone 应用程序完全可操作
it1352 07-25 -
使用 iPhone 进行移动设备管理
it1352 07-23 -
在android同时打开手电筒和前置摄像头
it1352 09-28 -
扫描 NFC 标签时是否可以启动应用程序?
it1352 08-02 -
检查邮件是否发送成功
it1352 07-25 -
Android微调工具-删除当前选择
it1352 06-20 -
Android App 和三星 Galaxy S4 不兼容
it1352 07-20 -
希伯来语的空格句子标记化错误
it1352 06-22