• 首页 首页 icon
  • 工具库 工具库 icon
    • IP查询 IP查询 icon
  • 内容库 内容库 icon
    • 快讯库 快讯库 icon
    • 精品库 精品库 icon
    • 问答库 问答库 icon
  • 更多 更多 icon
    • 服务条款 服务条款 icon

pandas pytable:指定MultiIndex元素的min_itemsize

用户头像
it1352
帮助1

问题说明

我将熊猫数据框存储为包含MultiIndex的pytable.

I am storing a pandas dataframe as a pytable which contains a MultiIndex.

MultiIndex的第一级是与用户ID对应的字符串.现在,大多数用户ID的长度为13个字符,但其中一些则为15个字符.当我追加一条包含长userID的记录时,pytables会引发错误,因为它期望使用13个字符的字段.

The first level of the MultiIndex is a string corresponding to a userID. Now, most of the userIDs are 13 characters long, but some of them are 15 characters long. When I append a record containing the long userID, pytables raises an error because it is expecting a 13 characters field.

ValueError('Trying to store a string with len [15] in [user] column but\nthis column has a limit of [13]!\nConsider using min_itemsize to preset the sizes on these columns',)

但是,我不知道如何为MultiIndex的元素设置属性min_itemsize.我已经尝试过{'index': 15},但是它不起作用...

However, I do not know how to set the attribute min_itemsize for the elements of a MultiIndex. I have tried {'index': 15} and it does not work...

我知道我可以通过添加空格来强制所有ID从一开始就长度为15个字符,但我希望避免这种情况.

I know that I could force all IDs to be 15 characters long from the beginning by appending spaces, but I would prefer to avoid this if possible.

谢谢您的帮助!

正确答案

#1

您需要指定要为其设置min_itemsize的多索引级别的名称.这是一个示例:

You need to specify the name of the multi-index level that you want to set a min_itemsize for. Here's an example:

创建2个多索引框架

In [1]: df1 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([['abcdefghijklm','foo'],[1,2]],names=['string','number']))

In [2]: df2 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([['abcdefghijklmop','foo'],[1,2]],names=['string','number']))

In [3]: df1
Out[3]: 
                             0         1
string        number                    
abcdefghijklm 1       0.737976  0.840718
              2       0.605763  1.797398
foo           1       1.589278  0.104186
              2       0.029387  1.417195

[4 rows x 2 columns]

In [4]: df2
Out[4]: 
                               0         1
string          number                    
abcdefghijklmop 1       0.539507 -1.059085
                2       1.263722 -1.773187
foo             1       1.625073  0.078650
                2      -0.030827 -1.691805

[4 rows x 2 columns]

创建商店

In [9]: store = pd.HDFStore('test.h5',mode='w')

In [10]: store.append('df1',df1)

这里是长度的计算

In [12]: store.get_storer('df1').table
Out[12]: 
/df1/table (Table(4,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),
  "number": Int64Col(shape=(), dflt=0, pos=2),
  "string": StringCol(itemsize=13, shape=(), dflt='', pos=3)}
  byteorder := 'little'
  chunkshape := (1456,)
  autoindex := True
  colindexes := {
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "number": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "string": Index(6, medium, shuffle, zlib(1)).is_csi=False}

这是您现在遇到的错误

In [13]: store.append('df1',df2)

ValueError: Trying to store a string with len [15] in [string] column but
this column has a limit of [13]!
Consider using min_itemsize to preset the sizes on these columns

用级别名称指定min_itemsize

In [14]: store.append('df',df1,min_itemsize={ 'string' : 15 })

In [15]: store.get_storer('df').table
Out[15]: 
/df/table (Table(4,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),
  "number": Int64Col(shape=(), dflt=0, pos=2),
  "string": StringCol(itemsize=15, shape=(), dflt='', pos=3)}
  byteorder := 'little'
  chunkshape := (1394,)
  autoindex := True
  colindexes := {
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "number": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "string": Index(6, medium, shuffle, zlib(1)).is_csi=False}

追加

In [16]: store.append('df',df2)

In [19]: store.df
Out[19]: 
                               0         1
string          number                    
abcdefghijklm   1       0.737976  0.840718
                2       0.605763  1.797398
foo             1       1.589278  0.104186
                2       0.029387  1.417195
abcdefghijklmop 1       0.539507 -1.059085
                2       1.263722 -1.773187
foo             1       1.625073  0.078650
                2      -0.030827 -1.691805

[8 rows x 2 columns]

In [20]: store.close()

这篇好文章是转载于:学新通技术网

  • 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
  • 本站站名: 学新通技术网
  • 本文地址: /reply/detail/tanhcfhgfe
系列文章
更多 icon
同类精品
更多 icon
继续加载