pandas 的大而持久的 DataFrame

it1352

2023-09-30 帮助1人

问题说明

作为长期 SAS 用户，我正在探索切换到 python 和 pandas.

I am exploring switching to python and pandas as a long-time SAS user.

然而，今天在运行一些测试时，我很惊讶 python 在尝试 pandas.read_csv() 一个 128mb 的 csv 文件时内存不足.它有大约 200,000 行和 200 列主要是数字数据.

However, when running some tests today, I was surprised that python ran out of memory when trying to pandas.read_csv() a 128mb csv file. It had about 200,000 rows and 200 columns of mostly numeric data.

使用 SAS，我可以将 csv 文件导入 SAS 数据集，它可以和我的硬盘一样大.

With SAS, I can import a csv file into a SAS dataset and it can be as large as my hard drive.

pandas 中有类似的东西吗?

我经常处理大文件，但无法访问分布式计算网络.

I regularly work with large files and do not have access to a distributed computing network.

正确答案

原则上不应该用完内存，但是目前read_csv对大文件存在内存问题，原因是一些复杂的Python 内部问题(这个很模糊，但是早就知道了:http://github.com/pydata/pandas/问题/407).

In principle it shouldn't run out of memory, but there are currently memory problems with read_csv on large files caused by some complex Python internal issues (this is vague but it's been known for a long time: http://github.com/pydata/pandas/issues/407).

At the moment there isn't a perfect solution (here's a tedious one: you could transcribe the file row-by-row into a pre-allocated NumPy array or memory-mapped file--np.mmap), but it's one I'll be working on in the near future. Another solution is to read the file in smaller pieces (use iterator=True, chunksize=1000) then concatenate then with pd.concat. The problem comes in when you pull the entire text file into memory in one big slurp.

这篇好文章是转载于：学新通技术网

pandas 的大而持久的 DataFrame

问题说明

正确答案

YouTube API 不能在 iOS (iPhone/iPad) 工作，但在桌面浏览器工作正常?

iPhone，一张图像叠加到另一张图像上以创建要保存的新图像?(水印)

保持在后台运行的 iPhone 应用程序完全可操作

使用 iPhone 进行移动设备管理

在android同时打开手电筒和前置摄像头

扫描 NFC 标签时是否可以启动应用程序?

检查邮件是否发送成功

Android微调工具-删除当前选择

希伯来语的空格句子标记化错误

Android App 和三星 Galaxy S4 不兼容