使用Python读取CSV文件的标题列?

Question

问题说明

我正在寻找一种仅读取大量大型CSV文件的标题行的方法.

I am looking for a a way to read just the header row of a large number of large CSV files.

使用Pandas，每个csv文件都可以使用此方法:

Using Pandas, I have this method available, for each csv file:

>>> df = pd.read_csv(PATH_TO_CSV)
>>> df.columns

我可以仅使用csv模块来做到这一点:

I could do this with just the csv module:

>>> reader = csv.DictReader(open(PATH_TO_CSV))
>>> reader.fieldnames

这些问题是每个CSV文件的大小都超过500MB，并且读取每个文件的整个文件只是拉标题行似乎是巨大的浪费.

The problem with these is that each CSV file is 500MB in size, and it seems to be a gigantic waste to read in the entire file of each just to pull the header lines.

我所有这些的最终目标是提取唯一的列名.一旦有了这些文件中每个文件的列标题列表，就可以执行此操作.

My end goal of all of this is to pull out unique column names. I can do that once I have a list of column headers that are in each of these files.

如何快速快速地仅提取CSV文件的标题行?

How can I extract only the header row of a CSV file, quickly?

Answer 1

正确答案

#1

我以iglob为例搜索.csv文件，但是一种方法是使用一组，然后根据需要进行调整，例如:

I've used iglob as an example to search for the .csv files, but one way is to use a set, then adjust as necessary, eg:

import csv
from glob import iglob

unique_headers = set()
for filename in iglob('*.csv'):
    with open(filename, 'rb') as fin:
        csvin = csv.reader(fin)
        unique_headers.update(next(csvin, []))

这篇好文章是转载于：学新通技术网

使用Python读取CSV文件的标题列?

问题说明

正确答案

YouTube API 不能在 iOS (iPhone/iPad) 工作，但在桌面浏览器工作正常?

iPhone，一张图像叠加到另一张图像上以创建要保存的新图像?(水印)

保持在后台运行的 iPhone 应用程序完全可操作

使用 iPhone 进行移动设备管理

在android同时打开手电筒和前置摄像头

扫描 NFC 标签时是否可以启动应用程序?

检查邮件是否发送成功

Android微调工具-删除当前选择

希伯来语的空格句子标记化错误

Android App 和三星 Galaxy S4 不兼容