用python获取.csv文件某一列或者某些列
1.把三个csv文件中的feature值整合到一个文件中,同时添加相应的label。
-
# -*-coding:utf-8 -*-
-
import csv;
-
label1 = '1'
-
label2 = '2'
-
label3 = '3'
-
a = "feature1,feature2,feature3,feature4,feature5,feature6,feature7,feature8,feature9,feature10,label" "\n"
-
with open("./dataset/dataTime2.csv", 'a') as rfile:
-
rfile.writelines(a)
-
with open("./dataset/f02.csv", 'rb') as file:
-
a = file.readline().strip()
-
while a:
-
a = a ',' label1 "\n"
-
#a = label1 ',' a "\n"
-
with open("./dataset/dataTime2.csv", 'a') as rfile:
-
rfile.writelines(a)
-
a = file.readline().strip()
-
with open("./dataset/g03.csv", 'rb') as file:
-
a = file.readline().strip()
-
while a:
-
a = a ',' label2 "\n"
-
#a = label2 ',' a "\n"
-
with open("./dataset/dataTime2.csv", 'a') as rfile:
-
rfile.writelines(a)
-
a = file.readline().strip()
-
with open("./dataset/normal05.csv", 'rb') as file:
-
a = file.readline().strip()
-
while a:
-
a = a ',' label3 "\n"
-
#a = label3 ',' a "\n"
-
with open("./dataset/dataTime2.csv", 'a') as rfile:
-
rfile.writelines(a)
-
a = file.readline().strip()
2.获取csv文件中某一列,下面可以获得label为表头的列中对应的所有数值。
-
filename = "./dataset/dataTime2.csv"
-
list1 = []
-
with open(filename, 'r') as file:
-
reader = csv.DictReader(file)
-
column = [row['label'] for row in reader]
3.获取csv文件中某些列,下面可以获得除label表头的对应列之外所有数值。
-
import pandas as pd
-
odata = pd.read_csv(filename)
-
y = odata['label']
-
x = odata.drop(['label'], axis=1) #除去label列之外的所有feature值
4.也可以处理成list[np.array]形式的数据。
-
filename = "./dataset/dataTime2.csv"
-
list1 = []
-
with open(filename, 'r') as file:
-
a = file.readline()
-
while a:
-
c = np.array(a.strip("\n").split(","))
-
list1.append(c)
5.也可以处理成tensor格式数据集
-
# -*-coding:utf-8 -*-
-
import tensorflow as tf
-
# 读取的时候需要跳过第一行
-
filename = tf.train.string_input_producer(["./dataset/dataTime.csv"])
-
reader = tf.TextLineReader(skip_header_lines=1)
-
key, value = reader.read(filename)
-
record_defaults = [[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], tf.constant([], dtype=tf.int32)]
-
col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11= tf.decode_csv(
-
value, record_defaults=record_defaults)
-
features = tf.stack([col1, col2, col3, col4, col5, col6, col7, col8, col9, col10])
-
with tf.Session() as sess:
-
# Start populating the filename queue.
-
coord = tf.train.Coordinator()
-
threads = tf.train.start_queue_runners(coord=coord)
-
trainx = []
-
trainy = []
-
for i in range(81000):
-
# Retrieve a single instance:
-
example, label = sess.run([features, col11])
-
trainx.append(example)
-
trainy.append(label)
-
coord.request_stop()
-
coord.join(threads)
-
#最后长度是81000,trainx是10个特征
这篇好文章是转载于:学新通技术网
- 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
- 本站站名: 学新通技术网
- 本文地址: /boutique/detail/tanhefaake
系列文章
更多
同类精品
更多
-
photoshop保存的图片太大微信发不了怎么办
PHP中文网 06-15 -
《学习通》视频自动暂停处理方法
HelloWorld317 07-05 -
word里面弄一个表格后上面的标题会跑到下面怎么办
PHP中文网 06-20 -
Android 11 保存文件到外部存储,并分享文件
Luke 10-12 -
photoshop扩展功能面板显示灰色怎么办
PHP中文网 06-14 -
微信公众号没有声音提示怎么办
PHP中文网 03-31 -
excel下划线不显示怎么办
PHP中文网 06-23 -
excel打印预览压线压字怎么办
PHP中文网 06-22 -
TikTok加速器哪个好免费的TK加速器推荐
TK小达人 10-01 -
怎样阻止微信小程序自动打开
PHP中文网 06-13