How Can I Split Csv Files In Python?
Because of the memory error, i have to split my csv files. I did research it. I found it from one of the stack overflow user who is Aziz Alto. This is his code. csvfile = open('#',
Solution 1:
import pandas as pd
rows = pd.read_csv("csvfile.csv", chunksize=5000000)
for i, chuck in enumerate(rows):
chuck.to_csv('out{}.csv'.format(i)) # i is for chunk number of each iteration
chucksize
you specify how many rows you want- in excel you can have upto 1,048,576 rows.
This will save it as 5000000 and with header.
hope this Helps!!
Solution 2:
On the 2nd till last file you have to always add the 1st line of your original file (the one containing the header):
# this loads the first file fully into memory
with open('#', 'r') as f:
csvfile = f.readlines()
linesPerFile = 1000000
filename = 1
# this is better then your former loop, it loops in 1000000 lines a peice,
# instead of incrementing 1000000 times and only write on the millionth one
for i in range(0,len(csvfile),linesPerFile):
with open(str(filename) + '.csv', 'w+') as f:
if filename > 1: # this is the second or later file, we need to write the
f.write(csvfile[0]) # header again if 2nd.... file
f.writelines(csvfile[i:i+linesPerFile])
filename += 1
Post a Comment for "How Can I Split Csv Files In Python?"