Skip to content Skip to sidebar Skip to footer

How Can I Split Csv Files In Python?

Because of the memory error, i have to split my csv files. I did research it. I found it from one of the stack overflow user who is Aziz Alto. This is his code. csvfile = open('#',

Solution 1:

import pandas as pd 
rows = pd.read_csv("csvfile.csv", chunksize=5000000) 
for i, chuck in enumerate(rows): 
    chuck.to_csv('out{}.csv'.format(i)) # i is for chunk number of each iteration 

chucksize you specify how many rows you want- in excel you can have upto 1,048,576 rows. This will save it as 5000000 and with header.

hope this Helps!!


Solution 2:

On the 2nd till last file you have to always add the 1st line of your original file (the one containing the header):

# this loads the first file fully into memory
with open('#', 'r') as f:
    csvfile = f.readlines()

linesPerFile = 1000000
filename = 1
# this is better then your former loop, it loops in 1000000 lines a peice,
# instead of incrementing 1000000 times and only write on the millionth one
for i in range(0,len(csvfile),linesPerFile):
    with open(str(filename) + '.csv', 'w+') as f:
        if filename > 1: # this is the second or later file, we need to write the
            f.write(csvfile[0]) # header again if 2nd.... file
        f.writelines(csvfile[i:i+linesPerFile])
    filename += 1

Post a Comment for "How Can I Split Csv Files In Python?"