Skip to content Skip to sidebar Skip to footer

Why Am I Getting [errno 7] Argument List Too Long And Oserror: [errno 24] Too Many Open Files When Using Mrjob V0.4.4?

It seems like the nature of the MapReduce framework is to work with many files. So when I get errors that tell me I'm using too many files, I suspect I'm doing something wrong. If

Solution 1:

The problem for "Argument list too long" is not the job or python, its bash. The asterisk in your command line to kick off the job expands out to every file that matches which is a really long command line and exceeds bash limit.

The error has nothing to do with ulimit but the error "Too many open files" is to do with ulimit, so you bump into the ulimit if the command were to actually run.

You can check the shells limit like this (if you are interested)... getconf ARG_MAX

To get around the max args problem, you can concatenate all the files into one by doing this.

for f in *; do cat "$f" >> ../directory/bigfile.log; done

Then run your mrjob pointed at the big file.

If its a lot of files you can use multiple threads to concat the file using gnu parallel because above command is single thread and slow.

ls | parallel -m -j 8 "cat {} >> ../files/bigfile.log"

*Change 8 to the amount of parallelism you want

Post a Comment for "Why Am I Getting [errno 7] Argument List Too Long And Oserror: [errno 24] Too Many Open Files When Using Mrjob V0.4.4?"