Why Am I Getting [errno 7] Argument List Too Long And Oserror: [errno 24] Too Many Open Files When Using Mrjob V0.4.4?
Solution 1:
The problem for "Argument list too long" is not the job or python, its bash. The asterisk in your command line to kick off the job expands out to every file that matches which is a really long command line and exceeds bash limit.
The error has nothing to do with ulimit but the error "Too many open files" is to do with ulimit, so you bump into the ulimit if the command were to actually run.
You can check the shells limit like this (if you are interested)...
getconf ARG_MAX
To get around the max args problem, you can concatenate all the files into one by doing this.
for f in *; do cat "$f" >> ../directory/bigfile.log; done
Then run your mrjob pointed at the big file.
If its a lot of files you can use multiple threads to concat the file using gnu parallel because above command is single thread and slow.
ls | parallel -m -j 8 "cat {} >> ../files/bigfile.log"
*Change 8 to the amount of parallelism you want
Post a Comment for "Why Am I Getting [errno 7] Argument List Too Long And Oserror: [errno 24] Too Many Open Files When Using Mrjob V0.4.4?"