Downloading Files From Google Storage Using Spark (python) And Dataproc
I have an application that parallelizes the execution of Python objects that process data to be downloaded from Google Storage (my project bucket). The cluster is created using Goo
Solution 1:
The problem was clearly the Spark context. Replacing the call to "gsutil" by a call to "hadoop fs" solves it:
from subprocess import call
from os.path import join
defcopyDataFromBucket(filename,remoteFolder,localFolder):
call(["hadoop","fs","-copyToLocal",join(remoteFolder,filename),localFolder]
I also did a test to send data to the bucket. One only needs to replace "-copyToLocal" by "-copyFromLocal"
Post a Comment for "Downloading Files From Google Storage Using Spark (python) And Dataproc"