Downloading Files From Google Storage Using Spark (python) And Dataproc

August 30, 2023 Post a Comment

I have an application that parallelizes the execution of Python objects that process data to be downloaded from Google Storage (my project bucket). The cluster is created using Goo

Solution 1:

The problem was clearly the Spark context. Replacing the call to "gsutil" by a call to "hadoop fs" solves it:

from subprocess import call
from os.path import join

defcopyDataFromBucket(filename,remoteFolder,localFolder):
  call(["hadoop","fs","-copyToLocal",join(remoteFolder,filename),localFolder]

I also did a test to send data to the bucket. One only needs to replace "-copyToLocal" by "-copyFromLocal"

Free Interactive Python Tutorial

Downloading Files From Google Storage Using Spark (python) And Dataproc

Solution 1:

Post a Comment for "Downloading Files From Google Storage Using Spark (python) And Dataproc"