Best Option(s) For Running Python Etl Code In Aws
I am looking for a recommendation on which AWS service (or combination thereof) to use to execute an ETL code in Python to transform text-based files: Description of the code/proce
Solution 1:
We can configure a Lambda S3 event trigger on the landing folder and when a file is uploaded, we can have a brief script in Lambda to trigger the Glue job. The glue python script should have required logic to convert the input text files into a CSV files. This way your job can be run any number of times when a file is uploaded to the S3.
Your billing is also only for the duration of the job is run. Please be aware that the cost is little high in Glue due to its managed services feature.
Have the event trigger created , trigger the glue job. Please find herewith a code snippet for AWS Lambda:
from __future__ import print_function
import json
import boto3
import time
import sys
import time
from datetime import datetime
s3 = boto3.client('s3')
glue = boto3.client('glue')
deflambda_handler(event, context):
gluejobname="<< THE GLUE JOB NAME >>"try:
runId = glue.start_job_run(JobName=gluejobname)
status = glue.get_job_run(JobName=gluejobname, RunId=runId['JobRunId'])
print("Job Status : ", status['JobRun']['JobRunState'])
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist ''and your bucket is in the same region as this ''function.'.format(source_bucket, source_bucket))
raise e
Post a Comment for "Best Option(s) For Running Python Etl Code In Aws"