This video shows a complete example with a Serverless ETL batch pipeline with Cloud Storage, Dataflow, BigQuery and Cloud Workflows.
Extract : Cloud Storage
Transform : Transform raw to domain data with Dataflow (Flex Template)
Load : Load the result domain data from Dataflow to BigQuery
Backup processed files : Workflows invokes Cloud Build to move processed files from the input to the destination bucket. The advantage is to have access to the gcloud cli, because it's not possible natively with the storage API to move several files between buckets.
The deployment is done with Cloud Build.
Firstly the use case is execute with gcloud commands and bash scripts :
Deploy the workflow
Run the workflow with runtime arguments
Create a scheduler and cron job with Cloud Scheduler to launch the workflow with runtime arguments
Then the use case is execute with Terraform :
Deploy the workflow
Create a scheduler and cron job with Cloud Scheduler to launch the workflow with runtime arguments
#googlecloud #ETL #Pipeline #Serverless #CloudWorkflows #CloudStorage #Dataflow #Beam #Python #FlexTemplate #Docker #BigQuery #CloudBuild #Terraform #CloudScheduler
▸ Github :
https://github.com/tosun-si/teams-lea...
▸ Github Dataflow Project :
https://github.com/tosun-si/dataflow-...
▸ Slides : https://docs.google.com/presentation/...
Feel free to subscribe to the channel and click on the bell 🔔 to receive notifications for the next videos.
📲 Follow me on social networks :
▸ Articles : / mazlum.tosun
▸ X : / mazlumtosun3
▸ LinkedIn : / mazlum-tosun-900b1812
▸ WhatsApp : https://whatsapp.com/channel/0029VaCj...