Posts

Showing posts from October, 2022

AWS Glue to ingest a REST API into a Relational Database

Image
     Intro AWS Glue Provides a handy serverless platform for moving around data at scale, usually for the purpose of machine learning or other kinds of data analytics. Your data doesn't have to be "Big Data" to qualify - it can be anything really! Key features of AWS Glue include Spark ETL Jobs Python Shell Jobs Glue Catalog Glue Workflows Interactive Data Vizualisation In this article, I'll be describing an architecture for ingesting an API into relational store using a combination of Spark ETL and Python Shell jobs, orchestrated by a Glue Workflow Batch Ingestion for Analytics In an ideal world, all data sets would be available immediately and streamed in for real time consumption. The reality is that we usually have to compromise for periodic retrieval, as real-time feeds can be costly to subscribe to from external data providers, or costly to maintain internal infrastructure for on internal data Period data retrieval can either come in the form of polling for new ...