Posts

Evolution of Data Engineering over the Last 10 Years

Over the past 10 years, the field of data engineering has evolved significantly. Some of the key ways in which it has changed include the following: Increased focus on big data and real-time data processing: In recent years, there has been a growing emphasis on technologies and techniques that enable organizations to collect, store, and process large volumes of data in real-time. This has led to the widespread adoption of technologies such as Hadoop, Spark, and NoSQL databases, which are designed to handle big data efficiently. Advancements in machine learning and artificial intelligence: The increasing availability of large datasets and powerful computing resources has led to significant advancements in the field of machine learning and artificial intelligence. This has in turn increased the demand for data engineers who can design and implement systems that can process and analyze data using these technologies. Increased emphasis on data governance and privacy: As organizations co...

AWS Glue to ingest a REST API into a Relational Database

Image
     Intro AWS Glue Provides a handy serverless platform for moving around data at scale, usually for the purpose of machine learning or other kinds of data analytics. Your data doesn't have to be "Big Data" to qualify - it can be anything really! Key features of AWS Glue include Spark ETL Jobs Python Shell Jobs Glue Catalog Glue Workflows Interactive Data Vizualisation In this article, I'll be describing an architecture for ingesting an API into relational store using a combination of Spark ETL and Python Shell jobs, orchestrated by a Glue Workflow Batch Ingestion for Analytics In an ideal world, all data sets would be available immediately and streamed in for real time consumption. The reality is that we usually have to compromise for periodic retrieval, as real-time feeds can be costly to subscribe to from external data providers, or costly to maintain internal infrastructure for on internal data Period data retrieval can either come in the form of polling for new ...

How to Deploy an AWS S3 Hosted React SPA through CloudFront

Image
Introduction The purpose of this short tutorial is to demonstrate how to deploy an AWS S3 hosted React Application using Cloud Front.  To successfully complete this tutorial you will need: Linux Bash shell React framework  Serverless framework Text editor AWS Free tier account Step 1: Creating a React Application For the purposes of this tutorial a simple react application will be built using the automatic application generation tools. The following command can be run to create a react application in the current directory: jafrimo @LP00650 :~/tutorials$ npx create-react-app aws-cloudfront-react-app To validate this has completed successfully, you can change into the newly created application  directory  aws-cloudfront-react-app  and start up the react application locally (we're not in AWS yet!) as follows: jafrimo @LP00650 :~/tutorials/aws-cloudfront-react-app$ npm start If all goes well, then dependi...