AWS Data Engineer – Associate Certification Breakdown and Thoughts

by Justin Cook

Hi All,

As we know, the new certification came out this month for the AWS Data Engineer. This associate-level certification didnโ€™t have a lot of documentation out there since it was new but here is what breakdown before and after.

Before I took it I focused on this:

โ€ข Developing data ingestion and transformation techniques and coordinating data pipelines with programming concepts. 

โ€ข Identifying the most effective data store, creating models, organizing schemas, and managing lifecycles. 

โ€ข Maintaining, operating, and monitoring data pipelines. Evaluating data quality and analyzing data. 

โ€ข Implementing proper authentication, authorization, data encryption, and governance. Enabling logging for security purposes.

โ€ข Competence in configuring and maintaining extract, transform, and load (ETL) pipelines, covering the entire data journey from ingestion to the destination.

โ€ข Application of high-level, language-agnostic programming concepts tailored to the requirements of the data pipeline.

โ€ข Proficiency in utilizing Git commands for effective source code control, ensuring versioning and collaboration within the development process.

โ€ข Utilizing data lakes to store data.

โ€ข Having general concepts related to networking, storage, and computing, providing a foundation for designing and implementing robust data engineering solutions.

But after taking the exam, I have to say these are the topics that I would focus on instead:

Start with Analytics:

โ€ข Amazon Athena
โ€ข Amazon EMR
โ€ข AWS Glue
โ€ข AWS Glue DataBrew
โ€ข AWS Lake Formation
โ€ข Amazon Kinesis Data Analytics
โ€ข Amazon Kinesis Data Firehose
โ€ข Amazon Kinesis Data Streams
โ€ข Amazon Managed Streaming for Apache Kafka (Amazon MSK)
โ€ข Amazon OpenSearch Service
โ€ข Amazon QuickSight

Then look into databases:

โ€ข Amazon DocumentDB (with MongoDB compatibility)
โ€ข Amazon DynamoDB
โ€ข Amazon Keyspaces (for Apache Cassandra)
โ€ข Amazon MemoryDB for Redis
โ€ข Amazon Neptune
โ€ข Amazon RDS
โ€ข Amazon Redshift

You have to have a rough understanding of Developer Tools:

โ€ข AWS CLI
โ€ข AWS Cloud9
โ€ข AWS Cloud Development Kit (AWS CDK)
โ€ข AWS CodeBuild
โ€ข AWS CodeCommit
โ€ข AWS CodeDeploy
โ€ข AWS CodePipeline

A fundamental understanding of monitoring and governance tools is good as well:

โ€ข AWS CloudFormation
โ€ข AWS CloudTrail
โ€ข Amazon CloudWatch
โ€ข Amazon CloudWatch Logs
โ€ข AWS Config
โ€ข Amazon Managed Grafana
โ€ข AWS Systems Manager
โ€ข AWS Well-Architected Tool

There were quite a few questions about migration services so get to know those:

โ€ข AWS Application Discovery Service
โ€ข AWS Application Migration Service
โ€ข AWS Database Migration Service (AWS DMS)
โ€ข AWS DataSync
โ€ข AWS Schema Conversion Tool (AWS SCT)
โ€ข AWS Snow Family
โ€ข AWS Transfer Family

Odd I have a few questions of backup solutions as well, so get to know these:

โ€ข AWS Backup
โ€ข Amazon Elastic Block Store (Amazon EBS)
โ€ข Amazon Elastic File System (Amazon EFS)
โ€ข Amazon S3
โ€ข Amazon S3 Glacier

Here is the outline: https://aws.amazon.com/certification/certified-data-engineer-associate/

Overall, the exam went fast, and if you know Glue, Data Pipelines, you will need much less than 130 minutes for the 65 questions.

Thanks & Contact Us with any questions!