How to Pass AWS Certified Big Data Specialty
I passed AWS Certified Big Data Specialty on July 29, 2019, after five months preparation! This certification exam is more difficult than the AWS CSAA I had. Most people think it’s even more difficult than the AWS Solution Architect Professional. I want to share with you my studying strategy and notes to help you pass the exam.
Topics
Download the exam guide to identify the study topics and your timeline. Based on the exam guide the following topics must be learned before taking any mock exam:
- Collection: Kinesis(Kinesis Data Streams, Kinesis Data Firehose), IoT (Core, GreenGrass), SQS, Snowball, DMS/SCT, Direct Connect
- Storage: S3, Glacier, DynamoDB/DAX, ElastiCache
- Processing: EMR (Spark, Hive, Pig, Presto, Zeppelin, HUE, etc.), Lambda, Glue (Data Catalog/Crawl/ETL), Data Pipeline, AI Services, Machine Learning, SageMaker
- Analysis: Kinesis Data Analytics, Redshift, CloudSearch, Elasticsearch, Athena
- Visualization: QuickSight, Commerical BI tools (MicroStrategy, Tableau), java scripts (D3.js, chart.js, highcharts.js)
- Data Security: security of the above topics, KMS, CloudHSM, STS, Identity Federation, Cognito
- Foundation Services: VPC, CloudWatch, CloudTrail, EC2 instance types, EBS, RDS (distinguish the use cases among Aurora, DynamoDB, Redshift, and EMR)
I’ll share the last-minute cheat sheet that you can use for a quick review before the exam in the next post.
Learning materials
There are a lot of learning materials that you can find online. I narrowed them down to the following materials that really helped me on the exam:
- Online Training Course: ACloudGuru, Linux Academy, or Udemy have very good training materials. Utilize the 7-day free trial to pick the one that you feel most comfortable with. After you pick the course, please make sure to review the course at least twice.
- Whitepapers: Big Data Analytics Option on AWS, Machine Learning Foundations, Streaming Data Solutions on AWS with Amazon Kinesis I have pretty solid knowledge on Redshift and DynamoDB. So I didn’t read any whitepaper on these two topics.
- YouTube: AWS Big Data Analytics Architectural Patterns and Best Practices, Deep Dive and Best Practices for Amazon Redshift, High Performance Data Streaming with Amazon Kinesis: Best Practices, also search for any topic’s deep dive or best practices on YouTube (if you take the exam next year, then search for the same topics from AWS re:Invent 2019)
- Exam Readiness: AWS Certified Big Data – Specialty (Digital) (free 2 1/2 hours online training from AWS. I reviewed it three times! The sample questions are very similar to the real one.
- Labs: You should have hands-on experience before you go to the exam. You can set up a free tier AWS account and practice with Getting Started from each topic’s AWS document or the labs from the online training course.
- Mock exams: Download sample questions from the exam page. Sample questions from AWS exam readiness. Quizzes and mock exams from an online course. Whizlabs AWS Certified Big Data (none of the questions will be in the exam but it is good practice for checking your knowledge)
- Options:
- Whitepapers: Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility
- FAQ: If you have the time, go through AWS’s FAQ for the topics that you need extra learning materials for. I went through the EMR, Data Pipeline, Glue and Kinesis Data Streams.
- AWS Big Data blog with lots of good use cases
Timeline
If you have a full-time job like me, you may have only ten to fifteen hours per week to prepare the exam. Here is my timeline:
- Learning phase (ten weeks): I have a solid background in enterprise applications’ architecture and AWS CSAA certification. So my learning path was focused on Kinesis, EMR, IoT, and security. I reviewed the whitepapers on those topics and reviewed the other topics in the online course. I followed Getting Started from AWS documents on those topics and reviewed the other details in the AWS documents. To dive deep into big data, I also learned thoroughly the big data Hadoop ecosystem.
- Practicing phase (six weeks): In this phase, I reviewed the online course one more time and followed the hands-on labs to build a real system. I reviewed all deep dive and best practices’ videos from YouTube. I completed the mock exams. You may want to base your final review on your mock exams’ results.
- Final reviewing phase (four weeks): I reviewed my study notes. I practiced mock exams one more time. I reviewed the AWS FAQs. I made a last-minute cheat sheet.
- Contingency (two weeks): I reviewed the last-minute cheat sheet and reviewed the mock questions that I made mistakes. I scheduled the exam in one week before the exam.
Notes List
- How to Pass AWS Certified Big Data SpecialtyAWS Big Data Study Notes – AWS Kinesis
- AWS Big Data Study Notes – EMR and Redshift
- AWS Big Data Study Notes – AWS Machine Learning and IoT
- AWS Big Data Study Notes – AWS QuickSight, Athena, Glue, and ES
- AWS Big Data Study Notes – AWS DynamoDB, S3 and SQS
- AWS Kinesis Data Streams vs. Kinesis Data Firehose
- Streaming Platforms: Apache Kafka vs. AWS Kinesis
- When Should Use Amazon DynamoDB Accelerator (AWS DAX)?
- Data Storage for Big Data: Aurora, Redshift or Hadoop?
- Apache Hadoop Ecosystem Cheat Sheet
Conclusion
Overall, the questions in my exam are more difficult than the mock exams I had. The questions are very similar to AWS exam readiness with real use cases. Most of the questions have a long description. Pay attention to the keywords. The exam is almost three hours, so you should have enough time to understand and finish each question. You should also have time to review any questions you flag for later review. Some questions are really tricky, so make sure you understand the difference among the terms (e.g. DynamoDB vs. HBase, Redshift vs. EMR, Aurora vs. DynamoDB, Hive vs. Glue) and choose the best solution (e.g. cost/performance/latency/throughput) in the real environment. Hopefully, my experience will help you to study more effectively. Good luck on your exam!