AWS Big Data Study Notes – AWS DynamoDB, S3 and SQS
This is the cheat sheet on AWS DynamoDB, S3, and SQS.
AWS DynamoDB
Features
- Fully managed NoSQL database service:
- On-demand capacity mode
- Built-in support for ACID transactions
- On-demand backups and point-in-time recovery
- Encryption at rest
- Supports both key-value and document data models
- Basic concepts
- Tables, items, attributes
- Primary key: partition key (hash attribute) or partition key with sort key (range attribute)
- Read consistency: Eventually consistent reads (Default) and strongly consistent reads
- Data types: Scalar Types(number, string, binary, Boolean, and null), Document Types(list and map), Set Types(string set, number set, and binary set)
- Secondary index: Global secondary index( any two attributes from its table), Local secondary index (the same partition with a different sort key during table creating time)
- Read/write capacity modes for each table: on-demand and provisioned (auto-scaling)
- Throughput capacity: RCU (Read Capacity Unit) = 1 strongly consistent or 2 eventually consistent reads with 4KB per item/second, WRU (Write Capacity Unit) = 1KB per item/second. Exceeded RCU or WCU of a table or one or more global secondary indexes will receive ProvisionedThroughputExceededException
- Manage throughput capacity: burst capacity (handle usage spikes) and adaptive capacity (continue reading/writing to hot partitions)
- Total number of partition = ceiling (max (capacity, size)). Capacity = total RCU/3000 + total WCU/10000. Size = total size/10GB
- Global table: replicate data automatically across your choice of the AWS regions and automatically scale capacity to accommodate your workloads
- Time to Live(TTL): the background job to delete items based on the attribute field (epoch time value) expiration
- DynamoDB Streams: provides a time-ordered sequence of item-level changes in any DynamoDB table. The changes are de-duplicated and stored for 24 hours. Trigger AWS lambda function
- DynamoDB Accelerator (DAX): in-memory cache reduces the response times of eventually-consistent read workloads by an order of magnitude, from single-digit milliseconds to microseconds
Security
- DynamoDB encryption at rest encrypts your data using 256-bit Advanced Encryption Standard (AES-256), which helps secure your data from unauthorized access to the underlying storage.
- Encryption at rest integrates with AWS Key Management Service (AWS KMS) for managing the encryption key that is used to encrypt your tables. When creating a new table, you can choose one of the following customer master keys (CMK) to encrypt your table:
- AWS owned CMK – Default encryption type. The key is owned by DynamoDB (no additional charge).
- AWS managed CMK – The key is stored in your account and is managed by AWS KMS (AWS KMS charges apply). The AWS managed CMK provides these additional features:
- You can view the CMK and its key policy. (You cannot change the key policy.)
- You can audit the encryption and decryption of your DynamoDB table by examining the DynamoDB API calls to AWS KMS using AWS CloudTrail.
- VPC endpoint is provided through the gateway
- DynamoDB streams don’t support encryption
- Access to tables/API/DAX using IAM
AWS S3
Features
- Storage classes:
- S3 Standard – general Purpose of frequently accessed data. 99.999999999% durability and 99.99% availability across multiple AZs with 2 concurrent facility failures
- S3 Standard-Infrequent Access (IA) – long-lived, but less frequently accessed data. Cost less than S3 standard with the same durability and availability across multiple AZs
- S3 One Zone-Infrequent Access – support same data access as standard IA but in single AZ. Cost less than S3 Standard IA with 99.999999999% durability and 99.5% availability
- S3 Intelligent Tiering – optimize storage costs by automatically moving data to the most cost-effective storage access tier
- Glacier – long-term archive. Archives are stored in vaults. The vault has access policy (restrict user/account permissions) and lock policy (immutable never be changed)
- Access Control Lists(ACL): allow Read and/or Write access to both the objects in the bucket and the permissions to the object
- Versioning: can only be disabled on the bucket but not removed after enabled versioning
- Lifecycle management: transition through the tiers e.g. S3 standard to IA to Glacier
- Consistency models: read after write consistency for PUTS of new objects; eventual Consistency for DELETES and PUTS of existing objects
Security
- Encryption: SSE-S3, SSE-KMS, SSE-C, and client-side encryption
- IAM: bucket policies and user policies, ACLs, block
AWS SQS
- Fully managed message queues. Supports both standard and FIFO queues
- Message size limit is 256KB
- Security
- IAM: IAM user/role to allow usage of SQS, access policy to control over IP or control over the time the requests come in
- SSE-KMS encrypts the body but not metadata
- HTTPS
Notes List
- How to Pass AWS Certified Big Data SpecialtyAWS Big Data Study Notes – AWS Kinesis
- AWS Big Data Study Notes – EMR and Redshift
- AWS Big Data Study Notes – AWS Machine Learning and IoT
- AWS Big Data Study Notes – AWS QuickSight, Athena, Glue, and ES
- AWS Big Data Study Notes – AWS DynamoDB, S3 and SQS
- AWS Kinesis Data Streams vs. Kinesis Data Firehose
- Streaming Platforms: Apache Kafka vs. AWS Kinesis
- When Should Use Amazon DynamoDB Accelerator (AWS DAX)?
- Data Storage for Big Data: Aurora, Redshift or Hadoop?
- Apache Hadoop Ecosystem Cheat Sheet