Why Use AWS Redshift Spectrum with Data Lake
A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. AWS uses S3 to store data in any format, securely, and at a...
A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. AWS uses S3 to store data in any format, securely, and at a...
I mentioned in my post “Which is Right Hadoop Solution for You?” that if your Big Data is below 1.6PB, you may want to take a look at the Redshift data warehouse option. When we...
As I mentioned in the Hadoop ecosystem cheat sheet, the Hadoop ecosystem is open-source with plenty of add-on packages; additionally, you can build your own Hadoop system with these free resources. However, it will be...
Apache Hadoop 3.1.1 was released on the eighth of August with major changes to YARN such as GPU and FPGA scheduling/isolation on YARN, docker container on YARN, and more expressive placement constraints in YARN. Apache...
The nonrelational database solves the challenges from the relational database such as a well-defined structure of data, the predefined schema, and the vertical database scaling. NoSQL database provides the flexibility to extend the data structure...
I’m so happy to announce that my new course Hands-on AWS DynamoDB is published! The course Hands-on AWS DynamoDB is designed for anybody who wants to learn AWS NoSQL Database Solution DynamoDB. There are twelve fun hands-on projects in this course. Such...
MongoDB and DynamoDB are two popular choices on NoSQL databases. Both have excellent features to support business needs. MongoDB vs. DynamoDB: How do you choose between them? The right choice really depends on your...
After I built the data warehouse on AWS Reshift and analyzed the visualization on AWS QuickSight. I wonder if there is anything else I can do to analyze the data or predict patterns on the...
The serverless web application is a good example to utilize AWS S3, AWS API gateway, AWS Lambda, and AWS DynamoDB. In my online training course Hands-on AWS DynamoDB, I demonstrated a simple product utility...
AWS Redshift Data Warehouse solution is based on PostgreSQL but beyond just PostgreSQL. It is specifically designed for online analytical processing (OLAP) and business intelligence (BI) applications. Redshift is a fast, fully managed petabyte-scale...