Cloud Big Data Analytics Solution on AWS or GCP
Last time when we talked about big data, we focused on big data storage (Data Storage for Big Data: Aurora, Redshift or Hadoop?) and the Hadoop solution you should pick (Which is Right Hadoop Solution for You? ). In this article, let’s take a look at cloud big data analytics solution on cloud services providers AWS and GCP.
Cloud Big Data Analytics Solution
Cloud big data analytics solution can provide numerous benefits for you in an agile culture. The more flexible solution keeps you up to date with the latest cloud computing technology. The easier infrastructure setup allows you to focus more on business logic and software development. The well-architectured framework on the cloud will help you drive the innovation. Both AWS and GCP provide the well-architectured big data analytics solution. The concept of big data architecture on both cloud service providers is very similar to each other. So the question will be which cloud services provider you should pick.
Architectural principle
Both provide fully managed services to remove operational overhead so that you focus on your big data analytics solution but not infrastructure set up. Both can handle performance, scalability, availability, security, and compliance need automatically. The big data analytics solution requires that you build a decoupled system by separating data capture, data storage, data processing, data analyzing, and consuming. You can decouple data compute and data storage, build a data lake, and separate raw data and processed data with both providers. Also, both are built on machine learning and AI-ready foundations.
AWS Big Data Analytics solution from AWS: reInvent:
GCP Big Data Analytics solution from the GCP web site:
Products
Here is the map of AWS and GCP Big Data Analytics products:
Conclusion
I didn’t see any significant difference between AWS and GCP big data analytics solution in performance, security, availability, and orchestration. Both integrate very well with other cloud-native services (e.g. AI/ML, objects storage, security, computing and more). From the user’s perspective, both are easy to use and seem to take the same amount of effort to configure. The pricing on GCP is lower than on AWS. I do like AWS Glue’s data catalog feature. AWS EMR does support more popular frameworks (e.g. Spark, Presto, Flink, Hue, Phoneix and more) than GCP DataProc ( only Spark, Hadoop, Pig, and Hive). Therefore, the choice of cloud service provider depends on what you need and what the provider offers not only on big data analytics solution but also overall of the cloud ecosystem. Since AWS occupied the market earlier than GCP, you will find the more open sources and free tools on AWS than GCP. AWS has better documentation such as API reference, user guide, and whitepaper on architectures. If you only need Spark for the big data processing, then choose GCP for the lower cost. In conclusion, no matter what cloud service you opt for, you’ll be satisfied with the results.
Reference
AWS Big Data Analytics Architectural Patterns and Best Practices
GCP Big Data Solution