Streaming Data from Kinesis Firehose to Redshift
In this article, we will simulate the devices temperature sensor data sending to Amazon Kinesis Firehose delivery stream that publishes data to Amazon Redshift. After the data stored in Redshift, you can do data visualization and prediction with any business intelligence tool.
Data Flow
Here is the streaming data from Kinesis Firehose to Redshift data flow overview:
- Kinesis Data Generator (KDG) generates the devices temperature sensor data and send to Kinesis Firehose delivery stream
- Create Kinesis Data Firehose delivery stream to publish data to Redshift with S3 as an intermediary step
- Use COPY command to copy records from S3 into Redshift’s table
Kinesis Firehose
Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk. With Kinesis Data Firehose, you don’t need to write applications or manage resources.
You configure your data producers, in our example, KDG to send data to Kinesis Data Firehose. Here is the data template on KDG:
{ "device_id": {{random.number(50)}}, "temperature": {{random.number( { "min":10, "max":150 } )}}, "timestamp": "{{date.now("DD/MMM/YYYY:HH:mm:ss Z")}}" }
The sample data will look like this:
{ "device_id": 39, "temperature": 141, "timestamp": "29/Oct/2018:14:19:53 -04:00"}
Then it will automatically deliver the data to the destination that you specified e.g. Redshift as the destination. You can also configure Kinesis Data Firehose to transform your data before delivering it. In our example, we defined jsonpaths expression to transform data for Redshift COPY option:
-
- jsonpaths expression:
{ "jsonpaths": [ "$.device_id", "$.temperature", "$.timestamp" ] }
-
- COPY option:
json 's3:///jsonpaths.json' region '';
Redshift
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. In our example, we created a Redshift cluster with the demo table to store the simulated devices temperature sensor data:
create table demo ( device_id varchar(10) not null, temperature int not null, timestamp varchar(50) );
Conclusion
Kinesis Data Firehose delivery stream provides the easy way to load the streaming data into data warehouse. You can combine this example with Amazon QuickSight for visualization and Amazon Machine Learning for data prediction. Please review the video for the details.