A few weeks ago, we launched two significant performance improving features for Amazon Kinesis Data Streams (KDS): enhanced fan-out and an HTTP/2 data retrieval API. Enhanced fan-out allows developers to scale up the number of stream consumers (applications reading data from a stream in real-time) by offering each stream consumer its own read throughput. Meanwhile, the HTTP/2 data retrieval API allows data to be delivered from producers to consumers in 70 milliseconds or better (a 65% improvement) in typical scenarios. These new features enable developers to build faster, more reactive, highly parallel, and latency-sensitive applications on top of Kinesis Data Streams.
Kinesis actually refers to a family of streaming services: Kinesis Video Streams, Kinesis Data Firehose, Kinesis Data Analytics, and the topic of today’s blog post, Kinesis Data Streams (KDS). Kinesis Data Streams allows developers to easily and continuously collect, process, and analyze streaming data in real-time with a fully-managed and massively scalable service. KDS can capture gigabytes of data per second from hundreds of thousands of sources – everything from website clickstreams and social media feeds to financial transactions and location-tracking events.
Kinesis Data Streams are scaled using the concept of a shard. One shard provides an ingest capacity of 1MB/second or 1000 records/second and an output capacity of 2MB/second. It’s not uncommon for customers to have thousands or tens of thousands of shards supporting 10s of GB/sec of ingest and egress. Before the enhanced fan-out capability, that 2MB/second/shard output was shared between all of the applications consuming data from the stream. With enhanced fan-out developers can register stream consumers to use enhanced fan-out and receive their own 2MB/second pipe of read throughput per shard, and this throughput automatically scales with the number of shards in a stream. Prior to the launch of Enhanced Fan-out customers would frequently fan-out their data out to multiple streams to support their desired read throughput for their downstream applications. That sounds like undifferentiated heavy lifting to us, and that’s something we decided our customers shouldn’t need to worry about. Customers pay for enhanced fan-out based on the amount of data retrieved from the stream using enhanced fan-out and the number of consumers registered per-shard. You can find additional info on the pricing page.
Before we jump into a description of the new API, let’s cover a few quick notes about HTTP/2 and how we use that with the new SubscribeToShard API.
HTTP/2 is a major revision to the HTTP network protocol that introduces a new method for framing and transporting data between clients and servers. It’s a binary protocol. It enables many new features focused on decreasing latency and increasing throughput. The first gain is the use of HPACK to compress headers. Another useful feature is connection multiplexing which allows us to use a single TCP connection for multiple parallel non-blocking requests. Additionally, instead of the traditional request-response semantics of HTTP, the communication pipe is bidirectional. A server using HTTP/2 can push multiple responses to a client without waiting for the client to request those resources. Kinesis’s SubscribeToShard API takes advantage of this server push feature to receive new records and makes use of another HTTP/2 feature called flow control. Kinesis pushes data to the consumer and keeps track of the number of bytes that have been unacknowledged. The client acknowledges bytes received by sending
WINDOW_UPDATE frames to the server. If the client can’t handle the rate of data, then Kinesis will pause the flow of data until a new
WINDOW_UPDATE frame is received or until the 5 minute subscription expires.
Now that we have a grasp on SubscribeToShard and HTTP/2 let’s cover how we use this to take advantage of enhanced fan-out!
Using Enhanced Fan-out
The easiest way to make use of enhanced fan-out is to use the updated Kinesis Client Library 2.0 (KCL). KCL will automatically register itself as a consumer of the stream. Then KCL will enumerate the shards and subscribe to them using the new SubscribeToShard API. It will also continuously call SubscribeToShard whenever the underlying connections are terminated. Under the hood, KCL handles checkpointing and state management of a distributed app with a Amazon DynamoDB table it creates in your AWS account. You can see an example of this in the documentation.
The general process for using enhanced fan-out is:
- Call RegisterStreamConsumer and provide the StreamARN and ConsumerName (commonly the application name). Save the ConsumerARN returned by this API call. As soon as the consumer is registered, enhanced fan-out is enabled and billing for consumer-shard-hours begins.
- Enumerate stream shards and call SubscribeToShard on each of them with the ConsumerARN returned by RegisterStreamConsumer. This establishes an HTTP/2 connection, and KDS will push SubscribeToShardEvents to the listening client. These connections are terminated by KDS every 5 minutes, so the client will need to call SubscribeToShard again if you want to continue receiving events. Bytes pushed to the client using enhanced fan-out are billed under enhanced fan-out data retrieval rates.
- Finally, remember to call DeregisterStreamConsumer when you’re no longer using the consumer since it does have an associated cost.
You can see some example code walking through this process in the documentation.
You can view Amazon CloudWatch metrics and manage consumer applications in the console, including deregistering them.
Enhanced fan-out and the new HTTP/2 SubscribeToShard API are both available now in all regions for new streams and existing streams. There’s a lot more information than what I’ve covered in this blog post in the documentation. There is a per-stream limit of 5 consumer applications (e.g., 5 different KCL applications) reading from all shards but this can be increased with a support ticket. I’m excited to see customers take advantage of these new features to reduce the complexity of managing multiple stream consumers and to increase the speed and parallelism of their real-time applications.