Microsoft Open Sources Trill to Deliver Insights on A Trillion Events A Day
Posted on December 17, 2018
In today’s high-speed environment, being able to process massive amounts of data each millisecond is becoming a common business requirement. We are excited to be announcing that an internal Microsoft project known as Trill for processing “a trillion events per day” is now being open sourced to address this growing trend.
Here are just a few of the reasons why developers love Trill:
- As a single-node engine library, any .NET application, service, or platform can easily use Trill and start processing queries.
- A temporal query language allows users to express complex queries over real-time and/or offline data sets.
- Trill’s high performance across its intended usage scenarios means users get results with incredible speed and low latency. For example, filters operate at memory bandwidth speeds up to several billions of events per second, while grouped aggregates operate at 10 to 100 million events per second.
A rich history
Trill started as a research project at Microsoft Research in 2012, and since then, has been extensively described in research papers such as VLDB and the IEEE Data Engineering Bulletin. The roots of Trill’s language lie in Microsoft’s former service StreamInsight, a powerful platform allowing developers to develop and deploy complex event processing applications. Both systems are based off an extended query and data model that extends the relational model with a time component.
While systems prior to Trill only achieved subsets of these benefits, Trill provides all these advantages in one package. Trill was the first streaming engine to incorporate techniques and algorithms that process events in small batches of data based on the latency tolerated by the user. It was also the first engine to organize those batches in columnar format, enabling queries to execute much more efficiently than before. To users, working with Trill is the same as working with any .NET library, so there is no need to leave the .NET environment. Users can embed Trill within a variety of distributed processing infrastructures such as Orleans and a streaming version of Microsoft’s SCOPE data processing infrastructure.
Trill works equally well over real-time and offline datasets, achieving best of breed performance across the spectrum. This makes it the engine of choice for users who just want one tool for all their analyses. The highly expressive power of Trill’s language allows users to perform advanced time-oriented analytics over a rich range of window specifications, as well as look for complex patterns over streaming datasets.
After its launch and initial deployment across Microsoft, the Trill project moved from Microsoft Research to the Azure Data product team and became a key component of some of the largest mission-critical streaming pipelines within Microsoft.
Powering mission-critical streaming pipelines
Trill powers internal applications and external services, reaching thousands of developers. A number of powerful, streaming services are already being powered by Trill, including:
“Trill enables Financial Fabric to provide real-time portfolio & risk analytics on streaming investment data, fundamentally changing the way financial analytics on high volume and velocity datasets are delivered to fund managers.” – Paul A. Stirpe, Ph.D., Chief Technology Officer, Financial Fabric
“Trill has enabled us to process large scale data in petabytes, within a few minutes and near real-time compared to traditional processing that would give us results in 24 plus hours. The key capabilities that differentiate Trill in our view are the ability to do complex event processing, clean APIs for tracking and debugging, and the ability to run the stream processing pipeline continuously using temporal semantics. Without Trill, we would have been struggling to get streaming at scale, especially with the additional complex requirements we have for our specific big data processing needs.” – Rajesh Nagpal, Principal Program Manager, Bing
“Trill is the centerpiece of our stream processing system for ads in Bing. We are able to construct and execute complex business scenarios with ease because of its powerful, consistent data model and expressive query language. What’s more is its design for performance, Trill lives up to its namesake of “trillions of events per day” because it can easily process extremely large volumes of data and operate against terabytes of state, even in queries that contain hundreds of operators.” – Daniel Musgrave, Principal Software Engineer, Bing
Azure Stream Analytics
“Azure Stream Analytics went from the first line of code to public preview within 10 months by using Trill as the on-node processing engine. The library form factor conveniently integrates with our distributed processing framework and input/output adaptors. Our SQL compiler simply compiles SQL queries to Trill expressions, which takes care of the intricacies of the temporal semantics. It is a beautiful programming model and high-performance engine to use. In the near future, we are considering exposing Trill’s programming model through our user defined operator model so that all of our customers can take advantage of the expressive power.” – Zhong Chen, Principal Group Engineering Manager, Azure Data.
“Trill has been intrinsic to our data processing pipeline since the day we introduced it into our services back in 2013. Its impact has been felt by any player who has picked up the sticks to play a game of Halo. Their data dense game telemetry flows through our pipelines and into the Trill engine within our services. From finding anomalous and interesting experiences to providing frontline defense against bad behavior, Trill continues to be a stalwart in our data processing pipeline.” – Mike Malyuk, Senior Software Engineer, Halo
There are many other examples of Trill enabling streaming at scale, including Exchange, Azure Networking, and telemetry analysis in Windows.
We believe there is no equivalent to Trill available in the developer community today. In particular, by open-sourcing Trill we want to offer the power of the IStreamable abstraction to all customers the same way that IEnumerable and IObservable are available. We hope that Trill and IStreamable will provide a strong foundation for streaming or temporal processing for current and future open-source offerings.
We also have many opportunities for community involvement in the future development of Trill. First, one of Trill’s extensibility points is that it allows users to write custom aggregates. Trill’s internal aggregates are implemented in the same framework as user-defined ones. Every aggregate uses the same underlying high-performance architecture with no special cases. While Trill has a wide variety of aggregates already, there are countless others that could be added, especially in verticals such as finance.
There are also several research projects built on top of Trill where the code exists but is not yet in product-ready form. Three projects at the top of our working list include:
- Digital signal processing with the capability and performance normally seen in R or better.
- An improved ability to handle out of order data for allowing users to specify multiple levels of latency.
- Allowing operator state to be managed using the recently open-sourced FASTER framework.
Welcome to Trill!
We are incredibly excited to be sharing Trill with all of you! You can look forward to more blog posts about Trill’s API, how Trill is used within Microsoft, and in-depth technical details. In the meantime, you can find Trill sources and documentation in our GitHub repository, take Trill for a spin, and tell us what you think!