Stream Processing on the Edge – Part 1

In the latest years, I have been working on Fluent Bit project making it a reliable system-level tool to solve most of the logging challenges that we face nowadays, and with a strong focus on cloud environments. This process has been a joint effort with the community of individuals and companies who are deploying it in production.

The whole point of Logging is to perform Data Analysis, so whatever makes it reliable, easier and flexible is a good addition to have; as a project maintainer I am always looking for innovation and the Stream Processing topic have got a lot of attention in my circle of colleagues and community in general.

Stream Processing (aka SP) can be described as the ability to perform data processing while it\'s still in motion. Most of the people who are familiar with the SP term knows about Apache Spark, Apache Flink and Kafka Streams within others. Most of this tooling provides a full set of data processing capabilities and helps to perform a flexible Data Analysis once the data is fully aggregated.

I mentioned above that Stream Processing happens once the data is aggregated, this means that different services are sending data from multiple local/remote sources and aggregating them in a central place so data processing and analysis can be performed. But what about if we could do distributed stream processing on the edge side? this would be very beneficial since we could catch exceptions or trigger alerts based on specific data processing results as soon as they happen.

To implement Stream Processing on the Edge we need the proper tooling that at least must have the following features:

  • Ability to collect, parse, filter and deliver data to remote hosts.
  • Lightweight: low memory and CPU footprint.
  • Provide a Query language to perform computation on top of streams of data.
  • Be Open Source (of course right ? 🙂 )

Fluent Bit is a good fit since it\'s nature is data collection, processing, and data delivery, it\'s a good option to extend it with Stream Processing capabilities, and that\'s something that at Arm and Treasure Data we have been working in the last weeks (despite the idea was born on 2018).

Our current implementation will be showcased in the upcoming Fluent Bit v1.1.0 release on April 2019. It brings a Stream Processor Engine with SQL support to query records, run aggregation functions doing windowing and optional grouping. In addition, it also allows the creation of new streams of data using query results that can be tagged and routed as normal records of the Fluent Bit pipeline, e.g:

In the next part, I will be sharing details on how to get started with this new Stream Processing feature. As usual we are looking for your feedback...