Writing a Distributed Event Streaming Service in Rust

As a passionate software engineer always seeking to expand my skillset, I recently took on an ambitious project - the CASCADE project. My primary motivation for embarking on this challenge was to gain hands-on experience with Rust, gRPC, and Kubernetes, particularly in the realm of intra-service communication, such as the connection between the REST API and producer nodes.

The core functionalities of this project revolved around creating a robust messaging system that implemented features like REST endpoints for consumers and producers, fault-tolerant consumer and producer clusters for maximum throughput, and persistent storage for events, topics, and partitions. Additionally, implementing the orderly storage of events within partitions using a queue-based system was a priority. The project made use of modern technologies such as Rust, Tokio, Tonic, gRPC (Protocol Buffers), AWS EKS/Kubernetes, and Docker to ensure high performance and efficiency.

As the project lead, I was responsible for spearheading the initiative, developing much of the infrastructure, assigning roles, troubleshooting bugs, and coordinating the efforts of three talented team members. One of the most significant challenges we faced was devising a method to efficiently look up events in the log file. We aimed for events to have variable lengths, which meant that we couldn't simply rely on mathematical calculations to determine the index of a given event. Our solution involved using a separate index file with an implicit mapping from the event number to the event index.

The index table consists of 4-byte integers, each representing the index of a corresponding event. With this constant width, we can effortlessly map the event number to the index in the log file. For example, to find the index of event #6, we multiply 6 by 4 to get 24 and read the integer stored at offset 24 in the index table.

let event_num = data_received.event_number;
index_table_file.seek(SeekFrom::Start(event_num*4))?;
let mut index_buffer = [0;4];
index_table_file.read_exact(&mut index_buffer).?;
event_index = usize::from_le_bytes(index_buffer.try_into().unwrap());

In the code snippet above, we read the supplied event number and seek the index table file to this event number multiplied by 4 (since the integers are 4 bytes). We then read the file into a 4-byte long buffer, which provides us with the index of the event in the log file. This approach allows for efficient event lookups while accommodating variable-length events.

Despite the hurdles, our team successfully made progress, and we have our sights set on future improvements. In the next phase of the CASCADE project, we plan to focus on enhancing throughput, latency, and scalability to create an even more robust and reliable messaging system. This experience has been an invaluable opportunity to grow as a developer and apply cutting-edge technology to create innovative solutions.

You can check out the project at github.com.