To measure is to know: the ELK-stackPublished on: Author: Ronald Bakker Category: Java & Web
We order more and more online. Which means we also receive more and more notifications about the delivery of our orders. I work for a large retailer on a Kotlin application that keeps customers informed about the expected time of delivery. We use Elasticsearch, Logstash and Kibana (the ELK stack) to gain insight into the changes we make. Because to measure is to know.
Customers receive push notifications concerning their order status at different times: when the order is scheduled, in transit, almost there or when there is an unexpected change in the arrival time.
My team created an ingenious algorithm for sending push notifications. The algorithm uses input from the delivery trucks. These arrive by the thousands per minute. The algorithm works optimally when the communicated arrival times are as correct as possible. We want to minimize unexpected changes, because customers don't like unnecessary messages.
The effect of modifications
A small change can sometimes have a significant effect. Our backlog consists of all changes made to the algorithm. A major change for example is an extra push notification if the order arrives within 5 minutes. Minor changes could be increasing the time slot or sending push notifications earlier or later. The longer we wait before sending a message, the more accurate the expected arrival time will be. On the other hand, customers want to be informed as soon as possible.
We use the ELK stack to gain insight into the effect of the modifications we make. This tooling ensures that the logging of our application can be used to create dashboards, allowing us to immediately see what is happening in the application. For example, a bar graph can be made of the number of messages sent per type or a histogram of the number of messages sent per customer.
Testing new features in a simulation environment
A simulation environment runs alongside the production environment. The simulation receives the same input (schedules and updates from delivery trucks), but the environment doesn’t send messages to customers. Logging remains the same and is accessed in the same way in the ELK stack as in the production environment. This allows us to test new features in the simulation environment first. We then compare the graphs with the production environment. This way, we measure the effect before we put features into production.
This allows us to see if a new feature is an improvement for the customer or if we need to go back to the drawing board. A major improvement. It just goes to show: to measure is to know!