In all my recent projects monitoring applications has been a topic of frustration. Establishing a good solution has been far from simple. Sometimes the requirements are making it difficult, other times it is the lack of, or confusing setup of tooling and infrastructure that is the limiting factor. In addition, I think we are often too tied up in the "old way" of doing it.
While monitoring infrastructures and platforms has been done by listening to components directly, applications has typically been monitored through log files. For years this has been the standard. Typically a log aggregation tool, like Splunk, Elastic and Greylog is put on top of the logs to make them readable and to aggregate information from sets of log files. Also, nice dashboards are created to visualize the log content.
This approach has its limits. It generates a big load of log files and the log content must be transported to the log aggregation tool in some way.
The rise of distributed applications
With the rise of distributed applications, like microservices and serverless/functions, we can really see a shift towards needs for monitoring the application itself more directly. The number of applications to monitor grows, the amount of log files grows, and the infrastructure are more dynamic. Also, the use of central log storage doesn't really fit very good into the rest of the infrastructure in my eyes.
Wouldn't it be better if we could listen to the applications the way we listen to the infrastructure? Wouldn't it be better if we could use some of the live data to do corrective actions on application levels before it turns into problems?
A long time back we started to see both Spring and Java EE enabled monitoring of applications behavior using the JMX API, but still very limited. Now this is changing. By using Spring Boot Actuator a lot more information can be provided through a standardized interface, and the content can even be customized to deliver customized content in addition to the standards.
In addition, tools like Spring Cloud Sleuth gives us a lot of tracing possibilities, but it is still based on log files. Adding tools like Zipkin on top gives you good tracing of the applications, but still the logfiles is the basic.
Now we see that tools for monitoring the applications more directly are becoming more and more visible in the market. This is called Distributed Tracing.
As a base for most tools supporting distributed tracing OpenTracing formalizes a standard for format and how the tracing should be extracted from applications. OpenTracing also "certifies" agents (called tracers) supporting the standard. The protocol for transferring information is different for the different tools, but some of them uses Kafka, while others use RESTful services or other transferring mechanism.
Some of the tools supporting this standard as we speak are Jaeger, DataDog, LightStep, Zipkin, Instana and Elastic. I have seen a presentation of most of these tools this week, and I believe this is the future of how to monitor distributed applications.
Next for me is to try some of them out and see how easy they are to configure and use, and maybe replace some current logging with live tracing instead. I recommend others to have a look as well, and not get stuck in old patterns.