How to set up monitoring tools for JVM

Table of Contents

This article was originally published on Medium.

Cover image by Towfiqu barbhuiya on Unsplash

In this blog post I’ll show how to set up two monitoring tools that are widely used for Java applications - Java Flight Recorder with JDK Mission Control and Prometheus with Grafana.

When your application reaches a state when it’s production ready it’s a good time to celebrate. Soon users will start using it, which is great, but it creates a new type of problem. How to monitor how an application behaves under the load of users traffic? How to validate if everything goes smoothly and users are not experiencing any kind of slowness or errors? The answer to these questions would be to use a performance monitoring tools.

With this blog post I’ll show how to set up two opensource monitoring tools available for JVM - Java Flight Recorder with JDK Mission Control and Prometheus with Grafana. It’s a first article out of two. In this one I’ll focus on installing both tools and in the second I’ll explain how read the indicators that both of these tools are providing.

Project preparation #

The entire code base that will be mentioned thru the article is available on GitHub - monitoring-sandbox. It contains a source code of a Java application and all configuration files for both monitoring tools.

The only thing that is required is to have Docker installed. Using simple Docker Compose command (docker-compose up -d) we can start all necessary pieces (monitored application with JFR enabled, Prometheus and Grafana) so you don’t need to follow all my steps. They’re just for explanation what’s going on.

Setting up Java Flight Recorder with JDK Mission Control #

Let’s now move to more practical part of the post and have a look on JDK Flight Recorder (JFR) and JDK Mission Control (JMC).

As a name might suggest for a first one it has something in common with black box flight recorder used in airplanes. Similarly to the ones used in aircrafts their primary function is to record and save everything that is happening inside a plane, so in case of an accident specialists could rewind all the steps back to find out what was the cause to prevent from happening it again. In Java JFR plays similar role. It records events that occur inside JVM and application. It collects them and store e.g. in a file. Many of events are already built-in, which enables us to see what’s happening inside our application, e.g. we can see how Garbage Collector is working, how Threads are used etc., but it’s also possible to define custom events inside an application. It could be used for monitoring business processes and performance or whatever we would like.

As already mentioned an output from JFR will be a recording, usually a file. Now how to open it? It could be achieved by using jfr CLI which is bundled with JDK since version 11 (for open source version of Java distributions, for commercial ones it was from Java 7). It allows to list all events by the type and print them in the consol. But it’s not convenient. Luckily there is a tool which was already mentioned - JDK Mission Control (JMC).

JMC is a desktop tool that can read JFR recordings and visualize it in a very nice way. Similarly to JFR it was bundled with JDK, but now it’s distributed separately and can be downloaded from the official website.

Having this introduction let’s play around with JFR. For demonstration I’ve prepared a small Java project that is available on GitHub - monitoring-sandbox. For convenience we will be running an app using Docker. Therefore here is a Dockerfile located in a /app folder:

FROM maven:3.8.2-openjdk-17-slim AS build
RUN mkdir -p /workspace
WORKDIR /workspace
COPY pom.xml /workspace
COPY src /workspace/src
RUN mvn -B -f pom.xml clean package -DskipTests

FROM openjdk:17-jdk
COPY --from=build /workspace/target/*.jar app.jar
RUN mkdir -p /recordings
EXPOSE 8080
ENTRYPOINT ["java","-jar","app.jar"]

It’s a very simple configuration. The application is running on a recently released OpenJdk 17 from Oracle. This is very important information because depending on the distribution of JDK we can have different results as internals of each distribution might be a little different.

Another thing that is worth to mention is a line with RUN mkdir -p /recordings. With it we’re creating a folder inside Docker container in which we will next store JFR recordings.

We could run an app using docker run command, but let’s have another approach and use Docker Compose, which will also be used in the next part when we’ll be setting up Grafana. Here is the docker-compose.yaml from the project:

version: '3.9'

services:

  jvm:
    build: ./app
    container_name: jvm-app
    volumes: 
      - ./recordings:/recordings
    ports:
      - 8080:8080
    entrypoint: ["java","-XX:StartFlightRecording:filename=/recordings/recording.jfr,dumponexit=true","-jar","app.jar"]

A service is called jvm and its Docker image is build using a Dockerfile from /app directory. With volumes we’re mapping a project’s recording directory with a directory with the same name inside a Docker container. It’s necessary to add it, because JFR recordings will be saved there and in order to use them in JDK Mission Control we need extract them first from a running container. Finally there is ports mapping provided.

Having all of that we can move on to the main course - starting an application along with Java Flight Recorder.

There are several ways to start recording. One of them is to enable it using the -XX:StartFlightRecording argument of the java command during application startup. This is why there is an entrypoint provided in the docker-compose.yaml file. Below there is a command that is used to bootstrap an application:

java -XX:StartFlightRecording:filename=/recordings/recording.jfr,dumponexit=true -jar app.jar

To tune JFR recording we can add some flags. In above example two of them were introduced:

filename - specifies location and name of a recording, i.e. /recordings/recording.jfr inside of a container,
dumponexit - if in an application crashes or just stop we can specify if a recording will be dumped, or in other words if JFR should save a recording after stopping a program. In our case a value is true which means it will be persisted.

More of available options can be found here and here.

Let’s start an application and see what will happen. To do that just run following command:

docker-compose up -d jvm

Now inside /recordings folder there should be a recording.jfr file. If we want now to use JDK Mission Control to inspect recording file we would receive an error. The reason is that first we need to end recording, only then a file can be read. Therefore we need to stop a program:

docker-compose down

After that we are able to use the recording.jfr file in JDK Mission Control.

But is it possible to get access to recording other way than “crashing” an application? Sure it is!

First of all we need to get inside the running container. We can achieve it by opening an interactive shell for a container with a name jvm-app:

> docker exec -it jvm-app sh

sh-4.4#

As a result we’re in the running container. If for example we run the ls command we gets an information what’s inside a container (starting from now on all commands will be executed inside of the container and it’s manifested by the sh-4.4#):

> docker exec -it jvm-app sh

sh-4.4# ls
app.jar  bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  recordings  root  run  sbin  srv  sys  tmp  usr  var

To control JFR we use a jcmd CLI which is bundled with JDK. To check what other tools are provided with JDK run the command:

sh-4.4# ls $JAVA_HOME/bin
jar        java   javadoc  jcmd      jdb        jdeps  jhsdb   jinfo  jmap  jpackage  jrunscript  jstack  jstatd   rmiregistry
jarsigner  javac  javap    jconsole  jdeprscan  jfr    jimage  jlink  jmod  jps       jshell      jstat   keytool  serialver

But for now let’s focus on jcmd. By typing only the jcmd we will get a list of Java processes that are happening right now inside of the container:

sh-4.4# jcmd
1 app.jar
183 jdk.jcmd/sun.tools.jcmd.JCmd

Above we can see that there is an app.jar process with a pid (process id) which equals 1. To check if there is an active recording for app.jar type following command:

sh-4.4# jcmd 1 JFR.check
1:
Recording 1: name=1 maxsize=250.0MB (running)

Recording is active, great! Now let’s dump what has been collected so far into a separate file located in the recordings folder:

sh-4.4# jcmd 1 JFR.dump filename=/recordings/short-recording.jfr
1:
Dumped recording, 3.2 MB written to:

/recordings/short-recording.jfr

In above command first we specify apid, which is 1 as we already checked, and then a JFR.dump command that tells to write a recording to a file located in specific directory (filename=/recordings/short-recording.jfr).

Let’s now check a recording using jfr CLI which, as well as jcmd, is provided in JDK.

sh-4.4# jfr summary /recordings/short-recording.jfr

 Version: 2.1
 Chunks: 1
 Start: 2021-09-26 17:26:22 (UTC)
 Duration: 1081 s

 Event Type                                   Count  Size (bytes)
==================================================================
 jdk.NativeMethodSample                       53054        635109
 jdk.JavaMonitorWait                          51660       1393022
 jdk.ThreadPark                                2489         96828
 jdk.GCPhaseParallel                           2195         53536
 jdk.CheckPoint                                1184        833801
 jdk.CPULoad                                   1079         21547
 jdk.JavaThreadStatistics                      1079         12915
 jdk.ClassLoadingStatistics                    1079         11836
 jdk.CompilerStatistics                        1079         31258
 jdk.ExceptionStatistics                       1079         15073
 jdk.ModuleExport                               946         10256
 jdk.ObjectAllocationSample                     780         11772
 jdk.BooleanFlag                                533         15762
 jdk.ActiveSetting                              355         10945
 jdk.ThreadCPULoad                              314          5311

We can print all events that are stored in recording, but be aware that if your recording was very long you’ll get a huge amount of events:

sh-4.4# jfr print /recordings/short-recording.jfr

jdk.ActiveSetting {
  startTime = 17:26:22.702
  id = 1185
  name = "threshold"
  value = "0 ns"
  eventThread = "main" (javaThreadId = 1)
}

jdk.ActiveSetting {
  startTime = 17:26:22.702
  duration = 0.732 ms
  id = 132
  name = "period"
  value = "beginChunk"
  eventThread = "main" (javaThreadId = 1)
}

jdk.ActiveSetting {
  startTime = 17:26:22.702
  duration = 0.732 ms
  id = 132
  name = "enabled"
  value = "true"
  eventThread = "main" (javaThreadId = 1)
}

It might be quite overwhelming, therefore we can filter them out using --categories option:

sh-4.4# jfr print --categories 'GC' /recordings/short-recording.jfr

jdk.GCPhasePause {
  startTime = 17:32:37.583
  duration = 3.55 ms
  gcId = 10
  name = "GC Pause"
  eventThread = "VM Thread" (osThreadId = 14)
}

jdk.GarbageCollection {
  startTime = 17:32:37.583
  duration = 3.55 ms
  gcId = 10
  name = "G1New"
  cause = "Metadata GC Threshold"
  sumOfPauses = 3.55 ms
  longestPause = 3.55 ms
}

jdk.YoungGarbageCollection {
  startTime = 17:32:37.583
  duration = 3.55 ms
  gcId = 10
  tenuringThreshold = 15
}

jdk.G1GarbageCollection {
  startTime = 17:32:37.583
  duration = 3.55 ms
  gcId = 10
  type = "Concurrent Start"
}

Even if we can filter those events and as they are quite easy to read still they’re pretty hard to analyze. We need to have a bigger picture. So this is why we need to use a different tool - JDK Mission Control, which will make for us some visualizations.

First exit from a running container, so just type exit. Now let’s open JMC assuming that we have already installed it from the official website. The procedure will depends on the OS system and it’s described in instructions.

Once it has started we need to load recording file. Therefore select File > Open file… and find a recording file, which can be either a short recording or the entire one that’s created when application shuts down (remember both of them are located in recordings folder). After that you should get similar screen:

On the left there are couple categories into which we can dive in. After clicking on one of them we can see more details, e.g. by clicking on Memory we can get an insight of application memory allocated and how long Garbage collection took:

We can see even more details about Garbage Collection by clicking on the Garbage Collection item:

Another good example of information that we can get from JMC is total CPU usage (Processes item):

But there a way more information that we can read from it, which I would encourage you to explore on your own (or check my next article on that topic ;)). For now this was everything that I would like to show and let’s move on to the next section.

Setting up JVM dashboards in Grafana #

Previously discussed JFR is an amazing tool to get very in depth information about working JVM. But the tricky part is how to set it up so the information will be collected and stored in continuous way, without manually dumping data every time we would like to make an analysis. The solution to such inconvenience would be to make a use of something that’s called JFR Event Streaming, but probably it was introduced in Java 14 it’s still in very early stages and not many tools were built on top of that. Possible reason is that vast majority of an industry still sticks with the latest LTS version of Java, which is 11. With a recent release of Java 17, which is an LTS version, it will change soon as more and more companies will be migrating to it.

Until then there are other tools which might come with help. There are dozens of them some open-sourced, some commercial. Today I’ll show you how to use a very popular ones - Prometheus and Grafana.

Here are all the pieces of a project that will be built:

Micrometer - allows to expose metrics from JVM application,
Promethesu - scrapes and stores metrics from applications,
Grafana - visualize stored metrics.

Small remark, before we start. If you don’t want to setup everything on your own, you can jump straight forward to the next part. All code is available in my GitHub repository - https://github.com/wkrzywiec/monitoring-sandbox.

First step is to expose metrics by the application itself. They will be than periodically collected and stored by Prometheus. Exposing metrics is very easy in Spring Boot application. The only thing that is needed is to add micrometer-registry-prometheus dependency (e.g. to Maven’s pom.xml file):

<dependencies>
		<dependency>
			<groupId>io.micrometer</groupId>
			<artifactId>micrometer-registry-prometheus</artifactId>
			<scope>runtime</scope>
		</dependency>
</dependencies>

Another thing that should be added, but it’s not a must, is to include application name into exposed metrics which can be done by adding management.metrics.tags.application property to application.properties or application.yaml file:

spring:
  application:
    name: jvm-app

management:
  metrics:
    tags:
      application: ${spring.application.name}

After running an application and entering its /actuator/prometheus endpoint something like this should appear:

# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next
# TYPE jvm_gc_memory_allocated_bytes_total counter
jvm_gc_memory_allocated_bytes_total{application="jvm-app",} 6.291456E7
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="CodeHeap 'non-nmethods'",} 5840896.0
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="Metaspace",} -1.0
jvm_memory_max_bytes{application="jvm-app",area="heap",id="G1 Old Gen",} 3.31350016E9
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.22908672E8
jvm_memory_max_bytes{application="jvm-app",area="heap",id="G1 Survivor Space",} -1.0
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="Compressed Class Space",} 1.073741824E9
jvm_memory_max_bytes{application="jvm-app",area="heap",id="G1 Eden Space",} -1.0
jvm_memory_max_bytes{application="jvm-app",area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 1.22908672E8

Great! Metrics are now exposed by the application, so we can move on to building monitoring infrastructure with Docker.

Because both Prometheus and Grafana are standalone application we need first to run them. For convenience I’m using Docker Compose, therefore in the docker-compose.yaml file I’ve defined 3 services:

version: '3.9'

services:

  jvm:
    build: ./app
    container_name: jvm-app
    ports:
      - 8080:8080

  prometheus:
    image: prom/prometheus:v2.30.0
    container_name: prometheus
    ports:
      - 2000:9090
    volumes:
      - prometheus:/prometheus
      - ./infra/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:8.1.4
    container_name: grafana
    ports:
      - 3000:3000
    environment:
      - PROMETHEUS_URL=http://prometheus:9090
      - DS_PROMETHEUS=prometheus
    volumes:
      - ./infra/grafana/config.ini:/etc/grafana/config.ini
      - ./infra/grafana/provisioning:/etc/grafana/provisioning
      - ./infra/grafana/dashboards:/var/lib/grafana/dashboards
    depends_on:
      - prometheus

volumes:
  prometheus:

First service, jvm, is a Spring Boot application which will be monitored and it’s the same appliacation that was investigated before.

Second service, prometheus, will be used to collect and store metrics. It’s based on an official Docker image prom/prometheus, it’s exposing 9090 port and map it to a local 2000 port. Because Prometheus persists metrics data over time, we need to create a volume so it will be saved between container startups on local disk.

But that’s not enough. We need to tell Prometheus from where and how to collect data. Luckily, there is an easy way. We can provide a prometheus.yml configuration file and insert it into specific folder in a running container.

global:
  scrape_interval: 5s

scrape_configs:
  - job_name: 'jvm_micrometer'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['jvm:8080']

For one application only one scrape job is defined. It contains information about the address of scraped application and how often Prometheus needs to call it.

The last service, grafana, is used to visualize downloaded data. Like Prometheus it’s based on an official Docker image and is exposing 3000 port. Beside that there are couple things added more for configuration purposes. With environment variables we’re providing address to Prometheus service, data source of metrics. Next, in volumes section we’re mapping couple of local folders into the Grafana’s container.

First file is config.ini, with simple configs (more about it can be found in the official docs):

[paths]
provisioning = /etc/grafana/provisioning

[server]
enable_gzip = true

Next is a provisioning folder with two subfolders. First one, datasources, contains datasources.yml file, which helps to add Prometheus datasource:

apiVersion: 1

datasources:
  - name: prometheus
    type: prometheus
    access: proxy
    url: ${PROMETHEUS_URL}
    is_default: true
    version: 1
    editable: true

In the second folder, dashboards, there is also one YAML file dashboards.yml, in which we configure from where dashboards definitions will be loaded:

apiVersion: 1

providers:
  - name: 'default'
    org_id: 1
    folder: ''
    type: 'file'
    options:
      folder: '/var/lib/grafana/dashboards'

On the same level as provisioning folder create another one called dashboards. For now it will be empty, but soon we will add here a JVM dashboard definition so it will be persisted.

All configurations files I’ve included in a common infra folder, which structure is as following (or you can just check it in GitHub repository):

.
├───grafana
│   ├───dashboards
│   └───provisioning
│       ├───dashboards
│       └───datasources
└───prometheus

Having all of that, we can now run the entire stack using Docker Compose.

> docker compose up -d

[+] Building 33.6s (14/14) FINISHED
...                                                                                         0.0s 

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
[+] Running 4/4
 - Network "monitoring-sandbox_default"  Created                                                                                                     0.0s 
 - Container prometheus                  Started                                                                                                     1.2s
 - Container jvm-app                     Started                                                                                                     1.2s 
 - Container grafana                     Started                                                                                                     3.3s

Great, all three containers are now running. Let’s go now to Grafana’s UI, therefore in a web browser enter http://localhost:3000 address so the login page will show up. Then provide a username and password which is admin in both cases. After successful login a main page should be visible (you will be asked to change the password, but you can go ahead and skip it).

In a meantime Prometheus is already collecting metrics from the jvm-app, but we’re not yet able to visualize it in Grafana. To fix it we need to have a dashboard. We could create our own one, but there is already a great one available to be imported, which was prepared by the community.

To add it, from left panel select Create (plus sign) and than Import. A new page will appear where we need to provide an ID of dashboard which we’re gonna import. I want to import a very popular one - JVM (Micrometer) with an id = 4701.

After providing it in an input field and clicking Load button a summary page will be visible. We can decide what will be the name of imported dashboard, its location, identifier and the most important, don’t forget to select it, a data source from where data will be pulled for visualizations. In our case it should be available only one source - prometheus.

After clicking Import button a dashboard will be created.

The last thing that we need to do here is to export newly created dashboard to JSON file and place it to the empty dashboards folder, mentioned couple lines before. To do that click on the share icon (left, near the star icon) so the pop up window will appear. Next, go to Export tab and click Save to file so a file will be downloaded.

Next copy-paste it to the empty dashboards folder, stop containers (using docker compose down command) and restart them. After that a dashboard will be persisted.

That would be it! The dashboard is set, you can check it, see what indicators it’s presenting, like memory usage, CPU usage or others. Definitely there are here way less information than we could read from the JDK Mission Control, but they’re one of the most important one.

Conclusion #

With this article I’ve showed how to set up two popular monitoring tools that can be used in your production system. They can offer lots of in-depth information what’s happening with your application and JVM. It can be especially helpful when something mysterious is starting to happen. Information about memory allocation, Garbage collection, CPU usage etc. can than help you find a root cause. But how to make sense from all these metrics? This is a story for another article, which will be published soon.

Until then, see you soon!

Project preparation #

Setting up Java Flight Recorder with JDK Mission Control #

Setting up JVM dashboards in Grafana #

Conclusion #

References #