Best practice for JVM Tuning with CMS GC
The purpose of this article is to provide best practice advice on JVM tuning with CMS GC; including key symptoms, understanding heap space and GC logs. This best practice advice applies to AM, DS, IDM and IG running on Java® 8. JVM tuning is not an exact science and will vary across individual environments and applications. Consistent load and stress testing of your environment will be the only true way to gauge the effect of changes made.
6 readers recommend this article
JVM tuning considerations
Before you begin JVM tuning, you should consider the following points:
- Costs - depending on your environment, it may be more cost effective to add more hardware to improve performance rather than spending the time to tune.
- Desired outcome - tuning for stability is more important in the long run than tuning for performance, although there is an overlap.
- Ongoing problems - tuning may only delay or mask a problem temporarily if there is an underlying issue you need to resolve.
- Memory leaks - these will always cause Garbage Collection (GC) issues regardless of tuning.
The focus of this article is Java 8. In Java 8, the CMS is the default garbage collector and tends to give the best performance characteristics with the lowest garbage collection pause times. Consider using the G1 garbage collector only if CMS performance characteristics do not fit your deployment, and testing shows G1 performs better. In Java 11, the G1 garbage collector is the default (and the CMS garbage collector is deprecated). See Best practice for JVM Tuning with G1 GC for further information.
What is Java heap and how do I know it needs tuning?
The Java heap is the amount of memory allocated to applications running in the JVM. Objects in heap memory can be shared between threads. Garbage collection (GC) refers to the automatic process of managing the runtime memory. The JVM is subject to GC pauses that vary in frequency and duration; if these pauses become more frequent and/or last longer, you will see issues (such as application pauses) that indicate that you may need to tune your Java heap.
It is recommended that you set the minimum and maximum heap sizes to the same value for best performance. Otherwise, the JVM runs a full Garbage Collector (GC) cycle when increasing its heap, during which time it can pause ongoing operations for up to a few seconds. Generally, a smaller heap will increase the frequency at which the JVM GC executes but reduce the duration; similarly, a larger heap will reduce the frequency and increase the duration. When tuning the JVM heap, the goal is to strike a balance between frequency and duration so as to reduce the impact of the GC on the application.
The heap is initially created when the JVM starts and is split into different spaces or generations, the key ones being Young (Eden or Nursery) and Tenured (Old):
- The Young generation is used for new objects. A GC process (ParNew) automatically runs when it becomes full which removes unused objects and moves all objects that have lived long enough to the Tenured generation, thereby freeing up space in the Young generation for more new objects. Typically objects in the Young generation are short lived and temporary. The Young generation is small but active and GC happens frequently but with limited impact on performance.
- The Tenured generation is used for the longer lived objects. Another GC process (CMS) runs when it becomes full to remove any unused objects. The Tenured generation is larger and less active than the Young generation but GC tends to have more impact on performance.
Key symptoms that indicate you need to tune your Java heap are:
- High CPU usage. When GC threads consume a lot of CPU, you will see spikes in overall CPU usage.
- Application hangs. When your application appears to hang and stops responding, there are gaps in your log files or you experience general slowness when performing routine tasks, poorly tuned GC can be the cause.
The first thing to check if you experience these symptoms is your GC logs; these are easy to check and may help you identify your issue quickly.
There are various different GCs available; ForgeRock recommends the use of CMS (ConcurrentMMarkSweep), which is a throughput collector and tends to give the best performance as it runs concurrently with the application, meaning less pauses. You should specify this collector by setting the following JVM option:-XX:+UseConcMarkSweepGC
You can enable GC logging as described in the following articles:
- How do I enable Garbage Collector (GC) Logging for AM?
- How do I enable Garbage Collector (GC) Logging for DS?
- How do I enable Garbage Collector (GC) Logging for IDM?
- How do I enable Garbage Collector (GC) Logging for IG?
Typically you can leave GC logging on without impacting your system, however, you can just enable it when you are experiencing issues.
Understanding GC Logs
These example log files come from a CMS collector, where ParNew refers to the Young generation and CMS refers to the Tenured generation. These log file examples also show typical GC logs for a system where tuning is appropriate.
An example GC log output for the ParNew section looks like this:24.245: [GC24.245: [ParNew: 545344K->39439K(613440K), 0.0424360 secs] 594671K->88767K(4101632K), 0.0426850 secs] [Times: user=0.17 sys=0.00, real=0.04 secs] 29.747: [GC29.747: [ParNew: 584783K->27488K(613440K), 0.0889130 secs] 634111K->104723K(4101632K), 0.0892180 secs] [Times: user=0.24 sys=0.01, real=0.09 secs]
Each line in this log file represents one GC and shows the following types of information for ParNew:
- Timestamp information - if you did not set the PrintGCTimeStamps and PrintGCDateStamps options, this just shows as the number of seconds from JVM startup: 24.245: [GC24.245: 29.747: [GC29.747: This information is useful as it shows the frequency of GCs. You should be aiming for GCs to occur once every 1 to 5 seconds; if they are occurring more than once a second, you need to tune your JVM and it is likely that you will be seeing high CPU usage.
- Young generation information: [ParNew: 545344K->39439K(613440K), 0.0424360 secs] [ParNew: 584783K->27488K(613440K), 0.0889130 secs] This information shows the initial size of the Young space before doing the GC, the size after doing the GC, the total size available and the time it took to do the GC. If the time taken to do the GC is large, that is more than 0.1 seconds, then it is possible that your Young generation heap size is too big. The total size available will not grow if you set NewSize and MaxNewSize to the same value and this is what you want to see in your logs.
- Overall heap information: 594671K->88767K(4101632K), 0.0426850 secs] 634111K->104723K(4101632K), 0.0892180 secs] This information shows initial size of the overall heap before doing the GC, the size after doing the GC and the total size available. With the second lot of figures (size after doing the GC), you can see how this incrementally grows up to the point when the GC is done and then reduces back down to a baseline figure. If you have a memory leak, you will see the baseline figure growing incrementally as it becomes increasingly full even after doing GC.
- System information: [Times: user=0.17 sys=0.00, real=0.04 secs] [Times: user=0.24 sys=0.01, real=0.09 secs] This information shows how much time was spent in user space, how much time spent in kernel or system space and what was the real impact to the user. If the user time is high, it indicates there isn't enough CPU for the number of user threads. If the system time is high, it indicates your system is swapping memory to disk which means that there isn't enough physical memory for your heap size.
The CMS section of the log file is different to the ParNew section as it shows multiple lines for the same GC. An example GC log output for the CMS section looks like this:40.146: [GC [1 CMS-initial-mark: 26386K(786432K)] 26404K(1048384K), 0.0074495 secs] 40.154: [CMS-concurrent-mark-start] 40.683: [CMS-concurrent-mark: 0.521/0.529 secs] 40.683: [CMS-concurrent-preclean-start] 40.701: [CMS-concurrent-preclean: 0.017/0.018 secs] 40.704: [GC40.704: [Rescan (parallel) , 0.1790103 secs]40.883: [weak refs processing, 0.0100966 secs] [1 CMS-remark: 26386K(786432K)] 52644K(1048384K), 0.1897792 secs] 40.894: [CMS-concurrent-sweep-start] 41.020: [CMS-concurrent-sweep: 0.126/0.126 secs] 41.020: [CMS-concurrent-reset-start] 41.147: [CMS-concurrent-reset: 0.127/0.127 secs]
This log shows the different phases the GC goes through. All the ones marked concurrent are happening while your application is running and therefore have little impact. The ones that don't say concurrent are stopping your application from running temporarily and therefore can have an impact. The CMS-initial-mark and CMS-remark show heap sizing details.
Typically, you will see these sections intermingled, for example, some ParNews then a CMS, some more ParNews, another CMS and so on as the GC processes automatically occur to manage the heap.
Common issues shown in the GC logs and solutions
This section identifies common issues you might find in your GC logs and how you can resolve them. You can also use the Universal GC Log Analyzer to analyze your GC log. This is a third-party website that we suggest can be used for analysis but is not supported by ForgeRock.
Young generation - too small
The following log snippet shows an issue where the Young generation heap size is too small:1.813: [GC1.813: [ParNew: 1152K>128K(1152K), 0.0008100 secs] 16620K->15756K(26936K), 0.0008350 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 1.816: [GC1.816: [ParNew: 1152K>128K(1152K), 0.0006430 secs] 16780K->15913K(26936K), 0.0006640 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 1.819: [GC1.819: [ParNew: 1152K>128K(1152K), 0.0005370 secs] 16937K->16038K(26936K), 0.0005570 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
As you can see, there are multiple GCs occurring within a second. This means you need to increase the size of the Young generation (NewSize and MaxNewSize) and may also need to increase total heap size to compensate for this change (Xms and Xmx). Alternatively you could change the NewRatio option (this is set to 2 by default, which means the Tenured generation heap is twice as big as the Young generation heap or that the Young generation heap is a 1/3 of the size of the Tenured generation heap).
Young generation - too large / Tenured generation - too small
The following log snippet shows an issue where the Young generation heap size is too large and consequently the Tenured generation heap size is too small:276.716: [GC (CMS Initial Mark) [1 CMS-initial-mark: 104176K(135168K)] 2758985K(6741248K), 0.8762860 secs] [Times: user=0.88 sys=0.00, real=0.88 secs] 277.592: [CMS-concurrent-mark-start] 277.657: [CMS-concurrent-mark: 0.063/0.065 secs] [Times: user=0.12 sys=0.00, real=0.06 secs] 277.657: [CMS-concurrent-preclean-start] 277.658: [CMS-concurrent-preclean: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 277.658: [CMS-concurrent-abortable-preclean-start] 277.658: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 277.658: [GC (CMS Final Remark)[YG occupancy: 2657419 K (6606080 K)]277.658: [Rescan (parallel) , 0.9815460 secs]278.639: [weak refs processing, 0.0000320 secs]278.640: [scrub string table, 0.0011700 secs] [1 CMS-remark: 104176K(135168K)] 2761595K(6741248K), 0.9828250 secs] [Times: user=7.18 sys=0.09, real=0.99 secs] 278.641: [CMS-concurrent-sweep-start] 278.668: [CMS-concurrent-sweep: 0.026/0.027 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 278.668: [CMS-concurrent-reset-start] 278.668: [CMS-concurrent-reset: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 280.673: [GC (CMS Initial Mark) [1 CMS-initial-mark: 104079K(135168K)] 2774091K(6741248K), 0.9033730 secs] [Times: user=0.90 sys=0.00, real=0.90 secs] 281.577: [CMS-concurrent-mark-start] 281.640: [CMS-concurrent-mark: 0.063/0.063 secs] [Times: user=0.13 sys=0.00, real=0.07 secs] 281.640: [CMS-concurrent-preclean-start] 281.641: [CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 281.641: [CMS-concurrent-abortable-preclean-start] 281.641: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 281.641: [GC (CMS Final Remark)[YG occupancy: 2670011 K (6606080 K)]281.641: [Rescan (parallel) , 0.9914290 secs]282.633: [weak refs processing, 0.0000110 secs]282.633: [scrub string table, 0.0008100 secs] [1 CMS-remark: 104079K(135168K)] 2774091K(6741248K), 0.9923100 secs] [Times: user=7.14 sys=0.11, real=0.99 secs] 282.634: [CMS-concurrent-sweep-start] 282.659: [CMS-concurrent-sweep: 0.024/0.025 secs] [Times: user=0.06 sys=0.01, real=0.02 secs] 282.659: [CMS-concurrent-reset-start] 282.659: [CMS-concurrent-reset: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
As you can see, the CMS collections are occurring too frequently, without any ParNew collections in between. This is a result of the Young generation heap size being too large so that each time a ParNew collection happens, the Tenured heap size instantly becomes full and a CMS collection runs. This could happen if you increased your Young generation heap size without increasing your total heap size. This log snippet could also happen if you have memory leaks.
The following log snippet shows a CMS failure (concurrent mode failure):(concurrent mode failure): 1551948K->1548050K(1572864K), 15.1154540 secs] 1986510K->1548050K(2044736K), [CMS Perm : 78597K->78588K(524288K)], 15.4406700 secs] [Times: user=15.52 sys=0.00, real=15.44 secs] 703240.983: [GC [1 CMS-initial-mark: 1548050K(1572864K)] 1561545K(2044736K), 0.0059690 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
This error means that the CMS collector cannot keep up with the amount of garbage being generated (the Tenured generation heap space is about 90% full) and GCs are taking longer meaning your application stops for longer periods of time. This example shows the application stopping for over 15 seconds, which will have a significant impact on performance.
You can add the following GC options to try to avoid the CMS failure:
You should set -XX:CMSInitiatingOccupancyFraction to 70 or 80% (instead of default ~90%) which may enable the CMS collector to keep up with the amount of garbage being generated.
The following log snippet shows a Full GC:25.466: [Full GC )25.466: [CMS: 68756K>75361K(1987392K), 0.2548820 secs] 96571K->75361K(2064064K), [CMS Perm : 61069K>60722K(61376K)], 0.2551850 secs] [Times: user=0.25 sys=0.01, real=0.25 secs] 26.185: [Full GC (System.gc())26.185: [CMS: 75361K>73283K(1987392K), 0.2398450 secs] 76136K->73283K(2064064K), [CMS Perm : 60723K>60723K(101204K)], 0.2400120 secs] [Times: user=0.24 sys=0.00, real=0.24 secs] 27.293: [Full GC (System.gc())27.293: [CMS: 73283K>73291K(1987392K), 0.2111960 secs] 74096K>73291K(2064064K), [CMS Perm : 60723K>60723K(101396K)], 0.2112730 secs] [Times: user=0.21 sys=0.00, real=0.22 secs] 68.081: [Full GC (Heap Inspection Initiated GC)68.081: [CMS: 73291K->73187K(1987392K), 0.2172670 secs] 77634K>73187K(2064064K), [CMS Perm : 60742K>60742K(101396K)], 0.2174520 secs] [Times: user=0.22 sys=0.00, real=0.21 secs]
These log snippets refer to Full GCs occurring, meaning that the entire heap is being cleared (Young and Tenured); this often happens due to system processes such as RMI which performs a Full GC every hour on the hour. The PrintGCCause option is very useful to help you find out why the GC is happening (the last three lines in the log above show the additional information output by this option, making it clear which ones are system related). You can also set the DisableExplicitGC option to prevent system related Full GCs from happening.
Best practice advice on tuning
Initially, you should configure your JVM as per the recommended settings for your product (if available):
You can then tune your JVM as and when required.
You should always follow this basic process when tuning:
- Establish a baseline using a valid load testing process. This step is essential as it is the only way you will know if your tuning is having the desired effect.
- Change one thing at a time and document each change. This allows you to determine the impact of each change and revert if necessary.
Heap tuning guidelines
Although we cannot give recommended heap settings because they vary considerably between environments and applications, you should follow these guidelines when tuning:
- You should always use the latest supported software versions.
- You should use the CMS GC as recommended by ForgeRock.
- Your total heap size must not exceed the amount of physical RAM available.
- You must leave some physical RAM for other applications to run on your machine.
- You should set Xms and Xmx to the same value for best performance. These options determine your total heap size.
- You should set NewSize and MaxNewSize to the same value. These options determine how the heap is allocated between generations.
- You should set the DisableExplicitGC option to prevent system related full GCs from running.
- You must not add lots of options at once or add options you don't understand.
- You must not copy and paste options from other applications.
- You should use GC logs to help identify issues with your tuning.
- For AM, refer to FAQ: AM performance and tuning (Q. What is the recommended JVM heap size for AM?) for further advice.
- For DS, refer to FAQ: DS performance and tuning (Q. What is the recommended JVM heap size for DS?) for further advice.
Getting your total heap size and allocation between generations right is key; once you have these set correctly, you likely won't need to do any more tuning.
In summary, the most important GC knobs to get right are:
- The collector, which should be CMS.
- The total heap size, where Xmx = Xms.
- The split between generations, using NewSize or NewRatio, where NewSize = MaxNewSize.
Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide
- How do I change the JVM heap size for AM?
- FAQ: AM performance and tuning
- How do I collect JVM data for troubleshooting AM?
- How do I diagnose a hung AM (All versions) server?
- How do I collect data for troubleshooting high CPU utilization on AM (All versions) servers?
- How do I find which thread is consuming CPU in a Java process in AM?
- How do I tune DS process sizes: JVM heap and database cache?
- FAQ: DS performance and tuning
- How do I collect JVM data for troubleshooting DS?
- How do I collect data for troubleshooting high CPU utilization on DS (All versions) servers?
- How do I find which thread is consuming CPU in a Java process in DS?
- How do I change the JVM heap size for IDM?
- FAQ: IDM performance and tuning
- How do I collect JVM data for troubleshooting IDM?
- How do I collect data for troubleshooting high CPU utilization on IDM (All versions) servers?
- How do I find which thread is consuming CPU in a Java process in IDM?
- How do I change the JVM heap size for IG?
- How do I collect JVM data for troubleshooting IG 6.x and 7.x?
- How do I collect data for troubleshooting high CPU utilization or Out of Memory errors on IG (All versions) servers?
- How do I find which thread is consuming CPU in a Java process in IG 6.x and 7.x?