Tuning Java Garbage Collection – SysteMajik Consulting

I recently completed a garbage collection exercise on a variety of applications. In all, twenty WebLogic application clusters were tuned. A dozen of these are large busy application clusters. These provide a mix of Web Applications and Web Services.

Tuning garbage collection is a matter of trade-offs. Large heaps take longer to garbage collect. Small heaps need to be collected frequently using more CPU time.

The tuning priorities I used were:

Minimize pauses time and frequency of old generation garbage collections.
Minimize garbage collection time.
Avoid excessively large heap sizes.

JVM Parameters

These are the the JVM parameters I used. There are others, but these proved sufficient for my requirements.

Basic Settings

-XX:+UseConcMarkSweepGC: Select concurrent mark and sweep garbage (CMS) collector. This also selects the parallel new garbage collector.
-XX:+CMSScavengeBeforeRemark: The CMS collectory considers all objects in the new generation active. Scavenging before remark ensures only live objects are in the new generation at the start of remarking. This provides significant performance improvements if the new generation is large.
-XX:+CMSParallelRemarkEnabled: Allow remarking to be done in parallel. This requires hardware with sufficient threads.
-XX:+CMSClassUnloadingEnabled: Allow the CMS collector to unload unused classes.
-XX:+UseCMSInitiatingOccupancyOnly: This fixes the initiating occupancy to the specified value. Otherwise the configured value is auto tuned after each cycle.
-XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses: Specifies the collector for explicit GC requests, and requests that GC to unload unused classes.

GC Logging Arguments

These options are optional, but the contents of the logs can be useful in analyzing garbage collection behavior.

-Xss256k: Set the stack size. Default size varies. This needs to be large enough to provide memory for local variables of the currently running methods. The size will need to be larger if deep recursion is used.
-verbose:gc: Log garbage collection.
-XX:+PrintGCDetails: Print details of the garbage collection.
-XX:+PrintGCDateStamps: Print date stamp in the GC log. This is in addition to the server live time value.
-XX:+UseGCLogFileRotation: Rotate the garbage collection log files. The current file’s name will end with .current.
-XX:NumberOfGCLogFiles=4: Specify the number of garbage collection log files to use.
-XX:GCLogFileSize=2M: Size at which to rotate garbage collection log files. Rotations are several days apart on my system with this value. Tune to your requirements.
-Xloggc:/var/log/local/myServer-gc.log: Chose a name and location appropriate for your server.

Tunable values

Once you have set the above values are ready to tune the Heap sizing. These are the knobs you will be tuning. The following section will discuss my approach to tuning garbage collection.

-XX:CMSInitiatingOccupancyFraction=75: Sets the trigger point for CMS garbage collection. If this value is too high, then the heap may fragment and trigger a single-threaded full garbage collection. If it is too low, you will be wasting memory.
-XX:ParallelGCThreads=8: Sets the number of threads to use for parallel GC operations. Defaults to 25% of the available threads. In my tests, GC times didn’t improve with values over 8. Values up to 6 showed significant improvement as threads were added. These tests were done on a Sparc machine with 32 threads.
-Xmn432m: Sets both minimum and maximum new generation size to the same value.
-XX:SurvivorRatio=16: Sets the size ratio of the survivor spaces to Eden.
-Xms2560m -Xmx4096m: Sets the minimum (-Xms) and maximum (-Xmx) heap size. These are normally set to the same value. The minimum value should be set to the minimum required size. However, if there is some infrequently used code that requires large amounts of memory, it may be prudent to allow the heap to grow. During tuning allow the heap to grow should mitigate under-sizing the initial heap size.
-XX:PermSize=448m -XX:MaxPermSize=448m: Sets the minimum and maximum permanent generation sizes. These are normally set to the same size. The minimum value must be larger than the expected size. The maximum value may be larger.

Tuning Approach

The servers being tuned are monitored and sampled values are available for examination in a graphical interface. This allows examining data for a period of a week or more. At least a week between tuning cycles. Before tuning, the existing performance was examined and both the -Xms and -XX:MaxPermSize settings were increased to allow for growth if required. For most JVMs, this was not required as the excessive memory was allocated. Tuning was done in several passes:

Eden was tuned so that the young generation GCs occurred at least 15 seconds apart. Eden was shrunk if GCs occurred more than 30 seconds apart under peak loads. As most requests are completed in under a second and long-running requests fall in the under 10 second time period, most of Eden was expected to be freed each GC. Sizing was rounded to 2 or 3 times a power of 2 and measured in megabytes.
-XX:SurvivorRatio was adjusted so that maximum occupancy was between 80% and 100% at peak periods. The required -Xmn value was calculated and set. For most clusters, the resulting ratios were between 8 and 16.
-Xms was tuned so that 4 to 12 Full GCs occurred per day and at least an hour apart. In most cases, significant portions of the heap were freed.
The Permanent Generation was resized so that maximum occupancy was about 80%.

Data Gathering Methods

The stacks being tuned were being monitored with Introscope. A garbage collection console was created that displayed data on the available garbage collection metrics. This provided a quick method to monitor garbage collection. It also eased the comparison of tuned performance to untuned performance.

The GC log provides information on long term GC performance. Full GC details should be a small portion of the log, but all the necessary metrics are there. Young generation GCs should be relatively frequent.

Running “jstat -gc” for an hour or so under peak load provides useful details. Sample intervals of 10 to 60 seconds should provide a sufficient number of samples. The most significant values are GC counts, GC times, and survivor space occupancy. The generation sizes are also available. This data can be used to estimate reasonable sizes for the generations.

Known Issues

There are some known issues with this tuning approach.

Sudden increases in response times can flush objects to the old generation. This can cause frequent Full GCs and decrease performance. Possible resolutions:
- Timeout back-end services quickly.
- Set -Xmx to allow the heap to grow if needed.
- Size Eden so that the time between GCs is a small multiple of the timeout.
Infrequent requirements for large memory objects. This may be a result of having large or multiple active DOM objects, a large array, etc. Possible resolutions:
- Use SAX instead of DOM. This introduces limitations on how the data can be navigated.
- Replace potentially large arrays with a different data structure.
The fast growth of permanent generation with space recovered by Full GCs. This may be a result of the frequent creation of temporary class loaders for parsing objects using introspection.
- If possible, cache the results of the introspection.
Heap utilization after Full GCs continually increasing (Memory Leaks). Possible causes:
- Caching data for excessive periods of time.
- Failure to process and remove items from a list (queue, task list, others).
- Failure to cleanup objects when they are no longer useful.
- Failure to close resources after use.

Formulas

Eden
NewGeneration * SurvivorRatio / (2 + SurvivorRatio)
NewGeneration
Eden * ( 2 + SurvivorRatio ) / SurvivorRatio
Survivor Size
NewGeneration / (2 + SurvivorRatio )
Old Generation
Heap – NewGeneration