This is a question that I was asked during an interview: what does the concurrent mode failure means? It’s really a good question to check your understanding of Concurrent Mark Sweep (CMS) Collector.

This post starts with concurrent mode failure and explains the whole steps of one CMS cycle.

In a nutshell,

The message “concurrent mode failure” signifies that the concurrent collection of the Old Generation did not finish before the Old Generation became full.

By the word “concurrent” in Concurrent Mark Sweep, it means CMS does not explicitly stop the application threads to perform mark-sweep in the Old Generation. In real world situation, Minor Garbage Collections of the Young Generation can occur anytime during concurrent collecting the Old Generation. That’s the root cause of concurrent mode failure. If the Young Generation grows too fast and overflows to Old Generation, but the CMS could not clear out in the background, you will get a “Concurrent Mode Failure”.

Recall a cycle of CMS If you want to understand it thoroughly.

The following contents are copied from Concurrent Mark and Sweep.

Phase 1: Initial Mark. This is one of the two stop-the-world events during CMS. The goal of this phase is to mark all the objects in the Old Generation that are either direct GC roots or are referenced from some live object in the Young Generation. The latter is important since the Old Generation is collected separately.

Phase 2: Concurrent Mark. During this phase the Garbage Collector traverses the Old Generation and marks all live objects, starting from the roots found in the previous phase of “Initial Mark”. The “Concurrent Mark” phase, as its name suggests, runs concurrently with your application and does not stop the application threads. Note that not all the live objects in the Old Generation may be marked, since the application is mutating references during the marking.

In the illustration, a reference pointing away from the “Current object” was removed concurrently with the marking thread.

Phase 3: Concurrent Preclean. This is again a concurrent phase, running in parallel with the application threads, not stopping them. While the previous phase was running concurrently with the application, some references were changed. Whenever that happens, the JVM marks the area of the heap (called “Card”) that contains the mutated object as “dirty”(this is known as Card Marking).

In the pre-cleaning phase, these dirty objects are accounted for, and the objects reachable from them are also marked. The cards are cleaned when this is done.

Phase 4: Concurrent Abortable Preclean. Again, a concurrent phase that is not stopping the application’s threads. This one attempts to take as much work off the shoulders of the stop-the-world Final Remark as possible. The exact duration of this phase depends on a number of factors, since it iterates doing the same thing until one of the abortion conditions (such as the number of iterations, amount of useful work done, elapsed wall clock time, etc) is met.

Phase 5: Final Remark. This is the second and last stop-the-world phase during the event. The goal of this stop-the-world phase is to finalize marking all live objects in the Old Generation. Since the previous preclean phases were concurrent, they may have been unable to keep up with the application’s mutating speeds. A stop-the-world pause is required to finish the ordeal.

Usually CMS tries to run final remark phase when Young Generation is as empty as possible in order to eliminate the possibility of several stop-the-world phases happening back-to-back.

After the five marking phases, all live objects in the Old Generation are marked and now garbage collector is going to reclaim all unused objects by sweeping the Old Generation:

Phase 6: Concurrent Sweep. Performed concurrently with the application, without the need for the stop-the-world pauses. The purpose of the phase is to remove unused objects and to reclaim the space occupied by them for future use.

Phase 7: Concurrent Reset. Concurrently executed phase, resetting inner data structures of the CMS algorithm and preparing them for the next cycle.

Reference