It was perhaps too deterministic. What's not mentioned in the blog is that after running for long enough, the cluster would line up it's GCs, and each node would do the 2 minute GC at exactly the same time causing bigger spikes as the entire cluster would degrade. I'm guessing all it takes is a few day night cycles combined with a spike in traffic to make all the nodes reset their forced GC timers to the same time.