Green Tea: Brewing Faster Garbage Collection in Go

10 Minby Muhammad Fahid Sarker
Gogarbage collectionGreen Teamemory managementGC performanceGo runtime

Introduction

Garbage collection (GC) is like hiring a cleaning crew for your memory: it keeps things tidy so you don’t have to free every byte yourself. But that convenience can come with surprise performance hiccups—think of your CPU usage skyrocketing every few hours while your production service gasps for breath.

Java developers have told war stories of heap dumps and endless JVM flags. In C or C++, you dodge these nightmares by managing memory manually—but you trade one set of problems for another. Go promised a better balance: safe, automatic GC with reasonable performance. But as core counts climb and RAM latency lags behind, even Go’s GC started feeling the pinch.

Enter Green Tea, an experimental, memory-aware collector that brews up better locality and fewer cache misses. Let’s pour ourselves a cup and see what’s inside.


Why Classic Mark-Scan Hits a Wall

Go’s original GC uses a classic mark-scan:

  1. Walk the entire object graph pointer by pointer.
  2. Mark reachable objects.
  3. Sweep the rest.

Each pointer chase can jump anywhere in the heap—like your cleaning crew teleporting randomly from room to room. On modern multi-core, NUMA hardware, this wreaks havoc:

  • Poor spatial locality: caches don’t get reused.
  • Poor temporal locality: data you’ll need soon isn’t in L1/L2.
  • Memory bandwidth throttles as everyone fights over DRAM.

In Go’s internal profiling:

  • ~85% of GC time is in the scan loop.
  • 35%+ of CPU cycles just stall, waiting on memory.

Ouch.


Green Tea: Span-Based Scanning

The idea behind Green Tea is surprisingly straightforward:

  1. Partition memory into spans—8 KB aligned blocks holding fixed-size small objects.
  2. When you see a pointer to an object, mark the entire span for later scanning.
  3. Scan one span at a time, processing every object inside.

Why it helps:

  • You walk a contiguous 8 KB block—much friendlier to caches.
  • Chances are high that multiple live objects sit in the same span.
  • You reduce pointer-chasing randomness and spread memory reads over sequential addresses.
go
// Pseudo-workflow inside the Green Tea mark queue for span := range markQueue { for each obj in span { if !obj.marked { obj.marked = true for field in obj.fields { if isPointer(field) { markQueue.enqueue(spanOf(field)) } } } } }

Tasting the Results

Benchmarks on high-core machines show:

  • 35% drop in total GC overhead
  • 50% fewer CPU cache misses
  • Lower tail latency spikes
  • Better overall throughput

It’s like switching from instant coffee to a carefully brewed green tea—lighter, cleaner, less jittery.


Caveats & Trade-Offs

Green Tea isn’t a magic bullet:

  • Focuses on small objects only (large objects still use classic GC).
  • Workloads with poor locality or rapidly mutating, low-fanout trees can see regressions—sometimes because your app, not the GC, becomes the bottleneck!

Keep an eye on your heap profile and access patterns before pouring a second cup.


Conclusion

Garbage collection may never be perfectly predictable, but smarter algorithms like Green Tea show we can keep pushing performance forward. If you’re building high-throughput Go services, this is one GC experiment you’ll want to watch.

Have questions or want more Go deep dives? Drop a comment, like, subscribe—and until next time, happy coding (and happy sipping)! 🍵