Garbage collector issue running larger simulations via headless on Linux server

JFeurer · December 22, 2024, 3:18pm

Hello everyone,

I’ve written a bark beetle model in NetLogo and am currently running simulations on a Linux server using BehaviorSpace in headless NetLogo with 128GB of RAM and 24 cores. A single simulation run can consume up to 15-20GB of RAM, as in extreme cases, the model creates up to 20 million individual beetles.

For about a week now, I’ve been trying to find solutions for the following issue, and I’m hoping someone here might be able to help me out. It took me quite a while to figure out that the garbage collector (GC) either wasn’t functioning at all or was only working to a very limited extent without specific edits to the headless configuration. I’ve since added the following lines to the JVM options, which has brought some improvement:
-XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:InitiatingHeapOccupancyPercent=40
I’ve also analyzed the GC logs, and it seems to be working.

However, the problem persists that RAM usage continues to build up across multiple simulations. Only restarting the JVM (or stopping and restarting the experiment execution) clears the cache. Without doing this, I can end up with something like 120GB of used RAM, even though only the last of, say, five parallel simulations is still running. At this point, the RAM usage should really be down to around 15-20GB.

Has anyone encountered this issue? Do you have any ideas for a fix? My current workaround is to split my experiment into lots of smaller experiments and run them via a shell script so that NetLogo restarts and clears the RAM between experiments. But this is only a partial solution, as I’d like to use the server’s rental time as efficiently as possible.

Any tips or ideas would be greatly appreciated!
Best regards, Joe

Edit: is it somehow possible to run any form of GC in the model code to force JVM to run the GC?

SteveRailsback · December 22, 2024, 6:17pm

Hi- I do big experiments using regular BehaviorSpace (not headless). According to my desktop memory monitor, NetLogo seems to re-use RAM as it starts new model runs but does not free it until you close NetLogo. For example, if you have 100 model runs and 10 processors, memory use might quickly climb up to 50 GB and stay there until all 100 runs are done. So it looks like garbage collection is working (memory use doesn’t keep climbing) but, as you noticed, the memory is not freed up when it’s down to the last model run. That doesn’t seem to be a real problem unless you want to start up another big job as the first is finishing.

I’m pretty sure the NetLogo developers are aware of this.