PuppetMaster process memory usage constantly increases until kernel complains and starts killing off other processes
The last few days, I have been getting errors thrown at me by my monitoring software, in the early hours of the morning. It appears that my box running Puppet has run out of memory and the only way to solve it is the restart the Puppetmaster service.. When investigating further, I see that the memory consumption of the process has steadily increased throughout the day after about 12hrs it is at 70-80% memory usage held by that 1 process and I kill it in the morning.
The changes I have made around the time that this started happening are:
- Installing PuppetDB (postgresql backend) on the same node.
- Installing PuppetDBQuery module.
- Installed Puppetboard module
- Added about 20 PuppetDBQueries (via the PuppetDBQuery 'query_nodes' function) to the "default" node definition.
I have noticed that neither the PuppetDB, Puppetboard or postgresql processes increase in memory usage at all.
Unsure how to track the cause of this issue, any help would be greatly appreciated.
I have managed to subdue this problem a bit. It seems to be as I expected the massive number of queries I had set in the default node definition. After breaking them down and assigning only the necessary ones to each specific node the problem slowed down. I were originally performing ~20 PuppetDB queries on each node to obtain an array of node IP addresses from the DB. It seems these all took a lot of resources, and so with ~40 nodes, each doing and holding 20 queries the PuppetMaster process quickly ran out of memory on the machine. To try solve it, I split up the queries into the separate node definitions, so each node was only performing the 1-4 queries it needed to do and not all 20. As a result of this, the average memory usage of the Puppetmaster process increased at a much slower rate. It's taken 3 days to climb from 99MB (After restarting) to 1GB at the time of writing this. So the issue hasn't been fixed, but it has been slowed down. Starting to wonder if there is some GC that's not happening or something, as all of the nodes are set to check in every 30 minutes (default) so I would expect the memory to increase to this point and then stay stable.
Here is the additional info asked for just in case:
- Puppet Version: 3.6.2 (Agent and master)
- Ruby Version: 1.9.3
- Puppetdb Version: 3.0.1
- Postgresql Version: 9.1.13
- Postgresql module version: puppetlabs/postgresql - 3.4.0
- Just using the default installation of Open Source Puppet, believe it's using webbrick for puppetboard.