Ask Your Question
0

Puppet agent lock file exist - skipping (/var/lib/puppet/state/agent_catalog_run.lock exists

asked 2015-05-11 06:02:47 -0500

shirish shukla gravatar image

updated 2015-05-11 12:11:55 -0500

JohnsonEarls gravatar image

Hello Friends,

We have found agent_catalog_run.lock file is more than 24 hour old and not being deleted automatically and still it's pid exist . Puppet should heve advance enough to track such things and auto del this file if older than certain age OR should have some parameters in puppet.conf to control such situation .

# date
Mon May 11 16:24:45 IST 2015

# cat /var/lib/puppet/state/agent_catalog_run.lock
24145

# ll /var/lib/puppet/state/agent_catalog_run.lock
-rw-r--r-- 1 root root 5 May 10 01:43 /var/lib/puppet/state/agent_catalog_run.lock


# puppet agent -t --no-daemonize --onetime
Notice: Run of Puppet configuration client already in progress; skipping  (/var/lib/puppet/state/agent_catalog_run.lock exists)

# stat /var/lib/puppet/state/agent_catalog_run.lock
  File: `/var/lib/puppet/state/agent_catalog_run.lock'
  Size: 5               Blocks: 8          IO Block: 4096   regular file
Device: fd06h/64774d    Inode: 131155      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-05-11 10:51:43.761514774 +0530
Modify: 2015-05-10 01:43:52.205471543 +0530
Change: 2015-05-10 01:43:52.205471543 +0530


# ll /proc/24145/fd/
total 0
lr-x------ 1 root root 64 May 11 15:36 0 -> /dev/null
l-wx------ 1 root root 64 May 11 15:36 1 -> /dev/null
l-wx------ 1 root root 64 May 11 15:36 2 -> /dev/null
lrwx------ 1 root root 64 May 11 15:36 3 -> socket:[6398356]
lr-x------ 1 root root 64 May 11 15:36 4 -> /etc/group
lrwx------ 1 root root 64 May 11 15:36 5 -> socket:[7500737]
[root@qpass-prod-dbmgmt-101 ~]#

We are running with:

  • Master:

    • puppet-server-3.7.5-1.el6.noarch
    • puppet-3.7.5-1.el6.noarch
    • facter-2.4.3-1.el6.x86_64
    • ruby-1.8.7.374-3.el66.x8664
    • mcollective-2.8.0-1.el6.noarch
  • Agent:

    • mcollective-2.8.0-1.el6.noarch
    • puppet-3.7.5-1.el6.noarch
    • facter-2.4.3-1.el6.x86_64
    • ruby-1.8.7.374-3.el66.x8664

Any one faced such issue, please guide how to overcome this issue . I tried below:

  1. restart puppet - NO luck
  2. manually delete the /var/lib/puppet/state/agent_catalog_run.lock - this work but this can't be permanent solution .

Please help!!

edit retag flag offensive close merge delete

4 Answers

Sort by ยป oldest newest most voted
1

answered 2015-05-11 12:14:39 -0500

JohnsonEarls gravatar image

updated 2015-05-11 12:16:02 -0500

The fact that /proc/24145 exists means the old puppet process is still running. Deleting the lock file just means now you'll have multiple puppet agents running.

Try running the puppet agent by hand (puppet agent --test --debug) and see if it completes. If it doesn't, that should tell you what module and what resource is causing a problem. If it does complete, then check your puppet agent log on the client and the puppet server log on the server, and see if you can figure out from those what's causing it to hang.

edit flag offensive delete link more
0

answered 2015-05-12 06:56:49 -0500

shirish shukla gravatar image

updated 2015-05-12 06:59:37 -0500

puppet agent --test --debug getting failed saying lock file exist And after deleting lock file it's successfully without any error/issue .

After analysis have found, Generally this issue comes when system interrupt during puppet catalog apply at agent end . In my case likewise service received segmentation fault and all communication broken .

But my question is why puppet agent is not smart enough to delete the lock file if it's older than certain value (can have such option in puppet.conf)

Also, if have any temporary solution would be good as this is happening in my all nodes occasionally .

I found there are some condition defined in below file to handle such situation but unable to trick it . Any help are appreciated .

/usr/lib/ruby/site_ruby/1.8/puppet/util/pidlock.rb

Thanks !

edit flag offensive delete link more

Comments

Like I mentioned above, if /proc/<pid> exists that means the old agent process is still running. This is why `pidlock.rb` failed to remove the lock. Instead of removing the lock file, kill the old agent process. To fix the underlying issue, find out why the agent is hanging and fix that.

JohnsonEarls gravatar imageJohnsonEarls ( 2015-05-12 07:25:06 -0500 )edit
0

answered 2015-11-23 11:01:42 -0500

When facing the same issue, I added the following line to a startup script:

find /var/lib/puppet/state/ -name agentcatalogrun.lock -type f -mmin +60 -delete

In my case I added this line to /etc/gdm/Init/Default, as it is on workstations. In case of a server, I would suggest running the command via cron.

edit flag offensive delete link more
0

answered 2016-08-12 09:36:43 -0500

jaco gravatar image

I had same issue and deleting lock file also resolved issue for me.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

2 followers

Stats

Asked: 2015-05-11 06:01:18 -0500

Seen: 7,435 times

Last updated: May 12 '15