Ask Your Question
1

Puppet agent lock file exist - skipping (/var/lib/puppet/state/agent_catalog_run.lock exists

asked 2015-05-11 06:02:47 -0600

shirish shukla gravatar image

updated 2015-05-11 12:11:55 -0600

JohnsonEarls gravatar image

Hello Friends,

We have found agent_catalog_run.lock file is more than 24 hour old and not being deleted automatically and still it's pid exist . Puppet should heve advance enough to track such things and auto del this file if older than certain age OR should have some parameters in puppet.conf to control such situation .

# date
Mon May 11 16:24:45 IST 2015

# cat /var/lib/puppet/state/agent_catalog_run.lock
24145

# ll /var/lib/puppet/state/agent_catalog_run.lock
-rw-r--r-- 1 root root 5 May 10 01:43 /var/lib/puppet/state/agent_catalog_run.lock


# puppet agent -t --no-daemonize --onetime
Notice: Run of Puppet configuration client already in progress; skipping  (/var/lib/puppet/state/agent_catalog_run.lock exists)

# stat /var/lib/puppet/state/agent_catalog_run.lock
  File: `/var/lib/puppet/state/agent_catalog_run.lock'
  Size: 5               Blocks: 8          IO Block: 4096   regular file
Device: fd06h/64774d    Inode: 131155      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-05-11 10:51:43.761514774 +0530
Modify: 2015-05-10 01:43:52.205471543 +0530
Change: 2015-05-10 01:43:52.205471543 +0530


# ll /proc/24145/fd/
total 0
lr-x------ 1 root root 64 May 11 15:36 0 -> /dev/null
l-wx------ 1 root root 64 May 11 15:36 1 -> /dev/null
l-wx------ 1 root root 64 May 11 15:36 2 -> /dev/null
lrwx------ 1 root root 64 May 11 15:36 3 -> socket:[6398356]
lr-x------ 1 root root 64 May 11 15:36 4 -> /etc/group
lrwx------ 1 root root 64 May 11 15:36 5 -> socket:[7500737]
[root@qpass-prod-dbmgmt-101 ~]#

We are running with:

  • Master:

    • puppet-server-3.7.5-1.el6.noarch
    • puppet-3.7.5-1.el6.noarch
    • facter-2.4.3-1.el6.x86_64
    • ruby-1.8.7.374-3.el66.x8664
    • mcollective-2.8.0-1.el6.noarch
  • Agent:

    • mcollective-2.8.0-1.el6.noarch
    • puppet-3.7.5-1.el6.noarch
    • facter-2.4.3-1.el6.x86_64
    • ruby-1.8.7.374-3.el66.x8664

Any one faced such issue, please guide how to overcome this issue . I tried below:

  1. restart puppet - NO luck
  2. manually delete the /var/lib/puppet/state/agent_catalog_run.lock - this work but this can't be permanent solution .

Please help!!

edit retag flag offensive close merge delete

5 Answers

Sort by » oldest newest most voted
1

answered 2015-05-11 12:14:39 -0600

JohnsonEarls gravatar image

updated 2015-05-11 12:16:02 -0600

The fact that /proc/24145 exists means the old puppet process is still running. Deleting the lock file just means now you'll have multiple puppet agents running.

Try running the puppet agent by hand (puppet agent --test --debug) and see if it completes. If it doesn't, that should tell you what module and what resource is causing a problem. If it does complete, then check your puppet agent log on the client and the puppet server log on the server, and see if you can figure out from those what's causing it to hang.

edit flag offensive delete link more
0

answered 2015-05-12 06:56:49 -0600

shirish shukla gravatar image

updated 2015-05-12 06:59:37 -0600

puppet agent --test --debug getting failed saying lock file exist And after deleting lock file it's successfully without any error/issue .

After analysis have found, Generally this issue comes when system interrupt during puppet catalog apply at agent end . In my case likewise service received segmentation fault and all communication broken .

But my question is why puppet agent is not smart enough to delete the lock file if it's older than certain value (can have such option in puppet.conf)

Also, if have any temporary solution would be good as this is happening in my all nodes occasionally .

I found there are some condition defined in below file to handle such situation but unable to trick it . Any help are appreciated .

/usr/lib/ruby/site_ruby/1.8/puppet/util/pidlock.rb

Thanks !

edit flag offensive delete link more

Comments

Like I mentioned above, if /proc/<pid> exists that means the old agent process is still running. This is why `pidlock.rb` failed to remove the lock. Instead of removing the lock file, kill the old agent process. To fix the underlying issue, find out why the agent is hanging and fix that.

JohnsonEarls gravatar imageJohnsonEarls ( 2015-05-12 07:25:06 -0600 )edit
0

answered 2015-11-23 11:01:42 -0600

When facing the same issue, I added the following line to a startup script:

find /var/lib/puppet/state/ -name agentcatalogrun.lock -type f -mmin +60 -delete

In my case I added this line to /etc/gdm/Init/Default, as it is on workstations. In case of a server, I would suggest running the command via cron.

edit flag offensive delete link more
0

answered 2017-10-31 10:43:38 -0600

7yl4r gravatar image

updated 2017-10-31 11:03:11 -0600

As others have pointed out, removing the lockfile may not solve this cleanly. Your problem may be that the puppet process has gotten stuck, and you need to kill the hung process. Here are the steps for puppet 4.10.0 (file locations different from original question).

  1. verify lockfile is old (mine is 11 days old):

    [root@userproc ~]# ls -lh /opt/puppetlabs/puppet/cache/state/agent_catalog_run.lock 
    -rw-r--r--. 1 root root 4 Oct 20 16:32 /opt/puppetlabs/puppet/cache/state/agent_catalog_run.lock
    
  2. verify that the puppet process is still running (mine has been running 11 days):

    cat /opt/puppetlabs/puppet/cache/state/agent_catalog_run.lock 
    6046
    
    # let's inspect the process from the lockfile to make sure it looks like we expect it to:
    [root@userproc ~]# ps -aux | grep 6046
    root      2087  0.0  0.0 112648   960 pts/0    R+   15:37   0:00 grep --color=auto 6046
    root      6046  0.0  0.4 402460 67500 ?        Sl   Oct20   0:03 puppet agent: applying configuration
    

3a. Yep, puppet is hung. Kill the process:

    [root@userproc ~]# kill 6046
    # check again w/ `ps`, if puppet isn't dying gracefully use `kill -9 6046`

Puppet will recover once it sees that the pid in the lockfile is not a valid puppet process, so you don't need to remove the lockfile.

3b. if your puppet process is not stuck, then removing the lockfile would be the proper course of action. However, I believe puppet 4 can recover from a stale lockfile.

edit flag offensive delete link more

Comments

This comment was from 2015, but either way, we had a similar problem where our puppet runs didn’t have the correct proxy configuration so a manual run would ‘work’, due to the users env, but a daemon run would hang, as every call needing the proxy (we had a lot) would have a several min timeout.

DarylW gravatar imageDarylW ( 2017-10-31 23:26:04 -0600 )edit
0

answered 2016-08-12 09:36:43 -0600

jaco gravatar image

I had same issue and deleting lock file also resolved issue for me.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

2 followers

Stats

Asked: 2015-05-11 06:01:18 -0600

Seen: 9,659 times

Last updated: Oct 31