"comparison of String with :undef failed" - puppetmaster failing between "submit facts" and "submit catalog"
We're managing ~150 AWS instances in Puppet, and recently about four of those have started exhibiting some strange behavior that prevents runs. Here's the end of output from a 'puppet agent -t' on one affected server:
Info: Loading facts in /var/lib/puppet/lib/facter/staging_windir.rb Info: Loading facts in /var/lib/puppet/lib/facter/mlmongodb.rb Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb Error: Could not retrieve catalog from remote server: Error 400 on SERVER: comparison of String with :undef failed Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run
And from the puppetmaster:
Info: 'replace facts' command for glusterclient0004.mc2.mcloud.local submitted to PuppetDB with UUID 0b4df162-b53a-433b-a11545afb8a7da85 Debug: Using cached facts for glusterclient0004.mc2.mcloud.local Info: Caching node for glusterclient0004.mc2.mcloud.local [...] Notice: Compiled catalog for glusterclient0004.mc2.mcloud.local in environment production in 0.50 seconds Info: Caching catalog for glusterclient0004.mc2.mcloud.local Error: comparison of String with :undef failed
The puppetmaster mirrors the "replace facts" and the absence of a "replace catalog" to PuppetDB:
puppetdb.log:2015-01-07 15:20:00,090 INFO [c.p.p.command] [2bd42b63-6320-4059-b0ff-2aff148cf901] [replace facts] glusterclient0004.mc2.mcloud.local puppetdb.log:2015-01-07 15:50:01,060 INFO [c.p.p.command] [7b213e19-3c9c-4903-b8c6-da1b2b6c926c] [replace facts] glusterclient0004.mc2.mcloud.local puppetdb.log:2015-01-07 16:20:10,856 INFO [c.p.p.command] [b430280c-6ab2-4f57-9646-f231097a0309] [replace facts] glusterclient0004.mc2.mcloud.local
(Our runs are every 30 minutes.)
The appearance of this behavior is extremely strange; it appears on an affected node suddenly, and it seems to persist forever once it's appeared. I've been totally unable to find a workaround. It's only shown up on a few node types (so far...), but it doesn't consistently appear on nodes of that type. For example, we also have a glusterclient0007, created identically to glusterclient0004 and with identical facts (except for IP, mac address, those sorts of things); it's not affected by this issue.
Puppet 3.6.2 (agent and master both), PuppetDB 2.2.0, Facter 2.3.0, Amazon Linux (i.e. Red Hat).
EDIT Jan 9th:
Okay, here's one more interesting bit of information. We have a custom fact definition that uses
ec2-describe-tags and parses the output to create arbitrary facts based on the AWS tags attached to the instance; their name is "awstag[tagname]" and their value is the value of the tag. (We use it as a sort of ENC.) If a particular, arbitrary tag - in this case, 'monitoring:nagios_disabled' - is removed, the error (and the fact) disappear. However, when that tag IS applied, facter has no problem parsing it, even though runs fail:
[root@rabbitmq0003 ~]# facter -p aws_tag_monitoring nagios_disabled
This tag is being used, with the same fact definition, on a dozen or so other servers, where it's working perfectly normally. And I can't prove it immediately, but I think we've had this problem appear on servers where ... (more)