Ask Your Question
0

"comparison of String with :undef failed" - puppetmaster failing between "submit facts" and "submit catalog"

asked 2015-01-07 16:19:42 -0500

burrito gravatar image

updated 2015-01-13 15:21:58 -0500

We're managing ~150 AWS instances in Puppet, and recently about four of those have started exhibiting some strange behavior that prevents runs. Here's the end of output from a 'puppet agent -t' on one affected server:

Info: Loading facts in /var/lib/puppet/lib/facter/staging_windir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/mlmongodb.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: comparison of String with :undef failed
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

And from the puppetmaster:

Info: 'replace facts' command for glusterclient0004.mc2.mcloud.local submitted to PuppetDB with UUID 0b4df162-b53a-433b-a11545afb8a7da85
Debug: Using cached facts for glusterclient0004.mc2.mcloud.local
Info: Caching node for glusterclient0004.mc2.mcloud.local
[...]
Notice: Compiled catalog for glusterclient0004.mc2.mcloud.local in environment production in 0.50 seconds
Info: Caching catalog for glusterclient0004.mc2.mcloud.local
Error: comparison of String with :undef failed

The puppetmaster mirrors the "replace facts" and the absence of a "replace catalog" to PuppetDB:

puppetdb.log:2015-01-07 15:20:00,090 INFO  [c.p.p.command] [2bd42b63-6320-4059-b0ff-2aff148cf901] [replace facts] glusterclient0004.mc2.mcloud.local
puppetdb.log:2015-01-07 15:50:01,060 INFO  [c.p.p.command] [7b213e19-3c9c-4903-b8c6-da1b2b6c926c] [replace facts] glusterclient0004.mc2.mcloud.local
puppetdb.log:2015-01-07 16:20:10,856 INFO  [c.p.p.command] [b430280c-6ab2-4f57-9646-f231097a0309] [replace facts] glusterclient0004.mc2.mcloud.local

(Our runs are every 30 minutes.)

The appearance of this behavior is extremely strange; it appears on an affected node suddenly, and it seems to persist forever once it's appeared. I've been totally unable to find a workaround. It's only shown up on a few node types (so far...), but it doesn't consistently appear on nodes of that type. For example, we also have a glusterclient0007, created identically to glusterclient0004 and with identical facts (except for IP, mac address, those sorts of things); it's not affected by this issue.

Puppet 3.6.2 (agent and master both), PuppetDB 2.2.0, Facter 2.3.0, Amazon Linux (i.e. Red Hat).

EDIT Jan 9th:

Okay, here's one more interesting bit of information. We have a custom fact definition that uses ec2-describe-tags and parses the output to create arbitrary facts based on the AWS tags attached to the instance; their name is "awstag[tagname]" and their value is the value of the tag. (We use it as a sort of ENC.) If a particular, arbitrary tag - in this case, 'monitoring:nagios_disabled' - is removed, the error (and the fact) disappear. However, when that tag IS applied, facter has no problem parsing it, even though runs fail:

[root@rabbitmq0003 ~]# facter -p aws_tag_monitoring
nagios_disabled

This tag is being used, with the same fact definition, on a dozen or so other servers, where it's working perfectly normally. And I can't prove it immediately, but I think we've had this problem appear on servers where ... (more)

edit retag flag offensive close merge delete

Comments

Can you run "puppet agent -t --debug --trace" on an affected node and post a link to a pastebin of the output?

GregLarkin gravatar imageGregLarkin ( 2015-01-08 22:22:40 -0500 )edit

Here's a debug trace, which for some reason does not get turned into a link that works: http://pastebin.com/j9Tjb0CS

burrito gravatar imageburrito ( 2015-01-09 17:09:52 -0500 )edit

Hmm, I guess that doesn't help too much. Do you have a sense of when this problem started showing up and what has changed in your codebase since then? Do any of the facts used by your code change value? Has Hiera data (if used) changed?

GregLarkin gravatar imageGregLarkin ( 2015-01-09 17:37:32 -0500 )edit

Actually, yes, but let me just edit it in...

burrito gravatar imageburrito ( 2015-01-09 17:51:54 -0500 )edit

Can you post the code to your custom fact? That's probably a good place to start looking for the cause of the message.

GregLarkin gravatar imageGregLarkin ( 2015-01-12 23:27:50 -0500 )edit

2 Answers

Sort by ยป oldest newest most voted
0

answered 2015-10-16 20:56:13 -0500

danieldreier gravatar image

I believe that https://tickets.puppetlabs.com/browse... will resolve this problem.

edit flag offensive delete link more
0

answered 2015-01-16 19:04:15 -0500

burrito gravatar image

This was miserable to figure out, and I never would have gotten it without some shotgun debugging. Here's what I've figured out.

Every node was exporting a nagios_host definition, with an array of tags that looked like this:

$host_tags  = ['nagios_enabled', $::aws_tag_cluster, $::aws_availability_zone]

The problem was that, although every node was guaranteed to have ::awsavailabilityzone, not every node had a fact named ::awstagcluster. So we would have been getting arrays like this:

$host_tags  = ['nagios_enabled', undef, 'us-east-1a']

As soon as we removed the not-guaranteed facts (which weren't being actively used) from that array, runs behaved normally. Here's my theory about why.

The behavior makes the most sense on our Nagios server, which was only collecting hosts that didn't have the tag "nagios_disabled". (This was our workaround for the issue of not being able to programmatically wipe exported resources as part of a run.)

So, during a Puppet run, the Puppetmaster would compile a catalog (no problem), resolve facts (no problem), and then - not apparent in the logs - start sorting through the exported nagioshost definitions to find which ones needed to be shipped off to the node. It would pull up an affected node, check each string in the array of resource tags to see if it was 'nagiosdisabled', and bomb out when it got to an undef value. But this isn't a perfect theory.

A few things this DOESN'T explain:

  • Why did this also happen on nodes that collect no resources? (Was it bombing out while comparing to see if the exported resource had changed?)
  • Why did it sometimes behave normally, but have 'sticky' behavior against nodes as soon as it happened once?

I suspect it's some sort of bug in PuppetDB, and (time permitting) I'll work towards submitting it properly.

edit flag offensive delete link more

Comments

Sorry for the late follow-up on this, and glad you were able to get somewhere with it. Sounds pretty tricky! Do you think any of the behavior would change if you inserted an empty string into your array instead of allowing the undef value or removing the value altogether?

GregLarkin gravatar imageGregLarkin ( 2015-01-20 16:25:13 -0500 )edit

Regarding your final questions, I do wonder if the sticky behavior was caused when the undef value was stored in PuppetDB and then used in a comparison for all time after that. Not sure about the 1st question you asked, though.

GregLarkin gravatar imageGregLarkin ( 2015-01-20 16:26:33 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2015-01-07 16:19:42 -0500

Seen: 458 times

Last updated: Jan 16 '15