asked 2015-10-21

Below is my configuration:

  • mCollective client 2.5.2, erlang-R16B03-0.2, rabbitmq server 3.3.4 installed on RHEL 6.4

  • mCollective server 2.5.2 (with stomp rubygem 1.3.2 and ruby 1.8.7) installed on RHEL 5.10 (on multiple hosts which are in a different data center than the host running MC client)

mco ping shows all mCollective servers in the results for a while. But If I run any agents (plugins) across the fleet, some hosts stop responding to the mco ping any longer. I see the message below in mCollective server.log only on the hosts which are missing. Other hosts dont have this message.

  W, [2015-10-15T20:56:39.968701 #29827]  WARN -- : agents.rb:136:in `dispatch' Timeout while       handling message for <agent name>

I tried increasing the timeout values in the below files and restarted mCollective, but it didnt help.

  • mcollective/agent/discovery.rb

  • mcollective/discovery/mc.ddl

Also added the plugin.discovery.timeout parameter to server.cfg and client.cfg, but to no avail.

Can anyone please suggest a solution? Thank you

answered 2015-10-23

Thanks for responding. No, increasing timeout is not helping.

Also, different kinds of messages are being displayed in rabbit@<rabbitmq server="">.log on the broker. Two of them below:


closing STOMP connection <0.19888.229> (<mcollective server="" host:port=""> -> <rabbitmq server="" host:port="">): {badmatch,<<10,45,45,45,32,10,58,114,101,113,117,101,115,116,105,100,58,32, 102,100,51,100,54,97,102,53,56,48,51,54,53,48,98,49,98,98,98,101, 102,49,57,98,101,102,49,53,99,101,53,56,10,58,115,101,110,100,101, 114,105,100,58,32,110,108,116,98,105,114,49,50,55,49,55,46,110, 108,100,99,49,46,111,114,97,99,108,101,99,108,111,117,100,46,99, 111,109,10,58,98,111,100,121,58,32,124,10,32,32,116,106,78,111,54, 68,78,77,103,57,73,53,97,78,89,108,115,55,69,105,97,65,61,61,10, 10,58,115,115,108,107,101,121,58,32,124,10,32,32,98,117,108,48,80, 110,65,108,110,57,106,103,102,109,80,68,109,49,51,107,105,103,43, 109,121,114,48,114,55,68,106,98,66,56,102,68,80,106,86,88,51,65, 102,68,54,119,43,97,120,100,55,106,85,52,88,107,70,119,78,82,10, 32,32,111,112,104,78,43,75,56,106,87,76,65,57,83,101,49,48,100, 100,49,47,122,78,118,65,102,102,66,116,103,47,104,86,49,69,83,43, 83,57,79,104,68,67,116,110,56,106,120,73,119,87,97,111,107,66,75, 49,50,119,100,90,10,32,32,113,109,106,67,105,54,105,50,81,85,76, 111,122,99,104,107,56,118,89,75,110,82,86,103,70,73,83,71,108,115, 117,50,82,65,75,115,52,72,107,76,56,116,77,54,70,78,48,83,80,120, 115,61,10,10,58,109,115,103,116,105,109,101,58,32,49,52,52,51,51, 50,48,52,48,48,10,58,115,101,110,100,101,114,97,103,101,110,116, 58,32,100,105,115,99,111,118,101,114,121,10,0>>}


closing STOMP connection <0.16013.281> (<mcollective server="" host:port=""> -> <rabbitmq server="" host:port="">): {badmatch,<<"SEND\nexpiration:70000\ncontent-type:text/plain; charset=UTF-8\ndestination:/reply-queue/amq.gen-oFBxYzc2v-3-T3NuzzUHUw\ncontent-length:370\n">>}

answered 2015-10-23

does increasing timeout on commandline work ( e.g. -t 15)

