Implementation Advice: Stepwise Execution and Reporting of Puppet Actions

asked 2015-05-12 05:07:18 -0600

far4d gravatar image

I’m asking you for advice how to implement a stepwise execution and reporting for a big deployment workflow.

I’ve already had my fair share of exposure with puppet and I realised setups for about 50 nodes. Now I’m faced with an implementation task, which seems a bit of an odd fit for puppet, but nevertheless I’ll be using puppet for this. We’re looking forward to use Puppet Enterprise to accomplish this task.

Requirements/Workflow

  1. Execute Step1 (Prepare Deployment Tools) and report back to PE Console the status of the affected nodes.
  2. Execute Step 2 (Prepare Deployment Files and Dry-Run) if all nodes specified in Step 1 completed successfully. Report back to PE Console.
  3. Execute Step 3 (Do the disruptive Deployment) if Step 2 was successful. Report back to PE Console.

Details

Each step should generate a simple report (parseable by non-technicians /non-puppeteers) itemizing the status of the nodes affected by the step (e.g. node x: step 1 success, node y: step 1 fail). Each step will be implemented as a class.

These steps are rather time-consuming (it could take up to an hour to complete a single step) and should be executed only once. Since each step is rather expensive (both in terms of time and processing needed), the status/outcome of each step should be stored somewhere (puppetdb? simple “marker” files stored in the filesystem on the node? hiera?) where it can be queried easily.

It is quite possible that the whole workflow spans several days, since each step requires a management approval. So Step 1 is scheduled for Monday, Step 2 for Wednesday, Step 3 for Saturday.

Now I’m pondering how to achieve the desired workflow and have the following questions:

  1. How would you implement the check for the outcome for each step? If Step 1 fails, the node cannot proceed to Step 2…
  2. How to achieve the reporting of the steps in a form consumable by non-technicians, who interact solely with the web interface and are driving the workflow from there? Ideally the management simply assigns nodes to classes (representing the steps), if the nodes passed the previous step. The deprecation of Live Management in PE 3.8 leaves me wondering, if I have to do the 'gui' myself...
  3. The management should be able to push the changes to the nodes. Execution of the steps through agent polling is not desired, since it interferes with the time schedule of the management. Therefore I need a lightweight, not easily spoofable, and reliable method for checking the nodes status, when the agent periodically runs in daemonized mode. Basically: if node has class step1 assigned and outcome of step1 is success; then do nothing; else do step 1 again.

TIA for your ideas

far4d

edit retag flag offensive close merge delete