Pages

Monday, July 7, 2014

Continuous Deployment in the Cloud Part 2: The Pipeline Engine in 100 lines of code

As I talked about in my previous post in this series we need to treat our Continuous Delivery process as a distributed system and as part of that we need to move the Pipe out of Jenkins and into a first class citizen of its own. Aside from the facts that a CI Tool is a very bad Continuous Delivery/Deploy orchestrator I find the potential of executing my pipe from anywhere in the cloud very tempting.

If my pipe is a first class citizen of its own and executable on any plain old node then I can execute it anywhere in the cloud. Which means I can execute it locally on my laptop, a simple minion in my own cloud or in one of all of the managed CI services that have surfaced in the Cloud.

To accomplish this we need five very basic and simple things

  1. a pipeline configuration that defines what tasks to execute for that pipe
  2. a pipeline engine that executes tasks
  3. a library of task implementations
  4. a definition of what pipe to use with my artefact
  5. a way of distributing the pipeline engine to the node where I want to execute my pipeline
Lets have a look.

Define the pipeline

The pipeline is a relatively simple process that executes tasks. For the purpose of this blog series and for simple small deliveries sequential execution of tasks can be sufficient but at my work we do run a lot of parallel sub pipes to improve the throughput on the test parts. So our pipe should be able to handle both.

We also want to be able to run the pipe to a certain stage. Like from start to regression test and then step by step launch in QA and launch in Prod. Obviously of we want to do continuous deployment we don't need to worry too much about that capability. But I include it just to cover a bit more scope.

Defining pipelines is no real rocket science and in most cases a few archetype pipelines will cover like 90% of the pipes needed for a large scale delivery. So I do like to define a few flavours of pipes that gives us the ability to distribute a base set of pipes for CI to CD.

Once we have defined the base set of pipe flavours the each team should configure which pipe they want to handle their deliverables.

I define my pipes something like this.

name: Strawberry
pipe: 
  - do:
     - tasks:
       - name: Build
         type: mock.Dummy
  - do:
     - tasks:
       - name: Deploy A
         type: mock.Dummy
       - name: Test A
         type: mock.Dummy
     - tasks:
       - name: Deploy B
         type: mock.Dummy
       - name: Test B
         type: mock.Dummy
    parallel: true     
  - do:
     - tasks:
       - name: Publish
         type: mock.Dummy
  
A pipe named Strawberry which builds our services then deploys it in two parallel pipes where it executes two test suites and finally publishes the artefacts in our artefact repo.  At this stage each task is just executed with a Dummy task implementation.

The pipeline engine

We need a mechanism that understands our yml config and links it to our library of executable tasks.

I use Groovy to build my engine but it can just as easily be built in any language. Ive intentionally stripped down some of the logging I do but this is basically it.  In about 80 lines of code we have a engine that loads tasks defined in a yml, executes then in serial or parallel and has the capability to run all tasks, the tasks up to one point or a single task.

@Log
class BalthazarEngine {

    def int start(Map context){
        def status = 0
        def definition = context.get "balthazar-pipe" 
        
        for (def doIt:  definition["pipe"] ){
            status = executePipe(doIt, context)
        }
        return status
    }
    def int executePipe(Map doIt, Map context){
        def status = 0
        if (doIt.parallel == true){
            status = doItParallel(doIt,context)
        } else {
            status = doItSerial(doIt,context)
        }
        return status
    }
    
    def int doItSerial(def doIt, def context){
        def status = 0
        for (def tasks : doIt.do.tasks){
            status = executeTasks(tasks, context)
        }
        return status
    }
    
    def int doItParallel(def doIt, def context){
        def status = new AtomicInteger()
        def th
        for (def tasks : doIt.do.tasks){
            def cloneContext = deepcopy(context)
            def cloneTasks = deepcopy(tasks)
            th = Thread.start {
                status = executeTasks(cloneTasks, cloneContext)
            }
        }
        th.join()
        return status
    }   
    
    def int executeTasks(def tasks, def context){
        def status = 0
        for (def task : tasks){
            //execute if the run-task is not specified or if run-task equqls this task
            if (!context["run-task"] || context["run-task"] == task.name){
                log.info "execute ${task.name}"
                context["this.task"] = task
                def impl = loadInstanceForTask task;
                status = impl.execute context 
            }
            
            if (status != BalthazarTask.TASK_SUCCESS){
                break
            }
            
            if (context["run-to"] == task.name){
                log.info "Executed ${context["run-to"]} which is the last task, done executing."
                break
            }
        }
        return status
    }
    def loadInstanceForTask(def task){
        def className = "balthazar.tasks.${task.type}"
        def forName = Class.forName className
        return forName.newInstance()
    
    }
    def deepcopy(orig) {
        def bos = new ByteArrayOutputStream()
        def oos = new ObjectOutputStream(bos)
        oos.writeObject(orig); oos.flush()
        def bin = new ByteArrayInputStream(bos.toByteArray())
        def ois = new ObjectInputStream(bin)
        return ois.readObject()
    }
}

A common question I tend to get is "why not implement it as a lifecycle in maven or gradle". Well I want a process that can support building in maven, gradle or any other tool for any other language.  Also as soon as we use another tool to do our job (be it a build tool, a ci server or what ever) we need to adopt to its lifecycle definition of how it executes its processes. Maven has its lifecycle stages quite rigidly defined and I find it a pita to redefine them. Jenkins has its pre, build, post stages where its a pita to share variables. And so on. But most importantly use build tools for what they do well and ci tools for what they do well and none of that is implementing CD pipes.

Task library.

We need tasks for our pipe engine to execute. The interfaces for a task is simple.

public interface BalthazarTask {
    int execute(Map<String, Object> context);
}

Then we just implement them. For my purpose I package tasks in "balthazar.task.<type>.<task>" and just define the type and task in my yml. 

Writing tasks in a custom framework over say jobs in Jenkins is a joy.  You no longer need to do workaround to tooling limitations for simple things such as setting variables during execution.  

Anything you want to share you just put it on the context.

Here is an example of how two tasks share data.

  - tasks: 
    - name: Initiate Pipe
      type: init.Cerebro
  - tasks: 
    - name: Build
      type: build.Gradle
      command: gradle clean fatJar

I have two tasks. The first task creates a new version of the artefact we are building in my master data repository that I call Cerebro. (More on Cerebro in the next post). Cerebro is the master of all my build things and hence my version numbers come from there. So the init.Cerebro task takes the version from Cerebro and puts it on the context.

@Log
class Cerebro implements BalthazarTask {
    @Override
    def int execute(Map<String, Object> context){
        def affiliation = context.get("cerebro-affiliation")
        def hero = context.get("cerebro-hero")
        def key = System.env["CEREBRO_KEY"]
        def reincarnation =  CerebroClient.createNewHeroReincarnation(affiliation, key, hero)
        context.put("cerebro-reincarnation",reincarnation)
        return TASK_SUCCESS
    }
}


My build.Gradle task takes the version number from cerebro (called reincarnation) and sends it to the build script. As you can see I can use custom commands and in this case I do as fat jars is what I build. By default the task does gradle build. I can also define what log level I want my gradle script to run.

@Log
class Gradle implements BalthazarTask {
    @Override
    def int execute(Map<String, Object> context){
        def affiliation = context["cerebro-affiliation"]
        def hero = context["cerebro-hero"]
        def reincarnation = context["cerebro-reincarnation"]
        def command = context["this.task"]["command"] == null ? "gradle build": context["this.task"]["command"]
        def loglevel = context["this.task"]["loglevel"] == null ? "" : "--${context["this.task"]["loglevel"]}"
        
        def gradleCommand = """${command} ${loglevel} -Dcerebro-affiliation=${affiliation} -Dcerebro-hero=${hero} -Dcerebro-reincarnation=${reincarnation}"""

        def proc = gradleCommand.execute()                 
        proc.waitFor()                               
        return proc.exitValue()
    }
}

This is how hard it is to build tasks (jobs) if its done with code instead of configuring it in a CI tool. Sure some tasks like building Amazon AMI´s take a bit more of code. (j/k they don't). But ok a launch task that implements a rolling deploy on amazon using a A/B release pattern does but I will come back to that specific case.

Configure my repository

So I have a build pipe executor, pre built build pipes and tasks that execute in them. Now I need to configure my repository.

In my experience 90% of your teams will be able to use prefab pipes without investing too much effort into building tons of prefabs. A few CI a few simple CD and a few parallelized pipes should cover a lot of demand if you are good enough at putting an interface between the deploy tasks and the deploy as well as the test tasks and the deploy tools.

So in my repo I have a .balthazar.yml which contains.

balthazar-pipe: Strawberry

Distributing the pipeline engine and the task library

First thing we need is a balthazar client that starts the engine using the configuration provided inside my repository.  Simply a Groovy script does the trick.

@Log
class BalthazarRunner {
    def int start(Map<String, Object> context){
        Yaml yaml = new Yaml()
        if (!context){
            def projectfile = new File(".balthazar.yml")            
            if (projectfile.exists()){
                context = yaml.load projectfile.text
            } else {
                throw new Exception("No .balthazar.yml in project")
            }
            
        }
        def name = context.get "balthazar-pipe" 
        def definition = yaml.load this.getClass().getResource("/processes/${name}.yml").text 
        BalthazarEngine engine = new BalthazarEngine()
        
        context["balthazar-pipe"] =  definition
        context["run-to"] = System.properties["run-to"]
        context["run-task"] = System.properties["run-task"]
        return engine.start(context)
    }
}
def runner = new BalthazarRunner()
runner.start([:])

Now we need to distribute the client, our engine and our library of tasks to the node where we want to execute the pipeline with our code repository. This can be done in many ways.

We can package balthazar as a installable package and install it using yum or similar tool. This works quite well on build servers but it does limit us a bit on where we can run it as we need "that installer" to be installed on the target environment. In many cases its really isn't a problem because if your a Debian shop then you have your deb everywhere and if your a Redhat shop then you have your yum. 

I personally opted for another way of distributing the client. Partially because Im lazy and partially because it works on a lot of environments. When I make my balthazar.yml I also checkout the balthazar client project as a git submodule.

So all my projects have a .balthazar.yml and a balthazar-client folder. In my client folder I have a balthazar.sh and a gradle.build file. I use gradle to fetch the latest artefacts from my repo and then the shell script does the java -jar part. Not all that pretty but it works.

Summary

So now on all my repos I can just do...

>. balthazar-client/balthazar.sh

... and I run the pipe configured in .balthazar.yml on that repo. Since all my tasks integrate with my Build Data Repository I get ONE view of all my pipe executions regardless of where they where executed.

CD made fun! Cheers!