Continuous Delivery: Risk

Showing posts with label Risk. Show all posts

Sunday, February 3, 2013

Working the trunk

When my colleague Tomas brought up the idea of continuous delivery he first thing that really caught my attention was "we do all work on the trunk". I've always hated branches. I've worked with many different branching strategies and honestly they have all felt wrong.

My main issue has always been that regardless of branch strategy (Release Branches or Feature Branches) its a lot of double testing and debuging after merge is always horrible. Its also hard to have a clear view of a "working system", what is the system, which branch do you refer to? Always having a clean and tested version of the trunk felt very compelling. No double work and a clear notion of "the system"! I'm game!

So we test everything all the time. How hard can it be.

Well it has proven to be a lot harder then we thought, not to continuously test but to manage everyone's desire to branch. Somehow people just love branches. Developers want their feature branches where they can work in their sandbox. Managers want their branches so that they don't get anything else then just that explicit bug fix for their delivery and not risk impact from anything else.

These are two different core problems one is about taking responsibility and one is about trust.

Managers don't trust "Jenkins".

Managers don't trust developers but somehow they do trust testers. Its interesting how much more credit a QA manager has when he/she says "I've tested everything" then a blue light on jenkins. In fact managers have MORE confidence in a manual regression test that has executed "most of the test-cases on the current build" then an automated process which executes "all the test-cases on every build". I think the reasons are twofold one is that the process is "something that the devs cooked up" and the other is that jenkins cant look a manager in the eyes. It would be much easier if Jenkins was actually a person who had a formal responsibility in the organisation and could be blamed, shouted on and fired if things went wrong.

It takes alot of hard work to sell "everything we have promised works as we have promised it". For each new build that we push into user acceptance testing we need to fight the desire to branch the previous release. Each time we have to go through the same discussion.

"I just want my bug fix"
"you get the newest version"
"I don't want the other changes"
"Everything we have promised works as we have promised it"
"How can you guarantee that"
"We run the tests on each check in"
"Doesn't matter I don't want the other changes they can break something, I want you to branch"

I dint know how many times we have had this argument. Interesting is that we are yet to break something in a production deploy as a result of releasing bug fix from the trunk (and hence including half done features). Though we have had a failed deploy due to having subdued to the urge to branch. We made a bad call and subdued to the branch pressure. By doing that we branched but we didn't build a full pipe for the branch which resulted in us not picking up a incompatible configuration change.

Developers love sandboxes

Its interesting, developers push for more releases, smaller work packages yet they love their feature branches. I despise feature branches even more then release branches. Reason is that they make it very hard to refactor an application and the merging process is very error prone. The design and implementation of a feature is based on a state of the "working system" which can be totally different from the system its merged onto. Also it breaks all the intentions to do smaller work packages and test them often, a merge is always bigger then a small commit.

The desire to feature branch comes from "all the repo updates we have to do all the time slow us down so much" and "we cant just check stuff in that doesn't work without breaking stuff". The later one isn't just from developers wanting to be irresponsible its also from us running SVN and not GIT. Developers do want to share code in a simple way. Small packages that two team mates want to share without touching the trunk is a viable concern. So the ability to micro branch would be nice. So yes I do recommend GIT if you can but its not a viable option for us. Though I'm quite sure that if we where using GIT we would end up having problems related to micro branches turning into stealth feature branches.

I think the complaint "all the repo updates we have to do all the time slow us down so much" is a much more interesting one. In general I think that developers need to adopt more continuous integration patterns in their daily work but this is actually a scale-ability issue. If you have too many developers working in the same part of the repo you are gonna get problems. When developers do adopt good continuous integration patterns in their daily work and their productivity drops then there is an issue. This is one of the reasons why we have seen feature branches in the past.

Distribute development across the repository

When we started building our delivery platform we based it on an industry standard pattern that clearly defines component responsibility. Early on in the life cycle we saw no issues of developers contesting the same repository space as we had just one or two working on each component. But as we scaled we started to see more and more of this. We also saw that some of the components where to widely defined in their responsibility which made them bloated and hard to understand. So we decided to refactor the main culprit component into several smaller more well defined components.

The result of this was very good. The less bloated a component is the easier it is to understand and to test, which leads to increased stability. By creating sub components we also spread out the developers across the repository. So we actually created stable contextual sandboxes that are easy to understand and manage.

Obviously it we shouldn't just create components to spread out our developers but I think that if developers start stepping on each other then its a symptom of either bad architecture or over staffing. If a component needs so many developers that they are in the way of each other then the chance is quite good that the component does way too much or that management is trying to meet feature demand by just pouring in more developers.

Backwards compatibly

Another key to working on the trunk has been our interface versioning strategy. Since we mostly provide REST services we actually where forced into branching once or twice where we had not other option and that was due to not being backwards compatible on our interfaces. We couldn't take the trunk into production because the changes where not backwards compliant and our tests had been changed to map the new reality. This is what lead to our new interface strategy where we among things never ever change interfaces or payloads, just add new ones and deprecate old ones.

Everything that interfaces the outside world needs to be kept backwards compatible or program management and timing issues will force inevitable branching.

Not what I expected

When we first deiced to work solely on the trunk I thought it was gonna be all about testing. Its important but I think people management has been a bigger investment (at least measured in mental energy drain) and importance of good architecture was under rated.

Tuesday, January 1, 2013

Upps our Continuous Delivery process became mission critical

At some point something changed with our Continuous Delivery process, it became mission critical. When we started working on the process it was basically a side project that another Tomas and I had. We added a consultant early in our project and he ended up doing some of the work on the first version of our deployment scripts but it wasnt anything organized and not part of any proccess or tools team.

When we increased the number of developers and started seeing issues with stability and scaleability we also started to realize that our process had become mission critical. In fact or continuous delivery process had become more important to us then our mail system.

Now we had a mission critical hobby project with the following setup.

No official Owner.
No official Developers.
No official Operations professionals involved

Operations only supporting the OS of the Jenkins and nexus instance.

One "live" instance of Jenkins on a super small virtual node.

All development done on live instance.

One "live" instance of Nexus with a very small disk.

All development done on live instance.

Small number of test servers, virtual but not cloud nodes.

Having about 30 developers really depending on a process that is setup like this is obviously a no go.

We started to figure we need to put more effort into it when we where to do our first rewrite of our deploy scripts. Still we didn't think in terms of production mission critical system. We needed a resource and I kept insisting we needed a CM, more on that in an upcoming post. We had architecture and test working together building the application around the process. But we needed some more hands building the deploy scripts and also someone who could help us with the complexity of our system configuration. As I wrote in the entry on deploy scripts this didn't work out well at all. Mostly because the CM ended up working alone in a corner of the organization but also because he didn't share our vision of continuous delivery. Between all discussions trying to get us to implement branching strategies he was writing deployment scripts without any JBoss or DB competence. Obviously this didn't work out all that well and it was during this script rewrite that we started to realize that our process was mission critical. The new deploy scripts where very unstable and as mentioned our tests had stability issues.

Now we started realizing that we have a mission critical system at our hands and we need to start treating it as such. Still this was a bit of an unknown entity in our landscape operations only support our office it and our customer deliveries while development supports tooling. While this for sure falls into tooling department the development organization isnt equipped to support a mission critical system. Still we had to do something about it so this was when we created our tools team, we refer to it as a platform team as it was intended to own certain components such as logging, help desk, ect. But main focus was to be continuous delivery. Our lacking development environment was another area of responsibility that we moved to this team, more on that as well in another entry.

The team consisted of our CM, application DBA, a newly added senior Java developer and my self as architect/lead. It was obvious from the onset how effective it is when you have resources (with full range of competence) that can focus on the process. This made us much more responsive to bugs in the process and faster in implementing changes.

We still at this date have not solved all the infra structure issues but most of it is being worked by the tools team and a new resource in our operations department who is responsible for our tooling serves. Still we don't have a Jenkins test environment and still the operations responsibility of Jenkins and Nexus aren't really well defined. But we have resources dedicated to the process and when something isn't working we handle it as bugs.

The biggest lesson is that its really important to get dedicated resources from dev and ops early. Getting two 50% resources is better then one full time as one isolated resource is a huge bottleneck and has a hard time prioritizing his work. Also make sure to have a bug/enhancement process in place early. Priorities should be made based on user experience, same as with any system in production. Also as soon as the process is in use by the developers you need a test environment for Jenkins (or what ever build server you use to drive the process) as its a production system after all.

I think the reason we got a bit blindsided by the process becoming mission critical is that we haven't had anything similar in our landscape before. There is actually one thing that has grown mission critical at about the same rate hand in hand with our CD process and that's our JIRA server. In fact we have an even bigger dependency on our JIRA if it goes down our developers have no clue what to work on and get stranded very quickly. For us this is a new type of mission critical systems. Previously they have only been supporting systems.

Another reason is that the continuous delivery community talks about how easy it is to get started and how we can just take small baby steps from our nightly build CI. It is both true and the way to go. I just guess I wasn't reading the fine print which says "and then it becomes mission critical".

Thursday, November 22, 2012

What drives my interest in Continuous Delivery?

I want to mitigate the risk of my presence in a project.

My weaknesses as a individual both privately and professorially are that I'm horrible at following instructions I cant do it once but I'm even worse at doing it multiple times. I´m feel physically sick if I need to do repetitive tasks, I get demotivated and hence I do an even worse job. I also tend to forget to do things because I've got my mind full of other stuff. I'm also very bad at testing. Ive worked with test automation for nearly six years now and I still suck at good test cases.

What I am good at is creative problem solving, finding solutions to problems and thinking outside the box. I'm also quite good at inspiring people.

This is a horrible combination for any project. This guy who cant follow instructions constantly comes up with new "bright ideas", cant event test them and even worse he gets people to engaged and enthusiastic towards his crazy ideas, do we really want him on our team???

Yes I am a high risk and I need to be mitigated.

In all seriousness, I do love change, I thrive in an ever changing environment. I love the positive energy towards solving problems and not dwelling over them. I love high pace and new challenges. I also do know that in order to do what I love to do the "maintenance" of our work needs to be minimized as its a huge time and cost sink for each project.

The "maintenance" part isnt just our production applications. Its our code, our tests, our mechanisms and of course our production deliveries. Every hour not spent maintaining these is spent on bringing in new business value into the organization and solving fun new challenges.

If I can contribute towards lowering the onetime and run time costs of a delivery then Im satisfied because I know my work has really mattered. Im also satisfied on a personal level because I know that Im spending time solving new problems and not maintaining old problems.

I want to come to work and feel satisfied...

About Me

I´ve worked as a Java Developer/Architect for 15 years. I´ve worked as part of a consulting organization and as part of a line organization.

Over the last 6 years I´ve had an ever increasing interest in the quality of the delivery. Initially this interest lead me to work with automation of system tests. Then more and more towards automation release and deploy processes. Now for the last two years Ive focused alot of my work on the full Continuous Delivery process.

This blog will server as a collections of lessons learned from my work. Mostly just for my self but Im happy to share my experiences if anyone is interested.

Follow @TomasRihaSE

Pages