Pages

Tuesday, April 22, 2014

New Talk: Continuous Testing

April 29th I will be visiting HiQ here in Gothenburg. I will have the opportunity to talk about the super important subject of Continuous Delivery and Testing.  This talk is a intro level talk that describes Continuous Delivery and how we need to change the way we work with Testing.

If anyone else is interested in this talk then please dont hesitate to contact me. The talk can be focused on a practitioner audience as well digging in a bit deeper into the practices. 

Here is a brief summary of the talk.

Why do we want to do Continuous Delivery
Introduction to Continuous Delivery and why we want to do it. What we need to do in order to do it, principles and practices and a look at the pipe. (Part of the intro level talk)

Test Automation for Continuous Delivery
Starting with a look on how we have done testing in the past and our efforts to automate it. Moving on to how we need to work with Application Architecture in order to Build Quality In so that we can do fast, scaleable and robust test automation.

Test Driven Development and Continuous Delivery
How we need to look work in a Test Driven way in order to have our features verified as they are completed. A look at how Test Architecture helps us define who does what.

Exploratory Testing and Continuous Delivery
In our strive to automate everything its easy to forget that we still NEED to do Exploratory Testing. We need to understand how to do Exploratory Testing without doing Manual Release Testing. The two are vastly different

Some words on Tooling

Alot of time discussions start with tools. This is probably the best way to fail your test automation efforts.

Areas Not Covered
Before we are done we need to take a quick look at Test Environments, Testing Non Functional Requirements, A/B Testing, Power of Metrics. (This section expands in the practitioner level talk)

Tuesday, April 1, 2014

Upcoming talks

I've gotten the honor to speak at two fantastic conferences this spring.

First one is PipelineConf 8th of april in London where I will talk about the people side of Continuous Delivery. This is the talk Ive had at Netlight EDGE and JDays Conferences though its been update with the experiences from the last 6-8 months of working with Continuous Delivery.

The second one is GeeCon 14th-16th may in Krakow Poland where I will be speaking about Scaling Continuous Delivery. This is a new talk that focuses on lessons learned from our journey to scale continuous delivery from a team of 5 to an organization of 100s.

If you are interested in hearing me speak at a conference, seminar or a workshop. Dont hesitate to contact me.

Monday, March 17, 2014

Portability

I've talked about Portability of the CD process before but it continuously becomes more and more evident for us how important it is. The closer the CD process comes to the developer the higher the understanding of the process. Our increase in portability has gone through stages.

Initially we deployed locally in a way that was totally different from the way we deployed in the continuous delivery process. Our desktop development environments where not part of our CD process at all. Our deploy scripts handle stopping starting of servers, moving artifacts on the server, linking directories and running liquibase to upgrade/migrate database. We did all this manually on the local environments. We ran liquibase but we ran it using the maven plugin (which we don't do in our deploy scripts there we run it using java -jar). We moved artifacts by hand or by other scripts.

Then we created a local bootstrap script which executed the CD process deploy scripts on a local environment. We built in environment specific support in the local bootstrap so that we supported linux and windows. Though in order to start Jboss and Mule we needed to add support for the local environment in the CD process deploy script as well. We moved closer to portability but we diluted our code and increased our complexity. Still this was an improvement but the process was still not truly portable.

In recent time we have decided to shift our packaging of artifacts from zip files to rpms. All our prod and test environments are redhat so the dependency on technology is not really an issue for us here. What this gives us is the ability to manage dependencies between artifacts and infrastructure in a nice way. The war file depends on a jboss version which depends on a java version and all are installed when needed. This also finally gives us a clear separation between install and deploy. The yum installer installs files on the server, our deploy application brings the runtime online, configures it and moves the artifacts into runtime.

In order for us to maintain portability to the development environment this finally forced us to go all in and make the decision "development is done in a linux environment". We won't be moving to linux clients but our local deploy target will be a virtual linux box. This finally puts everything into place for us creating a fully portable model. Its important to understand that we still dont have a cloud environment in our company.


This image, created by my colleague Mikael, is a great visualization of how portability we can build in our environment now and when we get a cloud. By defining a Portability level and its interface we manage to build a mini cloud on each jenkins slave and on a local dev machine using the exactly same process as we would for a QA or test deploy. The Nodes above the Portability level can be local on the workstation/jeknins slave or remote in a Prod Environment. The process is the same, regardless of environment Provision, Install and Deploy.


Friday, February 21, 2014

Scaling Continuous Delivery

Its been a while since I posted. Main reason is that we have been very focused on our main deliveries and feature development for the last six month. Whenever the feature train hits central station its always work such as build, release, test automation that gets hit first.

Though there are upsides to not touching your Continuous Delivery process for a few months. If you just keep working on your backlog you don't get time to analyze the impact of the changes you just made. Several times we have realized that the number two/three items in the backlog have dropped significantly in priority as we have fixed the most important issue and others rising fast in priority.

Now we have had time to analyse a lot of new issues and its time for us to pick up the pace again.

Scaling the Organization 

The good thing, the awesome thing (!) is that during these six or so months our organization has changed and we have actually be able to create a line organization that owns and takes responsibility for the continuous delivery process.

One of the major bottlenecks we found in our process was our platform/tools team. The team was small and resources in that team where always first to go when feature pressure increased. The team became just another "IT function" that didn't have time to be proactive due to all the reactive support work it had to do.

There was a few reasons behind this first it was the way the team worked in the past. It actually built the pipes and processes for all the teams by hand and tailored to the custom needs of each team. On some teams there were individuals who picked up the work and kept on configuring the jenkins jobs to tailor them even more but on some teams there was no interest whatsoever and their jobs degraded.

The result of this was that no one really knew how the pipes looked and how they should look. Introducing process change was a horribly slow process as it was all manual and dependent on the platform/tools team.

One of the first changes we made was to increase the bandwidth of the team and reducing the dependency on that team.  provided a great solution for this over a chat this summer. Instead of the platform/tools team supporting the development teams the development teams put resources into the platform/tools team. Each team was invited to add a 50% resource on a volunteered basis. This way the real life issues got much better attention in the platform/tools team and the competence about the Continuous Delivery process got spread in a much more organic way.

This did not eliminate the bottleneck organization but it gave us bandwidth to change the way we work and long term gave us the ability to scale with the number of teams that use the process.

Scaling the Process

The main issue with why we were a bottleneck was the way we worked. We preached Automate Everything, Test Everything, If its hard do it more often, ect but when it came to the Continuous Delivery process we didn't do what we where teaching.

We had ONE Jenkins Environment so all the changes happened directly in production. Testing plugins and new configurations on a production environment isn't really the way to delivery stability, reliability and performance.

Manually created Jenkins Pipes isnt really a way to create sustainable pace and continuous improvements.

Developing Deploy scripts without explicit unit tests isnt really a good way of creating a stable process. We have been priding ourselves with our deployment being tested hundreds of times pre production deploy which was true but very dumb. Implicit testing means that someone else takes the pain for my mistakes. Deployment scripts are applications and need to be treated as first class citizens.

This had to change.

First thing we did was to use the extra bandwidth we had obtained to build a totally new way of delivering continuous delivery. Automate everything, obvious, hu?

We also decided to deliver a continuous delivery environment per development team and not have them all in one environment. So we started with automating provisioning of Jenkins & Test environments. We dont have a cloud solution in our company at this time so we have a fake cloud that we work with which is a huge pool of virtual servers. This pool we provision and maintain using chef.

Second thing was to automate the build pipe setup. We built us a little simple pipe generator which has defined pipe templates of 5-8 different layouts to support the different needs. We actually managed to get the development teams to adjust to a stricter maven project naming convention to use the generated pipes as everyone saw the benefits of this.

The pipes we have are basically typed by what they build if its libs or deployable components and how they are tested as we still need to initiate our Fitnesse tests a bit differently from our other tests.

We made it the responsibility of the platform/tools team to develop the pipe templates and the responsibility of the development teams to configure their generator to generate the pipes they needed for their components.

Getting to this stage was a lot of work and a lot of migration work for all the teams but the results have been terrific. The support load has gone down alot on the platform/tools team and each bug fix is rolled out within minutes to all the pipes.

We have also be able to take on new development teams very easily. Not all teams in our company are ready to do Continuous Delivery but they are all heading in this direction and we can now provide environments and pipelines that match their maturity.

Summary

We have gone from a process developed as skunkworkz to Continuous Delivery as a Service within our organization. We always run into new bottlenecks and challenges this time the bottleneck was much more us than anything else. I assume that the next big bottleneck is going to be hardware and our inability to deliver on a cloud solution, since we now can roll out to more and more teams. But who knows I can be wrong only time will tell.





Tuesday, June 18, 2013

Its about the people.

Last week I attended QCon New York. Fantastic conference as usual and it was comforting to see that basically everyone was saying the same thing. "Continuous Delivery is not about the technology, its about the people". Which also happens to be the title of my talk at Netlight´s EDGE conference in september,

In his talk Steve Smith (@agilestevesmith) talked about how 5% is technology and 95% is organization. While I agree with that I think that the non-technical 95% can be divided into organization, change of role definitions and individual maturity. Its these three that my talk will cover.

Hopefully I will be able to have this talk in Gothenburg as well as its been submitted to JDays.

Monday, April 8, 2013

Talk at HiQ 24th of April

Continuous Delivery - Enabling Agile.

The key to agile development is a fast feedback loop. Continuous Delivery strives towards always having tested releases in deliverable state. Continuous Delivery is not just a technical process but a change to the entire organization and the individuals within it. This presentation describes the principles of Continuous Delivery, a brief overview on how it can be implemented, how it changes the organization and how it impacts the individuals.

Target audience for this presentation is Developers, Architects, Testers, Scrum Masters, Project Managers and Product Owners in no particular order. The presentation is not rich in technical detail and based on real life experiences.

Please use this post to provide questions and feedback.

Welcome

Sunday, February 24, 2013

Architect to re-Architect

We spend so much time trying to make the right decisions. It's one of the downsides of working on a next generation platform. "You better get it right this time!". We have all been there when a current generation solution just doesn't cut it anymore. Implementing that next requirement is going to be so expensive that we might just as well rewrite the whole thing. Thing is they also tried to "get it right this time!".

Why does it "always" go wrong? Why do we always run into dead ends with systems. Sure not always but always when an application is exposed to a lot of changes and new requirements.

Select technology then abstract and isolate it in the architecture.

Historically we have put a lot of thought into selection of technology when we build something new. Its important to not get it wrong so we think a lot about getting it right. We also think a lot about patterns so that we can replace tech A with tech B if the decision has to be reversed. Who hasn't written hundreds of DAOs so that we one day can change our database. How often do we change database? Historically well I have never done it. Change from Oracle to DB2 or what ever other SQL database has never been the reason for a major rewrite. In fact I've been part of more then one rewrite that has thown out everything but the data layer.

In the future we will see more database changes due NoSQL but if and when we do that do we really want to keep our DAO interfaces? If we do then we sure ant going to accomplish much with our rewrite. If we change then we change because we need to solve a bottleneck problem. In order to solve it we need to make an optimization using a niche product. So we need to write and query our data differently.

The cause of a major rewrite is either lack of scale ability or customer requirements that are to hard to expensive or too high risk to implement. The later almost always happens when everything has become so interconnected that the change can no longer be done in a safe and isolated way. We need to refactor so much I order to make the change possible that its cheaper to rewrite.

Distribute system can still be a monolith.

In standard monolithic design we monolithized everything not just the components of the system but also the data model and the business logic. By normalizing our data model and constantly striving towards decreasing code redundancy we entangle all the services of our application into a huge ball of concert. It's when we end up with our services entangled in a solid ball of concert that we need to blow it up, all of it in order to rewrite it. It doesn't matter how well we modeled our database, how nice our DAOs are or how much inversion of control we use. If we don't treat our services independently we will run into trouble down the road.

Decoupling the monolith into subsystems doesn't necessarily help either. If we still normalize our data and strive towards reusing as much code as possible within the components then all we have done is distributed the monolith. Chances are quite high that you will need to rewrite multiple components when the requirement change appears.

Lets take an example.


We have a training application aimed towards running and cycling. We have users, training sessions and races. Training sessions and races are the same thing really they both contain a number of users, equipment, time, distance and a route. We provide views of user training sessions, user races and race results by race. We sell the application to race organizers and its free to users. We have an agreement to keep the race results highly available and to keep all history of previous years.

So we have a simple data model with users and sessions with a many to many relationship and a type defining if its a race or a training session. Simple. Done. Delivered.


Now the application becomes really popular as a training application among users so we start gaining a lot of data. This data is mostly written since no one else then the user really cares about it. Though it does impact on our race data since people tend to look at that more.

Someone realizes that all the training data is interesting since we also added a heart rate integration. So we build queries on the training data to provide to medical studies. Sweet extra income that he sales dudes came up with. It's no real issue performance wise as we run them once a year and that's done over Christmas.

Now someone sells our services of race data, training and fitness trending to UCI (cycling union) as a tool for their fit against doping. We just need to add a query to correlate our sweet training reports with race results, how hard can that be. We add the develop for a sprint or two and go live. So now we get serious tonnage of data and we run our queries more often. *gag* it doesn't work we can't scale and we can't add e new query without totally killing our SLAs with the other races. We need to rewrite.

Components are not the silver bullet.

Components dont really help us
Having our system distributed into a user repository, session storage and a integration component providing rest services to our GUI component wouldn't help us all at much. Sure we have separated users and their equipment from the sessions but its the queries on the sessions that is the problem and that they are killing our SLAs with the other race organizers.




Design by Services 

So what we really need is to move the race result service into a service of its own. We need to isolate it. Even though all the data is identical to the race data by the user. Then we need to separate the integration code for the race organizer service into a service of its own so that we can deploy it separately.

Services do help us
Doing this when hitting then wall is both hard, costly and risky. Just the database split is a nightmare if the data has grown big.

If we would have done this from the get go we could just have re architected the user race and training session service. We could have moved that from our MySQL to a big table database or what ever without affecting our race by organizer service. But doing this upfront feels so awkward we would have had duplicate tables and redundant code.

Define and isolate services in the architecture.

If we focus on isolating services across our components instead of isolating technology then we can actually re-architecture our bottlenecks. In fact in our example we could just added a uci services that duplicates the other services and if it would run into performance issues we could just  re-architectured it. But that would have forced us to duplicate more upfront and to increase our initial development costs.
Services can be extremely similar and
yet be different services

It's hard to "get it right" when the right can be against everything you have been thought for years. What we must learn to understand better is how we define and isolate services so that we can re-architecture our bottlenecks for the services that experience them and not the entire system.