Continuous Delivery: December 2012

Wednesday, December 26, 2012

Process Scaleability

When we started working on our continuous delivery process our team was very small, three devs in two sites in different time zones. During the first six months we added two or three developers. So we where quite small for quite some time.

Then we grew very quickly to our current size of about thirty developers and eight or so testers. We grew the team in about six months. Obviously is provided huge issues for us with getting everyone up to speed. This exposed all the flaws we have with setting up and handling our dev environment. But not only that it also exposed issues with scaleability of our continuous delivery process.

With the increased number of developers the number of code commits increased. Since we test everything on every code commit our process started stacking test jobs. For each test type we had a dedicated server. So each deploy and the following test jobs had to be synchronized resulting in a single threaded process. This didn't bother us when we where just a few code committers but when we grew this became a huge issue.

Dedicated Test Server beeing the bottleneck

The biggest issue we had was that the devs didn't know when to take responsibility. If the process scales then the time it takes for a commit to go through the pipe is identical regardless of how many commits where made simultaneously. The time of our pipe was about 25-30 min. Bit long but durable IF it would be the same time for each commit. But since the process didn't scale the time for a developers checkin to go through was X*25 min where X=number concurrent commits.

This was perticularily bad in the afternoon when developers wanted to checkin before leaving. Sometime a checkin could take up to two three hours to go through and obviously devs wouldn't wait it out before leaving. So we almost always started the day with a broken pipe that needed fixing. Worse yet our colleagues in other timezones always had broken pipes during their day and they usually lacked the competence to fix the pipe.

Since the hardest thing with continuous delivery is training developers to take responsibility it's key that its easy to take responsibility. Visibility and feedback is very important factors but its also important to know WHEN to take responsibility.

The solution was obviously to start working with non dedicated test servers. Though this was easier said then done. If we would have had cloud nodes this would have been a walk in the park to solve. Spawning up a new node for each assembly and hence having a dedicated test node per assembly would scale very well. But our world isn't that easy. We don't use any cloud architecture. Our legacy organization isn't a very fast adopter of new infrastructure. This is quite common for most large old organizations and something we need to work around.

Our solution was to take all the test servers we had and put them into a pool of servers and assign them to testing of an assembly at the time.

Pipe 1 has to finish before any other thread can use
that pooled server instance.

This solves scaling but provides another problem we need to return servers into the pool. With cloud nodes you just destroy them when done and never reuse. Since we do reuse we need to make sure that once a deploy starts on a pooled server all the test jobs get to finish before next deploy starts.

We where quite uncertain how we wanted to approach the pooling. Did we really want to build some sort of pool manager of our own? We really, really didn't because we felt that there has to be some kind of tool that already does this.

Then it hit us. Could we do is with jenkins slaves? Could our pool of test servers be jenkins slaves? Yes they could! Our deploy jobs would just do a local host deploy and our test jobs would target local host instead of the ip of a test server.

The hard part was to figure out how to keep a pipe on the same slave and not have another pipe hijack that slave between jobs. But we finally managed to find a setup that worked for us where an entire pipe is executed on the same slave and jenkins blocks that slave for the duration of the pipe.

As of writing this post we are just about to start re-configuring our jobs to set this up. Hopefully when we have this fully implemented in two weeks or so we will have a process that scales. For our developers this will be a huge improvement as they will always get feedback within 25 min of hit checkin.

Monday, December 17, 2012

Test stability

The key to a good Continuous Delivery process is a stable regression suite. There are a few different types of instability on can encounter.

• Lack of robustness
• Random test result
• Fluctuation in execution time

Lack of robustness often comes from incorrectly defined touch points. If a test needs to include a lot of functionality in order to verify a small detail then its bound to lack robustness. Even worse is if a test directly touches implementation.

Adding extra interfaces to create additional Touch Points.

I'm sure everyone has broken unit tests when refactoring without actually breaking any functionality. This almost always happens when unit tests evaluate implementation and not functionality. For example when you just split methods to clean up responsibility.

Tough this doesn't only happen to unit tests it can just as easily, if not easier to functional acceptance tests. If they for instance touch database in validation, instrumentation or data setup you will break your tests every time you refactor your database.

Testers seem to be very keen on verifying against the database. This is understandable since in an old monolith system with manual verification you only have two touch points GUI and db. It's important to work with test architecture and to educate both developers and testers to prevent validation of implementation.

Another culprit is dependencies between implementation and test code. For example Fitnesse fixtures that directly access, model objects, DAOs or business logic implementation.

Tests must be robust enough to survive a refactoring and addition of functionality. Change of functionality is obviously another matter.

We actually managed this quite well with our touch points.

Random test failures are a horrible thing because it will result in either unnecessary stoppage of the line or people loosing respect for the color red.

This is where we have had most our issues. At one point our line was failing 8 of 10 times it was executed and at least 7 of these where false negatives. "Just run em again" was the most commonly heard phrase. Obviously the devs totally lost respect for the color red. The result was that when now and again a true bug caused the failures no on cared to fix it and they started stacking.

Result of our Jenkins Job History for
Functional Tests could look like this.

So why did we have these problems? First of all our application is heavily asynchronous so timing is a huge issue in our tests. Many of our touch points are asynchronous triggers that fire off some stuff in the application and then we wait for another trigger from the application to the test before we validate. We don't really have much architectural room here as its an industry standard pattern. This in it self isn't a big deal except that each asynchronous task schedules other asynchronous tasks. So a lot of things happen at the same time.

Since its desirable to keep execution times short we reconfigure the system at test time to use much shorter timeouts and delays. This further increases the amount of simultaneous stuff that happens for each request.

When all these simultaneous threads hit the same record in one table you get transactional issues. We had solved this through use of optimistic locking. So we had a lot of rollbacks and retries. But it "worked". But our execution times where very unpredictable and since our tests where sensitive to timing they failed randomly.

Really though did we really congest our test scenarios so much that this became a problem? I mean how on earth could we hit the optimistic lock so often that it resulted in 7 out of 10 regressions failing due to it?

Wasn't it actually so that the tests where trying to tell us something? Eventually we found that our get requests where actually creating transactions due to an incorrectly instances empty list, marking an object as new. We also changed our pool sizing as we exhausted it causing a lot of wait.

So we had a lot of bugs that we blamed on the nature of our application.

Listen to your tests!! They speak wise things to you!

Eventually we refactored that table everyone was hitting so that we don't do any updates on it but rather track changes in a detail table. Now we didn't need any locking at all. Sweet free feeling. Almost like dropping in alone on a powder day. ;)

Fluctuation in execution time. I find it just as important that a test executes equally fast every time as it is that it always gives the same result.

First of all a failed test should be faster then a success, give me feed back ASAP! Second if it takes 15 sec to run it should always take 15 sec, not 5 or 25 at times. This is important when building your suites. You want to have your very fast smoke test suites your bang per min short suites and your longer running suites. Being able to easily group tests by time is important.

It's also important to be able to do a trend analysis on the suite execution time. It's a simplistic but effective way to monitor decrease in performance.

We have actually nailed quite a few performance bugs this way.

Asynchronous nature makes everything harder to test but don't make it an excuse for your self. Random behavior is tricky and frustrating in a pipe but remember TRUST YOUR TESTS!

Wednesday, December 12, 2012

Deploy scripts, how hard can it be?

Obviously an integral part of a Continuous Delivery process is to get the artifacts deployed. The same deployment procedure should be used for every deploy. Every deploy means the deploy for every component test, functional acceptance test, load test, rollback test, user acceptance test and yes production as well.

For us this presented our first big challenge. We wanted the same deployment mechanism for all deployments made into environments owned by e development organization and into production owned by operations. Yes we are a legacy NoDevOps organization and there is no changing that anytime soon. I'll cover is more in another post. Basically each project deals with deploy scripts in their own way some have sort of same scripts for all environments some don't. We wanted to change this.

We wanted two things same mechanism for all deploys and triggered the same way. This meant at we had to agree on how to do it with our operations department. This also basically ruled out any sort of third party tooling like chef, or what ever. We felt that we didn't have the leverage and mandate to push a tool on them. We where crossing our fingers that they wouldn't shoot down our proposal to trigger deploys from jenkins.

They actually didn't shoot down using jenkins but they forced us to set up a jenkins at e developers wouldn't have access to. As in most legacy organizations devs aren't allowed to touch a production deploy. They also had some really good input on our initial rudimentary deploy scripts.

We had written some rudimentary scripts early on just to get something deploying. These where quite non-generic hard coded bash scripts. They handled the transfer of artifacts from nexus to the target server, running of liquibase scripts and restart of jboss and mule servers.

So we had the input from operations how to make the scripts better (required to use for prod deploy) and our desire to make the scripts generic. How hard can it be? It's just moving some wars and stuff to a server where we need to put it in the right place and then restart some stuff.

We decided at we first wanted to do the changes operations required as that would allow us to deploy our first delivery to productions. Problem was that while we where doing this we got more deliveries and more components that needed deploying to component test servers. So we ended up with huge set of copy pasted scripts in different stages of development. This bag of copy pasted scripts required maintenance and slowed our development of the main line scripts.

This is really a sad story in our development. It took us several months to get the scripts rewritten to match the requirements of operations. By the time this was done we so desperately needed our generic scripts that we had to throw out the mudball that our scripts had turned into and rewrite them again. This time it went much faster few weeks but the migration of all the other components and deliveries that used different versions of old scripts took quite some time as well.

Here are a few reasons we got into this mess (other then not having a DevOps organization).

1. We didn't have a standardized packaging strategy for our components and deliveries, making it hard to generalize scripts.
2. Having a script developer who was good with Linux but no JBoss, Mule or liquibase skills AND not pairing him with a developer possessing these skills.
3. Starting too late with the rewrite of the rudimentary scripts. We knew for a few months what we had to do but didn't do it.
4. Last but not least leaving rollback mechanisms and rollback pipe building to the last minute.

Get your packaging sorted out early and get your deployment mechanism in place well in time before first production deployment. Also have your rollback strategy sorted out and in testing as part of your pipe early, well before production. I'll cover rollback in another post.

So no deploy scripts aren't that hard if you do things in the right order and don't kill your self by doing mistakes on all levels.

Saturday, December 8, 2012

The impact of Continuous Delivery on the role of the Tester

Continuous Delivery really does change way we work. It's not just tools and processes. We know that because every talk, every blog says it does but when you really see it its quite interesting.

The changes to the level of responsibility required by each developer has been one of the hardest thing to manage. Don't check shit in or you will break the pipe! Don't just commit and run, it's your responsibility that the pipe is green! All that is frustrating and has been the single most time consuming on our journey, but its worth a post of its own.

What I'd like to focus on is the changes to he role of the tester. Previously we have worked mostly with manual GUI driven testing. Our testers have tons of domain knowledge and know our systems inside out. We have worked with test automation in some projects and I have some test automation experience from the past. I've been trying to champion test automation for years but we have had a bit of a hard time getting it of the ground. When we have its not been test automation but rather automated tests that require some kind of tester supervision.

In our current project, as I've written about in previous posts , we decided to go all in. When we started we where a team of just architects. I was leading the work on the test automation. We got really good results working a lot with test architectur and test ability architecture in our application. But after some time we started to suffer from from lack of tester input in our testing. We obviously ended up with a very happy case scenario oriented suite.

So we started to bring on testers to our project but what profiles should we look for. First we started to look for just our notion of what a tester was and had always been in our projects. We happily took on testers with experience of testing and test managing portals, order systems, corporate websites ect, ect. But our delivery at this time didn't have a GUI all testing was done using Fitnesse. We didn't really get the interaction between dev and test we where hoping for.

We did get use for our testers for partner integration testing, which was manual (using rest client). But it wasn't really working well because they didn't know the interfaces and the application as they hadn't worked with th test automation.

We have been having this same experience for over a year. Our testers don't seem to be able to get involved in our test automation. But we do have a few who are and we are super lucky to have found them because we have not really had much of a clue when recruiting.

Our developers who are very modern and in a lot of cases super interested in test automation have constantly been bashing us about our choices of test tools. As I talked about in my previous post they totally refused SoapUI and arnt all that fond of Fitnesse either, even though they prefer it alot over SoapUI. But my take has always been "we need testers to feel comfortable with the tool, it's their domain, let them pick".

After we decided to move to REST assured I started to realize the problem I've failed to grasp for the six years or so I've been working with test automation. There are two sets of testers. GUI testers and technical testers. A GUI tester will always struggle with test automation. He/she has little to none experience or education in coding. The technical tester has a background as a developer or started developing as part of a automation interest.

Still even the technical eaters we have who have experience from test automation have had a transition period coming into the continuous world. Testers do tend to accept manual steps that arnt acceptable in a continuous world. It's not ok to just quickly verify this bug manually because its so simple and changing the test case that had a gap is a lot of work. Its not ok to just add that user into e DB manually. It not ok to verify against the DB.

In the past our demand in GUI clicking testers has been high and our demand in technical testers has been low. At least in our old nonCD world the TestPyramid, http://martinfowler.com/bliki/TestPyramid.html, was upside down.

The Continuous Delivery expansion will drive a shift in what we look for in testers. Our tester demographics will move towards matching the pyramid. We will still need the GUI testers but their work will move more towards requirements gathering and acceptance testing. While I think we will see a new group of testers, with a much stronger developer background, come in and work with the automation.

This new group of technical testers or developers with super high understanding of testing is still hard to find but I hope we will see more of them.

Picking the tool for Component Testing

As i discussed in the previous post we realized we need to test our component, not just unit test code and do functional testing. Biggest reason was that we where cluttering our functional tests with validation a that didn't belong at that level. Cluttering tests with logic that doesn't belong at that level was increasing our test maint costs, something we knew rom the past we had to avoid.

Since our application is based on REST services exposed by components with very well defined responsibility its very well suited for component testing.

The responsibility for our components was clearly defined. A component test is a black box test with the responsibility of validating the functionality and interfaces of the component. This definition was somewhat a battle as some how our testers seem (our project and others in our company) to insist that they want to validate stuff on the database. I don't like beeing unreasonable but this was one thing I felt I had to be unreasonable on, black box means get the f*** out of our database. I still feel a bit bad about overriding our test managers on this but I still feel strongly that it had to be done. So all tests go through public or administrative interfaces on our components. Reason for this is obviously that I wanted our tests to support refactoring and not break when we change code.

The upside on forbidding database access was that we had to make a few administrative interfaces that we could eventually use for building admin tools.

So now what tools should we use? We where using Fitnesse for our functional testing but Fitnesse biggest weakness is that it doesn't separate test flow from test data. With functional testing this isn't really a big issue as each test is basically a flow of its own. But with component testing and it by nature beeing more detailed we saw that we would get much more tests cases per flow. Another weakness is that Fitnesse doesn't go all that well together with large XML/json document validation.

We do our GUI testing with selenium Fitnesse combo and that we would continue doing. But for our REST service testing our first choice was SoapUI. We prototyped a bit and decided we could accomplish what we wanted with it so we started building our tests. This was the single biggest mistake we made with our continuous delivery process.

Back in the days when we just did functional tests for our deliveries in Fitnesse we had a nice process of test driven development. Our developers activly worked on developing testcases and fixtures and the tests went green as the code hit done. I really like it when it's functional test driven developement and think this is the best form of tdd. This went totally down the drain when we started using SoapUI for our component tests.

Our developers refused to touch SoapUI and started handing over functionality to testers for testing after they had coded it. This resulted in total chaos as we got a lot of untested checkins. Backloaded testing works extremely bad with a continuous delivery process. Especially since we didn't use feature flags.

This put us in an interesting dilemma. Do we choose a test tool that testers feel comfortable working with or do we choose a tool that developers like? I personally am very unreligious when it comes to tools. If it does the job then I don't care so much. But I've always had the opinion that test tools are for testers and its up to them, devs need to suck it up and contribute. Testers always seem to like clicky tools so I wasn't suprised that they wanted to use SoapUi.

We where sitting with a broken test process and our devs and testers no longer working together. Fortunately our testers came to the same conclusion and realized this tool was a dead end for us. The biggest killer was how bad the tool was suited for teamwork and versioning, even enterprise edition. Each and every checkin caused problems and you basically need to checkin all your suites for each reference you change. Horrible.

So after wasting nearly 3-4 months, growing our test dept and killing our dev process on SoapUI we decided to switch to RESTassured. For some of our components this is a definite improvement and its definitely a improvement process wise as our developers are happy to get involved. But I do still see some posible issues on our horizon with this choice. Though the biggest change is for our testers and how we as an organization view the tester role and that will be the topic of my next post.

One very nice thing though, our continuous delivery process is maven based so the change of tooling didn't affect it. Each test is triggered with mvn verify and as long as the tool has a maven plugins don't really care what tool they use for their components.

Tuesday, December 4, 2012

Automated System Testing

As I described earlier we decided that we wanted to prioritize automation of our functional system testing. The trickiest thing for us was finding the right abstraction level for theses tests. In precious projects where we have been working with Fitnesse as a test tool we have failed to find a good abstraction level.

We have always ended up with tests that are super technical in nature and cluttered with details that are very implementation specific. The problem this has given us is that very few people understand the test cases. Especially testers tend to have problems understanding them as they are so technical. If our testers can't work with our tests then we do have some sort of problem (quite a few and I will get to them later).

This time around we set out to increase the level of abstraction in the tests to make them more aimed towards testing requirements and less towards testing technology. We where hoping to achieve two thins, less maintainance and more readability. Maintainance budget for our automated tests had been high in the past. The technical tests exposed too much implementation and hence most refactoring resulted in test rewrite. This was shit because our tests didn't help to secure our refactoring. This was something we had to address as well.

The level we went for was something like this. (In pseudo fitnesse)

|Register user with first name|Joe|last name|Doe|and email|jd@test.mail.xx|
|verify registration email|

We abstracted out the interfaces from the test cases and just loosely verified that things went right.

This registration test could be testing registration through the web portal or directly on the REST Service. This did lower the maintenance drastically. It did give us room to refactor or application without affecting the test cases. Though we where still having trouble getting our testers to write test cases that worked and made seance. The example above is obviously simplified and out of context, our test cases where quite long and complex. The biggest problem was what fixture method to use and how to design new ones with right abstraction level.

This exposed the need for something we really hadn't thought much about before, test architecture. We where really (and still are) lacking a test architect. Defining abstraction layers, defining test components, reusability of testcomponents, handling of test data, ect. All these become dramatically wrong when done by testers alone and not good enough when done by developers either.

Another problem we ran into with this abstraction level was that our testers didn't know our interfaces. The test doesnt really care about the interface so the tester doesn't need to take responsibility for the REST interface. Imho the responsibility of securing the quality of a public REST interface should lay with the testers. But here we give them tools to only secure registration functionality. Yes they are verifying the requirement to be able to register but the verification of the REST interface is implicit and left to the fixture implementation. This is not good.

It also generated a huge problem for us. We had integration tests with our partners that consume our REST interfaces and our testers who where in the sessions dint know our interfaces. When should they have learned them this was the first time they where exposed to them. They also where required to use tooling that they dint use other wise to test, REST clients of what ever flavor.

We had solved one thing but created another problem.

Had we abstracted too much? Was this the wrong level to test on?

Before we analyzed this we started to compensate for these issues by just jamming in more validations.

!define METHOD {POST}
|Register user by sending a |REST_REQUEST|${METHOD}| with first name|Joe|last name|Doe|and email|jd@test.mail.xx|
|RESPONSE_CODE|200|
|verify registration email|

Still a fictive example but Im trying to just illustrate how it went downhill. So we basically broke our intended abstraction layer and made our tests into shit. But it got worse. Our original intent was that registration is registration regardless of channel, web portal or rest. But then we had to have different templates for different users. Say that it was based on gender. So we took our web portal test case and made that register a female user and verify the template by string matching and then we used the REST interface for the guys.

Awesome now we made sure that we got a HTTP 200 on our REST response, we made sure that we used pink templates for the guys and green ones for the gals. Awesome and we made sure to test our web interface and our rest interface. Sweeet!!! We had covered everything!

Well maybe not.

This is when we started to think again and our conclusion was that the abstraction layer of the initial tests was actually quite ok.

|Register user with first name|Joe|last name|Doe|and email|jd@test.mail.xx|
|verify registration email|

This tests a requirement. Its a functional test. It does have a value. Leave it at that! But it cant be our only test case to handle registration.

Coming back to this picture

Our most prioritized original need was to have regression test on functional system, end to end of our delivery. We had that. We also had unit tests on the important mechanisms of the system, not on everything but and yest too little but on the key parts. Still we where lacking something and that something was component/subsystem end to end tests. This was something we deep down always knew that we would have to make but we had ignored it for quite some time due to "other prioritize". I still think we made the right call back then and there when we prioritized but this has been the most costly call of our entire journey.

So we started two tasks. One retro fitting Component Tests and cleaning up our System Tests. More on how that went and how we decided to abstract these in the next post.

About Me

I´ve worked as a Java Developer/Architect for 15 years. I´ve worked as part of a consulting organization and as part of a line organization.

Over the last 6 years I´ve had an ever increasing interest in the quality of the delivery. Initially this interest lead me to work with automation of system tests. Then more and more towards automation release and deploy processes. Now for the last two years Ive focused alot of my work on the full Continuous Delivery process.

This blog will server as a collections of lessons learned from my work. Mostly just for my self but Im happy to share my experiences if anyone is interested.

Follow @TomasRihaSE

Pages