Sunday, February 24, 2013

Architect to re-Architect

We spend so much time trying to make the right decisions. It's one of the downsides of working on a next generation platform. "You better get it right this time!". We have all been there when a current generation solution just doesn't cut it anymore. Implementing that next requirement is going to be so expensive that we might just as well rewrite the whole thing. Thing is they also tried to "get it right this time!".

Why does it "always" go wrong? Why do we always run into dead ends with systems. Sure not always but always when an application is exposed to a lot of changes and new requirements.

Select technology then abstract and isolate it in the architecture.

Historically we have put a lot of thought into selection of technology when we build something new. Its important to not get it wrong so we think a lot about getting it right. We also think a lot about patterns so that we can replace tech A with tech B if the decision has to be reversed. Who hasn't written hundreds of DAOs so that we one day can change our database. How often do we change database? Historically well I have never done it. Change from Oracle to DB2 or what ever other SQL database has never been the reason for a major rewrite. In fact I've been part of more then one rewrite that has thown out everything but the data layer.

In the future we will see more database changes due NoSQL but if and when we do that do we really want to keep our DAO interfaces? If we do then we sure ant going to accomplish much with our rewrite. If we change then we change because we need to solve a bottleneck problem. In order to solve it we need to make an optimization using a niche product. So we need to write and query our data differently.

The cause of a major rewrite is either lack of scale ability or customer requirements that are to hard to expensive or too high risk to implement. The later almost always happens when everything has become so interconnected that the change can no longer be done in a safe and isolated way. We need to refactor so much I order to make the change possible that its cheaper to rewrite.

Distribute system can still be a monolith.

In standard monolithic design we monolithized everything not just the components of the system but also the data model and the business logic. By normalizing our data model and constantly striving towards decreasing code redundancy we entangle all the services of our application into a huge ball of concert. It's when we end up with our services entangled in a solid ball of concert that we need to blow it up, all of it in order to rewrite it. It doesn't matter how well we modeled our database, how nice our DAOs are or how much inversion of control we use. If we don't treat our services independently we will run into trouble down the road.

Decoupling the monolith into subsystems doesn't necessarily help either. If we still normalize our data and strive towards reusing as much code as possible within the components then all we have done is distributed the monolith. Chances are quite high that you will need to rewrite multiple components when the requirement change appears.

Lets take an example.

We have a training application aimed towards running and cycling. We have users, training sessions and races. Training sessions and races are the same thing really they both contain a number of users, equipment, time, distance and a route. We provide views of user training sessions, user races and race results by race. We sell the application to race organizers and its free to users. We have an agreement to keep the race results highly available and to keep all history of previous years.

So we have a simple data model with users and sessions with a many to many relationship and a type defining if its a race or a training session. Simple. Done. Delivered.

Now the application becomes really popular as a training application among users so we start gaining a lot of data. This data is mostly written since no one else then the user really cares about it. Though it does impact on our race data since people tend to look at that more.

Someone realizes that all the training data is interesting since we also added a heart rate integration. So we build queries on the training data to provide to medical studies. Sweet extra income that he sales dudes came up with. It's no real issue performance wise as we run them once a year and that's done over Christmas.

Now someone sells our services of race data, training and fitness trending to UCI (cycling union) as a tool for their fit against doping. We just need to add a query to correlate our sweet training reports with race results, how hard can that be. We add the develop for a sprint or two and go live. So now we get serious tonnage of data and we run our queries more often. *gag* it doesn't work we can't scale and we can't add e new query without totally killing our SLAs with the other races. We need to rewrite.

Components are not the silver bullet.

Components dont really help us
Having our system distributed into a user repository, session storage and a integration component providing rest services to our GUI component wouldn't help us all at much. Sure we have separated users and their equipment from the sessions but its the queries on the sessions that is the problem and that they are killing our SLAs with the other race organizers.

Design by Services 

So what we really need is to move the race result service into a service of its own. We need to isolate it. Even though all the data is identical to the race data by the user. Then we need to separate the integration code for the race organizer service into a service of its own so that we can deploy it separately.

Services do help us
Doing this when hitting then wall is both hard, costly and risky. Just the database split is a nightmare if the data has grown big.

If we would have done this from the get go we could just have re architected the user race and training session service. We could have moved that from our MySQL to a big table database or what ever without affecting our race by organizer service. But doing this upfront feels so awkward we would have had duplicate tables and redundant code.

Define and isolate services in the architecture.

If we focus on isolating services across our components instead of isolating technology then we can actually re-architecture our bottlenecks. In fact in our example we could just added a uci services that duplicates the other services and if it would run into performance issues we could just  re-architectured it. But that would have forced us to duplicate more upfront and to increase our initial development costs.
Services can be extremely similar and
yet be different services

It's hard to "get it right" when the right can be against everything you have been thought for years. What we must learn to understand better is how we define and isolate services so that we can re-architecture our bottlenecks for the services that experience them and not the entire system.

No comments:

Post a Comment