Last week I attended QCon New York. Fantastic conference as usual and it was comforting to see that basically everyone was saying the same thing. "Continuous Delivery is not about the technology, its about the people". Which also happens to be the title of my talk at Netlight´s EDGE conference in september,
In his talk Steve Smith (@agilestevesmith) talked about how 5% is technology and 95% is organization. While I agree with that I think that the non-technical 95% can be divided into organization, change of role definitions and individual maturity. Its these three that my talk will cover.
Hopefully I will be able to have this talk in Gothenburg as well as its been submitted to JDays.
A blog sharing experience of working with Continuous Delivery, Test Driven Development, Architecture and Agile Methodologies.
Tuesday, June 18, 2013
Monday, April 8, 2013
Talk at HiQ 24th of April
Continuous Delivery - Enabling Agile.
The key to agile development is a fast feedback loop. Continuous Delivery strives towards always having tested releases in deliverable state. Continuous Delivery is not just a technical process but a change to the entire organization and the individuals within it. This presentation describes the principles of Continuous Delivery, a brief overview on how it can be implemented, how it changes the organization and how it impacts the individuals.
Target audience for this presentation is Developers, Architects, Testers, Scrum Masters, Project Managers and Product Owners in no particular order. The presentation is not rich in technical detail and based on real life experiences.
Please use this post to provide questions and feedback.
Welcome
Sunday, February 24, 2013
Architect to re-Architect
We spend so much time trying to make the right decisions. It's one of the downsides of working on a next generation platform. "You better get it right this time!". We have all been there when a current generation solution just doesn't cut it anymore. Implementing that next requirement is going to be so expensive that we might just as well rewrite the whole thing. Thing is they also tried to "get it right this time!".
Why does it "always" go wrong? Why do we always run into dead ends with systems. Sure not always but always when an application is exposed to a lot of changes and new requirements.
Select technology then abstract and isolate it in the architecture.
Historically we have put a lot of thought into selection of technology when we build something new. Its important to not get it wrong so we think a lot about getting it right. We also think a lot about patterns so that we can replace tech A with tech B if the decision has to be reversed. Who hasn't written hundreds of DAOs so that we one day can change our database. How often do we change database? Historically well I have never done it. Change from Oracle to DB2 or what ever other SQL database has never been the reason for a major rewrite. In fact I've been part of more then one rewrite that has thown out everything but the data layer.
In the future we will see more database changes due NoSQL but if and when we do that do we really want to keep our DAO interfaces? If we do then we sure ant going to accomplish much with our rewrite. If we change then we change because we need to solve a bottleneck problem. In order to solve it we need to make an optimization using a niche product. So we need to write and query our data differently.
The cause of a major rewrite is either lack of scale ability or customer requirements that are to hard to expensive or too high risk to implement. The later almost always happens when everything has become so interconnected that the change can no longer be done in a safe and isolated way. We need to refactor so much I order to make the change possible that its cheaper to rewrite.
Distribute system can still be a monolith.
In standard monolithic design we monolithized everything not just the components of the system but also the data model and the business logic. By normalizing our data model and constantly striving towards decreasing code redundancy we entangle all the services of our application into a huge ball of concert. It's when we end up with our services entangled in a solid ball of concert that we need to blow it up, all of it in order to rewrite it. It doesn't matter how well we modeled our database, how nice our DAOs are or how much inversion of control we use. If we don't treat our services independently we will run into trouble down the road.
Decoupling the monolith into subsystems doesn't necessarily help either. If we still normalize our data and strive towards reusing as much code as possible within the components then all we have done is distributed the monolith. Chances are quite high that you will need to rewrite multiple components when the requirement change appears.
Lets take an example.
We have a training application aimed towards running and cycling. We have users, training sessions and races. Training sessions and races are the same thing really they both contain a number of users, equipment, time, distance and a route. We provide views of user training sessions, user races and race results by race. We sell the application to race organizers and its free to users. We have an agreement to keep the race results highly available and to keep all history of previous years.
So we have a simple data model with users and sessions with a many to many relationship and a type defining if its a race or a training session. Simple. Done. Delivered.
Now the application becomes really popular as a training application among users so we start gaining a lot of data. This data is mostly written since no one else then the user really cares about it. Though it does impact on our race data since people tend to look at that more.
Someone realizes that all the training data is interesting since we also added a heart rate integration. So we build queries on the training data to provide to medical studies. Sweet extra income that he sales dudes came up with. It's no real issue performance wise as we run them once a year and that's done over Christmas.
Now someone sells our services of race data, training and fitness trending to UCI (cycling union) as a tool for their fit against doping. We just need to add a query to correlate our sweet training reports with race results, how hard can that be. We add the develop for a sprint or two and go live. So now we get serious tonnage of data and we run our queries more often. *gag* it doesn't work we can't scale and we can't add e new query without totally killing our SLAs with the other races. We need to rewrite.
Components are not the silver bullet.
Having our system distributed into a user repository, session storage and a integration component providing rest services to our GUI component wouldn't help us all at much. Sure we have separated users and their equipment from the sessions but its the queries on the sessions that is the problem and that they are killing our SLAs with the other race organizers.
Design by Services
So what we really need is to move the race result service into a service of its own. We need to isolate it. Even though all the data is identical to the race data by the user. Then we need to separate the integration code for the race organizer service into a service of its own so that we can deploy it separately.
Doing this when hitting then wall is both hard, costly and risky. Just the database split is a nightmare if the data has grown big.
If we would have done this from the get go we could just have re architected the user race and training session service. We could have moved that from our MySQL to a big table database or what ever without affecting our race by organizer service. But doing this upfront feels so awkward we would have had duplicate tables and redundant code.
Define and isolate services in the architecture.
If we focus on isolating services across our components instead of isolating technology then we can actually re-architecture our bottlenecks. In fact in our example we could just added a uci services that duplicates the other services and if it would run into performance issues we could just re-architectured it. But that would have forced us to duplicate more upfront and to increase our initial development costs.
It's hard to "get it right" when the right can be against everything you have been thought for years. What we must learn to understand better is how we define and isolate services so that we can re-architecture our bottlenecks for the services that experience them and not the entire system.
Why does it "always" go wrong? Why do we always run into dead ends with systems. Sure not always but always when an application is exposed to a lot of changes and new requirements.
Select technology then abstract and isolate it in the architecture.
Historically we have put a lot of thought into selection of technology when we build something new. Its important to not get it wrong so we think a lot about getting it right. We also think a lot about patterns so that we can replace tech A with tech B if the decision has to be reversed. Who hasn't written hundreds of DAOs so that we one day can change our database. How often do we change database? Historically well I have never done it. Change from Oracle to DB2 or what ever other SQL database has never been the reason for a major rewrite. In fact I've been part of more then one rewrite that has thown out everything but the data layer.
In the future we will see more database changes due NoSQL but if and when we do that do we really want to keep our DAO interfaces? If we do then we sure ant going to accomplish much with our rewrite. If we change then we change because we need to solve a bottleneck problem. In order to solve it we need to make an optimization using a niche product. So we need to write and query our data differently.
The cause of a major rewrite is either lack of scale ability or customer requirements that are to hard to expensive or too high risk to implement. The later almost always happens when everything has become so interconnected that the change can no longer be done in a safe and isolated way. We need to refactor so much I order to make the change possible that its cheaper to rewrite.
Distribute system can still be a monolith.
In standard monolithic design we monolithized everything not just the components of the system but also the data model and the business logic. By normalizing our data model and constantly striving towards decreasing code redundancy we entangle all the services of our application into a huge ball of concert. It's when we end up with our services entangled in a solid ball of concert that we need to blow it up, all of it in order to rewrite it. It doesn't matter how well we modeled our database, how nice our DAOs are or how much inversion of control we use. If we don't treat our services independently we will run into trouble down the road.
Decoupling the monolith into subsystems doesn't necessarily help either. If we still normalize our data and strive towards reusing as much code as possible within the components then all we have done is distributed the monolith. Chances are quite high that you will need to rewrite multiple components when the requirement change appears.
Lets take an example.
So we have a simple data model with users and sessions with a many to many relationship and a type defining if its a race or a training session. Simple. Done. Delivered.
Now the application becomes really popular as a training application among users so we start gaining a lot of data. This data is mostly written since no one else then the user really cares about it. Though it does impact on our race data since people tend to look at that more.
Someone realizes that all the training data is interesting since we also added a heart rate integration. So we build queries on the training data to provide to medical studies. Sweet extra income that he sales dudes came up with. It's no real issue performance wise as we run them once a year and that's done over Christmas.
Now someone sells our services of race data, training and fitness trending to UCI (cycling union) as a tool for their fit against doping. We just need to add a query to correlate our sweet training reports with race results, how hard can that be. We add the develop for a sprint or two and go live. So now we get serious tonnage of data and we run our queries more often. *gag* it doesn't work we can't scale and we can't add e new query without totally killing our SLAs with the other races. We need to rewrite.
Components are not the silver bullet.
Components dont really help us |
Design by Services
So what we really need is to move the race result service into a service of its own. We need to isolate it. Even though all the data is identical to the race data by the user. Then we need to separate the integration code for the race organizer service into a service of its own so that we can deploy it separately.
Services do help us |
If we would have done this from the get go we could just have re architected the user race and training session service. We could have moved that from our MySQL to a big table database or what ever without affecting our race by organizer service. But doing this upfront feels so awkward we would have had duplicate tables and redundant code.
Define and isolate services in the architecture.
If we focus on isolating services across our components instead of isolating technology then we can actually re-architecture our bottlenecks. In fact in our example we could just added a uci services that duplicates the other services and if it would run into performance issues we could just re-architectured it. But that would have forced us to duplicate more upfront and to increase our initial development costs.
Services can be extremely similar and yet be different services |
It's hard to "get it right" when the right can be against everything you have been thought for years. What we must learn to understand better is how we define and isolate services so that we can re-architecture our bottlenecks for the services that experience them and not the entire system.
So it took a year.
When we first started building our continuous delivery pipe I had no idea that the biggest challenges would be non technical. Well I did expect that we would run into a lot of dev vs ops related issues and that the rest would be just technical issues. I was so naive.
We seriously underestimated how continuous delivery changes the every day work of each individual involved in the delivery of a software service. It affects everyone Developer, Tester, PM, CM, DBA and Operations professionals. Really it shouldn't be a big shocker since it changes the process of how we deliver software. So yes everyone gets affected.
The transition for our developers took about a year. Just over a year ago we scaled up our development and added give or take 15-20 developers. All these developers have been of a very high quality and very responsible individuals. Though none of them had worked in a continuous delivery process before and all where more or less new to our business domain.
When introducing them everyone got the run down of the continuous delivery process, how it works, why we have it and that they need to make sure to check in quality code. So off you go make code, check in tested stuff and if something still breaks you fix it. How hard can it be?
Much much harder then we thought. As I said all our developers are very responsible individuals. Still it was a change for them. What once was considered responsible like if it "compiles and unit tests check it in so that it doesn't get lost" leads broken builds. Doing this before leaving early on Friday becomes a huge issue because others have to fix the build pipe. But it goes for a lot of things like having to ensure that database scripts work all the time, everything with the database is versioned, roll backs work, ect, ect. So everyone has had to step up their game a notch or two.
Continuous delivery really forces the developer to test much more before he/she checks in the code. Even for the developers that like to work test driven with their junit tests this is a step up. For many its a change of behavior. Changing a behavior that has become second nature doesnt happen over night.
We had a few highly responsible developers that took on this change seamlessly. These individuals had to carry a huge load during this first year. When responsibility was dropped by one individual it was these who always ensured that the pipe was green. This has been the biggest source of frustration. I get angry, frustrated and mad when the lack of responsibility by one individual affects another individual. They get angry and frustrated as well because they don't want to lave it in a bad state and their responsibility prevents them from going home to their families. I'm so happy that we didn't loose any of these individuals during this period.
Now after about a year things have actually changed everyone takes much more responsibility and fixing the build pipe is much more of a shared effort. Which is soo nice. But why did it take such a long time? Id really like to figure out if this transition could have been made smoother and faster.
Key things why it took so much time.
A change to behavior.
Developers need to test much more, not just now and then but all the time. No matter how much you talk about "test before check in" , "test", "test", "test" the day the feature pressure increases a developer will fall back on second nature behavior and check in what he/she believes is done. We can talk lean, kanban, queues, push and pull all we want but fact is still there will always be situations of stress. Its not before a behavior change has become second nature we do it under pressure.
Immature process.
Visibility, portability and scale ability issues have made it hard to take responsibility. Knowing when, where and how to take responsibility is super important. Realizing that lack of responsibility is tied to these took us quite some time to figure out. If its hard to debug a testcase its going to a lot of time to figure out why things are failing and its going to require more senior developers to figure it out. Its also hard to be proactive with testing if the portability between development environment and test environment is bad.
Lot of new things at once
When you tell a developer about a new system, domain and a new process Im quite sure the developer will always listen more to the system and domain specific talks.
Developer has head full of this system communicates with that system and its that type of interface. Then I start going on about "Jira, bla bla bla, test bla, checkin bla bla, Jenkins bla, deploy, bla, fitnesse, test bla, bla" and developer goes "Yeah yeah yeah Ill check in and it gets tested I hear you, sweet!".
I defiantly think its much easier for a developer to make the transition if the process is more mature, has optimized feedback loops, scales and is portable. Honestly I think its easily going to take 3-6 months of the learning curve. But its still going to take a lot of time in range of months if we don´t become better at understanding behavioral changes.
Today we go straight from intro session (slides or whiteboard) to live scenario in one step. Here is the info now go and use it. At least now we are becoming better at mentoring. So there is help to get so that you can be talked through the process and the new developer is usually not working alone, which they where a year ago. Still I dont think its enough.
Continuous Delivery Training Dojos
I think we really need to start thinking about having training dojos where we learn the process from start to finish. I also think this is extremely important when transitioning to acceptance test driven development. But just for the reason of getting a feeling for the process. What is tested where, how and what happens when I change this and that. How should I test things before comiting and what should be done in which order.
I think if we practiced this and worked on how to break and unbreak the process in a non live scenario the transition would go much faster. In fact I dont think these dojos should be just to train new team members but they would also be a extremely effective way of sharing information and consequences of process change over time.
We seriously underestimated how continuous delivery changes the every day work of each individual involved in the delivery of a software service. It affects everyone Developer, Tester, PM, CM, DBA and Operations professionals. Really it shouldn't be a big shocker since it changes the process of how we deliver software. So yes everyone gets affected.
The transition for our developers took about a year. Just over a year ago we scaled up our development and added give or take 15-20 developers. All these developers have been of a very high quality and very responsible individuals. Though none of them had worked in a continuous delivery process before and all where more or less new to our business domain.
When introducing them everyone got the run down of the continuous delivery process, how it works, why we have it and that they need to make sure to check in quality code. So off you go make code, check in tested stuff and if something still breaks you fix it. How hard can it be?
Much much harder then we thought. As I said all our developers are very responsible individuals. Still it was a change for them. What once was considered responsible like if it "compiles and unit tests check it in so that it doesn't get lost" leads broken builds. Doing this before leaving early on Friday becomes a huge issue because others have to fix the build pipe. But it goes for a lot of things like having to ensure that database scripts work all the time, everything with the database is versioned, roll backs work, ect, ect. So everyone has had to step up their game a notch or two.
Continuous delivery really forces the developer to test much more before he/she checks in the code. Even for the developers that like to work test driven with their junit tests this is a step up. For many its a change of behavior. Changing a behavior that has become second nature doesnt happen over night.
We had a few highly responsible developers that took on this change seamlessly. These individuals had to carry a huge load during this first year. When responsibility was dropped by one individual it was these who always ensured that the pipe was green. This has been the biggest source of frustration. I get angry, frustrated and mad when the lack of responsibility by one individual affects another individual. They get angry and frustrated as well because they don't want to lave it in a bad state and their responsibility prevents them from going home to their families. I'm so happy that we didn't loose any of these individuals during this period.
Now after about a year things have actually changed everyone takes much more responsibility and fixing the build pipe is much more of a shared effort. Which is soo nice. But why did it take such a long time? Id really like to figure out if this transition could have been made smoother and faster.
Key things why it took so much time.
A change to behavior.
Developers need to test much more, not just now and then but all the time. No matter how much you talk about "test before check in" , "test", "test", "test" the day the feature pressure increases a developer will fall back on second nature behavior and check in what he/she believes is done. We can talk lean, kanban, queues, push and pull all we want but fact is still there will always be situations of stress. Its not before a behavior change has become second nature we do it under pressure.
Immature process.
Visibility, portability and scale ability issues have made it hard to take responsibility. Knowing when, where and how to take responsibility is super important. Realizing that lack of responsibility is tied to these took us quite some time to figure out. If its hard to debug a testcase its going to a lot of time to figure out why things are failing and its going to require more senior developers to figure it out. Its also hard to be proactive with testing if the portability between development environment and test environment is bad.
Lot of new things at once
When you tell a developer about a new system, domain and a new process Im quite sure the developer will always listen more to the system and domain specific talks.
Developer has head full of this system communicates with that system and its that type of interface. Then I start going on about "Jira, bla bla bla, test bla, checkin bla bla, Jenkins bla, deploy, bla, fitnesse, test bla, bla" and developer goes "Yeah yeah yeah Ill check in and it gets tested I hear you, sweet!".
I defiantly think its much easier for a developer to make the transition if the process is more mature, has optimized feedback loops, scales and is portable. Honestly I think its easily going to take 3-6 months of the learning curve. But its still going to take a lot of time in range of months if we don´t become better at understanding behavioral changes.
Today we go straight from intro session (slides or whiteboard) to live scenario in one step. Here is the info now go and use it. At least now we are becoming better at mentoring. So there is help to get so that you can be talked through the process and the new developer is usually not working alone, which they where a year ago. Still I dont think its enough.
Continuous Delivery Training Dojos
I think we really need to start thinking about having training dojos where we learn the process from start to finish. I also think this is extremely important when transitioning to acceptance test driven development. But just for the reason of getting a feeling for the process. What is tested where, how and what happens when I change this and that. How should I test things before comiting and what should be done in which order.
I think if we practiced this and worked on how to break and unbreak the process in a non live scenario the transition would go much faster. In fact I dont think these dojos should be just to train new team members but they would also be a extremely effective way of sharing information and consequences of process change over time.
Etiketter:
Continuous Delivery,
Deployment,
Portability,
Responsiblity,
Test
Monday, February 11, 2013
Talk at ÅF Consult 2013-01-12
On Tuesday the 12 January I have a talk about Continuous Delivery at ÅF Consult, Gothenburg.
This is the agenda of the day.
This is the agenda of the day.
- Intro to Continuous Delivery
- Principles of Continuous Delivery
- Look at a Pipe
- Impact on Scrum
- Feature Driven Development
- Impact on Developers and Testers
Participants please use this post for feedback and any questions that you didn't get a chance to ask and would like me to answer.
The slides from the presentation can be found here.
The slides from the presentation can be found here.
Thursday, February 7, 2013
The world upside down.
Sometimes the world just goes upside down. I talked in a previous post about how continuous delivery and test driven development changes the role of the tester. I retract that post. Well maybe not fully but let me elaborate.
Our testers are asking HOW should we automate and our developers are asking WHAT are you trying to do.
I´ve thought that our problem has been that we haven't been able to find testers that know HOW to automate. Its not our problem. Our problem is that we are asking our testers to automate instead of asking them WHAT to test. If a tester figures out what to test then any developer can solve the how to automate with ease.
I still believe that the role of the tester will change over time and that anyone who can answer the what and how question will be the most desired team member in the future. Until then we need to have testers thinking about WHAT and developers thinking about HOW.
Frustrating how such a simple fact can stare us in the eyes for such a long time without us noticing it.
Next time a tester asks how and a developer has to question what then its time to stop the line and get everyone into the same room asap!
Our testers are asking HOW should we automate and our developers are asking WHAT are you trying to do.
I´ve thought that our problem has been that we haven't been able to find testers that know HOW to automate. Its not our problem. Our problem is that we are asking our testers to automate instead of asking them WHAT to test. If a tester figures out what to test then any developer can solve the how to automate with ease.
I still believe that the role of the tester will change over time and that anyone who can answer the what and how question will be the most desired team member in the future. Until then we need to have testers thinking about WHAT and developers thinking about HOW.
Frustrating how such a simple fact can stare us in the eyes for such a long time without us noticing it.
Next time a tester asks how and a developer has to question what then its time to stop the line and get everyone into the same room asap!
Tuesday, February 5, 2013
My dear Java what have you done?!?!
What is this diagram? Well its a generic lightweight storage component. In previous posts on this blog Ive talked about how we have refactored our main components into smaller specialized components and how fruitful that has been for us. It really has and its been one of the best architectural steps we have taken and the components are super clean and very well written. They are very well written based on our requirements and on how we as a community expect Java applications to be built. But they have really made me think, are we actually putting the right requirements on them??
Lets take an example if our system was a fitness application then we could have four of these components one for user profiles, one for assets (bikes, shoes, skis, ect), one for track routes (run, bike, ski routes) and one for training schedules. Small simple components that store and maintain well defined data sets. Then we have a service component that aggregates these data sets into a consumer service.
So say that our route track component has the responsibility of storing a set of GPS track points and classifying them as run, bike or ski. How hard can it be? Well look at the diagram.
First of all what is the value of the component? Its providing a set of track points. There could be some rules on what data is required and how missing data or irregular data is handled since GPS can be out of reach. But its still very simple receive, validate, store and retrieve. The value is the data. The request explicitly asks for data related to a user and an activity type. So why does the request go through layer upon layer of frameworks and other crap in order to even get to the data? Shouldn't we strive to keep the data as close as possible? Shouldn't we always strive to keep the value as close to the implementation as possible?
I've always been a huge fan of persistence frameworks. I worked back in the good old dot com era when everyone and his granny wanted into IT and become programmers. Ive seen these developers trying to work directly with JDBC and SQL. The mess they created and the mess much better developers created even after the dot com era has made me believe that abstracting away the database through persistence frameworks is a must. The gain on 90% of the vanilla code written by the avg Joe heavily out weighs the corner cases where you need a seasoned developers to write the native queries and mappings.
Though Im starting to believe that JPA and Hibernate has to be the worst thing that has happened to Java since EJB2. Adding a blanket over a problem doesnt make the problem go away. The false notion of not having to care about the database can have catastrophic consequences. Good developers understand this and have learned to understand the mapping between the JPA model and the DB model. They have learned how the framework works and they have learned how hql maps to sql. By trying to mitigate the bad code written by the average Joes we have created even more complexity and developers require to not just master databases but also a new framework on top of them.
So for us to get our data from our database to our REST interface we create objects that map to a model of another system. Then we write queries in a query language that is super hard to test and experiment with and that still doesn't give us feedback on compile time. These queries and mapped objects get translated into a query language of a remote system. Note we transform it in the java application to the language of the other system which means that we in our system need to have full notion and understanding of the other system. Then we take this translated query and feed it through a straw to that other system. Down in that system the query is executed in the query engine, which we have to understand in our system since what we generate maps directly onto it. Then this query engine executes on the logical data model. Which happens to be the core value of our application. The only think that does make sense is the mapping between logical and physical storage of the data, since this we don't have to care about and its actually been abstracted out pretty well.
The database engine then feeds us the data back through our straw where Hibernate maps it back to our Java Objects. Then of course we have been good students and listened to the preaching about patterns so we transform our entity objects into DTOs that we then feed through multiple layers of frameworks so that they can be written to the http response.
To quote a colleague "Im so not impressed by the data industry". I totally agree. We have really made a huge mess of things. Not in our delivery for being what it is, a delivery on the Java platform with a RDBMS its a very well written and solid application.
Am I saying that we should just throw out all the frameworks and go back to using JDBC and Servlets straight of? Well no obviously (even though I say it when Im frustrated). The problem needs to be solved at the root cause. JDBC and SQL was equally bad because it was still pushing data through a straw. Queries written in one system pushed through a straw into another system where they are execute is never ever going to be a good model. Then the fact that the data structure of the database system doesnt match the object structure of the requesting application is another huge issue.
I really think that the query engine and the logical data storage model need to become part of the application and not just mapping frameworks. Some of the NoSQL database do this but most of them still work too much like JDBC where you create your connection to the database engine and then you send it a query. Instead Id like to just query my data. I want data not a connection. The connection has to be there but it should be between the query engine and the persistent store not between me and the query engine.
Query q = Query.createQuery(TrackPoint.class);
List<TrackPoint> trackpoints = q.where("userId").equals("triha")
.and().where("date")
.between("2012-01-01", "2013-01-30");
This should be it. My object should be queried and logically stored as close to me as possible. This way we would mitigate the problems of JDBC and SQL, by removing them not coverying them up with yet another framework. This would give us a development platform easy enough for average Joes and yet not dumbed down to a level that restricts our bright minds.
We could use more of our time actually writing code that adds value rather then managing frameworks that are just a necessity and not a value. Our simple components would look something like the diagram to the right. Actually pretty simple.
How hard can it be? Obviously quite hard. The most amazing thing is that we get all this complexity to work. No wonder its super expensive to build systems.
Lets take an example if our system was a fitness application then we could have four of these components one for user profiles, one for assets (bikes, shoes, skis, ect), one for track routes (run, bike, ski routes) and one for training schedules. Small simple components that store and maintain well defined data sets. Then we have a service component that aggregates these data sets into a consumer service.
So say that our route track component has the responsibility of storing a set of GPS track points and classifying them as run, bike or ski. How hard can it be? Well look at the diagram.
First of all what is the value of the component? Its providing a set of track points. There could be some rules on what data is required and how missing data or irregular data is handled since GPS can be out of reach. But its still very simple receive, validate, store and retrieve. The value is the data. The request explicitly asks for data related to a user and an activity type. So why does the request go through layer upon layer of frameworks and other crap in order to even get to the data? Shouldn't we strive to keep the data as close as possible? Shouldn't we always strive to keep the value as close to the implementation as possible?
I've always been a huge fan of persistence frameworks. I worked back in the good old dot com era when everyone and his granny wanted into IT and become programmers. Ive seen these developers trying to work directly with JDBC and SQL. The mess they created and the mess much better developers created even after the dot com era has made me believe that abstracting away the database through persistence frameworks is a must. The gain on 90% of the vanilla code written by the avg Joe heavily out weighs the corner cases where you need a seasoned developers to write the native queries and mappings.
Though Im starting to believe that JPA and Hibernate has to be the worst thing that has happened to Java since EJB2. Adding a blanket over a problem doesnt make the problem go away. The false notion of not having to care about the database can have catastrophic consequences. Good developers understand this and have learned to understand the mapping between the JPA model and the DB model. They have learned how the framework works and they have learned how hql maps to sql. By trying to mitigate the bad code written by the average Joes we have created even more complexity and developers require to not just master databases but also a new framework on top of them.
So for us to get our data from our database to our REST interface we create objects that map to a model of another system. Then we write queries in a query language that is super hard to test and experiment with and that still doesn't give us feedback on compile time. These queries and mapped objects get translated into a query language of a remote system. Note we transform it in the java application to the language of the other system which means that we in our system need to have full notion and understanding of the other system. Then we take this translated query and feed it through a straw to that other system. Down in that system the query is executed in the query engine, which we have to understand in our system since what we generate maps directly onto it. Then this query engine executes on the logical data model. Which happens to be the core value of our application. The only think that does make sense is the mapping between logical and physical storage of the data, since this we don't have to care about and its actually been abstracted out pretty well.
The database engine then feeds us the data back through our straw where Hibernate maps it back to our Java Objects. Then of course we have been good students and listened to the preaching about patterns so we transform our entity objects into DTOs that we then feed through multiple layers of frameworks so that they can be written to the http response.
To quote a colleague "Im so not impressed by the data industry". I totally agree. We have really made a huge mess of things. Not in our delivery for being what it is, a delivery on the Java platform with a RDBMS its a very well written and solid application.
Am I saying that we should just throw out all the frameworks and go back to using JDBC and Servlets straight of? Well no obviously (even though I say it when Im frustrated). The problem needs to be solved at the root cause. JDBC and SQL was equally bad because it was still pushing data through a straw. Queries written in one system pushed through a straw into another system where they are execute is never ever going to be a good model. Then the fact that the data structure of the database system doesnt match the object structure of the requesting application is another huge issue.
I really think that the query engine and the logical data storage model need to become part of the application and not just mapping frameworks. Some of the NoSQL database do this but most of them still work too much like JDBC where you create your connection to the database engine and then you send it a query. Instead Id like to just query my data. I want data not a connection. The connection has to be there but it should be between the query engine and the persistent store not between me and the query engine.
Query q = Query.createQuery(TrackPoint.class);
List<TrackPoint> trackpoints = q.where("userId").equals("triha")
.and().where("date")
.between("2012-01-01", "2013-01-30");
This should be it. My object should be queried and logically stored as close to me as possible. This way we would mitigate the problems of JDBC and SQL, by removing them not coverying them up with yet another framework. This would give us a development platform easy enough for average Joes and yet not dumbed down to a level that restricts our bright minds.
We could use more of our time actually writing code that adds value rather then managing frameworks that are just a necessity and not a value. Our simple components would look something like the diagram to the right. Actually pretty simple.
How hard can it be? Obviously quite hard. The most amazing thing is that we get all this complexity to work. No wonder its super expensive to build systems.
Subscribe to:
Posts (Atom)