We have always had the need to understand our application runtime costs. In legacy enterprises this has often been done at a finance level and someone in the Runtime Operations department has been responsible for that cost. There have been different ways of distributing the costs to the right organization. Either each application has had its own infrastructure and own the full cost of it or applications have shared infrastructure in order to save money, utilize large servers in a better way or just to minimize setup and maintenance work of the servers.
In many legacy enterprises there has been a huge disconnect between the rollout the application evolution and the infrastructure evolution. Application development has changed towards micro services with a lot of small applications with lifecycles of their own. The legacy enterprises runtime operations are largely unable to delivery on these changes due to processes, licenses and hardware leases of huge "enterprise scale servers" for the old monoliths. Requests of several servers per micro services have been waved off as unrealistic.
Several ways around this have been invented in order to cohost multiple applications on the same servers in a somewhat isolated way. Docker draws so much attention partially because legacy enterprises can still have their big old servers and isolate the micro services on them. (Of course not only good thing with Docker).
Regardless of how this was solved the understanding of runtime cost becomes exponentially harder with micro service in legacy enterprise infrastructure.
When transitioning to DevOps and Micro Services each team is responsible for delivering its services end to end in all environments with the right functionality, performance and imho to the right cost. So what is the right cost? Before even answering that question let’s start with what is the cost. The DevOps team needs to know the cost of its services. In the legacy enterprise that cost might at best case arrive as some kind of cost spit formula calculated by a finance guy based on how many micro services shared the servers, a cost split formula of any license costs and a cost split formula of the man hours to maintain the servers. At best a team would get this information once a month.
Part of the reason we want DevOps is so that the team has full competence and ability to improve its services. This needs to include the runtime costs of the services. A performance optimization of a service that cuts resource requirements by 25% should result in lower runtime costs of the service. In the legacy world of cost splits and financial formals to calculate cost this just doesn’t happen.
Thankfully we have cloud providers. Not only do we get the right resources handle our services when we need them but we get billed real money for them.
With auto scaling, elasticity and all other nice features that we get there is also a risk. We get away with writing increasingly bad applications. Bad performance? Doesn’t matter as long as it scales horizontally. We can IO block to ourselves hell and back as long as we scale horizontally and we do get away with it. Well as long as we don’t have to take responsibility for the cost of our service. This is why it’s so important for the DevOps team to take the responsibility of the runtime cost of the service.
For a few years I have had a dream to be able to stamp a cost runtime cost to a load test and correlate the performance measured in the test with the cost of the runtime environment to run the test. We are still not there with our applications since we do run on AWS Auto Scaling Groups and AWS bills us per started hour. This makes the billing data to blunt too give the correlation between performance and cost from a shorter test. With a Micro Service architecture that uses AWS Services such as Lambda, DynamoDB and Kinesis this would actually be achievable today.
With this vision in mind we have been able to integrate the runtime costs of our Micro Services in AWS with our Continuous Delivery as a Service implementation (Delivery Engine) in another way. For us DevOps Teams are the owners of services and Solutions are the drivers of the cost. So everything that we launch in AWS we tag with name of micro service, owner team name and cost driving solution.
So this allows our DevOps Teams to see the cost of their Services.
Here we provide a graph of total costs for the services owned by a DevOps team. The services are listed in a top list below the graph and a long running trend of the cost for each component. This allows the Product Owners and Team Members to act in increasing costs. Links are provided to the Dashboard Page of each service.
This Service Dashboard provides the version history of each version built in delivery engine. Our Delivery Engine builds, black box tests the deployed service, load tests it, bakes AWS AMIs for it and launches it into the right environments. Here on the dashboard we visualize the total cost of the service grouped by environment (delivery engine, exploratory testing, qa and prod environments).
From the same dashboard we visualize service usage. In the below example it’s a visualization on cpu consumption across an auto scaling group.
This way we allow the team to ensure that they have the right scaling rules for its services and that they have the right amount of resources in each environment.
By visualizing the cost of the runtime environment for each service and combining it with Continuous Delivery, Continuous Performance Testing and DevOps we allow our developers to constantly tweak and improve the performance and optimize the cost of our runtime environments. The lead time in the cost reporting still is at the "next day level" as we report the cost per day and per month but I still think this is a reasonable feedback loop when it comes to cost optimizations.