We have always had the need to understand our application runtime
costs. In legacy enterprises this has often been done at a finance level and
someone in the Runtime Operations department has been responsible for that
cost. There have been different ways of distributing the costs to the right
organization. Either each application has had its own infrastructure and own
the full cost of it or applications have shared infrastructure in order to save
money, utilize large servers in a better way or just to minimize setup and maintenance
work of the servers.
In many legacy enterprises there has been
a huge disconnect between the rollout the application evolution and the
infrastructure evolution. Application development has changed towards micro
services with a lot of small applications with lifecycles of their own. The
legacy enterprises runtime operations are largely unable to delivery on these
changes due to processes, licenses and hardware leases of huge "enterprise
scale servers" for the old monoliths. Requests of several servers per
micro services have been waved off as unrealistic.
Several ways around this have been
invented in order to cohost multiple applications on the same servers in a
somewhat isolated way. Docker draws so much attention partially because legacy enterprises
can still have their big old servers and isolate the micro services on them.
(Of course not only good thing with Docker).
Regardless of how this was solved the
understanding of runtime cost becomes exponentially harder with micro service
in legacy enterprise infrastructure.
When transitioning to DevOps and Micro
Services each team is responsible for delivering its services end to end in all
environments with the right functionality, performance and imho to the right
cost. So what is the right cost? Before even answering that question let’s
start with what is the cost. The DevOps team needs to know the cost of its
services. In the legacy enterprise that cost might at best case arrive as some
kind of cost spit formula calculated by a finance guy based on how many micro
services shared the servers, a cost split formula of any license costs and a
cost split formula of the man hours to maintain the servers. At best a team
would get this information once a month.
Part of the reason we want DevOps is so
that the team has full competence and ability to improve its services. This
needs to include the runtime costs of the services. A performance optimization
of a service that cuts resource requirements by 25% should result in lower
runtime costs of the service. In the legacy world of cost splits and financial formals
to calculate cost this just doesn’t happen.
Thankfully we have cloud providers. Not
only do we get the right resources handle our services when we need them but we
get billed real money for them.
With auto scaling, elasticity and all
other nice features that we get there is also a risk. We get away with writing
increasingly bad applications. Bad performance? Doesn’t matter as long as it
scales horizontally. We can IO block to ourselves hell and back as long as we
scale horizontally and we do get away with it. Well as long as we don’t have to
take responsibility for the cost of our service. This is why it’s so important
for the DevOps team to take the responsibility of the runtime cost of the
service.
For a few years I have had a dream to be
able to stamp a cost runtime cost to a load test and correlate the performance
measured in the test with the cost of the runtime environment to run the test.
We are still not there with our applications since we do run on AWS Auto
Scaling Groups and AWS bills us per started hour. This makes the billing data
to blunt too give the correlation between performance and cost from a shorter
test. With a Micro Service architecture that uses AWS Services such as Lambda,
DynamoDB and Kinesis this would actually be achievable today.
With this vision in mind we have been able
to integrate the runtime costs of our Micro Services in AWS with our Continuous
Delivery as a Service implementation (Delivery Engine) in another way. For us
DevOps Teams are the owners of services and Solutions are the drivers of the
cost. So everything that we launch in AWS we tag with name of micro service,
owner team name and cost driving solution.
So this allows our DevOps Teams to see the
cost of their Services.
Here we provide a graph of total costs for
the services owned by a DevOps team. The services are listed in a top list below
the graph and a long running trend of the cost for each component. This allows
the Product Owners and Team Members to act in increasing costs. Links are
provided to the Dashboard Page of each service.
This Service Dashboard provides the
version history of each version built in delivery engine. Our Delivery Engine
builds, black box tests the deployed service, load tests it, bakes AWS AMIs for
it and launches it into the right environments. Here on the dashboard we
visualize the total cost of the service grouped by environment (delivery
engine, exploratory testing, qa and prod environments).
From the same dashboard we visualize
service usage. In the below example it’s a visualization on cpu consumption
across an auto scaling group.
This way we allow the team to ensure that
they have the right scaling rules for its services and that they have the right
amount of resources in each environment.
By visualizing the cost of the runtime
environment for each service and combining it with Continuous Delivery,
Continuous Performance Testing and DevOps we allow our developers to constantly
tweak and improve the performance and optimize the cost of our runtime
environments. The lead time in the cost reporting still is at the "next
day level" as we report the cost per day and per month but I still think
this is a reasonable feedback loop when it comes to cost optimizations.