Authors: Ramakrishna Phani – Senior Project Manager @ Cloud Kinetics & Babuji – Project Manager – Business Analyst @ Cloud Kinetics
One of the big reasons and indeed a widely-marketed reason for adopting cloud technology is scalability. The idea that capacity can be changed on the fly based on varying load. Among the various options for scaling available on various public cloud service providers, auto scaling has been a much-sought after features for many application developers and business user alike.
However, in practice I see that the feature is often misunderstood in the community and is keeping a lot of companies from putting this influential tool to practice. Used correctly, this feature has the potential to deliver excellent cost control, and improve end user experience. It also helps in relieving the pressure on product owners and developers from having to guess the load, and plan the capacity in advance. To top it all, once the entire process setup gets automated there will be no server management overheads. Everything will be taken care of by auto scaling service.
So, what is auto scaling? Simply stated auto scaling is a feature that allows a computational resource to scale horizontally (scale out and in) based on the load coming to the system due to some preset conditions/rules. Great! So how do I use it?
Just state the number of compute resources you need (minimum and maximum values) and the parameter based on which, auto-scaling service should decide to launch/decommission (as may be the case). Is that it? Not really. To effectively use this feature, we need to take a closer look at what is really happening when resources are being launched and decommissioned. Auto scaling is automated, that means the deployment process for the compute resources which needs to be auto scaled should not have any manual steps. Another important factor to be considered is that once the load alleviates and the auto scaling service decommissions resources based on the preset rules/conditions, the information on those decommissioned resources is lost.
Keeping in mind these two key points about auto scaling we can concretely state the following to be pre-requisites for auto scaling:
1. Automated Deployment
2. Ensure statelessness of the compute resource
Let’s have a deep look into these prerequisites:
If there are manual steps in the application configuration to set the new servers for use, then the auto scaling feature becomes useless as the new servers, that are automatically launched will have incomplete deployments. The deployment of code, launching services (including any launch sequence for services etc.) and any other configurations necessary for the compute resource to be operational, needs to be fully automated and thoroughly tested.
Statelessness of Compute Resource:
By statelessness of resource I mean that the compute resource should not have any information or run any process/service that is unique to that resource. This commonly includes information such as session data (user or app), log data (app or server) or any other information for that matter which is not sent to the database or held in some other form of persistent storage. This data will be irrecoverably lost when the resources are automatically decommissioned. At the risk of stating the obvious, the auto scaling compute resources should not be running any back-end/cron jobs or any background task which is unique to an individual compute resource.
Additional Points to consider
Relational databases don’t scale horizontally. Hypothetically, if we launch another server with a replica of an existing database we will end up with two separate databases. The information does not flow from one to the other. Again, inevitably when one of the servers is decommissioned all the data on that database will be lost. That is just how they are designed.
Having web, app and database all in one box is not going to work. In my experience the only way to have auto scaling work effectively is to have web scaling, app scaling and database scaling separately with a catching layer for a typical 3 tier web application.
Having multiple applications installed in the same box is not at all a good technique. They all need to be broken down to individual layers and hosted in separate smaller instances. Each will have a different usage character (some will be cpu intensive, some ram intensive, app specific customer parameters etc.) under load. It is important to have appropriate auto scaling rules/conditions setup and the correct scaling parameters (CPU, Memory, Network loads or some customer parameter) to avoid common problems including frequent scale out and scale in.