Scalability in the Cloud

Imagine you own a cinema, and on the weekends, there's a huge influx of customers visiting your place. However, on weekdays, only a few customers show up. This creates a challenge in managing your workforce effectively.

One approach is to hire more full-time employees to ensure timely service during the busy weekends. But this means you'll end up paying for excess labor during the slower weekdays. Another option is to hire fewer employees, resulting in long queues and frustrated customers during peak times.

Now, let's relate this situation to digital operations. Traditional on-premises data centers are like having a fixed amount of computing capacity, similar to having a fixed number of full-time workers. If the demand for computing resources is lower than your data center capacity, you'll be paying for unused capacity. On the other hand, if demand exceeds your data center capacity, it can take weeks to scale up your infrastructure to meet the increased demand.

This is where cloud computing comes in. Cloud providers, like Amazon Web Services (AWS), offer scalable computing resources that can adjust based on demand. It's like having access to a flexible pool of part-time workers that you can hire whenever needed.

With cloud computing, you can easily scale up or down your computing resources in real time. During peak periods, such as weekends for the cinema, you can quickly increase your capacity to handle the higher demand. And during slower periods, you can scale down to save costs.

Cloud computing eliminates the need to invest upfront in fixed infrastructure and allows you to pay for what you use. It provides the agility to respond to changing demands without the delays and costs associated with scaling up traditional on-premises data centers.

Scalability involves beginning with only the resources you need and designing your architecture to automatically respond to changing demand by scaling out or in.

EC2 can be scaled either vertically or horizontally.

Vertical Scaling

Vertical Scaling is when an EC2 instance is changed to a more powerful instance. For example, turning the T2.micro instance to an M5.large instance, the latter possessing more vCPUs and memory. You can think of it as replacing your employee with a proficient one.

Vertical Scaling allows you to adjust the resources of your EC2 instances to match your application's needs, providing flexibility and scalability without the need to manage multiple instances or distribute the workload across multiple servers.

However, to perform vertical scaling, you need to stop the current instance your application is running on.

Horizontal Scaling

Horizontal Scaling is when you add more instances. To handle the workload, instead of scaling vertically to process each request faster, you add more instances to process more requests simultaneously. Think of this as hiring more employees to handle the workload, each working at a slower rate than a vertically scaled instance, but at the same rate or faster combined.

When you scale horizontally, you don't need to stop the current instance and your business operations are unscathed from the scaling.

AWS EC2 Auto Scaling

The AWS service that provides this functionality for Amazon EC2 instances is Amazon EC2 Auto Scaling.

Dynamic vs Predictive scaling

Dynamic scaling and predictive scaling are two approaches to scaling resources in Amazon EC2 based on workload requirements, but they differ in their methods of determining when and how to scale.

Dynamic Scaling responds to changing demand. Dynamic scaling typically relies on metrics like CPU utilization, network traffic, or application-specific metrics to determine when to scale up or down. It is effective for applications with unpredictable or fluctuating workloads, as it can quickly adapt to changing demands.

Predictive Scaling automatically schedules the correct number of Amazon EC2 instances based on predicted demand. It leverages historical usage patterns, trends, and machine-learning algorithms to forecast future demand. Predictive Scaling is well-suited for applications with more consistent or predictable workloads, allowing for better cost optimization by scaling in advance.

Elastic Load Balancing (ELB)

Elastic Load Balancing is the AWS service that automatically distributes incoming application traffic across multiple resources, such as Amazon EC2 instances.

The Elastic Load Balancer (ELB) in Amazon Web Services (AWS) operates at a regional level and is separate from individual EC2 instances. By functioning at the regional level, ELB provides a centralized and efficient way to distribute traffic, ensuring that each instance receives an appropriate share of the workload. Its automatic scaling capabilities allow for the seamless integration of additional instances.
It provides automatic scaling capabilities without requiring any rate changes.
ELB serves as a traffic distributor for both external and internal requests. For instance, it collects all incoming requests from the front end and evenly distributes them across multiple instances. Similarly, it gathers requests from the front end and equally distributes them to the back-end instances. This ensures that the workload is balanced across the instances.

Search This Blog