In one of my previous articles in September 2020, I talked about “Applications don’t run fast in the Cloud without Continuous Performance Testing”. My thought process was that continuous performance testing is important for organizations despite deploying their application in the cloud to leverage the full cloud benefits.
Recently, I was reading an article published by AWS in October 2022–“Chaos Engineering in the cloud”. It talks about how organizations are responsible for resilience in the cloud which also suggests that the organizations need to conduct continuous resilience testing for full cloud resilience advantages.
In this blog article, I will talk about why organizations need to conduct continuous resilience testing even if it is deployed in the cloud.
Continuous Resilience Testing so Crucial
Get our top software testing content joined by 20K+ QA enthusiasts in your inbox once a week.
Per AWS, resilience is a shared responsibility of both AWS and the organization. While AWS is responsible for the “resilience of the cloud”, organizations are responsible for the “resilience in the cloud”, meaning the organization’s responsibility is determined by the AWS cloud services that they consume. Organizations need to ensure configuration works, recovery mechanisms, operational tooling, observability, and failure management to make the workload resilient. Organizations must conduct continuous resilience testing to ensure those by making their applications fully resilient.
Organizations can ensure application resilience even before deploying their application in the cloud or even after deploying the application in the cloud for already-hosted applications. Like applications don’t run fast in the cloud without continuous performance testing, applications will not be fully resilient without continuous resilience testing. After-effects can even lead to customer dissatisfaction, losing customer trust, more downtime, etc.
One more thing, conducting application resilience testing in the cloud will always include the resilience of the cloud. In that case, application resilience testing in the cloud needs to be conducted several times in different periods to get the true resilient picture and ensure a fully resilient application.
Continuous Resilience Testing for Enhanced End-user Experience While organizations know that resilience of the cloud itself gives you a better possibility of resilient application but still, they need to make a sure fully resilient application via conducting continuous resilience testing.
For example, periodic disaster recovery exercises provide an important level of confidence in the resilient application, but it is not sufficient to detect and mitigate the different other types of resilient failures that might be encountered in a real-world scenario.
The organization’s first target should be to find the potential resilient issues by conducting continuous resilience testing through planned & careful failure scenarios with continuous end-to-end monitoring. Then, organizations should start thinking about fixing those issues permanently and later re-execute to confirm. Finding this continuous resiliency testing requires a deep analysis in addition to fixing those issues. This deep analysis creates a reliable, fault-tolerant, resilient system that can handle many unexpected real-life events.
Again, identifying failure scenarios (planned and thoughtful) is extremely critical to this resilient process and should be a whole team approach. Typical examples will be resource exhaustion like CPU, memory, or disk, node shutdown, network latency, DNS failure, etc. Organizations must ensure that resilience testing should be integrated with CI/CD pipelines.
Organizations must conduct continuous resilience testing even if the application is deployed in the cloud for a shared responsibility model. Continuous resiliency testing assists to find and fix any resilient issues and resolve them before it impacts its end users. In a nutshell, continuous resiliency testing can assist to ensure resilient, fault-tolerant applications with increased uptime and customer trust, resulting overall enhanced end-user experience.