My comment was “To Save Production Environment......We need a better Test Environment, ideally a mirror image of Production Environment.” I think one of the ideas of having a Testing team; is to Save the Production Environment from any failure whether it is functional or non-functional. There may be many points to counter the above statement but let’s agree that organizations spend a huge sum of money to be rest assured that the application will not cause any trouble once it is tested and deployed to end users.
What are the probable causes of Prod incidents?
No application works on its own. There are always some downstream and upstream apps as well. The incidents that I had experienced were a result of one or more points as mentioned below.
1) Two interdependent applications are being tested separately in different environments (for e.g. QA and UAT), both works fine but when deployed to prod, all hell break loose.
2) QA environment may not have the similar data size as in prod environment leading to performance issues specially when there are jobs running & supplying data to another job.
3) An Urgent requirement change is deployed to prod assuming it would not affect anything else and there is no need to test. The change was urgent because business was getting impacted. Now, after deployment the business is severely getting impacted because of an untested minor code change.
There can be many more reasons for failure of application in the prod environment. The last one is the most frequent one for me.
What I meant by asking for a better test environment?
Environment where all interdependent applications work the same way (in terms of data flow, data size) as they supposed to work in prod.
I have faced situations where an application was tested in test environment, no issues found in UAT environment but when the application was released in Prod, there were some serious issues. When the Root Cause Analysis was done; it was found that when application was tested in QA or UAT; it wasn't receiving data from one of the applications, so integration was never tested. But in prod; the application started receiving data and BOOM. The problem as I see it was the communication gap between teams involved.
An environment where the dataset and user base are similar to that is supposed be there in Prod.
An application broke in Prod because the data files it used to receive in Test environment were of size few KBs but once in Prod it had to receive files of size of MBs and it couldn’t handle the load. Kedar Kulkarni, a friend whose expertise is in Performance Testing, would agree to this.
In general, a test environment is shared by many applications so timely availability to different teams becomes an issue. In such cases, environment management becomes an issue. And it’s an irony that people, who want their prod environment healthy, don’t pay enough attention to their test environment which can be disastrous.
Can a better Test Environment Save the Prod Environment?
My friend Parthiban disagreed with my comment as mentioned in the beginning of the post. His idea is to have a stable test environment and then mirror the same to prod. Initially I didn’t like or probably understood this thought. Let’s evaluate;
If we can have all the interdependent applications stabilized in the test environment in terms of data dependency, data flow and data size than yes we can mirror it to Prod. If we can remove the problem of communication which is generally a result of ego clashes, yes we can.
I would request readers to share their views on the same and let's learn from each other’s experiences.