August 7, 2014

Putting testability into practice

As testers, we're always being encouraged to push for testability within our products. Generally, I've found that such requests are well-received by developers; the harder challenge can be working out what to request. (If you're struggling, Adam Knight's excellent blog post Putting Your Testability Socks On can give you a few pointers.)

This week, in my spare time, I've started developing a small application myself. As sole developer and tester, it's therefore my job to decide upon and implement testability elements. It's given me an insight into coding for testability which has made me realise just how much benefit I can gain through being able to easily access the product's internals.

My project is a Python-based cricket simulator. I chose it because the sport is quite well-suited to an object-oriented programming approach: each team has an innings, which consists of a series of overs, and each over is a series of balls.

It only took me about 30 minutes to construct a simple framework which (using a random number generator) would simulate an innings for a team.

![A snippet of the bowl_ball logic](/content/images/2014/Aug/2014-08-06-21_18_58-PyCricket-py---E__Users_Neil_Google-Drive_PyCricket-py.png) A snippet of PyCricket's bowl\_ball() logic. Yes, it's currently very basic!

By running my code a few times, I could see that it appeared to be performing as I expected. But one of the challenges with random numbers is that you need to exercise a serious number of simulations before you can see whether your spread of results is appropriate.

So, 30 minutes into the project, I began add what James Bach refers to as "intrinsic testability" (Heuristics of Software Testability - PDF link). Specifically, I wanted elements of Observability (extracting stats and logs for analysis) and Control (to manipulate the state of the application):

  • A function print_stats_to_file which wrote a single line to a logfile at the end of each match, containing the team's score, number of wickets lost, and the scores of each batsman;
  • A global number_of_simulations variable, so that (when the script is executed) it generates this many match results.

That took less than 5 minutes to write; the hardest part was recalling the optimal Python syntax for rapidly creating/destroying connections to a log file. In no time at all, I had a harness which was capable of running thousands of simulations of my data; data which could then be profiled and benchmarked.

It's already proven invaluable. I've been modifying my random number generator to increase the realism (and complexity!) of its scores. Being able to make some changes, and then instantly comparing the results of earlier benchmarks, allowed me to quickly notice when my changes had undesirable effects.

In this first graph, the blue line indicates my (at that time) most-favoured scoring algorithm. I refactored some code in a way which shouldn't have affected the simulation engine at all, but when I re-ran my simulations, it was clear that something had gone horribly wrong; scores were significantly lower across the board.

![](/content/images/2014/Aug/2014-08-06-21_42_01-Microsoft-Excel---Cricket-Stats-Baseline-xls.png) The purple line shows where I accidentally nerfed the batters' abilities.

And in this second example, I'd made a pretty big change to the logic for deciding whether a batsman had been dismissed. Effectively, it should have changed a 7% chance of dismissal into a 3.5% chance of dismissal. The effect should've been a noticeable increase in scores, but when I re-ran and compared against my benchmarks, the results were close enough to suggest I'd screwed up my algorithm change:

![](/content/images/2014/Aug/2014-08-06-21_46_35-Microsoft-Excel---Cricket-Stats-Baseline-xls.png) I expected a dramatic difference, but didn't get one.

This graph prompted me to check my code, and almost immediately I noticed an error in my logic. My new code was having no effect, the chance of dismissal was still 7%! Having testability in my application prevented this from going unnoticed for much longer, when it would have been more problematic to track-down and fix.

I'm building upon this testability as I go, as I discover more needs. For instance, in the very first version of PyCricket, all batsmen were equally competent; however, earlier batsmen (the so-called "higher-order" batsmen) should normally score higher. Therefore I began introducing some weighting so that higher-order batsmen began performing more skilfully, and profiled their individual scores (and their averages) to check that this was having the expected effect. Again, I can store this data and revisit it in the future, to see if I've negatively affected the game in any way:

![](/content/images/2014/Aug/2014-08-06-21_58_15-Microsoft-Excel---5000-Sims-v1-xls.png) A scatter graph of thousands of batsmen's performances, with their scores on the Y axis

The lesson? As Michael Bolton says: Ask for testability. It will help your testing; often in ways that you expected, but sometimes even in ways that you didn't anticipate.

As for the future of PyCricket? Well, I'm going to promote it to prime-time. It began life as a single card on my personal Kanban board, but now it's getting its own Kanban board and a GitHub repository. Both will be made publically available next week, once I've introduced some interactive elements to the game. It should be a fun project!

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket
Comments powered by Disqus