Layers of Scenarios and a Reply on Randomness

 

Gojko Adzic asked for opinions in his blog “How to Specify Something Should Be Random.  He presented alternative scenarios and asked for your choice.

The situation he described is having a robot’s chat response appear like it comes from a human being. There could be multiple levels of tests associated with this requirement. Let’s take a look at these levels and along the way give my choice(s) for an answer.

Let’s start with a high level scenario for the functionality itself:

Scenario: Robot types reply that seems like it was typed by a human
Given a robot receives a message
When the robot types a reply
Then the reply has characteristics that mimic a human being

There may be some exploration associated with this scenario to determine what those characteristics are. They could include pauses between characters, backspacing, quick word completion (simulating auto completion) and so forth. This scenario might be tested manually. The bot would type a response and then a human would state whether it look “robotish” or “humanish”. Suppose it was determined that pauses were a characteristic to be implemented:

Scenario: Robot types reply that has pauses
Given a robot receives a message
When the robot types a reply
Then the reply has pauses that mimic a human being

For some teams, this might be enough. The timing of the pauses would be handled in the implementation. For others more detail could be included that describes what particularly mimics a human being. This would specify the results of the exploration.

Scenario: Robot types reply that has random pauses
Given a robot receives a message
When the robot types a reply
Then the reply has pauses between characters that randomly vary between 0.2 and 0.5 seconds

Testing this scenario would probably require automation to measure the delays between characters appearing on the screen. The test would just check that the pauses fell within that range and that they were not all the same.

One tricky issue is the meaning of random. Checking randomness involves technical tests. Wikipedia has some details on randomness tests.  These tests check that a sequence is random to some level of confidence.

At the high level, the randomness could be checked by using a large number of replies and testing the set of pauses that were inserted in the replies.

Scenario: Robot types random pauses over a large set of replies
Given a large set of replies
When the robot types them
Then pauses between characters are random with a confidence level of 99%

Now this scenario and the previous one cover the external behavior for the requirement. In answer to Gojko’s question, I would have both of these scenarios to cover the requirement. They represent behavior that is externally visible.

Randomness

As an internal test, one could run the randomness tests against the method that produced a random sequence, if the creator had not already done so.

Scenario: Given a pseudo random number generator that produces values from 0.0 to 1.0
When a long sequence is produced
Then it is random with a confidence level of 99%

As a general rule, values that are random should have a test double (mock) that produces a known output for testing. These values for the test double can be setup in a Given. For example, here’s a test that checks the conversion to a range is correct:

Scenario: Convert random sequence to pause sequence
Given random sequence <value>
When converted to pause from 0.2 to 0.5
Then pause length is <length>
| value | length | Notes |
| 1.0 | .5 s     | maximum |
| 0.9 | .47 s    |
| 0.4 | .32 s    |
| 0.6 | .38 s    |
| 0.0 | .2 s     | minimum |

Another internal test could check the plumbing between the pause sequence method and the reply functionality.

Scenario: Check that random sequence produces appropriate pauses
Given random sequence is:
| value |
| 1.0   |
| 0.9   |
| 0.4   |
| 0.6   |
| 0.0   |
When bot replies “hello”
Then the pauses between character are:
| char | pause |
| H    | .5 s  |
| e    | .47 s |
| l    | .32 s |
| l    | .38 s |
| o    | .2 s  |

Summary

I suggest you have at least the two external scenarios that show a single reply and also check randomness on a series of replies. Depending on the complexity of the implementation, you might have the other scenarios as internal tests for methods or combinations of methods.