Pragmatic testing (P1: Problems, principles and test-friendly code)

Posted on 2021-02-03 In 01 Binary Life Views: Disqus:

“Please write a UT for this change.” is one of the most frequent comments I leave in the code review. And I am seeing people tend to write minimum tests just for having something to pass the review.

Well, I kind of understand writing tests is annoying sometimes, but tests are extremely helpful and there are things could help us writing them too. So let’s talk about test!

1. “Writing test is so annoying”

“Hey! It compiles! Ship it!”

When talking about writing tests, many people find it annoying for simliar reasons:

Extra work. Most of time, writing tests is very time consuming. And sometimes, it will take even longer time than development itself.
The thrill of developing a new feature is usually over after the core logic is done.
Writing test is a tedious work. And you might not even be able to find a single bug after writing tons of tests.
We have added a lot of tests for a feature. Then the feature changed, and all the tests have to be rewritten, which greatly slows down the development.

All these questions and complaints are basically because we don’t understand the real intention of testing, and finally fall into the misunderstanding of writing test for the sake of writing tests. And when we are forced to do something, we are never going to feel it will benefit us, even if it really does.

2. So, why testing?

This is the ultimate question indeed. So, what are these tests for?

2.1. Ensures no regression in the future

Many people feel tests are like disposable items. We only use them once during code review to show the code is working as expected. On the contrary, there is nothing more cost-effective than tests: a once-investment, a lifetime warranty! Once the tests are done, they will guarantee our code always works in the way we expect. This is the one of the biggest reasons for us to write tests.

Furthermore, if it is done right, the marginal cost will be almost 0!

Don’t underestimate this property. it’s what allows us to make changes to our code with confidence. It is the cornerstone of any code refactoring (see “Refactor: Improving the Design of Existing Code” Ch. 2), and it is also the prerequisite of any automation (see “Continuous Delivery”). And safe code refactoring and automation is what we are really trying to achieve.

2.2. Reveals behavior changes

Of course, we can also apply this property above reversely, that is, when we change the behavior of the code, the test should be able to fully reflect all the changes brought to the system. And these changes should be as clear as possible to help us in self-reviewing and code review.

We’ll leave the specific methodology here and discuss it in next post.

2.3. Tests are enforced docs

“Programs must be written for people to read, and only incidentally for machines to execute.”
– Harold Abelson and Gerald Jay Sussman, from “The Structure and Interpretation of Computer Programs”

The biggest problem with documentation is that it’s not enforced, so it starts to become obsolete from the very first second it’s written. This is why the code itself should act as its own documentation. That’s how we can be sure that the documentation will never become obsolete. Tests are code too. And comparing to other code, they are a even better knowledge base.

First, tests demonstrate how to use the module that is being tested. If you are familiar with Test Driven Development (TDD), you will know that one of the purposes of writing tests is to help us design and improve the interfaces from the user’s perspective. Although we don’t necessarily have to use TDD, the idea is the same. Languages like golang do a even better job by directly distinguishing and supporting such example tests separately at the language level.
Second, the assertion in the test guarantees that everything it says will always be true! This gives us a shortcut to understand the code. We don’t have to struggle figuring out what does the code do and understand all the details, we can just read the conclusions that are absolutely true. How good is this! It is the best documentation you can ever imagine!

For this reason, whenever I need to understand a module, after a brief reading of the design doc, I will usually spend more time reading the relevant tests.

3. Principles of testing

Now that we understand the purpose of testing, you may have a little interest in writing tests. But reality is always stark and the problems we mentioned above are still in front of us, so the methodology of writing tests becomes especially important. Remember: we should never write tests for the sake of writing tests! Instead, we should test reasonably in reasonable places. Do not let testing becomes a burden to us, or making us slaves. Make it become our helper!

To help us test better, we should follow several principles below, no matter if we are writing features or writing tests:

Shortest distance: The distance between the error check and the action should be the shortest. In other words, an error should be reported immediately if something goes wrong.
Observable: The behavior and state of the program can be easily and clearly observed. For example, logs, error contexts, and other information.
Repeatable: The program’s behavior and errors can be stably reproduced. Unstable tests are really annoying.

We’ll discuss the specific methodology below and explain how to apply these principles.

4. Good code tests itself

When many people hear about writing tests, they think of writing test cases one by one in unit tests. In fact, tests don’t necessarily need to be implemented in this way. We know that the earlier we find a problem, the cheaper it is to fix it, so wouldn’t it be more convenient if the code itself could help us find the error?

So, don’t forget: good code tests itself. This is an application of the shortest distance principle.

4.1. Contract programming (Design by contract)

Although many projects are very complex, they are nothing compared to human society. So what is it that makes our society possible to move forward with such complexity? This is the power of contract! From a small verbal agreement, to a larger contract signed to work, to the ubiquitous rules and regulations, to the laws that guide our lives. They are all contracts. And if we break them, we will be punished in certain way.

Same can be applied in coding. Each function has its own contract: what are its preconditions (e.g. parameters are not null, usually checked by Requires), what are the postconditions (e.g. a file must be created, usually checked by Ensures), and what are the invariants during execution (for example, in binary search, left must never be greater than right, usually checked by Assert or Invariant). Once the contract is broken, it needs to be punished (return an error or exception). This is Design by contract.

Essentially contract programming is a kind of in-code testing. It is so famous not only because it helps us find the problem, but more importantly it also helps us determine where the problem is. It works as the contracts in real life - you’re late is you’re late. If the preconditions of a function are not met thus causing an error, then we don’t need to check this function at all; if an invariant is violated, then the problem must be in this function or its related functions (e.g., other functions of the same class), and we don’t need to check its caller. This greatly helps us debugging problems!

In fact, many programming languages already provide support for contract programming. For examples: SAL (a type of code annotation) and GSL (for supporting C++ Core Guidelines) in C/C++, CodeContract in C#. And the impact of this idea is so large that if the language doesn’t provide it or the inbox one is slightly inconvenient, the community will help provide third-party packages to implement it, such as Dawn.Guard for C# and ozzo-validation for Go.

4.2. Be disciplined, not permissive

“Don’t assume it, prove it.” – David Thomas, from “The Pragmatic Programmer: Your Journey to Mastery”

According to the shortest distance principle, we want to report errors as early as possible, so a very intuitive conclusion is that - If there is a more enforceable way to constrain code, always do so.

The common constraints, in descending order of strength, are listed as follows:

Syntax-level compilation errors
Compile-time assertions
Static code analysis
Run-time errors, assertions or exceptions
Checks in run-time and return error codes, then asserts in tests.
Checks in run-time, without returning any error codes. But internal state can be accessed later, then asserts in tests.
No error handling of any kind

Level 1-3 constraints are compile-time constraints, while level 4-6 constraints are run-time constraints, and they each have their own characteristics. To show how they works, here is an example, which I believe we all have seen a lot:

struct Data {
  int FieldA;
  char FieldB[6];
}

bool ProcessData(Data *p) {
  if (p == nullptr) { return false; }
  // Do something here
  return true;
}

4.2.1. Run-time constraints

4.2.1.1. Error codes

Based on the definitions above, we can tell this function is using level 5 constraint. Since this constraint mainly uses error codes to determine the type of error, it is better to use more expressive error codes. For example, std::error_code, a strong-typed, extendable, descriptive error code in C++, or integer based error codes likes HRESULT or System Error Code in Windows programming.

In the example above, using bool is bad, because returning false cannot tell us what might have gone wrong at all. Hence, to improve it, we can replace it with HRESULT:

HRESULT ProcessData(Data *p) {
  if (p == nullptr) { return E_INVALIDARG; }
  // Do something here
  return S_OK;
}

As we can see here, the problem with the return code is very obvious. If no test case tests it, this kind of error will not be found by us quickly, until it goes wrong somewhere else and maybe after days of debugging (missing logs again? :P). So in order to find the problem sooner, we can try raising the constraint level of this code.

4.2.1.2. Assertions and exceptions

We are all familar with assertions and exceptions. They are all very straightforward, no matter if we are using exceptions like ArgumentNullException or Assert. But there is a better way to apply assertions and exceptions. That is the contract programming we have already mentioned above.

If we don’t use any third-party libraries, we can define some macros ourselves to start with. For example, the following code implements the Requires keyword in contract programming to help us constrain our code.

##define REQUIRES(cond) assert(cond)

void ProcessData(Data *p) {
  REQUIRES(p != nullptr);
  // Do something here
}

A few things to note here are:

If the argument here is a user input, we must be careful with assertions. A user input can be anything, there is no contractual restriction on it, so we have to handle it properly. Assertions should only be used when something impossible happens, e.g. internal calls. Otherwise, they can lead to unexpected crashes and affect the user experience. (Classic joke: A QA engineer walks into a bar).
If this null pointer is the only thing that can go wrong in this function, we don’t need to return any error anymore, because it’s already asserted, hence changing the return value type to void.

And as mentioned above, there are already many mature libraries for contract programming. Applying them wisely can make us much more efficient at writing code and debugging issues.

4.2.2. Compile-time constraints

The benefit of run-time constraints is that it can handle any data. But its problem is that it requires us to execute this code. Therefore, we must write and run specific tests to trigger them, and it also takes time to debug these errors. These are all overhead for us. So, is there a stronger and more convenient constraint? Gladly, yes, that is compile-time constraint.

The compile-time constraints, which are all executed during compile and generate compile errors if something goes wrong. The advantage of it is very obvious:

If the compilation fails, we don’t have any software to release. And this prevent us from releasing bad versions for good.
Compile errors usually come with very detailed information about the error, like which file and which line of code. So comparing to run-time errors, compile errors require almost no debugging (ahem, except for C++ template deduction errors…).

But the biggest problem with compile-time constraints is that if the state cannot be determined during compilation, we cannot use them (e.g., user input).

And here, let’s see how to apply the compile-time constraints.

4.2.2.1. Static code analysis

Static code analysis is a useful tool to help us analyze our code and identify possible problems in it, such as memory being allocated but not released, variables being used before initialized, and so on. The reason it is listed only as level 3 here is that static code analysis is usually optional and people often forget to turn it on. In addition, code analysis takes time, which can make compilation speed significantly slower, so many people also disable it during development and only enable it when merging code, so it is not that mandatory, which causes the problems being found late.

There are many static code analysis tools, such as CppCoreCheck, NetAnalyzer and so on. Many of these tools also require us to modify the code accordingly, telling it what we expect, for example, SAL.

Now, let’s apply the level 3 constraint by adding SAL to the same code:

void ProcessData(_In_ Data *p) {
  // REQUIRES(p != nullptr); // This line can be removed, if p is not a third-party input, e.g. user input.
  // Do something here
}

As we can see here, the SAL _In_ keyword added before the parameter did not just provide the expectation for code analysis, it is also a contract that tells the caller this function does not accept null pointers. So, based on the contract programming, we can assume that the caller abides by the contract and remove null pointer check. This makes the code simplier and more clear, and also more efficient! (Imagine passing a pointer through a call stack, and the beginning of every function on the stack starts with the null pointer check…)

4.2.2.2. Assert, not just in run-time

Many languages also provide compile-time assertions, such as static_assert in C++. This assertion can be used as long as the condition can be evaluated at compile time.

For example, we can make the following assertion on the above code: (Will both assertions succeed? :D)

struct Data {
  int FieldA;
  char FieldB[6];
}

// Making sure structure and fields are aligned when dealing with binary buffers.
static_assert(sizeof(Data) == 10);
static_assert(FIELD_OFFSET(Data, FieldB) == 4);

In addition to this, there are some other common usage patterns, such as:

1 2	// Making sure every enum value will have its name. static_assert(_countof(enum_names) == enum_value_max));

4.2.2.3. Syntax-level constraints and immutability

At last, the highest level of constraint - syntax errors. By using the keywords provided by the language wisely, we can make the language itself help us avoid bugs.

For example, for fixing the code above, we can simply use reference instead of pointer to avoid the null pointer problem for good. Then, we don’t need any null pointer check at all. Also since it is an _In_ parameter, we can add a const qualifier to ensure it will not be modified by the function.

1	void ProcessData(_In_ const Data &p) { /* Do something here */ }

There are many other keywords/syntaxes like these to help us with constraints, such as the frequently used final keyword, the override keyword in c++, and so on. And my favorite one - const.

In C++, const qualifier can appear in many places - decorate variables, function parameters, member functions. Whenever I see const, I feel especially relieved, because I know whatever it decorates won’t be changed by anything (or won’t change anything), so it’s safe in multi-threaded programming (there are still exceptions, of course, but they’re much less likely). And this is one of the most important cornerstones of functional programming (or implementing thread-safe programs): Immutability.

Immutability, for an object, means that the state of the object is immutable after its constructor is done, i.e., the constructor is the only place where the state of the object can be changed. Such an object is also called an immutable object. The affinity of immutable objects for multi-threaded programming is very obvious - it cannot be changed by anyone, so you can access it in parallel or in any way you want.

Here again, taking the same function as an example. If, say, ProcessData is a callback function and the parameter p will be accessed by multiple threads, then we can implement this function in the following way, so that we can avoid many errors that caused by multi-threaded access, like race conditions.

1	void ProcessData(_In_ std::shared_ptr<const Data> p) { /* Do something here */ }

Now, immutable objects are not exclusive to functional languages anymore, for example, C# provided Immutable Collection to help us do better in multi-threaded programming.

4.3. Dead programs tell no lies

4.3.1. Assertions or error codes?

“Defensive programming is a waste of time. Let it crash!” – Joe Armstrong, the inventor of Erlang
“Crash, don’t trash.” – David Thomas, from “The Pragmatic Programmer: Your Journey to Mastery”

After the section above, you may be wondering, is it better to use assertions than error codes? If an assertion fails, the program will crash. Shouldn’t this be avoided?

On the contrary, because crashes are ruthless, we should use them more often. Especially when a major problem is encountered, we should crash as soon as possible instead of remain silent and continuing execution.

First, crashes never lie! And they also generate a crash dump file (i.e. core dump), which provides the call stack when crash happens, or even complete memory data to tell us why. This is far more informative than one or two lines of error logs, which can greatly increase the observability of a program when something goes wrong.

Second, when a program is in an uncertain state, it is usually better to just crash than to continue execution, because we cannot predict what the program is going to do next. It may modify user data indiscriminately, it may execute some logic that should not be executed at all, or call another module or service. And the effects of these errors are usually very difficult to recover (Good luck if a user’s data in his file or database is corrupted, due to a bug in our service sending valid request…)

Finally, crashes just simply can’t be ignored, whether it’s in testing, pre-release, or final release. This gives us two particularly good properties:

Anything that we asserted is guaranteed to be true. This helps us simplify our code and increases our confidence that our code is executing as expected. Think of a certain assertion in your program that never get triggered when executed on millions of machines.
Assertion failures are hard to miss. We want any errors to be exposed as early as possible, rather than found by our end user and causing us receiving a bunch of incident tickets.

Of course, assertions are good, but don’t forget the scenarios, in which we can use them, as already mentioned above. In simple words: if something is code-wise impossible, it should be asserted, otherwise it should be gracefully handled, such as user input.

4.3.2. Suicide point

“Let there be light!”

Because crashes have such good properties, we can also use them to better examine the behavior in our program when needed.

Suicide point is different from assertions. If a program hits them, it dies. But usually, they are not enabled at all. We only turn it on, when we have to.

I have worked on a service once, which is really old and buggy. In order to understand what the service was really doing, I added a lot of trace points on all key functions in all modules (trace points won’t trigger crashes but only report hit counts), such as whenever core system API call fails. Then after releasing the trace points, I was so surprised to find that the error rate of the core system API calls was so high that all instances combined could reach millions of errors per minute (yes, you are reading it right, millions, no kidding). Some errors are so hard to debug, even with very detailed error logs, because the amount of information was still too limited. So I added suicide support to all the trace points. With this, I could control any instance at will and trigger crash whenever the specific trace point was hit. The crash dumps collected from these suicide points gave me really deep insight on these hard issues with very limited impact, like what does certain memory buffer, that we pass into the system call, looks like. Then, by slowly gathering information and iterating the fixes, the amount of errors in the core system API has now dropped to single digits per hour or even per day. Finally, we also added alerts to these trace points to help us guard any future regressions.

5. Good code makes good tests

Before writing tests, we should never ignore the quality of the code itself. If the code itself stinks, the tests will be a mess too.

Please don’t hope how much testing can help us improve the our code quality. No matter how bad the code is, we can still write some test for it, but these tests will suck as well!

5.1. Pragmatic testing: code boundary and code refactor

Many companies or projects enforces all classes and functions must have test without even thinking about it, in order to achieve “absolute 100%” code coverage. Personally, I hate this, because it conflicts with one of the major goal we want to achieve with tests – helps us refactor the code. Writing code is like planning a city. As long as our project is evolving, refactoring is inevitable. And refactoring inevitably involves changing the code structure within certain boundaries. So this kind of brute force testing will only waste us a lot of time and slow down the development progress, while provides almost zero benefit.

Therefore, while we want make sure every case is tested, we also need to leave the code certain space to move. A better way to write test is to first observe where the code boundary is, such as submodules or code layers, and then add tests at these boundaries. Most of the code refactors that we do are within these boundaries or rarely cross them, so the tests, that is added this way, will require very little or even no modification at all during refactoring!

And this brings a change in the code refactoring process, which is the practice I recommend to everyone in our current project:

Before refactoring any code, add tests and submit them to harden the behavior of the code we want to refactor. If any bugs are found, fix the bugs first.
Start refactoring the code, with small commits (see “Refactor: Improving the Design of Existing Code” for how to refactor the code in small step). And the principle is: The code refactor commit should not show any behavior changes to existing tests. If a test needs to be added, return to the step 1 to add the test first. Then continue refactoring in the step 2.

The benefits of this are, first, the current behavior of the system will be thoroughly understood before any changes are made; second, when reviewing the code change, it is crystal clear even with just a glance, that I changed the code, but there is no behavior change at all to the system, hence it is a safe commit.

5.2. Increase testable surface: high cohesion, low coupling

“High cohesion, low coupling”, you may have heard this countless of times. But I still like to mention it here, because it is really helpful for writing tests.

Highly cohesive and low-coupling code has some characteristics: good modularity, single functionality (Single Responsibility Principle (SRP)) and orthogonal for each part. Such code is very test-friendly. So, if the code you are maintaining is messy, you could consider refactoring it first to improve cohesion and reduce coupling. This can lead to better and more fine-grained module partitioning or code layering, thus providing us with more testable surface (refer to the previous item: “Pragmatic testing: code boundary and code refactor”).

Another service I worked on was a typical victim of violation of the single responsibility principle. What this module do was actually simple: it is a multi-threaded, multi-protocol probe service. It sends request to other services and see if they are alive or not. The problem with this services is that, the underlying API usage, multi-protocol implementation, probe result handling and concurrency management were all implemented in the same place (well, with a few class inheritance, which is still bad for testing). It caused 2 major problems in the existing tests:

First, we have no way to do synchronization in testing. This results in 30 seconds sleeps everywhere in the tests and makes the whole test suite takes more than 20 minutes to run.
Second, there is no way to simulate each error cases and thoroughly test the probe result handling, especially when underlying api return errors in the multi-threaded context.

In the end, people gave up the unit test and created a test server to do tricky manual test every time. And even so, it only covers the simplest case. This makes this service almost impossible to maintain, and led to a lot of tickets coming to our team.

So, in order to improve this service, I had to be patient and this is what I did:

Harden the code behavior by adding and improving the tests first, even through the results are wrong. This step is critical, because it gives us a solid baseline to make sure no regression will be introduced. So, although it is crazily painful, as the tests took forever to run, this has to be done first.
Improve the logging and metrics data, which improves the observability, especially in scale.
Refactor the code in small steps. Change the inheritance to composition to separate api, multi-protocol implementation and concurrency management, which creates new code boundaries. And of course, no behavior change should show up in existing tests. But for each new layer, I will add more tests for it.
Improve the tests by using this new boundary, but still no behavior change should show up in tests.
Fix the issues that I found during this process. Each fix will show up as several test failures, which helps me to confirm the the fix.

Thus, in two weeks, I found and fixed countless problems - incorrect error handling and multithreading issues all over the place. In the end, besides the code quality improvements and bug fixes, not only the number of tests increase significantly, but the time to run all the tests also dropped from 20 minutes to 3 minutes. And the best part is that all intermediate versions could be released or rolled back at any time without worrying about regressions.

After this, the ticket for complaining this service has greatly reduced. Except for one known “bug” that was a behavior change and could not be fixed directly, the service has gone from a problem-prone area to almost a problem-free one. The new logs and metrics data have also become one of the most important metrics for evaluating build quality.

5.3. Avoid unstable code

As mentioned above, one of the principles of writing tests is repeatable: don’t report errors indiscriminately when nothing goes wrong, but be able to reproduce steadily when something goes wrong. If a test always fails two or three times out of ten runs, soon no one will take the failure seriously, because who knows if the error is “expected” or not, and it will pass after maybe two more tries anyway. This situation is a big no-no in testing.

Most of instability issues are caused by the code itself, that we are testing, being unstable.

Time sensitive logic or tests. E.g.: timer, sleep, multi-threaded contention, etc.
Usage of random APIs, e.g.: jitter, random numbers, etc.
Program internal states that could affect logics, e.g.: configurations.

For timer or random related cases, we can first separate out the unstable part, abstract it into a trigger, and then test the trigger and handling logic separately. This way we can easily use various synchronization mechanisms to stabilize our code and tests, such as event or lock. For internal state, we can try to convert them into function parameters, or use test stub (discussed below) to convert them into part of our input metadata and make it stable.

5.4. Be friendly to tests: test stub

I believe we all have encountered more or less the following awkward situations when writing tests.

Our code needs to call a system API, but this API does not work in the test environment.
The state of a member variable is really important, and we want to test it to make sure it works, but we don’t want to expose it to our users.
We need to test the behavior of a class under certain condition or state, but this state or condition is not easy to achieve
and so on…

This is where test stub can help.

A thin layer of wrapping around the system API, allowing us to easily simulate success and failure situations
If it is our own code, then check if we can create more layers in the code. Once more layers are created, they can be abstracted stably and used as our test stub to help us mock their state.
The much-maligned friend class in C++ can be a good solution to the problem of checking private member variables: friend class FooTests;

6. Take a break

All right! That’s the end of part 1. If you’ve read this far, you’ll notice that we haven’t even started talking about how to write tests! Yes, and to restate my point here: Writing tests is not just about writing tests, it’s not our purpose. Test is not a silver bullet, and adding a few tests won’t help us improve the quality of our code much. And good code quality is our actual goal.

Ok! Let’s summarize all the things we discussed so far:

First, we summarized the reason of testing: to ensure the code does not regress; to help us identify changes to the code; and to act as an enforced document and knowledge base.

Then, we discussed the principles of testing: shortest distance, observable, repeatable.

Finally, we discussed part of the methodology. In this post, we focused on the code being tested itself:

Good code tests itself: use contract programming (design by contract); constrain the code behavior as much as possible; use crashes wisely to increase the observability of our program.
Good code makes good tests: use code boundary to make tests better support refactoring; reduce coupling and improve cohesion to create more testable surfaces; avoid unstable code; use test stub to help us test more deeply.

In the next post, we’ll discuss how to write good tests.

同系列文章：

原创文章，转载请标明出处：Soul Orbit
本文链接地址：Pragmatic testing (P1: Problems, principles and test-friendly code)