1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

By Google Testing Bloggers

by Anthony ValloneUnreproducible bugs are the bane of my existence. Far too often, I find a bug, report it, and hear back that it’s not a bug because it can’t be reproduced. Of course, the bug is still there, waiting to prey on its next victim. These types of bugs can be very expensive due to increased investigation time and overall lifetime. They can also have a damaging effect on product perception when users reporting these bugs are effectively ignored. We should be doing more to prevent them. In this article, I’ll go over some obvious, and maybe not so obvious, development/testing guidelines that can reduce the likelihood of these bugs from occurring. Avoid and test for race conditions, deadlocks, timing issues, memory corruption, uninitialized memory access, memory leaks, and resource issues I am lumping together many bug types in this section, but they are all related somewhat by how we test for them and how disproportionately hard they are to reproduce and debug. The root cause and effect can be separated by milliseconds or hours, and stack traces might be nonexistent or misleading. A system may fail in strange ways when exposed to unusual traffic spikes or insufficient resources. Race conditions and deadlocks may only be discovered during unique traffic patterns or resource configurations. Timing issues may only be noticed when many components are integrated and their performance parameters and failure/retry/timeout delays create a chaotic system. Memory corruption or uninitialized memory access may go unnoticed for a large percentage of calls but become fatal for rare states. Memory leaks may be negligible unless the system is exposed to load for an extended period of time. Guidelines for development:
Simplify your synchronization logic. If it’s too hard to understand, it will be difficult to reproduce and debug complex concurrency problems.
Always obtain locks in the same order. This is a tried-and-true guideline to avoid deadlocks, but I still see code that breaks it periodically. Define an order for obtaining multiple locks and never change that order.
Don’t optimize by creating many fine-grained locks, unless you have verified that they are needed. Extra locks increase concurrency complexity.
Avoid shared memory, unless you truly need it. Shared memory access is very easy to get wrong, and the bugs may be quite difficult to reproduce.

Guidelines for testing:

Stress test your system regularly. You don’t want to be surprised by unexpected failures when your system is under heavy load.
Test timeouts. Create tests that mock/fake dependencies to test timeout code. If your timeout code does something bad, it may cause a bug that only occurs under certain system conditions.
Test with debug and optimized builds. You may find that a well behaved debug build works fine, but the system fails in strange ways once optimized.
Test under constrained resources. Try reducing the number of data centers, machines, processes, threads, available disk space, or available memory. Also try simulating reduced network bandwidth.
Test for longevity. Some bugs require a long period of time to reveal themselves. For example, persistent data may become corrupt over time.
Use dynamic analysis tools like memory debuggers, ASan, TSan, and MSan regularly. They can help identify many categories of unreproducible memory/threading issues.

Enforce preconditions I’ve seen many well-meaning functions with a high tolerance for bad input. For example, consider this function: void ScheduleEvent(int timeDurationMilliseconds) {if (timeDurationMilliseconds


Category: Anthony Vallone

Você também pode querer ler

Comments are off for this post.