Running Tests in Series in Rust
Programming · Aug 28th, 2022
If you don't care about motivation and just want a solution, simply jump to the code below or view it on GitHub.
Have fun 😉
Cargos test runner runs your tests concurrently. In a nutshell, that means cargo test
runs multiple tests at the same time. This is a double-edged sword though: On one hand, running all your tests may be significantly faster, because it literally runs more at less time. But the downside is, that your tests need to be 100% thread safe and they need to be completely independent from each other. Most of the time, tests you write in Rust should be independent from the get go, especially since cargo test
doesn't provide setup or teardown methods, which are commonly used in other testing frameworks in other languages. But there are a few cases, where even in Rust, data is shared between tests. Most notably: Statically mutable variables, files or database connections.
Here's how shared data creates problems with cargo test
: One test may set it up to "hoi" while another sets it up to "poi". Test one expects it to be "hoi" while the second expects it to be "poi". One possible concurrent execution of the tests may look like this: Test one sets it up to "hoi", but oh no, before it can check the data, test two sets it to "poi". Test two checks and sees the data is "poi" and succeeds. Now test one continues to execute, sees that the data is "poi" instead of "hoi", and the test fails. Even though we implemented our behaviour 100% correctly™, the test fails, because it has a race condition with another test.
So how do we fix this? Simple: Don't execute tests concurrently. The Book even tells us how to run tests on only one thread. But this directly means, that our tests won't run as fast. Bummer.
Instead, what we want is to selectively run only specific tests in series, while all unrelated tests run concurrently. This blogpost by Ferdinand de Antoni suggests to simply use the serial_test crate. This works I guess, but I am more of a Terry A. Davis kind of guy, with virtually none of his genius and twice his sanity. So, I want to keep 3rd-party stuff to minimum. Also spoiler: The solution is so quick and easy, it doesn't really deserve a separate crate.
Ok, so we want to come up with a solution ourselves. We could write our own test harness. Yes, in Rust you can actually write one yourself. But there are 2 strong reasons against it:
- I only know that you can write your own test harness, but not how. I am really unqualified to give you directions on that. Though Jon Gjengset mentioned how to, in his book "Rust for Rustaceans", if you seriously want to do this.
- Secondly: It's absolute overkill. Sure, you can kill a single ant with a nuclear bomb, it gets the job done, but I think there is a simpler solution.
A simple, working approach would be the following: Have some piece of code, which is shared between tests, so it ensures that no tests are run in parallel. Now we are thinking concurrently! The most obvious data structure would be a Mutex. But a Mutex comes with a hefty drawback: PoisonError. Long story short: If a thread panics while holding a locked Mutex, other threads currently waiting to acquire the Mutex will panic too. That is no good, meaning a perfectly fine test will fail, when another using the same Mutex is failing. So we have to use something else.
What about atomics? Ah yes, that would work. We can set a flag at the start of our test, indicating that this test is running. Other threads trying to set that flag would see it as already set, and thus wait until it is free. We just need to reset that flag at the end of each test, such that other tests can run again.
Now this is promising, but it still isn't perfect: What if the test fails? Most of the time, your test fails because some panic occurred, whether by some assert!()
, unwrap()
or because you deliberately throw it in your code. This is bad, because a panic leads to the code afterwards to not be executed. This means a test holding the flag doesn't reset it after it failed. This in turn means, that every waiting test will wait forever, because no one is going to reset the flag.
But, what if we put it into a drop-guard?
...
Genius! Yes! Why didn't I think of that?! Let's put it into a file and ship it! Easy 😎
View on GitHub
use std::{
sync::atomic::{AtomicBool, Ordering},
thread,
};
pub struct TestLock<'a>(&'a AtomicBool);
impl<'a> TestLock<'a> {
pub fn wait_and_lock(lock: &'a AtomicBool) -> Self {
while lock
.compare_exchange_weak(false, true, Ordering::SeqCst, Ordering::SeqCst)
.is_err()
{
thread::yield_now();
}
Self(lock)
}
}
impl<'a> Drop for TestLock<'a> {
fn drop(&mut self) {
self.0.store(false, Ordering::SeqCst)
}
}
#[cfg(test)]
mod examples {
use std::{sync::atomic::AtomicBool, thread, time::Duration};
use super::TestLock;
static mut UNSAFE_SHARED_DATA: String = String::new();
static LOCK: AtomicBool = AtomicBool::new(false);
#[test]
fn test_one() {
let lock = TestLock::wait_and_lock(&LOCK);
unsafe {
UNSAFE_SHARED_DATA = String::from("hoi");
thread::sleep(Duration::from_millis(1));
assert_eq!(UNSAFE_SHARED_DATA, "hoi");
}
drop(lock)
}
#[test]
fn test_two() {
let lock = TestLock::wait_and_lock(&LOCK);
unsafe {
UNSAFE_SHARED_DATA = String::from("poi");
assert_eq!(UNSAFE_SHARED_DATA, "poi");
}
drop(lock)
}
}
Yes, it uses a spinlock. Yes, it uses Ordering::SeqCst
. But if I am honest, at this point I don't care. It is better than running the tests on one thread, and it gets the job done.
Initially, I wanted to explain how this code works in detail and why, for beginners you know. But this got quickly out of hand, and if I would have included it here, then this blog post would've been 5 times as long. Concurrency is a big topic. You can write a book about it, and people have. If you aren't shy of C++, I highly recommend "C++ Concurrency in Action" by Anthony Williams.
If you don't know why this code works, and are not interested in the book I just recommended, understand this simplified explanation:
- Atomics are variables that are thread safe.
- The while loop attempts to lock the AtomicBool.
- If another test holds the lock, setting the lock will fail and the loop executes again. This may happen very, very often. But unless we succeed on setting the lock, the code will never leave the loop. It effectively waits. We say it spins, and this programming pattern is called a spinlock.
- Rust is smart and drops values as soon as they are not used anymore. Thus, if we don't use the lock, it will immediately be dropped and the lock will be freed. The manual call
drop(lock)
at the end of each test prevents the lock of being freed too early. drop()
is called, whether because the value falls out of scope, or because of a panic. No matter what our code does,drop()
will be called guaranteed. Therefore, it is safe to put cleanup code into it, like freeing our lock. This pattern is called a drop-guard.
And that's all there is about it. Using this solution really boils down to just copying and then using it. I hope this may be helpful for someone 😉
|
◀ | Previous Post: How to programm Button Down, Up and Hold with 3 lines of code |
▶ | More Programming related Posts |