Embracing or banishing randomness

May 15, 2019 – Nicolas Zermati – 18-minute read

This article was written before Drivy was acquired by Getaround, and became Getaround EU. Some references to Drivy may therefore remain in the post

Writing tests is becoming a big part of our job. If it isn’t yet, I strongly encourage you to push your organization down that path. Why could be the topic of another article.

I think there is a tremendous value in having an efficient test-suite. By efficient, I mean that it doesn’t give much extra work when refactoring and it gives accurate information when something is broken. And by accurate, I mean having as few false positive as possible, as many defects being caught as possible, and as few tests failing as possible for a single defect.

As important as tests are to me, I don’t give as much attention to tests as I give to production code… In my reviews, I tend to have lower standards when looking at the tests. For instance, I won’t ask for a refactoring of the tests as long as they seem to be testing the behavior that just changed. It leads to heterogeneous practices. And on some topics, we simply disagree!

This article will be about a controversial topic and will try to show the benefits of using randomness in your tests. I will also cover some of the downsides too and if you have more points you would like to add, please ping me on Twitter.

Context

The examples in this article will follow a feature and its testing journey. Here is a description of the feature:

We consider the duration of the rental to be the number of 24-hour chunks between its start time and its end time. When a trip spans across more calendar days than its number of 24-hour chunks, we would like to use the pricing of the car for the most relevant days. For instance: if a trip starts at 2pm and finishes at 8am the next day, we would like to consider the pricing of the car for the first day to be from 2pm to midnight.

Here we’ll look at the development of the tests written to test the #date_range method. This method gives the relevant days we should consider in order to price the trip.

Use-case based approach

In this context, in order to clarify things between the product owner and the development team, some examples were created and agreed upon before the code was created. Those examples were translated into the following test-cases by the developer:

subject(:date_range) do
  described_class.date_range(starts_at, ends_at)
end

let(:starts_on) { starts_at.to_date }
let(:ends_on) { ends_at.to_date }

let(:ends_at) { starts_at + duration }

context "when the duration is less than 24 hours" do
  context "when the start and the end time are on the same day" do
    let(:starts_at) { Time.zone.parse("2018-06-01 07:00") }
    let(:duration)  { 13.hours }

    it "returns a range including only the day the trip started" do
      is_expected.to eq starts_on..starts_on
    end
  end

  context "when the trip spans across 2 calendar days" do
    context "when the majority of the trip happens on the first day" do
      let(:starts_at) { Time.zone.parse("2018-06-03 11:00") }
      let(:duration)  { 22.hours }

      it "considers only the first day" do
        is_expected.to eq starts_on..starts_on
      end
    end

    context "when the majority of the trip happens on the second day" do
      let(:starts_at) { Time.zone.parse("2018-06-01 20:00") }
      let(:duration)  { 23.hours }

      it "considers only the second day" do
        is_expected.to eq ends_on..ends_on
      end
    end
  end
end

context "when the duration is between 24 and 48 hours" do
  context "when the trip spans across 2 calendar days" do
    let(:starts_at) { Time.zone.parse("2018-06-01 11:00") }
    let(:duration)  { 36.hours }

    it { is_expected.to eq starts_on..ends_on }
  end

  context "when the trip spans across 3 calendar days" do
    context "when a majority of time is spent on the last day compared to the first day" do
      let(:starts_at) { Time.zone.parse("2018-06-01 18:00") }
      let(:duration)  { 45.hours }

      it "excludes the first day" do
        is_expected.to eq (starts_on + 1)..ends_on
      end
    end

    context "when a majority of the rental's total time is on the first day rather than on the last day" do
      let(:starts_at) { Time.zone.parse("2018-06-01 10:00") }
      let(:duration)  { 48.hours }

      it "excludes the last day" do
        is_expected.to eq starts_on..(ends_on - 1)
      end
    end
  end
end

I rewrote the test names as the ones we had were Example 1, Example 2, and so on. They were extracted from a spreadsheet of use-cases the product team gave us.

What you may see here is that those examples describe some use-cases that we believed would be enough to ensure that the implementation was correct: ie to cover all cases. And it actually covered the given specifications correctly. And the implementation was making all tests green. Unfortunately, the whole team forgot about this one:

context "when the duration is less than 24 hours" do
  context "when the trip spans on 3 days (because of daylight savings)" do
    let(:starts_at) { "2018-03-24 23:30".in_time_zone("Europe/Paris") }
    let(:duration)  { 24.hours } # Produces this time: 2018-03-26 00:30

    it "excludes the first and last days" do
      is_expected.to eq (starts_on + 1)..(ends_on - 1)
    end
  end
end

Because of daylight savings in some time zones, we could have one trip that spans across more than N + 1 calendar days, where N is the number of 24-hour chunks between starts_at and ends_at. The first lesson here is to be really careful about the edge cases.

While in this example it does look like an edge case, it was actually a bit more common. We have an extra rule that allows a trip starting from 10am and finishing at 11am the next day to be considered as a one - rather than two - day trip.

Approaching tests from a different angle

The point of the article is to show that without being more clever, we could leverage another strategy to explore the expected behavior and detect that missing use-case from earlier.

subject(:date_range) do
  described_class.date_range(@starts_at, @ends_at)
end

context "when the trip spans over the same number of days than its duration" do
  add_constraint { @trip_span_size == @number_of_days }
  it { is_expected.to eq @starts_on..@ends_on }
end

context "when the trip spans over one more day than its duration" do
  add_constraint { @trip_span_size == @number_of_days + 1 }
  
  context "when the lowest amount of time is spent on the last day" do
    add_constraint { time_spent_on(@starts_on) >= time_spent_on(@ends_on) }
    it { is_expected.to eq @starts_on..(@ends_on - 1) }
  end

  context "when the lowest amount of time is spent on the first day" do
    add_constraint { time_spent_on(@starts_on) < time_spent_on(@ends_on) }
    it { is_expected.to eq (@starts_on + 1)..@ends_on }
    end
end

context "when the trip spans over two more days than its duration" do
  add_constraint { @trip_span_size == @number_of_days + 2 }
  it { is_expected.to eq (@starts_on + 1)..(@ends_on - 1) }
end

# This method is called for each test until the result meet all the constraints.
# If a context doesn't meet any branch of the constraint tree, then it raises an
# error telling you what context you may be missing.
def generate_context
  @starts_at = random_datetime
  @duration = random_trip_duration
  @ends_at = @starts_at + @durationlike
  @number_of_days = Rational(@duration.to_i, 1.hour.to_i).ceil
  @starts_on = @starts_at.to_date
  @ends_on = @ends_at.to_date
  @trip_span_size = (@ends_on - @starts_on + 1)
end

def time_spent_on(day)
  day = day.in_time_zone(@starts_at.time_zone)
  from_time = [day.beginning_of_day, @starts_at].max
  to_time = [@ends_at, day.end_of_day].min
  to_time - from_time
end

# Below are some shared helpers that could be reused everywhere.

def random_datetime
  time_zone = ActiveSupport::TimeZone::MAPPING.values.uniq.sample
  datetime = ActiveSupport::TimeZone[time_zone].local(
    rand(2010..(Time.zone.now.year + 2)), # year
    rand(1..12),                          # month
    1,                                    # day
    rand(0..23),                          # hour
    [0, 30].sample,                       # minute
    0,                                    # second
  )
  day_offset = (0...(datetime.end_of_month.day)).to_a.sample # randomize the day
  datetime + day_offset.days
end

def random_trip_duration
  rand(1.second..30.days)
end

Here the add_constraint and generate_context are features that doesn’t exists yet. If you’re interested to work on implementing them, let me know!

Using that kind of approach leads to fewer examples, and to ones that are more meaningful. Now, the team needs to find properties that the subject under test should respect given a certain context.

The product and the developer must, together, come with both those contexts and properties. They force us to clarify our thinking. Here it means that we reformulate relevant days from the original specification. The context and properties forces us to extract the domain related concepts of number_of_days, trip_span_size and time_spent_on which could help to model the problem and maybe lead to a clearer solution.

Random generators can be shared across the application. Custom generators for any value of your domain must be available, very much like factories would be.

If it was that great, everyone would be doing it, right?

Caveats and workarounds

Coding the logic twice

In this appoach, we need to use elements from the context (such as @starts_on, @ends_on) to compute the expected results. What prevents me from making a mistake in both the expected value computation and the production code?

The use-cases approach is simpler to setup and less risky to write because it focuses on a single and fixed context. Even when the context isn’t fixed, we could use constraints on it in order to reduce the complexity of the expected result computation.

In the examples, the arithmetic on start and end dates are the same in term of complexity.

Too much generalization

The obvious difference between the two approaches is that the use-cases are really close to reality while the one using randomness forces us to come with well-structured rules and a more generalized approach. Driving the implementation from the use-cases may be more natural for TDD practitioners. The use-cases are needed in order to find relevant properties and contexts. Thus, use-cases are still mandatory in the process.

Not bad but… what about determinism

Using randomness is something that many people are afraid of. They may feel that they are losing control, that their test suite is gonna start slowing them down. Here are two remarks that are deep enough to, maybe, make you reconsider:

The tests are random as soon as impure functions are used such as Date.current.
The tests are random since they are randomized at programming-time by the developper *.

Those remarks implie that there are various classes of randomness. One is comming from impure functions either in the tests or in the production code. Those could lead to flaky tests.

Another one, introduced purposefully, which is here to help us to discover failures, to reveal inconsistencies in our thinking, and to detect unexpected behaviour as soon as possible.

Reproducing failures

Your tests will run on CI and will give you failures. Once spec fails, it isn’t obvious what the generated inputs were. Being able to understand and reproduce a failure is critical.

In the example, the context is lost upon failure. It is simple to get that context and it would give us a good hint as to what’s going on. Here is an example:

def must_equal(value)
  expect(subject).to eq(value), <<~MSG
  	Expected #{subject} to eq #{value} while using:
    - Starts at: #{@starts_at}
    - Ends at: #{@ends_at}
  MSG
end

# Replace this:
it { is_expected.to eq (@starts_on + 1)..(@ends_on - 1) }

# With:
it { must_equal (@starts_on + 1)..(@ends_on - 1) }

I’m also experimenting with a custom pseudo-random generator that would use a different seed for each test and, in case of a failure, would display that specific seed to you. This experiment is a bit raw at the moment but lives in Github’s nicoolas25/fuzzier repository. It would look like this:

def random_datetime
  time_zone = Fuzzier.sample(ActiveSupport::TimeZone::MAPPING.values.uniq)
  datetime = ActiveSupport::TimeZone[time_zone].local(
    Fuzzier.rand(2010..2020),
    Fuzzier.rand(1..12),
    1,
    Fuzzier.rand(1..23),
    Fuzzier.rand(1..59),
    Fuzzier.rand(1..59),
  )
  day_offset = Fuzzier.rand(0...(datetime.end_of_month.day))
  datetime + day_offset.days
end
  
def random_trip_duration
  Fuzzier.rand(1.second..30.days)
end

When an error occurs, it will output an integer, lets say 12345 that can be used to reproduce the same randomness:

it "has as many days as the number of days of the trip", fuzzier: 12345 do
  # ...
end

The faker gem provides something similar with Faker::Config.random.rand.

Using only one generation

This approach is very similar to property-based-testing. The difference is mostly that we don’t try many input sets on those examples; only one. But because tests run quite often, we end with way more use-cases over time. Solutions like Rantly fully embrace property-based testing and provide more tools including the ability to run a test against many input generations.

Because I see this approach more like an exploration tool, we could try to run a given test many times to be more confident that nothing could go wrong. It would look like this:

1_000.times do
  it "has as many days as the number of days of the trip" do
    # ...
  end
end

Doing that exploration may show you some use-cases you missed and give you more confidence that the properties you specified truly match the requirements.

When to use it

I think using this kind of approach has multiple benefits:

Conciseness & expressiveness of the specifications, as we don’t test samples but we specify the expected behavior using the language of the problem.
Adaptive and dynamic examples over the life of the test suite, as the test will run against new domain values as they are introduced in the application over time.
Better maintainability, as we can reason about properties rather than a long list of examples.

I wouldn’t recommend this approach for integration testing where the goal is rather to secure well-known paths rather than explore all the possible cases. Also, I think about UI tests as a place I wouldn’t like randomness. You may want to compare screenshots of your application and that would be harder if the content was changing.

But, for components where we need its behavior to be fully described, I would consider this approach. I would consider it in addition to the usual use-cases for some edge cases. It forces me to think more about the problem and to have deeper discussions with the business. It can also point me to cases I didn’t think of.

As I said before, this technique can be a bit controversial and I invite you to talk about this with your team and share your opinion!

Did you enjoy this post? Join Getaround's engineering team!

View openings

« Your JavaScript can reveal your secrets

Design system and API-Driven UI »