We’ve recently extracted the checker_jobs gem from our codebase. It’s a simple alerting tool with a very specific purpose which this article will explain.
Over time, we update the rules that our data has to comply with. Making sure our data is always what we expect it to be is hard, especially when old constraints change, new constraints come along, new fields are added, backfill isn’t always possible…
Even with a careful team behind it, the system can produce corrupted data for weeks, months, or years before anyone notices. By that time, it could be too late or just impossible to fix. In comparison, crashes are noticed faster and could be corrected quickly, when a data issue could spread and impact many parts of the system making the issue way more expensive to fix.
The checker_jobs
are here to be sure that when this sneaky data corruption happens,
you notice it right away.
Imagine we’ve got, a users
table with a terms_of_services_accepted_at
column. This
column could be set for new users but not for old ones. We need the user to accept the
ToS before they can book a trip on our platform. Unfortunately, old trips aren’t subject
to that rule since the column didn’t exist back then.
We’ll do the best we can to be sure that we update all our user’s paths to take that new requirement into account. Even with our nice test suite, we don’t cover all the code paths, especially with all the production data. That data isn’t fresh from a testing factory, but testing on legacy, old, and sparse data is a different topic!
So to get some peace of mind, we would like to be sure that there are no recent trips
booked where the driver didn’t accept the ToS. What we could do is write a piece of
code verifying that we have no trips with users having the users.terms_of_services_accepted_at
unset.
The gem is offering you a quick way to get alerted when this piece of code finds such a trip. You can basically get notifications (emails, bugtrackers, …) when a trip doesn’t honor the ToS rule.
It would look like this:
Then you would have to enqueue that TripChecker
as often as you want to do that verification.
In our case, because we use a Ruby tasks scheduler and Sidekiq, it looks like this:
Here is an example of what we see in Bugsnag when one of our checkers is triggered:
There are others solutions to this issue like:
We try to use those when it makes sense, and we advise you to do the same.
Still, the checker_jobs
are different from all of those solutions:
Of course, they don’t provide the same guarantees compared to the other solutions thus the comparison isn’t that fair.
You could give checker_jobs a go, follow the instructions on Github and tell us how it went!
In the future, there are many things that we would like to see, things such as:
checker_jobs-web
a extra gem that allows you to publish the results of the checks on a dedicated web UI, and of courseWe intend to extract and release more of this kind of libraries and we hope others will find them useful.