Everything I know about end-to-end frontend tests

This week, I thought I'd share some notes on end-to-end tests, sometimes also known as "acceptance tests" or "journey tests."

Photo by Call Me Fred / Unsplash

No Simpler Machines next week – I'll be on vacation. After that I'm not sure, starting a new contract and might want to put this on hiatus. One possibility is that I'll start a brief pop-up newsletter to write about what I'm learning about front-end development and testing. If you want to read such a thing you can sign up here. I can't 100% commit to it but a bunch of you expressing interest would definitely push me in that direction.

Also– hi! I'm Nat Bennett, and this is my newsletter. It's mostly but not exclusively about software development. This week, I thought I'd share some notes on end-to-end tests, sometimes also known as "acceptance tests" or "journey tests." These are tests where you have some kind of simulated or headless browser and your code is simulating actually navigating your application by clicking and typing.

E2E tests are a distributed system made of threads.

The threads communicate by passing messages, but perform operations asynchronously. The three (or more) threads are running, respectively, your test code, your application code, and the browser. Everything you know about asynchronous programming applies equally to interactions between test code and application code.

Never use sleeps in test code.

This is the main practical consequence of point one. Any time you want to use sleep, use the "wait" pattern instead. For example, when you want to access an element but you need to wait for the element to render, check for that element in a loop until you find it or hit a timeout. Ideally your test framework does this for you.

Specify elements semantically.

"Semantically" here means "with a symbol that has meaning." Ideally, this is an id tag for that element specifically. If the element doesn't have the id already, add it for the test. Avoid complex location or class-based element selection. It'll break at some point and you might not find out right away.

Write mostly happy-path tests.

E2E tests are relatively slow. The total number of tests you could run is infinite. You need to get it down to a bounded number somehow. The number of major basic workflows through the application is probably pretty finite, so a good rule of thumb is to use E2E tests for those and do all your edge case and error conditions in smaller tests.

Don't write regression tests.

One of the ways these suites get unmanageably huge is that people will write tests for bugs they've fixed. I'm not sure this is quite a "never" but it's very close. Regression tests in general are pretty low value, since the total possible number of new bugs is always vastly larger than the particular bugs that you know about already, so by writing bugs you already know about, you're inherently throwing darts at a smaller target– bad odds. But it's especially nasty in E2E land where the tests are so slow.

Run the tests on every check-in.

Since the tests are kind of slow it's tempting sometimes to run them occasionally– say, before a release or a merge. But you'll save yourself a lot of time and pain if you run them constantly. That way if you introduce a weird problem by bumping a dependency you'll know about it immediately.

Use the application yourself a lot. Play around with it.

One way to keep your test load manageable is to use the software a lot, especially with an eye towards "hmm, I wonder what would happen if I..." In some cases honestly I don't write any of this kind of test or test the frontend automatically at all – I rely on unit tests and regular by-hand exploration. There's also a lot of subtle look-and-feel stuff that you can only catch by actually using the application.

Avoid writing helper libraries.

I don't know why this is exactly but helpers for these kinds of tests always seem to get gnarly and cumbersome. This kind of test code often has a lot of duplication in it that's incidental rather than meaningful, and extracting it into helpers solidifies that duplication into abstractions that couple things that shouldn't be coupled– they don't change together. Rather than eliminating duplication, focus on keeping your test code clear. Each test should tell a recognizable story about a user journey.

Avoid Cucumber & friends.

I love me some Gherkin syntax but Cucumber test steps are inherently re-usable helpers, but structured in a way that causes them to accumulate lots of not-quite-the-same copies of each other. Stick to Rspec or XUnit style test structure and write any helpers you do write as functions, not string matchers.

If anything weird happens, first, check your dependency versions.

Make sure the version of any test libraries you're using are compatible with each other and with the version of the browser simulator you're using.

At the end of the day, remember that it's just manipulating HTML.

Occasionally I'll run into folks who think that it's really hard to write tests for [framework] or that you need a bunch of special voodoo to write tests for a single-page app. There's some truth to that– the "it's a distributed system" point that we started with– but remember that at the end of the day you're passing input into functions that produce HTML, and then check that the HTML is what you expect. Inside the function might be pretty complicated, but it's still basically data goes in -> data comes out.