How much flakiness do you tolerate in end to end tests?

kersplort@programming.dev · edit-2 1 year ago

How much flakiness do you tolerate in end to end tests?

Pantoffel@feddit.de · edit-2 1 year ago

I think/hope that the wording you used was a mistake.

End to end tests do not introduce flakiness, but uncover it.

Whenever we discover flakiness, we try to fix it immediately. When there is no time for the fix (which is more than often the case) we create a ticket that vanishes in the backlog.

For a long time the company I currently work at didn’t have end to end tests save unit tests for a lot of their code.

Through a push of newcomers we finally managed to add end to end tests to many more parts of the code. However, these are still not properly documented. Some end to end tests overlap and some only cover a small part of one larger functionality. That is why we often find bugs that were introduced by us, because we had no end to end tests covering those parts.

We used to run end end tests only every night on the whole product. They usually take an hour or more to complete. This takes too long to run them before each merge. However, we have them organized enough such that for sub-product A we can run the sub-product A end to end tests only before each merge where we assume that we did only touch code affecting sub-product A. In case the code changes affected some other parts of the product, the nightly tests help us out. We are doing this in my team for a long while now. But we just recently started to establish this procedure in the other teams of the company, too.

kersplort@programming.dev · 1 year ago

My experience with E2E testing is that the tools and methods necessary to test a complex app are flaky. Waits, checks for text or selectors and custom form field navigation all need careful balancing to make the test effective. On top of this, there is frequently a sequentiality to E2E tests that causes these failures to multiply in frequency, as you’re at the mercy of not just the worst test, but the product of every test in sequence.

I agree that the tests cause less flakiness in the app itself, but I have found smokes inherently flaky in a way that unit and integration tests are not.

Pantoffel@feddit.de · 1 year ago

Okay I must admit that I do not have much experience with smoke and integration tests. We run end to end tests only and skip running the other two types entirely. They would be covered by the end to end tests anyways.

Perhaps I am lucky in that our software doesn’t require us to use many waits at all. Most things are synchronous and those that are not mostly have API endpoints where the status of the process an be safely queried, i.e. a wait(1000) and hope for the best is not necessary, but rather do wait(1000) until isFinished().

And yes, for us it is also a mess of errors popping up when one step in a pipeline fails, where many tests rely on this single step. I don’t know whether there is a way to approach this issue neatly. This is surely a chance in the market to be taken.

learningduck@programming.dev · 10 months ago

You run E2E test before each merge. So, you don’t merge very often?

How about running an integration test before each merge instead of a full fledged E2E and mocking out external dependencies (other services) during the test, then do E2E testing on a schedule like nightly?

I prefer it this way, because mocking out external dependencies cut out network instability and bugginess from dependencies. So, we can merge faster. Agree that test scenarios are overlapping, and if your E2E is very stable then it is probably not worth it, but unfortunately it’s not so stable in my environment.

Pantoffel@feddit.de · edit-2 10 months ago

Luckily, our e2e tests are pretty stable. And unfortunately we are not given the time to write integration tests as you describe. The good thing would be that with these mocks we were then also be able to load test single services instead of the whole product.

We merge multiple times a day and run only those e2e tests we think are relevant. Of course, this is not optimal and it is not too rare that one of the teams merges a regression, where one team or more talented at that than the others.

You see, we have issues and we realize we have them. Our management just thinks these are not important enough to spend time on writing integration tests. I think money and developer time are two of the reasons, but the lack of feature documentation, the lack of experts for parts of the codebase (some already left for another employer), and the amount of spaghetti code and infrastructure we have are other important reasons.

learningduck@programming.dev · 10 months ago

Reading the 3rd paragraph and I see myself 😄. Glad that you and the team managed to add another layer of testing successfully.