Draft: WIP: manage confidence level on job results
There is a debate about the confidence level we have in the results provided by a job in a pipeline.
This tool is initially meant to collate information from a job or a pipeline, but its features when further when we add to it the capacity to update expectation files based on the results.csv
and failures.csv
files the jobs provide. It tries to do what a developer do manually, but there are different behaviors based on details in the source.
When one ran a testing pipeline, the failing jobs are automatically retried based on GitLab CI rules. Then the tool uses the information from those retries to know if the results are consistent between runs. It's easy to think it is a Flake
instead of a Fail
if the results aren't consistent. But when processing a Nightly run pipeline, those heavy jobs aren't retried, so there is no information to see inconsistencies, so the confidence level on the results drops. There is a stream of thought to say with only one Fail
we cannot add to *-fails.txt
file, instead to *-flakes.txt
. But as far as we don't have a way to remove tests from flakes, we can end adding everything to flakes and cheating one of the purposes of the CI.
There are two issues open in deqp-runner
that can affect this development (mesa/deqp-runner#48 and mesa/deqp-runner#49).
Meanwhile, this merge request will experiment on different ways we can address the problem.