core/db: add db custom backend
This patch improves performance of matching filters. Most of the additional info is in commit messages, here is a copy-paste info of the most important commit with benchmarks.
Creating IssueFilters involves filtering UnknownFailures in Database.
It's one of the most common thing that real users (non bots) do on
CIBugLog. It should be fast but sometimes it's not, due to multiple
joins that are done on Database side. In case of simple filter
"status_name = 'fail' AND dmesg ~= 'error'" PostgreSQL first filter
results by dmesg, and later joins TextStatus table and filters it.
Anyway, the point is that it's complicated and not that fast.
It's really tricky problem, one solution that I had and tested with
success is using MaterializedViews.
https://www.postgresql.org/docs/current/rules-materializedviews.html
I've created materialized view for UnknownFailures which contains only
post merge failures (non-temporary) and reduces joins as much as
possible and sensible.
Here are benchmarks I've done to vouch for merging that patch.
*Notes:*
1. I've run benchmark on production size database, which is huge, and
contained about 3250 unknown failures.
2. I've copied generated SQL query by CIBugLog, and used it with
PostgreSQL's EXPLAIN ANALYZE to get timings.
Case 1 (simple query):
Query: status_name = 'fail'
Timings:
Before: 1.29s, 1.23s, 1.32s, 1.23s, 1.27s
After: 5ms, 8ms, 7ms, 7ms, 5ms, 4ms
Case 2 (dmesg regex query):
Query: status_name = 'fail' AND dmesg ~= 'error'
Timings:
Before: 3.8s, 3.8s, 3.7s
After: 800ms, 700ms, 720ms
Case 3 (advanced query):
Query: status_name = 'fail' AND dmesg ~= 'error' AND runconfig_tag =
'xe' AND machine_tag IS IN ['BMG', 'DG2', 'ADL'] AND test_name ~=
'igt@kms*'
Timings:
Before: 4s, 3.9s, 3.9s, 4.2s, 3.9s
After: 920ms, 990ms, 875ms, 880ms
Summary:
As we can see materialized view based queries perform much faster, due
to reduced number of joins, especially nested ones. In the case 2 and 3
materialized view performs about 4times faster which would be the case
of typical issue filter. In the best case scenario (case 1) materialized
view queries perform about 240times faster, 1200ms vs 5ms. But the
noticeable difference for real person would be about 1s which is not as
breath-taking.
Cons:
Unfortunatelly there are some cons to that patch:
1. Unknown failures materialized view with it's indexes takes some
space, in my case of 3250 failures it takes about 150MB with Indexes.
For 3250 failures it's really big number but we have to keep in mind
that these failures contain usually pretty big dmesg/stderr log, in
many cases 10k+ lines.
2. Materialized views have to be refreshed when data changes and we want
it to be up-to-date. In this case it's refreshed in two situations:
* new unknown failures are uploaded
* unknown failures are deleted (assigned to known failures)