perf(fp-history): batch false positive history processing by valentijnscholten · Pull Request #14449 · DefectDojo/django-DefectDojo

@valentijnscholten

Replaces the N+1 query pattern in false positive history with a single
product-scoped DB query per batch, and switches per-finding save() calls
to QuerySet.update() to eliminate redundant signal overhead.

Changes:
- Extract _fp_candidates_qs() as the single algorithm-dispatch helper
  shared by both single-finding and batch lookup paths
- Add do_false_positive_history_batch() which fetches all FP candidates
  in one query and marks findings with a single UPDATE
- do_false_positive_history() now delegates to the batch function
- post_process_findings_batch (import/reimport) calls the batch function
  instead of a per-finding loop
- _bulk_update_finding_status_and_severity (bulk edit) groups findings
  by (product, dedup_alg) and calls the batch function once per group;
  retroactive reactivation also batched the same way
- Fix dead-code bug in process_false_positive_history: the condition
  finding.false_p and not finding.false_p was always False because
  form.save(commit=False) mutates the finding in place; fixed by
  capturing old_false_p before the form save
- Replace all per-finding save()/save_no_options() in FP history paths
  with QuerySet.update() (bypasses signals identically to the old calls)
- Move all FP history helpers from dojo/utils.py to
  dojo/finding/deduplication.py alongside the matching dedupe helpers

All update() calls carry a comment explaining the signal-bypass
equivalence with the previous save(skip_validation=True) calls.

Adds 4 unit tests covering: batch single-query behaviour, retroactive
batch FP marking, retroactive reactivation (previously dead code), and
the no-reactivation guard.
Limit _fetch_fp_candidates_for_batch to only the fields actually read
from candidate objects (id, false_p, active, hash_code,
unique_id_from_tool, title, severity), avoiding loading unused columns.

Correct update() comments to clarify that .only() does not constrain
QuerySet.update() — Django generates UPDATE SQL independently — so the
sync requirement is only for fields *read* from candidate objects.
assertNumQueries(7) on both batch tests covers: System_Settings,
4 lazy-load chain (test/engagement/product/test_type from findings[0]),
candidates SELECT with .only(), and the bulk UPDATE — fixed regardless
of batch size or number of retroactively marked findings.
New test creates 5 pre-existing findings and asserts the batch still
uses exactly 7 queries regardless — proving the old O(N) per-finding
save loop is gone and a single bulk UPDATE covers all affected rows.

mtesauro

Maffooch

blakeaowens

tejas0077 pushed a commit to tejas0077/django-DefectDojo that referenced this pull request

Mar 30, 2026
…#14449)

* perf(fp-history): batch false positive history processing

Replaces the N+1 query pattern in false positive history with a single
product-scoped DB query per batch, and switches per-finding save() calls
to QuerySet.update() to eliminate redundant signal overhead.

Changes:
- Extract _fp_candidates_qs() as the single algorithm-dispatch helper
  shared by both single-finding and batch lookup paths
- Add do_false_positive_history_batch() which fetches all FP candidates
  in one query and marks findings with a single UPDATE
- do_false_positive_history() now delegates to the batch function
- post_process_findings_batch (import/reimport) calls the batch function
  instead of a per-finding loop
- _bulk_update_finding_status_and_severity (bulk edit) groups findings
  by (product, dedup_alg) and calls the batch function once per group;
  retroactive reactivation also batched the same way
- Fix dead-code bug in process_false_positive_history: the condition
  finding.false_p and not finding.false_p was always False because
  form.save(commit=False) mutates the finding in place; fixed by
  capturing old_false_p before the form save
- Replace all per-finding save()/save_no_options() in FP history paths
  with QuerySet.update() (bypasses signals identically to the old calls)
- Move all FP history helpers from dojo/utils.py to
  dojo/finding/deduplication.py alongside the matching dedupe helpers

All update() calls carry a comment explaining the signal-bypass
equivalence with the previous save(skip_validation=True) calls.

Adds 4 unit tests covering: batch single-query behaviour, retroactive
batch FP marking, retroactive reactivation (previously dead code), and
the no-reactivation guard.

* perf(fp-history): add .only() to candidate fetch, fix update() comments

Limit _fetch_fp_candidates_for_batch to only the fields actually read
from candidate objects (id, false_p, active, hash_code,
unique_id_from_tool, title, severity), avoiding loading unused columns.

Correct update() comments to clarify that .only() does not constrain
QuerySet.update() — Django generates UPDATE SQL independently — so the
sync requirement is only for fields *read* from candidate objects.

* test(fp-history): assert exact query count in batch tests

assertNumQueries(7) on both batch tests covers: System_Settings,
4 lazy-load chain (test/engagement/product/test_type from findings[0]),
candidates SELECT with .only(), and the bulk UPDATE — fixed regardless
of batch size or number of retroactively marked findings.

* test(fp-history): assert query count stays flat with N affected findings

New test creates 5 pre-existing findings and asserts the batch still
uses exactly 7 queries regardless — proving the old O(N) per-finding
save loop is gone and a single bulk UPDATE covers all affected rows.