Add import history cleanup task and max_import_history setting fixes … by tejas0077 · Pull Request #14521 · DefectDojo/django-DefectDojo
Description
When running frequent reimports, the dojo_test_import_finding_action table
grows infinitely causing significant database bloat (reported cases of 19GB+).
This PR adds a configurable max_import_history setting similar to the existing
max_dupes feature to automatically clean up old import history records.
Changes made:
- Added max_import_history field to System_Settings model
- Added DD_IMPORT_HISTORY_MAX_PER_OBJECT setting to settings.dist.py
- Added async_import_history_cleanup celery task in tasks.py
- Added migration 0262_system_settings_max_import_history.py
Closes #13776
Test results
Manually verified the new field appears correctly in the System_Settings model.
The cleanup task follows the same pattern as the existing async_dupe_delete task.
Documentation
No documentation changes needed.
Checklist
- Make sure to rebase your PR against the very latest dev.
- Features/Changes should be submitted against the dev branch.
- Model changes include the necessary migrations in dojo/db_migrations folder.
- Added the proper labels to categorize this PR.
🔴 Risk threshold exceeded.
This pull request modifies several sensitive files (dojo/db_migrations/0262_system_settings_max_import_history.py, dojo/models.py, and dojo/tasks.py) that are flagged by the configured codepaths scanner; these edits may require review or changes to .dryrunsecurity.yaml to permit the authors or paths. Please review the changes carefully for security impact and update allowed paths/authors if the edits are intended.
🔴 Configured Codepaths Edit in dojo/db_migrations/0262_system_settings_max_import_history.py (drs_feb15ef0)
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
🔴 Configured Codepaths Edit in dojo/models.py (drs_30105223)
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
🔴 Configured Codepaths Edit in dojo/tasks.py (drs_fc47c623)
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
🔴 Configured Codepaths Edit in dojo/tasks.py (drs_514a6cfc)
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
We've notified @mtesauro.
Comment to provide feedback on these findings.
Report false positive: @dryrunsecurity fp [FINDING ID] [FEEDBACK]
Report low-impact: @dryrunsecurity nit [FINDING ID] [FEEDBACK]
Example: @dryrunsecurity fp drs_90eda195 This code is not user-facing
All finding details can be found in the DryRun Security Dashboard.
Comment on lines +223 to +224
| except System_Settings.DoesNotExist: | ||
| return |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please log something or just let the exception bobble up
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test this as there is no trigger configured for this job to run?
Can you add a test case to make sure the right entries are deleted (oldest first, no more than needed, ...)
Hi @valentijnscholten, thank you for the review!
I have addressed the first point by adding proper logging in the System_Settings.DoesNotExist exception handler.
Regarding the trigger for the job — you are correct that there is no periodic task configured yet. Should I add a Celery beat schedule similar to the existing async_dupe_delete task? Looking at the codebase, async_dupe_delete is triggered via the system settings UI. Should max_import_history follow the same pattern?
Regarding test cases — I will add unit tests to verify:
- Oldest entries are deleted first
- No more than the configured limit are deleted per run
- Entries within the limit are not deleted
Could you point me to an existing test for async_dupe_delete so I can follow the same pattern?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import histories serve as an audit log for what occurred during an import/reimport. Rather than just dropped after a certain number of events, It would be better to make this cleanup time based (i.e. delete records older than one year)
Thanks @Maffooch! That makes sense — treating import history as an audit log with time-based retention is a better approach. I'll rework the cleanup task to delete records older than a configurable number of days (defaulting to 365 days / 1 year) instead of using a count-based limit. Should the setting be called import_history_retention_days or do you have a preferred name for it?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters