Adjust eBPF (user ring) by thiagoftsm · Pull Request #21676 · netdata/netdata
Summary
Test Plan
Additional Information
This PR was tested on the following distributions and kernels:
| Linux Distributions | Kernel |
|---|---|
| Slackware Linux | 6.12.71 |
| Arch Linux | 6.18.9-arch1-2 |
| Ubuntu 24.04 | 6.8.0-31-generic |
| Ubuntu 22.04 | 5.15.0-91-generic |
| CentOS 9 stream | 5.14.0-686.el9.x86_64 |
For users: How does this change affect me?
Summary by cubic
Refactored the eBPF plugin around the new ebpf_library to unify chart/dimension helpers and module orchestration. Stabilized HardIRQ (latency-only) with legacy load restored, and kept SoftIRQ disabled by default.
-
Refactors
- Integrated libbpf_api/ebpf_library via CMake; modules now share common chart/dimension and load/unload helpers.
- Standardized chart ordering for FD, Socket, and Sync (incl. cgroup/systemd), unified filesystem temp map size, and normalized TCPv6 connect type and close_fd detection.
- Expanded unit tests with a runner and coverage for the library and socket/module load paths.
-
Bug Fixes
- IPC/shmem: validated PID index bounds and thread presence before deletion; cgroup mmap now checks MAP_FAILED.
- DCStat: per‑pid and publish counters moved to 64‑bit; refined fast‑lookup optional detection.
- Kernel: version parsing now null‑terminates input and validates major/minor/patch components.
- Unload stability: standardized exit paths and added safe-clean guards across modules (disk, fd, mdflush, softirq, swap); socket IPC stop now uses ebpf_plugin_stop.
Written for commit 3b04a9f. Summary will update on new commits.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 45 out of 45 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/collectors/ebpf.plugin/ebpf_disk.c:436
ebpf_cleanup_disk_list()now frees thenetdata_ebpf_disks_tnodes without freeing dynamically allocated histogram strings (e.g.,w->histogram.nameisstrdupz()'d). This leaks memory on shutdown / disk removal. Call the newebpf_disks_cleanup()helper (or explicitlyfreez()histogram.name/title/ctx) before freeing the node.
static void ebpf_cleanup_disk_list(void)
{
netdata_ebpf_disks_t *move = disk_list;
while (move) {
netdata_ebpf_disks_t *next = move->next;
freez(move);
move = next;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/collectors/ebpf.plugin/ebpf_vfs.c">
<violation number="1" location="src/collectors/ebpf.plugin/ebpf_vfs.c:1343">
P2: The cgroup aggregation now takes the max of each per-PID counter instead of summing them, which undercounts cgroup totals whenever more than one PID contributes. Since each `pids->vfs` is a per-PID snapshot, the aggregation should still sum across PIDs (and handle monotonicity separately if needed).</violation>
</file>
<file name="src/collectors/ebpf.plugin/ebpf_disk.c">
<violation number="1" location="src/collectors/ebpf.plugin/ebpf_disk.c:489">
P2: The new `has_resources` guard skips cleanup when tracepoints were enabled by this thread but no other resources were allocated (e.g., `read_local_disks()` fails after enable). Since `was_block_*_enabled` is 0 when this module enables the tracepoints, `has_resources` remains false and `ebpf_disk_disable_tracepoints()` is never called, leaving tracepoints enabled. Track whether this thread enabled tracepoints (or restore the previous `disk_safe_clean` logic) so cleanup always runs in that case.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/collectors/ebpf.plugin/ebpf_disk.c">
<violation number="1" location="src/collectors/ebpf.plugin/ebpf_disk.c:490">
P2: Cleanup is skipped if ebpf_disk_enable_tracepoints() fails, even though mutexes were already initialized. That can leave initialized mutexes/resources around and make later restarts unsafe. Consider enabling cleanup as soon as those resources are allocated (or restoring the resource-based check) so failure paths still destroy them.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/collectors/ebpf.plugin/ebpf_hardirq.c">
<violation number="1" location="src/collectors/ebpf.plugin/ebpf_hardirq.c:216">
P2: The new early return in hardirq_cleanup skips unloading tracepoints/BPF objects when hardirq_safe_clean is false, which can leave probes attached on early failure paths. Keep the cleanup logic running even when the safe-clean flag is false, and only skip chart obsoletion.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/collectors/ebpf.plugin/ebpf_hardirq.c">
<violation number="1" location="src/collectors/ebpf.plugin/ebpf_hardirq.c:231">
P1: The hardirq BPF load/attach logic is fully commented out, so `ebpf_hardirq_load_bpf` always returns success without loading any program. The thread continues as if load succeeded, which leaves the collector running with no attached BPF programs and no data collected.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/collectors/ebpf.plugin/ebpf_hardirq.c">
<violation number="1" location="src/collectors/ebpf.plugin/ebpf_hardirq.c:250">
P2: Don’t skip destroying the libbpf object during cleanup. Leaving the destroy call commented out leaks BPF maps/links and file descriptors, and can keep resources attached until process exit.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/collectors/ebpf.plugin/ebpf_hardirq.c">
<violation number="1" location="src/collectors/ebpf.plugin/ebpf_hardirq.c:222">
P0: The unconditional early return disables the HardIRQ collector thread, making the module non-functional.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| static void hardirq_exit(void *pptr) | ||
| static void hardirq_cleanup(void *pptr) | ||
| { | ||
| return; |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0: The unconditional early return disables the HardIRQ collector thread, making the module non-functional.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/collectors/ebpf.plugin/ebpf_hardirq.c, line 222:
<comment>The unconditional early return disables the HardIRQ collector thread, making the module non-functional.</comment>
<file context>
@@ -219,6 +219,7 @@ static void ebpf_obsolete_hardirq_global(ebpf_module_t *em)
*/
static void hardirq_cleanup(void *pptr)
{
+ return;
ebpf_module_t *em = CLEANUP_FUNCTION_GET_PTR(pptr);
if (!em)
</file context>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters