ROX-33621: Refactor VM host probing and diagnostics by vikin91 · Pull Request #19458 · stackrox/stackrox
Please address the comments from this code review: ## Overall Comments - In `discoverDnfStatusFlags`, consider reusing `hostprobe.DetectDNFVersion` instead of re-deriving DNF version from history DB paths to keep version-detection logic centralized and avoid future drift. - In `discoverDnfCacheRepoDirPresence`, the DNF5 detection logic relies on `strings.Contains(cachePath, "/libdnf5")`; it would be more robust to infer DNF5 based on the specific cache root being iterated (e.g., the input path) rather than substring matching on the full path. ## Individual Comments ### Comment 1 <location path="compliance/virtualmachines/roxagent/index/index.go" line_range="78-85" /> <code_context> + indexerCfg.HostPath, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout) + + // This may slow the indexing process down by 1-2 seconds, but the diagnostics are invaluable for debugging. + logFilesystemDiagnostics(indexerCfg.HostPath) + fetchRepo2CPEMappingForDiagnostics(ctx, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout, httpClient) + + log.Info("Running node indexer...") </code_context> <issue_to_address> **suggestion (performance):** Diagnostics and repo2cpe fetch run unconditionally and may add network/latency overhead for every index run `logFilesystemDiagnostics` and `fetchRepo2CPEMappingForDiagnostics` now run on every `runIndexer` call, even when not debugging. The repo2cpe call adds a network round-trip and full-body read, which can materially slow indexing and add external coupling per run. Please gate these behind `cfg.Debug` (or a dedicated diagnostics flag), or at least make the repo2cpe fetch conditional, so normal runs avoid this overhead and only pay the cost when troubleshooting. ```suggestion log.Infof("Indexer config: HostPath=%q, Repo2CPEMappingURL=%q, Timeout=%v", indexerCfg.HostPath, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout) if cfg.Debug { // These diagnostics may slow the indexing process down by 1-2 seconds due to filesystem inspection // and an extra repo2cpe network round-trip, so run them only in debug mode. logFilesystemDiagnostics(indexerCfg.HostPath) fetchRepo2CPEMappingForDiagnostics(ctx, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout, httpClient) } log.Info("Running node indexer...") ``` </issue_to_address> ### Comment 2 <location path="compliance/virtualmachines/roxagent/vsock/discovery.go" line_range="276" /> <code_context> - return false, fmt.Errorf("unsupported OS detected: missing %s: %w", cachePath, err) - } - return false, fmt.Errorf("reading %s: %w", cachePath, err) +func discoverDnfCacheRepoDirPresence(hostPath string, cacheDirPaths []string) (hasDir, hasRepo bool, err error) { + if len(cacheDirPaths) == 0 { + return false, false, errors.New("missing cache directories") </code_context> <issue_to_address> **suggestion (bug_risk):** DNF5 detection via `strings.Contains(cachePath, "/libdnf5")` is brittle and may misclassify future layouts This logic tightly couples behavior to the literal path string and may fail if cache roots change (different prefixes, symlinks) or if an unrelated path happens to contain `libdnf5`. Since the caller already passes explicit `cacheDirPaths`, consider deciding DNF5 vs non-DNF based on which configured root you’re iterating (e.g., compare against a known constant or index) rather than substring-matching the host-adjusted path, so the behavior remains stable if paths change. Suggested implementation: ```golang const dnf5CacheRoot = "/var/cache/libdnf5" // - Tested against DNF4 defaults; DNF5 uses /var/cache/libdnf5: // https://github.com/rpm-software-management/dnf5/blob/185eaef1e0ad663bdb827a2179ab1df574a27d88/include/libdnf5/conf/const.hpp func discoverDnfCacheRepoDirPresence(hostPath string, cacheDirPaths []string) (hasDir, hasRepo bool, err error) { ``` ```golang for _, cacheDirPath := range cacheDirPaths { // Decide DNF5 vs non-DNF based on the configured cache root, not the host-adjusted path. // This avoids coupling behavior to the literal host path and keeps detection stable if // host path prefixes or layouts change. isDnf5Root := cacheDirPath == dnf5CacheRoot cachePath := hostprobe.HostPathFor(hostPath, cacheDirPath) ``` ```golang if isDnf5Root { ``` </issue_to_address> ### Comment 3 <location path="compliance/virtualmachines/roxagent/index/diagnostics.go" line_range="23" /> <code_context> + zlog.Set(&l) +} + +func logFilesystemDiagnostics(hostPath string) { + switch hostprobe.DetectDNFVersion(hostPath) { + case hostprobe.DNFVersion5: </code_context> <issue_to_address> **issue (complexity):** Consider extracting small helpers for error handling, repeated logging patterns, and HTTP body checks to keep these diagnostics functions linear and less repetitive without changing behavior. You can reduce the added complexity without changing behavior by extracting a few tiny helpers and reusing patterns. ### 1. Factor out Yum repo error classification `logFilesystemDiagnostics` mixes error classification and logging; a small helper keeps the main flow linear: ```go func logYumRepoError(err error) { switch { case os.IsNotExist(err): log.Info("Yum repo dir not found") case errors.Is(err, fs.ErrPermission): log.Warnf("Yum repo dir is not readable: %v", err) default: log.Infof("Yum repo dir is unavailable: %v", err) } } func logFilesystemDiagnostics(hostPath string) { switch hostprobe.DetectDNFVersion(hostPath) { case hostprobe.DNFVersion5: log.Info("DNF history DB (v5) found") case hostprobe.DNFVersion4: log.Info("DNF history DB (v4) found") default: log.Warn("DNF history DB not found!") } hasEntitlement, err := hostprobe.HasEntitlementCertKeyPair(hostPath) if err != nil { log.Warnf("Entitlement certificates not found: %v", err) } else if !hasEntitlement { log.Warn("Entitlement certificates not found") } else { log.Info("Entitlement certificates found") } repoCount, err := hostprobe.YumRepoCount(hostPath) if err != nil { logYumRepoError(err) return } if repoCount == 0 { log.Info("Yum repo dir is present but empty (0 repositories)") return } log.Infof("Yum repo dir contains %d repositories", repoCount) } ``` ### 2. Reduce repetition in `logIndexReportDiagnostics` The repo/dist loops differ only in formatting. You can encapsulate the “limited iteration with truncation” pattern: ```go func logLimited[T any, ID comparable]( name string, items map[ID]T, limit int, debug bool, formatter func(idx, total int, id ID, item T) string, ) { total := len(items) idx := 0 for id, item := range items { idx++ log.Info(formatter(idx, total, id, item)) if !debug && idx == limit { log.Info(" (truncated for brevity)") break } } } func logIndexReportDiagnostics(report *v4.IndexReport, debug bool) { contents := report.GetContents() numPkgs := len(contents.GetPackages()) numRepos := len(contents.GetRepositories()) numDists := len(contents.GetDistributions()) numEnvs := len(contents.GetEnvironments()) log.Infof("Index report summary: packages=%d, repositories=%d, distributions=%d, environments=%d", numPkgs, numRepos, numDists, numEnvs) logLimited("repositories", contents.GetRepositories(), 10, debug, func(idx, total int, id string, repo *v4.Repository) string { return fmt.Sprintf( "Repository (%d of %d) id=%q name=%q key=%q cpe=%q", idx, total, id, repo.GetName(), repo.GetKey(), repo.GetCpe(), ) }) logLimited("distributions", contents.GetDistributions(), 10, debug, func(idx, total int, id string, dist *v4.Distribution) string { return fmt.Sprintf( "Distribution (%d of %d) id=%s name=%q version=%q cpe=%q did=%q", idx, total, id, dist.GetName(), dist.GetVersion(), dist.GetCpe(), dist.GetDid(), ) }) if numRepos == 0 { log.Warn("Index report contains 0 repositories. Packages will be marked as UNSCANNED.") } } ``` If you prefer smaller steps, you can start with a non-generic helper just for the truncation logic and keep two separate formatters. ### 3. Simplify `fetchRepo2CPEMappingForDiagnostics` body handling You only care about “non-empty vs empty/unavailable”, so you don't need `io.ReadAll` and `len(body)`. A single-byte read via `LimitedReader` reduces IO and branches: ```go func fetchRepo2CPEMappingForDiagnostics(ctx context.Context, mappingURL string, timeout time.Duration, client *http.Client) { if mappingURL == "" { log.Warn("Repo2CPE mapping URL is empty") return } log.Info("Attempting to fetch repo2cpe mapping for diagnostics...") reqCtx, cancel := context.WithTimeout(ctx, timeout) defer cancel() req, err := http.NewRequestWithContext(reqCtx, http.MethodGet, mappingURL, nil) if err != nil { log.Warnf("Could not create repo2cpe mapping request for %q: %v", mappingURL, err) return } resp, err := client.Do(req) if err != nil { log.Warnf("Could not download repo2cpe mapping from %q: %v", mappingURL, err) return } defer resp.Body.Close() var lr io.LimitedReader lr.R = resp.Body lr.N = 1 n, _ := io.Copy(io.Discard, &lr) // n is 0 if body is empty or not readable log.Debugf("Repo2CPE mapping fetch: status=%d, hasBody=%t", resp.StatusCode, n > 0) if resp.StatusCode >= http.StatusBadRequest || n == 0 { log.Warn("Repo2CPE mapping is unavailable or empty") } else if resp.StatusCode == http.StatusOK && n > 0 { log.Info("Repo2CPE mapping downloaded successfully") } } ``` This preserves the logging semantics (success only on 200 + non-empty body, warn otherwise) while reducing control flow and memory usage. </issue_to_address>