feat(sidecar): add thread mode as fallback connection for restricted environments by Leiyks · Pull Request #3573 · DataDog/dd-trace-php

bwoebi

bwoebi

bwoebi

bwoebi

@Leiyks

…lter

With newer libdatadog (29e00628b), debugger logs are routed to
/debugger/v2/input instead of /debugger/v1/diagnostics. The filter
in replayDebuggerData() still included /debugger/v1/diagnostics with
body (added in c3be656 for older routing), which caused it to return
a diagnostics response before the actual /debugger/v2/input snapshot,
breaking debugger_span_decoration_probe.phpt.

Remove the diagnostics clause — only /debugger/v1/input and
/debugger/v2/input are snapshot/log endpoints.
…back routing

libdatadog 29e00628b routes logs to /debugger/v2/input but falls back to
/debugger/v1/diagnostics when the agent doesn't support v2 (as is the
case with the request replayer in tests). Update the expected URI in
debugger_span_decoration_probe.phpt accordingly.

Also restore the /debugger/v1/diagnostics filter in replayDebuggerData()
which was incorrectly removed in the previous commit.
…ailable

In ddtrace_sidecar_setup_thread_mode, a forked child detecting is_child_process=true
would try to connect to the parent's thread listener. If the parent used subprocess
mode (no thread listener), the connect would fail and the child returned with no
sidecar and no fallback.

Mirror the existing fallback logic from ddtrace_sidecar_handle_fork: when the
parent's listener is unavailable, reset ddtrace_sidecar_master_pid to the current
process and fall through to start a new master listener in this process.
…ct path

When a parent process initializes the sidecar in thread mode, forks, and
then exits, the child inherits a broken transport (parent's listener thread
is dead). In dd_sidecar_connect(), if ddog_sidecar_connect_worker() fails
and current_pid != master_pid, promote the child to master so it can still
submit traces.

The existing fallback in ddtrace_sidecar_setup_thread_mode covers the
initial-setup path, but the reconnect path (ddtrace_sidecar_connect_callback
-> dd_sidecar_connect) had no equivalent fallback, causing a silent failure
for orphaned children that already had an inherited transport.

Add a .phpt test that verifies the orphaned child can create and submit
spans after the parent exits.
… compatibility

Thread-mode sockets now include the master's effective uid in the filename
(libdd.<ver>@<uid>-<pid>.sock in /tmp/libdatadog/). A worker process that later
drops privileges via setuid() (e.g. www-data under PHP-FPM) still computes the
same socket path as the master listener, and ensure_dir_exists now best-effort
chmods the directory world-writable to allow socket creation by any user.

Also fixes a double-dot bug in socket/lock path construction (Rust >=1.87
no longer strips leading dots from with_extension arguments).

Adds test: sidecar_thread_mode_permissions.phpt verifies the socket is created
with the correct uid-pid encoding.

@Leiyks

Update libdatadog submodule: thread mode sidecar now uses abstract Unix
sockets on Linux (no filesystem permissions needed, any user can connect)
and a single-threaded Tokio runtime (no extra OS threads, fixes LSan
"Running thread was not suspended" ASAN warnings at process exit).

Update sidecar_thread_mode_permissions.phpt to verify abstract socket
usage (no filesystem socket created) instead of checking file permissions.
Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>
After fork, the child inherits the parent's heap including the Tokio
current_thread runtime allocations and Rust stdlib once-cell inits from
the sidecar thread. Since threads don't survive fork, these are orphaned
allocations that LSan incorrectly reports as leaks. This is the same
reason daemonize() already sets LSAN_OPTIONS=detect_leaks=0 for the
subprocess sidecar.
When the PHP-FPM master process runs as root, the sidecar thread (thread
mode) creates named SHM objects with 0600 by default, making them
inaccessible to worker processes running as www-data.

Call ddog_sidecar_set_shm_open_mode(0644) before starting the master
sidecar listener when geteuid()==0, so workers can open the SHM regions
read-only. The mode is set in both ddtrace_sidecar_setup_thread_mode()
and ddtrace_sidecar_minit() to cover all code paths.

@Leiyks

…-user SHM

Replace the incorrect set_shm_open_mode(0644) workaround with the proper
fchown-based fix implemented in the libdatadog submodule. The SHM ownership
is now transferred to the worker's UID (obtained via SO_PEERCRED on first
connection) rather than relaxing file permissions.

- Remove ddog_sidecar_set_shm_open_mode() calls from thread mode setup
- Remove declaration from components-rs/sidecar.h
- Update libdatadog submodule to pick up the fchown implementation
…er thread

In thread mode with PHP-FPM, dd_activate_once (called via pthread_once on
first RINIT) runs independently in each worker process since the master never
serves requests. When subprocess mode fails and thread mode is attempted, each
worker hit the fallback path in ddtrace_sidecar_setup_thread_mode() that called
ddog_sidecar_connect_master() — starting a new listener thread per worker.

The master listener must only be started in MINIT (via ddtrace_sidecar_minit())
in the master process, so it survives PHP-FPM forking. When a worker cannot
connect to the master's listener, it now logs a warning and runs without the
sidecar instead of spawning its own thread.

Non-child processes (master, CLI) retain the ability to start a new listener
as a fallback, preserving the behavior requested in earlier review feedback.
…er SHM

Adds a test that exercises the fchown() cross-user SHM path: PHP-FPM master
runs as root, workers switch to an unprivileged user (www-data/daemon/nobody),
and thread mode is used. This is the exact scenario that motivated the
SO_PEERCRED + fchown() fix.

Infrastructure changes:
- www.conf: add {{user_group}} placeholder for optional user/group directives
- PhpFpm.php: accept $fpmUser/$fpmGroup constructor params
- WebServer.php: add setPhpFpmUser() and pass it through to PhpFpm
- WebFrameworkTestCase.php: add configureWebServer() hook (called before
  start()) so subclasses can apply extra server config without reimplementing
  the full setUpWebServer() logic

New test (SidecarThreadModeRootTest):
- Skips if not root, not fpm-fcgi SAPI, or no unprivileged user available
- testTracesAreSubmittedWithRootMasterAndUnprivilegedWorker: a single request
  through a root-master/www-data-worker FPM pool must produce traces — failure
  means the worker cannot access the SHM after fchown()
- testMultipleWorkersShareSingleMasterListenerThread: 3 requests with multiple
  workers must all succeed, ensuring the per-worker-thread regression is caught
`void` return type hint was introduced in PHP 7.1; PHP 7.0 CI jobs
were failing with TypeError on every WebFrameworkTestCase subclass.
Remove the DD_TRACE_TEST_SAPI=fpm-fcgi skip condition and instead call
putenv() to force FPM mode before the parent sets up the web server.
This allows the test to run in every test_web_custom matrix entry
(cli-server, cgi-fcgi, apache2handler) rather than only when the CI job
explicitly sets fpm-fcgi SAPI.
Use untilNumberOfTraces(3) so the test agent waits for all 3 traces
before collecting, instead of returning early after the first one.
Also restore the >= 3 assertion which is now reliable.
Add $runAsSudo param to PhpFpm and setPhpFpmSudo() to WebServer so
php-fpm can be started as root even when the test runner is non-root.
The test now skips only if neither root nor passwordless sudo is
available, allowing it to run in CI where circleci has NOPASSWD sudo.
…deRootTest

Replace putenv('DD_TRACE_TEST_SAPI=fpm-fcgi') with a new WebServer::setForceSapi()
method that overrides the SAPI for a single WebServer instance only, without
affecting the global process environment used by all subsequent test classes.
Apache2handler uses a static shared instance on the same port as our
FPM+Nginx setup, causing port conflicts. Skip the test in that case;
it still runs under cli-server and cgi-fcgi CI jobs.

@Leiyks

Signed-off-by: Alexandre Rulleau <alexandre.rulleau@datadoghq.com>

This was referenced

Apr 1, 2026