Fix to only deny-list scheduled queries when watchdog is enabled by Micah-Kolide · Pull Request #8541 · osquery/osquery
We've been seeing some scheduled queries getting unexpectedly deny-listed when osquery is not running with watchdog enabled (example_denied_query_with_disabled_watchdog.txt), or when watchdog itself did not stop osquery. Based on the scheduled query definition of the deny-list field, I would expect that it would be deny-listed only by watchdog.
denylist: a boolean to determine if this query may be denylisted (when stopped by the Watchdog for excessive resource consumption), default true
However, there appears to be a bug in config.cpp as it does not differentiate between a failed_query by the watchdog or osquery dying by another means. Various signals sent to the osquery process could cause an executing_query to become a failed_query, even if it was not because watchdog killed it from high resource consumption.
I've applied a simple fix to verify that the disable_watchdog flag is false before deny-listing a failed_query, and I've also added some logging for when a scheduled query gets skipped per expiry period or osquery initialization, or when it expires from the denylist.
I've tested this fix a decent amount on a MacOS instance. I've included the verbose logs to show the fix in action:
1st init (watchdog disabled):
I0203 10:14:46.972573 107137088 init.cpp:413] osquery initialized [version=5.15.0-4-g745cdba37]
...
I0203 10:15:00.072829 1870835712 scheduler.cpp:120] Executing scheduled query stress_test_pass: select * from time
I0203 10:15:00.073619 1870835712 query.cpp:119] Storing initial results for new scheduled query: stress_test_pass
I0203 10:15:00.073698 1870835712 scheduler.cpp:201] Found results for query: stress_test_pass
...
I0203 10:15:40.208810 1870835712 scheduler.cpp:120] Executing scheduled query stress_test_fail: select * from apps CROSS JOIN users
I0203 10:15:58.768437 1870835712 query.cpp:119] Storing initial results for new scheduled query: stress_test_fail
I0203 10:15:59.021412 1870835712 scheduler.cpp:201] Found results for query: stress_test_fail
2nd init (watchdog enabled):
I0203 10:16:11.884200 107137088 init.cpp:413] osquery initialized [version=5.15.0-4-g745cdba37]
...
I0203 10:16:31.042694 1817571328 scheduler.cpp:120] Executing scheduled query stress_test_pass: select * from time
I0203 10:16:31.077574 1817571328 scheduler.cpp:201] Found results for query: stress_test_pass
...
I0203 10:16:36.055464 1817571328 scheduler.cpp:120] Executing scheduled query stress_test_fail: select * from apps CROSS JOIN users
I0203 10:16:38.922310 1839607808 watcher.cpp:538] pid: 50750, cpu: 1140ms/1440ms, memory: 28.47MB/3MB
W0203 10:16:38.922374 1839607808 watcher.cpp:425] osqueryd worker (50750) stopping: Memory limits exceeded: 29851648 bytes (limit is 3MB)
...
I0203 10:16:43.228420 107137088 init.cpp:410] osquery worker initialized [watcher=50749]
...
W0203 10:16:43.301170 107137088 config.cpp:362] Scheduled query may have failed: stress_test_fail
...
W0203 10:16:43.324247 107137088 config.cpp:508] The caller does not want denied queries, skipping denied scheduled query: stress_test_fail
...
I0203 10:17:00.395078 1872801792 scheduler.cpp:120] Executing scheduled query stress_test_pass: select * from time
I0203 10:17:00.420019 1872801792 scheduler.cpp:201] Found results for query: stress_test_pass
...
I0203 10:18:14.687698 1872801792 config.cpp:494] Scheduled denylisted query has expired: stress_test_fail
...
I0203 10:18:25.728713 1872801792 scheduler.cpp:120] Executing scheduled query stress_test_fail: select * from apps CROSS JOIN users
I0203 10:18:28.367564 1839607808 watcher.cpp:538] pid: 50784, cpu: 1052ms/1440ms, memory: 26.66MB/3MB
W0203 10:18:28.367623 1839607808 watcher.cpp:425] osqueryd worker (50784) stopping: Memory limits exceeded: 27951104 bytes (limit is 3MB)
...
I0203 10:18:32.669792 107137088 init.cpp:410] osquery worker initialized [watcher=50749]
I was testing with a 90 second deny query duration.
stress_test_pass's interval was set to 30.
stress_test_fail's interval was set to 60.