OnManagedConnectionFailed thread/memory leak fix by alexSatov · Pull Request #1710 · StackExchange/StackExchange.Redis
Conversation
Hi. I faced a problem with 'ConnectionMultiplexer': when it loses connection with Redis cluster, 'OnManagedConnectionFailed' event handler creates timer, which tries to 'SwitchMaster'. Timer ticks every second, but there is PLINQ query in 'GetConfiguredMasterForService' method, that leads to constantly threads starting (threads start faster than they exit). And if disconnect time is long enough, I get thread/memory leak in service.
PR contains hotfix of a problem with timer (timer starts next iteration only after previous iteration complete). But maybe you should take a closer look at PLINQ queries (they seem superfluous).
Also, I commented TextWriter.Null creation in 'SwitchMaster', because it seems like useless memory allocations (you handle nullable LogProxy).
@alexSatov, if the timer iterations are overlapping, wouldn't we still have the problem after this change?
Simply denying reentrancy isn't what we need? (like this example with Interlocked)
As I wrote, this change does not allow timer iterations to overlap. Arguments (0, 0) guarantee a single iteration, so I see no point in "resource lock".
You're right about not overlapping - my bad, now I see the original interval was also changed to 0.
But in case SwitchMaster fails now it will always retry without any delay, right? In that case, this timer without interval is the same as a while(!succeeded) without a timer.
But I have no idea if removing the 1 sec delay wouldn't overwhelm the CPU in this scenario, so I'll refrain to comment.
@eduardobr , you can add a delay it doesn't matter. In this change a working timer will start a new iteration only after previous one has been completed.
| { | ||
| if (log == null) log = TextWriter.Null; | ||
| // useless memory allocations? | ||
| // if (log == null) log = TextWriter.Null; |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert this section please, scoping the change to only one thing here?
| }, null, TimeSpan.FromSeconds(0), TimeSpan.FromSeconds(1)); | ||
| finally | ||
| { | ||
| connection.sentinelMasterReconnectTimer?.Change(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(0)); |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're disabling period pulling, the second arg should be Timeout.InfiniteTimeSpan here :)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! -1 is enough. And I've found one more mistake. Alex will send new commit next week.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix here! Confirmed working 👍
NickCraver pushed a commit that referenced this pull request
Jun 27, 2021This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters