feat(gax-grpc): add configurable resize delta and warning for repeated resizing#12838
feat(gax-grpc): add configurable resize delta and warning for repeated resizing#12838
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable maxResizeDelta for the ChannelPool and adds a warning log when the pool resizes consecutively for five cycles. Feedback highlights a logic error in the resizing detection that could lead to false-positive warnings when the pool is at its limits; a suggestion is provided to track actual size changes instead. Additionally, the removal of test exclusions in the pom.xml was flagged as a potential cause for CI failures.
I am having trouble creating individual review comments. Click here to see my feedback.
sdk-platform-java/gax-java/gax-grpc/src/main/java/com/google/api/gax/grpc/ChannelPool.java (322-327)
The logic for determining if the pool is 'resizing' is flawed. By checking if the current size is outside the load-based bounds (minChannels and maxChannels), the counter will increment even when the pool is at its hard limits (e.g., at minChannelCount while idle). This will lead to false positive warnings after 5 idle cycles because minChannels will be 0 while the pool is correctly clamped at its minimum size (e.g., 1).
Instead, you should track whether the pool is actually attempting to change its size by comparing dampenedTarget with currentSize. This correctly identifies when the pool is either oscillating or slowly growing/shrinking due to the maxResizeDelta cap.
boolean resized = (dampenedTarget != currentSize);
if (resized) {
consecutiveResizes++;
} else {
consecutiveResizes = 0;
}
sdk-platform-java/gax-java/gax-grpc/pom.xml (145-149)
The removal of the maven-surefire-plugin configuration seems unintended. This block was used to exclude specific tests that require environment variables to be set. Removing it will likely cause these tests to run and fail in standard CI environments. If this change was intentional, please provide justification; otherwise, it should be reverted.
b17aa56 to
827b22d
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable maxResizeDelta for the ChannelPool in gax-grpc and implements a warning mechanism for repeated resizing cycles. Feedback focuses on ensuring thread-safety for the resize counter, optimizing redundant size checks, and restoring accidentally deleted metadata and documentation links in the java-iam-policy module.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable maxResizeDelta in ChannelPoolSettings, adds logic to ChannelPool to track consecutive resizes and log warnings, and removes certain metadata and documentation links. Feedback indicates that the removal of the enable-api link in the README appears accidental and should be restored. Additionally, a stale comment in ChannelPool.java incorrectly describing the resize() method as synchronized should be removed.
This reverts commit 91ffb52.
| class ChannelPool extends ManagedChannel { | ||
| static final String CHANNEL_POOL_CONSECUTIVE_RESIZING_WARNING = | ||
| "Channel pool is repeatedly resizing. " | ||
| + "Consider adjusting `initialChannelCount` or `maxResizeDelta` to a more reasonable value. " |
There was a problem hiding this comment.
nit: would it make sense to also add max/minRpcsPerChannel in the message? E.g.
"Consider adjusting `initialChannelCount`, `maxResizeDelta`, `minRpcsPerChannel`, or `maxRpcsPerChannel` to a more reasonable value. "
There was a problem hiding this comment.
My plan is to change this to reference a public guide that I'm going to write (after metrics). WDYT if I leave this for now (for the immediate Datastore ask) and then update it to the public guide?
| if (Math.abs(delta) > ChannelPoolSettings.MAX_RESIZE_DELTA) { | ||
| dampenedTarget = | ||
| currentSize + (int) Math.copySign(ChannelPoolSettings.MAX_RESIZE_DELTA, delta); | ||
| if (Math.abs(delta) > settings.getMaxResizeDelta()) { |
There was a problem hiding this comment.
Do we know why there was a limit in the first place? Were there any technical limitations?
There was a problem hiding this comment.
IIUC, it looks to just be a choice. Dampening and rate limit the channel growth to not overwhelm the client for a sudden burst.
| .setMaxChannelCount(size) | ||
| // Static pools don't resize so this value doesn't affect operation. However, | ||
| // validation still checks that resize delta doesn't exceed channel pool size. | ||
| .setMaxResizeDelta(Math.min(DEFAULT_MAX_RESIZE_DELTA, size)) |
There was a problem hiding this comment.
I think we should still give it a max limit instead of being unbounded. Otherwise customers might expect this to handle 100x request spike as well.
There was a problem hiding this comment.
The delta is capped to between [1, MAX_CHANNEL_COUNT]. The javadocs already mention that the number of channels can never exceed the total number of channels configured (Default 200)
There was a problem hiding this comment.
But is 200 a realistic number? I think it makes sense to allow customers configure resizeDelta to be more than 2, but more than 10 (We need to come up with an more acurate number) may introduce other performance issues.
For example
- resize is in a synchronized block.
- We use AtomicInteger to keep tracking the number of outstanding RPCs, which could cause issues in high contention scenarios. In which case LongAdder may be preferred.
Can we reach out to the gRPC team and get some suggestions?
There was a problem hiding this comment.
This PR does not intend to fix channel pool's issues. It only exposing a setting to allow users to configure a value that they want that fits within the current bounds of the existing logic. The default resize delta remains 2 and if they choose a different value, then they can test it and figure if it works for their workload.
There are limitations to the existing channel pool logic that can use some broader changes. There is a project proposal to investigate (b/503856499) how to make this better overall.
But is 200 a realistic number? I think it makes sense to allow customers configure resizeDelta to be more than 2, but more than 10 (We need to come up with an more acurate number) may introduce other performance issues.
I just aim provide a safe default and let customers tinker with that they think best. Delta of 10 or 25 may work better for different workloads.
If we want to fix ChannelPooling, then I think other changes would be more beneficial than investigating the resize delta value.
There was a problem hiding this comment.
I agree that this PR is not to fix ChannelPooling. However, the new setter could make it easier for customers to exploit the current limitations of ChannelPooling, hence I think it would still be better to set an upper limit.
There was a problem hiding this comment.
exploit the current limitations of ChannelPooling
I'm not sure what you mean by exploiting here. If they configure a high value where performance degrades and doesn't work for them, they can rollback this change. If a high delta works for their use case, then they can opt to keep it.
Regardless, I think we may have to agree to disagree on setting a hard bound here. I don't think we should set an arbitrary bound for user configuration regardless of current limitations, barring something that doesn't fit the logic like negative resize delta.
IMO, if it has drastic performance concerns, it would be beneficial to see user workload configurations as well as the issue reports. It gives us signal about their requirements and helps us see what would be needed for a future ChannelPool overhaul (e.g. channel priming, power-of-two, etc). And the channelpool issues give us more direct data point to prioritize the project (instead of slightly tangential reports of increased client-side request latency). I'm worried that limiting this "exploit" hides the need to fix the underlying channelpool issues.
Rather than having to talk to gRPC and having to investigate the possible default upper bound limits, how we compromise and I set this to a higher default upper bound value (e.g. 25?). I'll add javadocs about potential performance concerns for setting a higher delta?
There was a problem hiding this comment.
Updated to clamp to 25 max for the resize delta with warning about performance
|
|


No description provided.