Update limits correspondence by teoxoy · Pull Request #5604 · gpuweb/gpuweb

@teoxoy

teoxoy

<td>[#693](https://github.com/gpuweb/gpuweb/issues/693)
<td>`maxVertexInputAttributes`
<td>*No limit.* Use `maxVertexBuffers * Maximum number of vertex attributes, per vertex descriptor`
<td>`Maximum number of vertex attributes, per vertex descriptor`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On metal it can't be:
maxVertexBuffers * "Maximum number of vertex attributes, per vertex descriptor"
the limit applies to the location as well, so it's impossible to get more than:
"Maximum number of vertex attributes, per vertex descriptor"
or else you get:

-[MTLVertexAttributeDescriptorArrayInternal objectAtIndexedSubscript:]:394: failed assertion `vertex attribute index (31) must be < 31.'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions

teoxoy

teoxoy

teoxoy

Comment on lines +267 to +283

<td>`maxComputeWorkGroupInvocations`
<td rowspan=4>`Maximum threads per threadgroup` (but actually `maxTotalThreadsPerThreadgroup` for a given pipeline)
<td>1024 = `D3D12_CS_THREAD_GROUP_MAX_THREADS_PER_GROUP`
<tr>
<th>`maxComputeWorkgroupSizeX`
<td>[#1898](https://github.com/gpuweb/gpuweb/issues/1898)
<td>`min(maxComputeWorkGroupSize[0], maxComputeWorkGroupInvocations)`
<td>`maxComputeWorkGroupSize[0]`
<td>1024 = `D3D12_CS_THREAD_GROUP_MAX_X`
<tr>
<th>`maxComputeWorkgroupSizeY`
<td>[#1898](https://github.com/gpuweb/gpuweb/issues/1898)
<td>`min(maxComputeWorkGroupSize[1], maxComputeWorkGroupInvocations)`
<td>`maxComputeWorkGroupSize[1]`
<td>1024 = `D3D12_CS_THREAD_GROUP_MAX_Y`
<tr>
<th>`maxComputeWorkgroupSizeZ`
<td>[#1898](https://github.com/gpuweb/gpuweb/issues/1898)
<td>`min(maxComputeWorkGroupSize[2], maxComputeWorkGroupInvocations)`
<td>`maxComputeWorkGroupSize[2]`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vulkan limits are 1:1.

teoxoy

Comment on lines +242 to +244

- `Maximum scalar or vector inputs to a fragment function`
- `(Maximum number of input components to a fragment function) / 4`
<td>31 = `min(D3D12_VS_OUTPUT_REGISTER_COUNT, D3D12_PS_INPUT_REGISTER_COUNT) - 1`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the position builtin doesn't count towards the maxInterStageShaderVariables in the WebGPU spec. The Vulkan and D3D12 limits need to be lowered by 1. On Metal the limit is only for user defined inter-stage variables.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D3D12 change looks right, I guess we missed updating this when we changed these builtins to be counted (when merging the two limits together in #4688).

On Metal, we had this carveout because there was a specific difference in the Metal validation layers between Apple and non-Apple GPUs. See #1962 (comment) - the validation layer used to erroneously say "on macOS" but it meant "on non-Apple GPUs".

When I wrote this I should have made clearer what the subtractions were for, but I think they're position and point_size and should be left as they were - IIUC even though point_size is accounted for by our spec in vertex outputs, we have to reserve it on Metal because Metal counts it toward fragment inputs too?

@teoxoy

kainino0x

<th>`maxTextureDimension2D`
<td>[#1327](https://github.com/gpuweb/gpuweb/issues/1327)
<td>`min(maxImageDimension2D, maxImageDimensionCube, maxFramebufferWidth, maxFramebufferHeight, maxViewportDimensions[0], maxViewportDimensions[1], viewportBoundsRange[0], -viewportBoundsRange[1])`
<td>`min(maxImageDimension2D, maxImageDimensionCube, maxFramebufferWidth, maxFramebufferHeight)`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<td>`min(maxImageDimension2D, maxImageDimensionCube, maxFramebufferWidth, maxFramebufferHeight)`
<td>`min(maxImageDimension2D, maxImageDimensionCube, maxFramebufferWidth, maxFramebufferHeight)`
(Vulkan guarantees that `maxViewportDimensions`/`viewportBoundsRange` are larger than `maxFramebufferWidth`/`maxFramebufferHeight`, so they don't need to be included here.)
<td>[#693](https://github.com/gpuweb/gpuweb/issues/693)
<td>`maxVertexInputAttributes`
<td>*No limit.* Use `maxVertexBuffers * Maximum number of vertex attributes, per vertex descriptor`
<td>`Maximum number of vertex attributes, per vertex descriptor`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<td>`maxVertexInputAttributes`
<td>*No limit.* Use `maxVertexBuffers * Maximum number of vertex attributes, per vertex descriptor`
<td>`Maximum number of vertex attributes, per vertex descriptor`
<td>32 = `D3D12_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... So this also needs to be corrected.

<td>32 = `D3D12_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT`
<td>30 = `D3D12_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT` - 2 for `SV_VertexID` and `SV_InstanceID`
<th>`maxBindingsPerBindGroup`
<td>[#3279](https://github.com/gpuweb/gpuweb/issues/3279),
[#3864](https://github.com/gpuweb/gpuweb/issues/3864)
<td colspan=3>Limit is arbitrary to allow implementations to treat binding space as an array.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also referring to WebGPU implementations (not just drivers) so should be written in the table in a way that it applies to all backends.

<td>[#409](https://github.com/gpuweb/gpuweb/issues/409)
<td>`maxPerStageDescriptorSamplers`
<td>*Strategy-dependent.* Choose a value &le;
`min(maxPerStageDescriptorSamplers, maxDescriptorSetSamplers / 2)`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand, is 2 here the number of shader stages?

Dawn doesn't look at any of these limits, I've filed https://crbug.com/494554821

<td>`minStorageBufferOffsetAlignment`
<td>*No documented limit.* Use 32, which is the lowest allowed value.
<td>32 = `max(32, D3D12_RAW_UAV_SRV_BYTE_ALIGNMENT (16))`
<td>`D3D12_RAW_UAV_SRV_BYTE_ALIGNMENT`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, it took me ages to figure out why I made this 32 instead of 16. My original draft had 16 and then I changed it to 32 while working on the PR with no explanation.

I finally figured out it explains it right on the previous line - the 32 comes from the spec:

  • minUniformBufferOffsetAlignment and minStorageBufferOffsetAlignment must both be ≥ 32 bytes.
    Note: 32 bytes would be the alignment of vec4<f64>. See WebGPU Shading Language § 14.4.1 Alignment and Size.

Now, the spec may be wrong about that (it was a conservative guess - it seems likely that vec4<f64> would actually have an alignment of 8 or maybe 16) but that's something we would need to be sure about.

<td>`D3D12_RAW_UAV_SRV_BYTE_ALIGNMENT`
<td>32 = `max(WebGPU spec allowed minimum (32), D3D12_RAW_UAV_SRV_BYTE_ALIGNMENT (16))`

Comment on lines +242 to +244

- `Maximum scalar or vector inputs to a fragment function`
- `(Maximum number of input components to a fragment function) / 4`
<td>31 = `min(D3D12_VS_OUTPUT_REGISTER_COUNT, D3D12_PS_INPUT_REGISTER_COUNT) - 1`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D3D12 change looks right, I guess we missed updating this when we changed these builtins to be counted (when merging the two limits together in #4688).

On Metal, we had this carveout because there was a specific difference in the Metal validation layers between Apple and non-Apple GPUs. See #1962 (comment) - the validation layer used to erroneously say "on macOS" but it meant "on non-Apple GPUs".

When I wrote this I should have made clearer what the subtractions were for, but I think they're position and point_size and should be left as they were - IIUC even though point_size is accounted for by our spec in vertex outputs, we have to reserve it on Metal because Metal counts it toward fragment inputs too?