Buffered
Convenience
A lot has been made of VK_EXT_descriptor_buffer, also known as “sane descriptor handling”. It’s an extension that revolutionizes how descriptors can be managed not only in brevity of code but in performance.
That’s why ZINK_DESCRIPTORS=db
is now the default wherever it’s supported.
Gains
But what does this gain zink (and other users), other than being completely undebuggable if anything were to break*?
* It won’t, trust me.
One nicety of descriptor buffers is performance. By swapping out descriptor templates for buffers, it removes a layer of indirection from the descriptor update path, which reduces CPU overhead by a small amount. By avoiding the need to bind different descriptor sets, GPU synchronization can also be reduced.
In zink terms, you’ll likely notice a small FPS increase in cases that were extremely CPU-bound (e.g., Tomb Raider).
Memory Too?
Yes, GPU memory utilization is also affected.
Historically, as we all know by now, zink uses six (four when constrained) descriptor sets:
- Uniforms
- UBOs
- Samplers
- SSBOs
- Storage Images
- Bindless
This optimizes access for the types as well as update frequency. In terms of descriptor sets, it means a different descriptor pool per descriptor layout so that sets can be bucket allocated to further reduce overhead.
Initially when I implemented descriptor buffer, I kept this same setup. Each descriptor type had its own descriptor buffer, and the buffers were hardcoded to fit N
descriptors of the given type, where N
was the maximum number of descriptors I could update per cmdbuf using drawoverhead
. Each cmdbuf was associated with a set of descriptor buffers, and it worked great.
But it also used a lot of VRAM, comparable even to the then-default template mode.
During my latest round of descriptor buffer refactors which added handling for bindless textures, I had a realization. I was doing descriptor buffers all wrong.
Instead of having all these different buffers per cmdbuf, why wouldn’t I just have a single buffer? Plus a static one for bindless, of course.
Imagine pointlessly allocating five times the memory required.
So now I had two descriptor buffers:
- normal descriptors (per cmdbuf)
- bindless descriptors (per GL context)
In Tomb Raider, this ended up being about a 6% savings for peak VRAM utilization (1445MiB -> 1362MiB). Not bad, and the code is now simpler too.
But what if I stopped hardcoding the descriptor buffer size and instead used a sliding scale so that only the (rough) amount of memory needed was allocated? Some minor hacking here and there, and peak VRAM utilization was cut even more (1362MiB -> 1297MiB).
Now it’s a little over a 10% reduction in peak VRAM utilization.
There’s still some ways to go with VRAM utilization in zink considering RadeonSI peaks at 1221MiB for the same benchmark, but a 6% gap is much more reasonable than a 16% one.
Future
Blog posts about Vulkan descriptor models aren’t going away.
I wish they were, but they just aren’t.
Stay tuned for more of these posts as well as other exciting developments on the road to Mesa 23.1.