Slow Down
Once Again We Return Home
It’s been a while, but for the first time this year I have to do it. Some of you are shaking your heads, saying you knew it, and you were right. Here we are again.
It’s time to vkoverhead.
The Numbers Must Go Up
I realized while working on some P E R F that there was a lot of perf to be gained in places I wasn’t testing. That makes sense, right? If there’s no coverage, the perf can’t go up.
So I added a new case for the path I was using, and boy howdy did I start to see some weird stuff.
Normally this is where I’d post up some gorgeous flamegraphs, and we would sit back in our expensive leather armchairs debating the finer points of optimization. But you know what? We can’t do that anymore.
Why, you’re asking. The reason is simple: perf
is totally fucking broken and has been for a while. But only on certain machines. Specifically, mine. So no more flamegraphs for you, and none for me.
Despite this massive roadblock, the perf gains must continue. Through the power of guesswork and frustration, I’ve managed some sizable gains:
# | Draw Tests | 1000op/s (before) | % relative to ‘draw’ (before) | 1000op/s (after) | % relative to ‘draw’ (after) |
---|---|---|---|---|---|
0 | draw | 46298 | 100.0% | 46426 | 100.0% |
16 | vbo change | 17741 | 38.3% | 22413 | 48.3% |
17 | vbo change dynamic (new!) | 4544 | 9.8% | 8686 | 18.7% |
18 | 1vattrib change | 3021 | 6.5% | 3316 | 7.1% |
20 | 16vattrib 16vbo change | 5266 | 11.4% | 6398 | 13.8% |
21 | 16vattrib change | 2352 | 5.1% | 2512 | 5.4% |
22 | 16vattrib change dynamic | 3976 | 8.6% | 5003 | 10.8% |
Though I was mainly targeting the case of using dynamic vertex input and binding new vertex buffers for every draw (and managed a nearly 100% improvement there) , I ended up seeing noteworthy gains across the board for binding vertex buffers, even when using fully static state. This should provide some minor gains to general RADV perf.
Future Improvements
Given the still-massive perf gap between using static and dynamic vertex state when only vertex buffers change, it seems likely there’s still some opportunities to reclaim more perf. Only time will tell what can be achieved here, but for now this is what I’ve got.