Samstag, September 09, 2017

radeonsi: out-of-order rasterization on VI+

I've been polishing a patch of Marek to enable out-of-order rasterization on VI+. Assuming it goes through as planned, this will be the first time we're adding driver-specific drirc configuration options that are unfamiliar to the enthusiast community (there's radeonsi_enable_sisched already, but Phoronix has reported on the sisched option often enough). So I thought it makes sense to explain what those options are about.

Background: Out-of-order rasterization

Out-of-order rasterization is an optimization that can be enabled in some cases. Understanding it properly requires some background on how tasks are spread across shader engines (SEs) on Radeon GPUs.

The frontends (vertex processing, including tessellation and geometry shaders) and backends (fragment processing, including rasterization and depth and color buffers) are spread across SEs roughly like this:


(Not shown are the compute units (CUs) in each SE, which is where all shaders are actually executed.)

The input assembler distributes primitives (i.e., triangles) and their vertices across SEs in a mostly round-robin fashion for vertex processing. In the backend, work is distributed across SEs by on-screen location, because that improves cache locality.

This means that once the data of a triangle (vertex position and attributes) is complete, most likely the corresponding rasterization work needs to be distributed to other SEs. This is done by what I'm simplifying as the "crossbar" in the diagram.

OpenGL is very precise about the order in which the fixed-function parts of fragment processing should happen. If one triangle comes after another in a vertex buffer and they overlap, then the fragments of the second triangle better overwrite the corresponding fragments of the first triangle (if they weren't rejected by the depth test, of course). This means that the "crossbar" may have to delay forwarding primitives from a shader engine until all earlier primitives (which were processed in another shader engine) have been forwarded. This only happens rarely, but it's still sad when it does.

There are some cases in which the order of fragments doesn't matter. Depth pre-passes are a typical example: the order in which triangles are written to the depth buffer doesn't matter as long as the "front-most" fragments win in the end. Another example are some operations involved in stencil shadows.

Out-of-order rasterization simply means that the "crossbar" does not delay forwarding triangles. Triangles are instead forwarded immediately, which means that they can be rasterized out-of-order. With the in-progress patches, the driver recognizes cases where this optimization can be enabled safely.

By the way #1: From this explanation, you can immediately deduce that this feature only affects GPUs with multiple SEs. So integrated GPUs are not affected, for example.

By the way #2: Out-of-order rasterization is entirely disabled by setting R600_DEBUG=nooutoforder.


Why the configuration options?

There are some cases where the order of fragments almost doesn't matter. It turns out that the most common and basic type of rendering is one of these cases. This is when you're drawing triangles without blending and with a standard depth function like LEQUAL with depth writes enabled. Basically, this is what you learn to do in every first 3D programming tutorial.

In this case, the order of fragments is mostly irrelevant because of the depth test. However, it might happen that two triangles have the exact same depth value, and then the order matters. This is very unlikely in common scenes though. Setting the option radeonsi_assume_no_z_fights=true makes the driver assume that it indeed never happens, which means out-of-order rasterization can be enabled in the most common rendering mode!

Some other cases occur with blending. Some blending modes (though not the most common ones) are commutative in the sense that from a purely mathematical point of view, the end result of blending two triangles together is the same no matter which order they're blended in. Unfortunately, additive blending (which is one of those modes) involves floating point numbers in a way where changing the order of operations can lead to different rounding, which leads to subtly different results. Using out-of-order rasterization would break some of the guarantees the driver has to give for OpenGL conformance.

The option radeonsi_commutative_blend_add=true tells the driver that you don't care about these subtle errors and will lead to out-of-order rasterization being used in some additional cases (though again, those cases are rarer, and many games probably don't encounter them at all).

tl;dr

Out-of-order rasterization can give a very minor boost on multi-shader engine VI+ GPUs (meaning dGPUs, basically) in many games by default. In most games, you should be able to set radeonsi_assume_no_z_fights=true and radeonsi_commutative_blend_add=true to get an additional very minor boost. Those options aren't enabled by default because they can lead to incorrect results.