> > A lot of our options do change how shaders work though, like forcing a shade...

Cieric · on Feb 10, 2025

In the case that does happen, then we don't apply that setting. Most of the changes applied are extensively tested and toggles like that are more often used for already broken shaders.

chrisjj · on Feb 11, 2025

> In the case that does happen

You meant "in the case we find where that does happen"?

Little consolation for the cases you don't find because e.g. you cannot afford to play a game to the end.

Cieric · on Feb 13, 2025

Fair enough, I can't say anything I've done has ever caused an issue like that (A new ticket would have been made and sent to me.) But I also can't say that it has never happened, so I'm not really in a position to disagree. We do have a good QA team though and we have an "open" beta program that also catches a lot of issues before they become more widely public.

chrisjj · on Feb 13, 2025

I for one would be most unhappy to find my game failing on card X due to drivers perverting my float type.

david-gpu · on Feb 9, 2025

Obviously, which is the reason you don't do something like that without appropriate amounts of testing.

chrisjj · on Feb 10, 2025

Testing on platforms that perverts the precision is outside appropriate, I would say.

Dylan16807 · on Feb 9, 2025

Code depending on specific low precision is very likely to be so fragile it won't make it anywhere near release.

account42 · on Feb 10, 2025

"Fragile shader code won't make it anywhwere near release" is a pretty bold claim.

If it happens to work with the Nvidia driver it's getting shipped.

Dylan16807 · on Feb 10, 2025

But since the premise is that any difference breaks the code, will it work the exact same way across different devices? That's what I'm skeptical of.

Unless every dev and tester that touches this shader is using the same hardware, which seems like an obvious mistake to avoid...

Cieric · on Feb 10, 2025

I will note, half of the customer facing bugs I get are "works on nvidia." Only to find out that it is a problem with the game and not the driver. Nvidia allows you to ignore a lot of the spec and it causes game devs to miss a lot of obvious bugs. A few examples:

1) Nvidia allows you to write to read only textures, game devs will forget to transition them to writable and will appear as corruption on other cards.

2) Nvidia automatically work with diverging texture reads, so devs will forget to mark them as a nonuniform resource index, which shows up as corruption on other cards.

3) Floating point calculations aren't IEEE compliant, one bug I fixed was x/width*width != x, On Nvidia this ends up a little higher and on our cards a little lower. The game this happened on ended up flooring that value and doing a texture read, which as you can guess, showed up as corruption on our cards.

1 and 2 are specifically required by the microsoft directx 12 spec, but most game devs aren't reading that and bugs creep in. 3 is a difference in how the ALU is designed, our cards being a little closer to IEEE compliant. A lot of these issue are related to how the hardware works, so stays pretty consistent between the different gpus of a manufacturer.

Side note: I don't blame the devs for #3, the corruption was super minor and the full calculation was spread across multiple functions (assumed by reading the dxil). The only reason it sticks out in my brain though is because the game devs were legally unable to ever update the game again, so I had to fix it driver side. That game was also Nvidia sponsored, so it's likely our cards weren't tested till very late into the development. (I got the ticket a week before the game was to release.) That is all I'm willing to say on that, I don't want to get myself in trouble.

Cieric · on Feb 11, 2025

> Floating point calculations aren't IEEE compliant

To late to edit, but I want to half retract this statement, they are IEEE compliant, but due to optimizations that can be applied by the driver developers they aren't guaranteed to be. This is assuming that the accuracy of a multiply and divide are specified in the IEEE floating point spec, I'm seeing hints that it is, but I can't find anything concrete.

chrisjj · on Feb 11, 2025

> the game devs were legally unable to ever update the game again, so I had to fix it driver side.

Could they not intercept the calls to inject a fix?

Cieric · on Feb 13, 2025

I'm just going off what I was told there, I was forced to make the fix since the game developers no longer were partnered to the company that owned the license to the content.

chrisjj · on Feb 11, 2025

PS Why is /width*width not eliminated on compilation?

Cieric · on Feb 13, 2025

Good question, I'm assuming it's due to the calculation happen across a memory barrier of some kind or due to all the branches in between so llvm is probably avoiding the optimization. It was quite a while ago so it is something I could re-investigate and actually try and fix. I would have to wait for downtime with all the other tickets I'm getting though. It's also just something that dxc itself should be doing, but I have no control over that.

chrisjj · on Feb 11, 2025

> But since the premise is that any difference breaks the code,

Not any.

> will it work the exact same way across different devices

Yes, where they run IEEE FP format.

Dylan16807 · on Feb 12, 2025

> Not any.

Just about any. It's pretty difficult to write code where changing the rounding of the last couple bits breaks it (as happens if you use wider types during the calculation), but other changes don't break it.

What real code have you seen with that behavior?

chrisjj · on Feb 14, 2025

> but other changes don't break it.

Why should that be a requirement? Obviously the driver can make other changes that can break any code.

Dylan16807 · on Feb 14, 2025

> Why should that be a requirement?

Originally I said "the premise is that any difference breaks the code".

You replied with "Not any."

That is where the requirement comes from, your own words. This is your scenario, and you said not all differences would break the hypothetical code.

This is your choice. Are we talking about code where any change breaks it (like a seeded/reproducible RNG), or are we talking about code where there are minor changes that don't break it but using extra precision breaks it? (I expect this category to be super duper rare)

chrisjj · on Feb 16, 2025

> Originally I said "the premise is that any difference breaks the code".

> You replied with "Not any."

> That is where the requirement comes from, your own words.

I wasn't stating a requirement. I was disputing your report of the premise.

> or are we talking about code where there are minor changes that don't break it but using extra precision breaks it?

Yup.

Dylan16807 · on Feb 19, 2025

> > or are we talking about code where there are minor changes that don't break it but using extra precision breaks it?

> Yup

Then I will ask again, have you ever seen code that falls into this category? I haven't. And an RNG would not fall into this category.

chrisjj · on Feb 10, 2025

Consider an RNG.

Dylan16807 · on Feb 10, 2025

I consider a floating point RNG in a shader that is seeded for reproducibility to be a bad idea.

chrisjj · on Feb 11, 2025

Why, pray?

Dylan16807 · on Feb 12, 2025

GPUs often have weird sizes for optimal behavior and you have to depend on math-based optimizations never being applied to your code.

In my opinion, floating point shaders should be treated as a land of approximations.

You asked in another comment why /width*width isn't optimized out by the compiler. But it's changes just like that that will break an RNG!

chrisjj · on Feb 14, 2025

> In my opinion, floating point shaders should be treated as a land of approximations.

Fine, but that leaves you responsible for the breakage the shader of an author that holds the opposite opinion, as he is entitled to do. Precision =/= accuracy.

Dylan16807 · on Feb 14, 2025

But do you want /width*width to be optimized? Or associative and commutative operations in general? Then you have to reject the opposite opinion.

chrisjj · on Feb 16, 2025

> But do you want /width*width to be optimized?

I do. But I don't see such optimisation as anything to with your changing float type.

Dylan16807 · on Feb 19, 2025

Changes like that will break just as much code as adding extra precision will. Because it will change how things round, and not much else, just like adding extra precision. They're both slightly disruptive, and they tend to disrupt the same kind of thing, unlike removing precision which is very disruptive all over.