Thanks to Jeff Bolz for spotting the write-after-read hazard on the sh_flag accesses. This fixes observed failures on Nvidia Turing and Ampere on DX12.
We're following the policy of committing all translated shaders to the git repo rather than rebuilding at runtime. Here are the new DXIL ones.