All of lore.kernel.org
 help / color / mirror / Atom feed
* Fermi+ shader header docs
@ 2015-05-02 16:34 Ilia Mirkin
       [not found] ` <CAKb7Uvj=oisrfyGoehZFLtzWD38Nx_j435K2uZ9O5S8RXFayBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ilia Mirkin @ 2015-05-02 16:34 UTC (permalink / raw)
  To: gpu-public-documentation
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org

Hi,

As I'm looking to add some support to nouveau for features like atomic
counters and images, I'm running into some confusion about what the
first word of the shader header means. Here is the definition as we
have it today:

https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_shaders.xml

VS/HS/DS/GS:
<reg32 offset="0" name="0">
  <bitfield high="7" low="0" name="MAGIC">
    <value value="0x61" name="VP_MAGIC"/>
  </bitfield>
  <bitfield high="12" low="10" name="KIND" type="GF100_SHADER_KIND"/>
  <bitfield pos="16" name="GMEM_ENABLE"/>
  <bitfield pos="17" name="UNK17"/><!-- default 1 -->
  <bitfield pos="26" name="LMEM_ENABLE"/>
  <bitfield pos="27" name="FP64_ENABLE"/>
</reg32>

FS:
<reg32 offset="0" name="0">
  <bitfield high="7" low="0" name="MAGIC">
    <value value="0x62" name="FP_MAGIC"/>
  </bitfield>
  <bitfield high="12" low="10" name="KIND" type="GF100_SHADER_KIND"/>
  <bitfield pos="14" name="MULTIPLE_COLOR_OUTPUTS" type="boolean"/>
  <bitfield pos="15" name="USES_KIL" type="boolean"/>
  <bitfield pos="16" name="GMEM_ENABLE"/>
  <bitfield pos="17" name="UNK17"/><!-- default 1 -->
  <bitfield pos="26" name="LMEM_ENABLE"/>
  <bitfield pos="27" name="FP64_ENABLE"/>
</reg32>

However I know that these are somewhat wrong. I've seen shaders that
use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM enable
bit set (and they use no lmem). And I've seen additional bits set, esp
relating to images, but I haven't spent enough time looking at all the
variations to make sense of it yet. For example, I think that Fermi
and Kepler+ have different meanings for some of the bits.

I was hoping you could just release the docs for the shader headers,
or at least the first word of the shader header.

I know that it can take some time to get it all approved, but if you
think this is *not* information you can release, please let me know
soon so that I don't lose a month waiting.

Thanks,

  -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fermi+ shader header docs
       [not found] ` <CAKb7Uvj=oisrfyGoehZFLtzWD38Nx_j435K2uZ9O5S8RXFayBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-21 14:05   ` Robert Morell
       [not found]     ` <20150521140516.GA8516-f3YH7lVHJt/FT5IIyIEb6QC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Morell @ 2015-05-21 14:05 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	gpu-public-documentation

Hi Ilia,

On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote:
> Hi,
> 
> As I'm looking to add some support to nouveau for features like atomic
> counters and images, I'm running into some confusion about what the
> first word of the shader header means. Here is the definition as we
> have it today:

[...]

> However I know that these are somewhat wrong. I've seen shaders that
> use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM enable
> bit set (and they use no lmem). And I've seen additional bits set, esp
> relating to images, but I haven't spent enough time looking at all the
> variations to make sense of it yet. For example, I think that Fermi
> and Kepler+ have different meanings for some of the bits.

Those look pretty close :)

> I was hoping you could just release the docs for the shader headers,
> or at least the first word of the shader header.

We've posted the specification for the full Shader Program Header to our
GPU documentation site here:

ftp://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html

I hope it helps clear things up.

- Robert
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fermi+ shader header docs
       [not found]     ` <20150521140516.GA8516-f3YH7lVHJt/FT5IIyIEb6QC/G2K4zDHf@public.gmane.org>
@ 2015-05-21 15:32       ` Ilia Mirkin
       [not found]         ` <CAKb7UvgQRqh_8gBWmRJska-O-wR=+R8eaf9815FBv-Z0erjQ9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ilia Mirkin @ 2015-05-21 15:32 UTC (permalink / raw)
  To: Robert Morell
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	gpu-public-documentation

On Thu, May 21, 2015 at 10:05 AM, Robert Morell <rmorell@nvidia.com> wrote:
> Hi Ilia,
>
> On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote:
>> Hi,
>>
>> As I'm looking to add some support to nouveau for features like atomic
>> counters and images, I'm running into some confusion about what the
>> first word of the shader header means. Here is the definition as we
>> have it today:
>
> [...]
>
>> However I know that these are somewhat wrong. I've seen shaders that
>> use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM enable
>> bit set (and they use no lmem). And I've seen additional bits set, esp
>> relating to images, but I haven't spent enough time looking at all the
>> variations to make sense of it yet. For example, I think that Fermi
>> and Kepler+ have different meanings for some of the bits.
>
> Those look pretty close :)
>
>> I was hoping you could just release the docs for the shader headers,
>> or at least the first word of the shader header.
>
> We've posted the specification for the full Shader Program Header to our
> GPU documentation site here:
>
> ftp://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html
>
> I hope it helps clear things up.

Yep, just a few follow-up questions:

- SPH Type 1 and type 2 appear to be flipped wrt the tables -- "When
PS is used, field SphType in CommonWord0 must be set to 1; similarly,
when VTG is used, SphType in CommonWord0 must be set to 2." But the
"Table 1. SPH Type 1 Definition" is clearly meant for VTG and table 2
is clearly meant for PS...
- You skip over SassVersion -- what is that?
- You have a funny note in there -- "Triangles generated by the
geometry shader always have all their edge flags set to TRUE" -- that
is the *only* reference to edge flags in the whole document. Right now
we do some crazy thing to get edge flags right on fermi+ (and I think
we just get them wrong on tesla). Is there a way to emit edge flags
from vertex shader?
- To be clear: DoesLoadOrStore -- *any* load/store? Even LDC? ALD?

Thanks!

  -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fermi+ shader header docs
       [not found]         ` <CAKb7UvgQRqh_8gBWmRJska-O-wR=+R8eaf9815FBv-Z0erjQ9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-23 21:35           ` Ilia Mirkin
       [not found]             ` <CAKb7Uvik6eNQtAyAR7oRZm50z15Z1S2onWkuNH-bHbWJt5Wkkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ilia Mirkin @ 2015-05-23 21:35 UTC (permalink / raw)
  To: Robert Morell
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	gpu-public-documentation

On Thu, May 21, 2015 at 11:32 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> On Thu, May 21, 2015 at 10:05 AM, Robert Morell <rmorell@nvidia.com> wrote:
>> Hi Ilia,
>>
>> On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote:
>>> Hi,
>>>
>>> As I'm looking to add some support to nouveau for features like atomic
>>> counters and images, I'm running into some confusion about what the
>>> first word of the shader header means. Here is the definition as we
>>> have it today:
>>
>> [...]
>>
>>> However I know that these are somewhat wrong. I've seen shaders that
>>> use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM enable
>>> bit set (and they use no lmem). And I've seen additional bits set, esp
>>> relating to images, but I haven't spent enough time looking at all the
>>> variations to make sense of it yet. For example, I think that Fermi
>>> and Kepler+ have different meanings for some of the bits.
>>
>> Those look pretty close :)
>>
>>> I was hoping you could just release the docs for the shader headers,
>>> or at least the first word of the shader header.
>>
>> We've posted the specification for the full Shader Program Header to our
>> GPU documentation site here:
>>
>> ftp://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html
>>
>> I hope it helps clear things up.
>
> Yep, just a few follow-up questions:
>
> - SPH Type 1 and type 2 appear to be flipped wrt the tables -- "When
> PS is used, field SphType in CommonWord0 must be set to 1; similarly,
> when VTG is used, SphType in CommonWord0 must be set to 2." But the
> "Table 1. SPH Type 1 Definition" is clearly meant for VTG and table 2
> is clearly meant for PS...
> - You skip over SassVersion -- what is that?
> - You have a funny note in there -- "Triangles generated by the
> geometry shader always have all their edge flags set to TRUE" -- that
> is the *only* reference to edge flags in the whole document. Right now
> we do some crazy thing to get edge flags right on fermi+ (and I think
> we just get them wrong on tesla). Is there a way to emit edge flags
> from vertex shader?
> - To be clear: DoesLoadOrStore -- *any* load/store? Even LDC? ALD?

Oh, and one more little correction:

"""
The SPH field OutputTopology sets the primitive topology of the
vertices that are output from the pipe stage. This field is only used
with geometry shaders, where the value must be greater than zero and
has a maximum of 1024. The allowed values are: ... [the correct values
for OutputTopology]
"""

The 1024 thing seems like it probably applies to MaxOutputVertexCount
in CommonWord4.

  -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fermi+ shader header docs
       [not found]             ` <CAKb7Uvik6eNQtAyAR7oRZm50z15Z1S2onWkuNH-bHbWJt5Wkkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-23  1:10               ` Ilia Mirkin
       [not found]                 ` <CAKb7UviyZGJtxQXpbWqY1Hf4rzBW3oC2p9Yif_52L+7SqNpjkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ilia Mirkin @ 2015-06-23  1:10 UTC (permalink / raw)
  To: Robert Morell
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	gpu-public-documentation

And an additional question: I have a trace here where a reserved bit
from CommonWord0 is set. Is that just random values that aren't
cleared by the driver, or does it have some significance? Here is the
full shader:

HEADER:
0x06040461   0 = { SPH = VTG | VERSION = 3 | KIND = VP_B |
SASS_VERSION = 2 | LDST_ENABLE | SO_MASK = 0 | 0x2000000 }
0x00000000   1 = { LMEM_POS_ALLOC = 0 | PATCH_ATTRIBUTES = 0 }
0x00000000   2 = { LMEM_NEG_ALLOC = 0 | THREADS_PER_PRIM = 0 }
0x00000000   3 = { WARP_CSTACK_SIZE = 0 | OUTPUT_PRIM = 0 }
0x00000000   4 = { MAX_OUTPUT_VERTS = 0 | MIN_OUT_READ_SLOT = 0 |
MAX_OUT_READ_SLOT = 0 }
0x00000000   ATTR_EN_0 = 0
0x00000000   ATTR_EN_1 = 0
0x00000000   ATTR_EN_2 = 0
0x00000000   ATTR_EN_3 = 0
0x00000000   ATTR_EN_4 = 0
0x00000000   ATTR_EN_5 = { 0 }
0x00000000   11 = 0
0x00000000   12 = 0
0x0001f000   EXPORT_EN_0 = { HPOS = 0xf | 0x10000 }
0x00000000   EXPORT_EN_1 = 0
0x00000000   EXPORT_EN_2 = 0
0x00000000   EXPORT_EN_3 = 0
0x00000000   EXPORT_EN_4 = 0
0x00000000   EXPORT_EN_5 = { CLIP_DISTANCE = 0 | UNK12 = 0 }
0x00000000   19 = 0
CODE:
00000000: a01088b0 08bcb810     sched 0x2c 0x22 0x4 0x28 0x4 0x2e 0x2f
00000008: 0b1ffc1e 5b601c07     set $p0 0x1 ge u32 0x0 c0[0x3858]
00000010: 1000003c 12000000     $p0 bra 0x38
00000018: 0a1c0002 64c03c07     mov b32 $r0 c0[0x3850]
00000020: 0a9c0006 64c03c07     mov b32 $r1 c0[0x3854]
00000028: 001c0000 cc800000     ld b32 $r0 cg g[$r0d]
00000030: 041c003c 12000000     bra 0x40

00000038: 7f9c0002 e4c03c00  C  mov b32 $r0 0x0

00000040: 9c108010 090c8c10  C  sched 0x4 0x20 0x4 0x27 0x4 0x23 0x43
00000048: 001c2802 e5c00000     cvt rn f32 $r0 u32 $r0
00000050: 341c0006 64c03c00     mov b32 $r1 c0[0x1a0]
00000058: 349c000a 64c03c00     mov b32 $r2 c0[0x1a4]
00000060: 351c000e 64c03c00     mov b32 $r3 c0[0x1a8]
00000068: 359c0012 64c03c00     mov b32 $r4 c0[0x1ac]
00000070: 381ffc06 7f03fc00     st b32 a[0x70] $r1 0x0 0x0
00000078: 3a1ffc0a 7f03fc00     st b32 a[0x74] $r2 0x0 0x0
00000080: 3c110d0c 08000001     sched 0x43 0x43 0x4 0x4f 0x0 0x0 0x0
00000088: 3c1ffc0e 7f03fc00     st b32 a[0x78] $r3 0x0 0x0
00000090: 3e1ffc12 7f03fc00     st b32 a[0x7c] $r4 0x0 0x0
00000098: 401ffc02 7f03fc00     st b32 a[0x80] $r0 0x0 0x0
000000a0: 001c003c 18000000     exit

000000a8: fc1c003c 12007fff  C  bra 0xa8
000000b0: 001c3c02 85800000     nop
000000b8: 001c3c02 85800000     nop

On Sat, May 23, 2015 at 5:35 PM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> On Thu, May 21, 2015 at 11:32 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>> On Thu, May 21, 2015 at 10:05 AM, Robert Morell <rmorell@nvidia.com> wrote:
>>> Hi Ilia,
>>>
>>> On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote:
>>>> Hi,
>>>>
>>>> As I'm looking to add some support to nouveau for features like atomic
>>>> counters and images, I'm running into some confusion about what the
>>>> first word of the shader header means. Here is the definition as we
>>>> have it today:
>>>
>>> [...]
>>>
>>>> However I know that these are somewhat wrong. I've seen shaders that
>>>> use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM enable
>>>> bit set (and they use no lmem). And I've seen additional bits set, esp
>>>> relating to images, but I haven't spent enough time looking at all the
>>>> variations to make sense of it yet. For example, I think that Fermi
>>>> and Kepler+ have different meanings for some of the bits.
>>>
>>> Those look pretty close :)
>>>
>>>> I was hoping you could just release the docs for the shader headers,
>>>> or at least the first word of the shader header.
>>>
>>> We've posted the specification for the full Shader Program Header to our
>>> GPU documentation site here:
>>>
>>> ftp://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html
>>>
>>> I hope it helps clear things up.
>>
>> Yep, just a few follow-up questions:
>>
>> - SPH Type 1 and type 2 appear to be flipped wrt the tables -- "When
>> PS is used, field SphType in CommonWord0 must be set to 1; similarly,
>> when VTG is used, SphType in CommonWord0 must be set to 2." But the
>> "Table 1. SPH Type 1 Definition" is clearly meant for VTG and table 2
>> is clearly meant for PS...
>> - You skip over SassVersion -- what is that?
>> - You have a funny note in there -- "Triangles generated by the
>> geometry shader always have all their edge flags set to TRUE" -- that
>> is the *only* reference to edge flags in the whole document. Right now
>> we do some crazy thing to get edge flags right on fermi+ (and I think
>> we just get them wrong on tesla). Is there a way to emit edge flags
>> from vertex shader?
>> - To be clear: DoesLoadOrStore -- *any* load/store? Even LDC? ALD?
>
> Oh, and one more little correction:
>
> """
> The SPH field OutputTopology sets the primitive topology of the
> vertices that are output from the pipe stage. This field is only used
> with geometry shaders, where the value must be greater than zero and
> has a maximum of 1024. The allowed values are: ... [the correct values
> for OutputTopology]
> """
>
> The 1024 thing seems like it probably applies to MaxOutputVertexCount
> in CommonWord4.
>
>   -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fermi+ shader header docs
       [not found]                 ` <CAKb7UviyZGJtxQXpbWqY1Hf4rzBW3oC2p9Yif_52L+7SqNpjkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-08-14 19:48                   ` Ilia Mirkin
  0 siblings, 0 replies; 6+ messages in thread
From: Ilia Mirkin @ 2015-08-14 19:48 UTC (permalink / raw)
  To: Robert Morell
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	gpu-public-documentation

And as I've just started looking at GM107 traces to fix up
tessellation shader attribute address calculations, I noticed the
following unknown bits in CommonWord3 of TCP shaders:

PB: 0x00000021   GM107_3D.SP[0x2].SELECT = { ENABLE | PROGRAM = TCP }
PB: 0x00000830   GM107_3D.SP[0x2].START_ID = 0x830
HEADER:
0x04210861   0 = { SPH = VTG | VERSION = 3 | KIND = TCP | GMEM_STORE | SASS_VERS
0x06000000   1 = { LMEM_POS_ALLOC = 0 | PATCH_ATTRIBUTES = 6 }
0x03000000   2 = { LMEM_NEG_ALLOC = 0 | THREADS_PER_PRIM = 3 }
0x60000000   3 = { WARP_CSTACK_SIZE = 0 | 0x60000000 }
0xff000000   4 = { MIN_OUT_READ_SLOT = 0 | MAX_OUT_READ_SLOT = 0xff }
0xf0000000   ATTR_EN_0 = 0xf0000000
0x00000000   ATTR_EN_1 = 0
0x00000000   ATTR_EN_2 = 0
0x00000000   ATTR_EN_3 = 0
0x00000000   ATTR_EN_4 = 0
0x00000000   ATTR_EN_5 = { 0 }
0x00000000   11 = 0
0x00000000   12 = 0
0x0000f000   EXPORT_EN_0 = { HPOS = 0xf }
0x00000000   EXPORT_EN_1 = 0
0x00000000   EXPORT_EN_2 = 0
0x00000000   EXPORT_EN_3 = 0
0x00000000   EXPORT_EN_4 = 0
0x00000000   EXPORT_EN_5 = { CLIP_DISTANCE = 0 | UNK12 = 0 }
0x00000000   19 = 0

Anything that we need to also be setting?

  -ilia

On Mon, Jun 22, 2015 at 9:10 PM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> And an additional question: I have a trace here where a reserved bit
> from CommonWord0 is set. Is that just random values that aren't
> cleared by the driver, or does it have some significance? Here is the
> full shader:
>
> HEADER:
> 0x06040461   0 = { SPH = VTG | VERSION = 3 | KIND = VP_B |
> SASS_VERSION = 2 | LDST_ENABLE | SO_MASK = 0 | 0x2000000 }
> 0x00000000   1 = { LMEM_POS_ALLOC = 0 | PATCH_ATTRIBUTES = 0 }
> 0x00000000   2 = { LMEM_NEG_ALLOC = 0 | THREADS_PER_PRIM = 0 }
> 0x00000000   3 = { WARP_CSTACK_SIZE = 0 | OUTPUT_PRIM = 0 }
> 0x00000000   4 = { MAX_OUTPUT_VERTS = 0 | MIN_OUT_READ_SLOT = 0 |
> MAX_OUT_READ_SLOT = 0 }
> 0x00000000   ATTR_EN_0 = 0
> 0x00000000   ATTR_EN_1 = 0
> 0x00000000   ATTR_EN_2 = 0
> 0x00000000   ATTR_EN_3 = 0
> 0x00000000   ATTR_EN_4 = 0
> 0x00000000   ATTR_EN_5 = { 0 }
> 0x00000000   11 = 0
> 0x00000000   12 = 0
> 0x0001f000   EXPORT_EN_0 = { HPOS = 0xf | 0x10000 }
> 0x00000000   EXPORT_EN_1 = 0
> 0x00000000   EXPORT_EN_2 = 0
> 0x00000000   EXPORT_EN_3 = 0
> 0x00000000   EXPORT_EN_4 = 0
> 0x00000000   EXPORT_EN_5 = { CLIP_DISTANCE = 0 | UNK12 = 0 }
> 0x00000000   19 = 0
> CODE:
> 00000000: a01088b0 08bcb810     sched 0x2c 0x22 0x4 0x28 0x4 0x2e 0x2f
> 00000008: 0b1ffc1e 5b601c07     set $p0 0x1 ge u32 0x0 c0[0x3858]
> 00000010: 1000003c 12000000     $p0 bra 0x38
> 00000018: 0a1c0002 64c03c07     mov b32 $r0 c0[0x3850]
> 00000020: 0a9c0006 64c03c07     mov b32 $r1 c0[0x3854]
> 00000028: 001c0000 cc800000     ld b32 $r0 cg g[$r0d]
> 00000030: 041c003c 12000000     bra 0x40
>
> 00000038: 7f9c0002 e4c03c00  C  mov b32 $r0 0x0
>
> 00000040: 9c108010 090c8c10  C  sched 0x4 0x20 0x4 0x27 0x4 0x23 0x43
> 00000048: 001c2802 e5c00000     cvt rn f32 $r0 u32 $r0
> 00000050: 341c0006 64c03c00     mov b32 $r1 c0[0x1a0]
> 00000058: 349c000a 64c03c00     mov b32 $r2 c0[0x1a4]
> 00000060: 351c000e 64c03c00     mov b32 $r3 c0[0x1a8]
> 00000068: 359c0012 64c03c00     mov b32 $r4 c0[0x1ac]
> 00000070: 381ffc06 7f03fc00     st b32 a[0x70] $r1 0x0 0x0
> 00000078: 3a1ffc0a 7f03fc00     st b32 a[0x74] $r2 0x0 0x0
> 00000080: 3c110d0c 08000001     sched 0x43 0x43 0x4 0x4f 0x0 0x0 0x0
> 00000088: 3c1ffc0e 7f03fc00     st b32 a[0x78] $r3 0x0 0x0
> 00000090: 3e1ffc12 7f03fc00     st b32 a[0x7c] $r4 0x0 0x0
> 00000098: 401ffc02 7f03fc00     st b32 a[0x80] $r0 0x0 0x0
> 000000a0: 001c003c 18000000     exit
>
> 000000a8: fc1c003c 12007fff  C  bra 0xa8
> 000000b0: 001c3c02 85800000     nop
> 000000b8: 001c3c02 85800000     nop
>
> On Sat, May 23, 2015 at 5:35 PM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>> On Thu, May 21, 2015 at 11:32 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>>> On Thu, May 21, 2015 at 10:05 AM, Robert Morell <rmorell@nvidia.com> wrote:
>>>> Hi Ilia,
>>>>
>>>> On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote:
>>>>> Hi,
>>>>>
>>>>> As I'm looking to add some support to nouveau for features like atomic
>>>>> counters and images, I'm running into some confusion about what the
>>>>> first word of the shader header means. Here is the definition as we
>>>>> have it today:
>>>>
>>>> [...]
>>>>
>>>>> However I know that these are somewhat wrong. I've seen shaders that
>>>>> use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM enable
>>>>> bit set (and they use no lmem). And I've seen additional bits set, esp
>>>>> relating to images, but I haven't spent enough time looking at all the
>>>>> variations to make sense of it yet. For example, I think that Fermi
>>>>> and Kepler+ have different meanings for some of the bits.
>>>>
>>>> Those look pretty close :)
>>>>
>>>>> I was hoping you could just release the docs for the shader headers,
>>>>> or at least the first word of the shader header.
>>>>
>>>> We've posted the specification for the full Shader Program Header to our
>>>> GPU documentation site here:
>>>>
>>>> ftp://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html
>>>>
>>>> I hope it helps clear things up.
>>>
>>> Yep, just a few follow-up questions:
>>>
>>> - SPH Type 1 and type 2 appear to be flipped wrt the tables -- "When
>>> PS is used, field SphType in CommonWord0 must be set to 1; similarly,
>>> when VTG is used, SphType in CommonWord0 must be set to 2." But the
>>> "Table 1. SPH Type 1 Definition" is clearly meant for VTG and table 2
>>> is clearly meant for PS...
>>> - You skip over SassVersion -- what is that?
>>> - You have a funny note in there -- "Triangles generated by the
>>> geometry shader always have all their edge flags set to TRUE" -- that
>>> is the *only* reference to edge flags in the whole document. Right now
>>> we do some crazy thing to get edge flags right on fermi+ (and I think
>>> we just get them wrong on tesla). Is there a way to emit edge flags
>>> from vertex shader?
>>> - To be clear: DoesLoadOrStore -- *any* load/store? Even LDC? ALD?
>>
>> Oh, and one more little correction:
>>
>> """
>> The SPH field OutputTopology sets the primitive topology of the
>> vertices that are output from the pipe stage. This field is only used
>> with geometry shaders, where the value must be greater than zero and
>> has a maximum of 1024. The allowed values are: ... [the correct values
>> for OutputTopology]
>> """
>>
>> The 1024 thing seems like it probably applies to MaxOutputVertexCount
>> in CommonWord4.
>>
>>   -ilia
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-14 19:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-02 16:34 Fermi+ shader header docs Ilia Mirkin
     [not found] ` <CAKb7Uvj=oisrfyGoehZFLtzWD38Nx_j435K2uZ9O5S8RXFayBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-21 14:05   ` Robert Morell
     [not found]     ` <20150521140516.GA8516-f3YH7lVHJt/FT5IIyIEb6QC/G2K4zDHf@public.gmane.org>
2015-05-21 15:32       ` Ilia Mirkin
     [not found]         ` <CAKb7UvgQRqh_8gBWmRJska-O-wR=+R8eaf9815FBv-Z0erjQ9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-23 21:35           ` Ilia Mirkin
     [not found]             ` <CAKb7Uvik6eNQtAyAR7oRZm50z15Z1S2onWkuNH-bHbWJt5Wkkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-23  1:10               ` Ilia Mirkin
     [not found]                 ` <CAKb7UviyZGJtxQXpbWqY1Hf4rzBW3oC2p9Yif_52L+7SqNpjkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-14 19:48                   ` Ilia Mirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.