All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about nv40_draw_array
@ 2009-12-17 15:22 Krzysztof Smiechowicz
       [not found] ` <4B2A4CAE.2020302-5tc4TXWwyLM@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Krzysztof Smiechowicz @ 2009-12-17 15:22 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi,

My name is Krzysztof and currently I'm working on porting nouveau 
(gallium3d driver + libdrm + drm) to AROS Research OS 
(http://www.aros.org). I completed a quite successful port of "old" drm 
(one from libdrm git - now removed) and currently I'm working on drm 
port from the nouveau kernel tree git.

Right now I'm faced with rather peculiar memory allocation/access 
problem with call lists and I would like to ask for help in 
understanding how a certain thing is implemented.

Let's assume I have a following call trace:

#0  0x985a764d in nv40_draw_arrays (pipe=0x988ba0fc, mode=5, start=0, 
count=3)
     at 
/data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/../gallium/drivers/nv40/nv40_vbo.c:192
#1  0x9865e773 in st_draw_vbo (ctx=0x9a163788, arrays=0x9a19384c, 
prims=0x9a1d595c, nr_prims=23, ib=0x0,
     index_bounds_valid=1 '\001', min_index=0, max_index=574)
     at 
/data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/state_tracker/st_draw.c:698
#2  0x9876c5fa in vbo_save_playback_vertex_list (ctx=0x9a163788, 
data=0x9a1d5558)
     at 
/data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/vbo/vbo_save_draw.c:277
#3  0x985ed13a in execute_list (ctx=0x9a163788, list=1)
     at 
/data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/main/dlist.c:6438
#4  0x985f1871 in _mesa_CallList (list=1) at 
/data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/main/dlist.c:7622
#5  0x98657b4b in neutral_CallList (i=1) at 
/data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/main/vtxfmt_tmp.h:298
#6  0x9853e65a in glCallList (list=1) at 
/data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/glapi/glapitemp.h:95


at this point nv40->vtxbuf[0] contains a vertex buffer that was 
previously used to store a compiled call list.

My question is: how data from this buffer is being transfered to gfx 
card/used by gfx card.

I went through the software path "nv40_draw_elements_swtnl" and found a 
place in draw module where the buffer storage address is obtained and 
data from buffer is used direclty by software rendering. I cannot
however find a similar place for hardware path. I would like to learn 
where is the code that copies this data to gfx card or, if this is done 
by card reading into computer's memory, what code triggers the read, how 
does the gfx card know from which address in RAM to copy the data and 
what code indicates that the read finished.

The card in question is GF6200 AGP *but* it runs in PCI mode.

Any help is GREATLY appreciated as I'm stuck on my bug for a long time now.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question about nv40_draw_array
       [not found] ` <4B2A4CAE.2020302-5tc4TXWwyLM@public.gmane.org>
@ 2009-12-17 16:51   ` Christoph Bumiller
       [not found]     ` <4B2A6181.2010601-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Bumiller @ 2009-12-17 16:51 UTC (permalink / raw)
  To: Krzysztof Smiechowicz; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Krzysztof Smiechowicz schrieb:
> Hi,
> 
> My name is Krzysztof and currently I'm working on porting nouveau 
> (gallium3d driver + libdrm + drm) to AROS Research OS 
> (http://www.aros.org). I completed a quite successful port of "old" drm 
> (one from libdrm git - now removed) and currently I'm working on drm 
> port from the nouveau kernel tree git.
> 
> Right now I'm faced with rather peculiar memory allocation/access 
> problem with call lists and I would like to ask for help in 
> understanding how a certain thing is implemented.
> 
> Let's assume I have a following call trace:
> 
> #0  0x985a764d in nv40_draw_arrays (pipe=0x988ba0fc, mode=5, start=0, 
> count=3)
>      at 
> /data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/../gallium/drivers/nv40/nv40_vbo.c:192
> #1  0x9865e773 in st_draw_vbo (ctx=0x9a163788, arrays=0x9a19384c, 
> prims=0x9a1d595c, nr_prims=23, ib=0x0,
>      index_bounds_valid=1 '\001', min_index=0, max_index=574)
>      at 
> /data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/state_tracker/st_draw.c:698
> #2  0x9876c5fa in vbo_save_playback_vertex_list (ctx=0x9a163788, 
> data=0x9a1d5558)
>      at 
> /data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/vbo/vbo_save_draw.c:277
> #3  0x985ed13a in execute_list (ctx=0x9a163788, list=1)
>      at 
> /data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/main/dlist.c:6438
> #4  0x985f1871 in _mesa_CallList (list=1) at 
> /data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/main/dlist.c:7622
> #5  0x98657b4b in neutral_CallList (i=1) at 
> /data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/main/vtxfmt_tmp.h:298
> #6  0x9853e65a in glCallList (list=1) at 
> /data/deadwood/AROS/AROS/contrib/gfx/libs/mesa/src/mesa/glapi/glapitemp.h:95
> 
> 
> at this point nv40->vtxbuf[0] contains a vertex buffer that was 
> previously used to store a compiled call list.
> 
> My question is: how data from this buffer is being transfered to gfx 
> card/used by gfx card.
> 
Hi.
Most probably the state tracker calls pipe_buffer_map on the vertex
buffer which (if it was not created as a user buffer) causes an mmap
of it to the user's address space (so either GART system memory pages or
VRAM pages through the FB aperture get mapped, whatever was selected in
drivers/nouveau/nouveau_screen.c), then just writes the data and
subsequently unmaps again.

This seems to work, at least for buffers that are somewhat persistent.
On nv50 I experienced, well, problems with user buffers (those that are
just used for a few draw calls before being destroyed again, and have
not previously been in GPU accessible RAM).
That's why nv50 uses immediate mode for those.
> I went through the software path "nv40_draw_elements_swtnl" and found a 
> place in draw module where the buffer storage address is obtained and 
> data from buffer is used direclty by software rendering. I cannot
> however find a similar place for hardware path. I would like to learn 
> where is the code that copies this data to gfx card or, if this is done 
> by card reading into computer's memory, what code triggers the read, how 
> does the gfx card know from which address in RAM to copy the data and 
> what code indicates that the read finished.
The vertex buffers are set up in nv40_vbo_validate, which records a
state object to be emitted on validation.
The address is set with method NV40TCL_VTXBUF_ADDRESS(i). We output a
relocation, so the kernel side fills in the appropriate address for us,
which might have changed if the buffer was moved (which can happen at
any time, and *will* happen if you reloc to a different type of memory).

The read is triggered by NV40TCL_BEGIN_END + NV40TCL_VB_VERTEX_BATCH,
and it will probably be done reading when the GET pointer of the FIFO
has moved past the command.
We do some kind of fencing in the kernel and probably make sure the
buffer isn't deleted or moved until the card is done with it, I'm not
really familiar with the details, would have to read up on some code.
> The card in question is GF6200 AGP *but* it runs in PCI mode.
> 
> Any help is GREATLY appreciated as I'm stuck on my bug for a long time now.
> 
> Best regards,
> Krzysztof
> _______________________________________________
> Nouveau mailing list
> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau

If that wasn't detailed enough, confusing or whatever, I hope someone
else will answer. Or come to IRC.

Christoph

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question about nv40_draw_array
       [not found]     ` <4B2A6181.2010601-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
@ 2009-12-17 18:11       ` Krzysztof Smiechowicz
       [not found]         ` <4B2A746B.60603-5tc4TXWwyLM@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Krzysztof Smiechowicz @ 2009-12-17 18:11 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Christoph Bumiller pisze:
> Hi.

Hi, thanks for the quick feedback. :)

> Most probably the state tracker calls pipe_buffer_map on the vertex
> buffer which (if it was not created as a user buffer) causes an mmap
> of it to the user's address space (so either GART system memory pages or
> VRAM pages through the FB aperture get mapped, whatever was selected in
> drivers/nouveau/nouveau_screen.c), then just writes the data and
> subsequently unmaps again.

This is what I found "inside" the nv40_draw_elements_swtnl, but I can't 
find this in case of hardware path.


>> I went through the software path "nv40_draw_elements_swtnl" and found a 
>> place in draw module where the buffer storage address is obtained and 
>> data from buffer is used direclty by software rendering. I cannot
>> however find a similar place for hardware path. I would like to learn 
>> where is the code that copies this data to gfx card or, if this is done 
>> by card reading into computer's memory, what code triggers the read, how 
>> does the gfx card know from which address in RAM to copy the data and 
>> what code indicates that the read finished.
> The vertex buffers are set up in nv40_vbo_validate, which records a
> state object to be emitted on validation.
> The address is set with method NV40TCL_VTXBUF_ADDRESS(i). We output a
> relocation, 

I assume by relocation you mean this code:

nv40_vbo.c, 536

		so_reloc(vtxbuf, nouveau_bo(vb->buffer),
				 vb->buffer_offset + ve->src_offset,
				 vb_flags | NOUVEAU_BO_LOW | NOUVEAU_BO_OR,
				 0, NV40TCL_VTXBUF_ADDRESS_DMA1);


so the kernel side fills in the appropriate address for us,

Can you tell me where this filling happens? Where does the kernel put 
this address (some buffer, card registers?) - maybe I can read it and 
validate? I assume it should put the actually memory address at which 
read should start? Or maybe address of beginning of buffer and offset in 
another "argument"?

> 
> The read is triggered by NV40TCL_BEGIN_END + NV40TCL_VB_VERTEX_BATCH,
> and it will probably be done reading when the GET pointer of the FIFO
> has moved past the command.

I assume the read will happen after pipe->flush() and not immediatelly 
after:

			BEGIN_RING(curie, NV40TCL_VB_VERTEX_BATCH, 1);
			OUT_RING  (((nr - 1) << 24) | start);




Let me describe the bug I'm facing - my suspicion is that this is caused 
by bug in my porting of TTM and not a bug in nouveau itself. I have some 
parts of code still commented out as they are linux specific and no 
immediate mapping to AROS structures could be made. Also I didn't had 
these problems on "old" drm port.

I see this problem on morph3d demo. What it does is: for each frame 
create a call list and then call it 4 times.

ADDR	VRAM OFFSET
A	X
B	Y
C	X

A,B,C is the memory offset of 32kb buffer created for vertex buffer when 
call lists are compiled. X,Y is the VRAM OFFSET (bo.mem.mm_node.start)

First buffer is created (X,A). When it gets full (after around 3 frames) 
second buffer is created (Y,B). Then first one is freed. When second 
buffer is full, third is created (X,C) - here the problem start:
according to my observations, the card seems to read vertexes not from 
address C but from address A as if it somehow remembered the initial 
address binding.

Other observations:
- the data during execution of gl commands actually seems to be put into 
location C - when I switch to software path, I could track down that it 
reads data from location C - rendering is done correctly in software path
- when I comment out freeing of memory manager node (bo.mem.mm_node), so 
that the third buffer is Z,C (paired with not yet used offset of VRAM) 
then hardware rendering behaves correctly - but this will make card "run 
out" of memory as no memory manager nodes will be deallocated
- when I switch the calls of glCallList into actual rendering code and 
disable invocation of glNewList/glEndList the hardware rendering also 
behaves correctly

Any help is appreciated.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question about nv40_draw_array
       [not found]         ` <4B2A746B.60603-5tc4TXWwyLM@public.gmane.org>
@ 2009-12-17 21:26           ` Christoph Bumiller
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Bumiller @ 2009-12-17 21:26 UTC (permalink / raw)
  To: Krzysztof Smiechowicz; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 17.12.2009 19:11, Krzysztof Smiechowicz wrote:
> Christoph Bumiller pisze:
>   
>> Hi.
>>     
> Hi, thanks for the quick feedback. :)
>
>   
>> Most probably the state tracker calls pipe_buffer_map on the vertex
>> buffer which (if it was not created as a user buffer) causes an mmap
>> of it to the user's address space (so either GART system memory pages or
>> VRAM pages through the FB aperture get mapped, whatever was selected in
>> drivers/nouveau/nouveau_screen.c), then just writes the data and
>> subsequently unmaps again.
>>     
> This is what I found "inside" the nv40_draw_elements_swtnl, but I can't 
> find this in case of hardware path.
>
>   
If the state tracker uses a buffer with PIPE_BUFFER_USAGE_VERTEX,
the data will be in GPU accessible memory already, i.e. there will be
a nouveau_bo_kalloc'd / TTM buffer object.
If it uses a user buffer, a kernel side bo is allocate in bo_emit_buffer
and the data is memcpy'd there.
>>> I went through the software path "nv40_draw_elements_swtnl" and found a 
>>> place in draw module where the buffer storage address is obtained and 
>>> data from buffer is used direclty by software rendering. I cannot
>>> however find a similar place for hardware path. I would like to learn 
>>> where is the code that copies this data to gfx card or, if this is done 
>>> by card reading into computer's memory, what code triggers the read, how 
>>> does the gfx card know from which address in RAM to copy the data and 
>>> what code indicates that the read finished.
>>>       
>> The vertex buffers are set up in nv40_vbo_validate, which records a
>> state object to be emitted on validation.
>> The address is set with method NV40TCL_VTXBUF_ADDRESS(i). We output a
>> relocation, 
>>     
> I assume by relocation you mean this code:
>
> nv40_vbo.c, 536
>
> 		so_reloc(vtxbuf, nouveau_bo(vb->buffer),
> 				 vb->buffer_offset + ve->src_offset,
> 				 vb_flags | NOUVEAU_BO_LOW | NOUVEAU_BO_OR,
> 				 0, NV40TCL_VTXBUF_ADDRESS_DMA1);
>
>
>   
> so the kernel side fills in the appropriate address for us,
>
> Can you tell me where this filling happens? Where does the kernel put 
> this address (some buffer, card registers?) - maybe I can read it and 
> validate? I assume it should put the actually memory address at which 
> read should start? Or maybe address of beginning of buffer and offset in 
> another "argument"?
>
>   
The address is put into the FIFO == command buffer, which is
submitted to the kernel for emission on FIRE_RING.
The address is the start address of the buffer + the data
argument (in the above case vb->buffer_offset + ve->src_offset)
for nouveau_pushbuf_emit_reloc.
The drm_nouveau_gem_pushbuf_bo struct contains that
data and an index in the command buffer the address is to
be placed, this will also be handed to the kernel.
Look at nouveau_gem_ioctl_pushbuf_* in the kernel's
nouveau_gem.c, there's nouveau_gem_pushbuf_reloc_apply
and friends which will handle relocation.
I don't know how their exact proceedings.

The reloc above contains the actual start address of the
vertex element, so there's no other method/reg used to
set an additional offset as far as I can see.
>> The read is triggered by NV40TCL_BEGIN_END + NV40TCL_VB_VERTEX_BATCH,
>> and it will probably be done reading when the GET pointer of the FIFO
>> has moved past the command.
>>     
> I assume the read will happen after pipe->flush() and not immediatelly 
> after:
>
> 			BEGIN_RING(curie, NV40TCL_VB_VERTEX_BATCH, 1);
> 			OUT_RING  (((nr - 1) << 24) | start);
>
>
>
>   
Right.
> Let me describe the bug I'm facing - my suspicion is that this is caused 
> by bug in my porting of TTM and not a bug in nouveau itself. I have some 
> parts of code still commented out as they are linux specific and no 
> immediate mapping to AROS structures could be made. Also I didn't had 
> these problems on "old" drm port.
>
> I see this problem on morph3d demo. What it does is: for each frame 
> create a call list and then call it 4 times.
>
> ADDR	VRAM OFFSET
> A	X
> B	Y
> C	X
>
> A,B,C is the memory offset of 32kb buffer created for vertex buffer when 
> call lists are compiled. X,Y is the VRAM OFFSET (bo.mem.mm_node.start)
>
> First buffer is created (X,A). When it gets full (after around 3 frames) 
> second buffer is created (Y,B). Then first one is freed. When second 
> buffer is full, third is created (X,C) - here the problem start:
> according to my observations, the card seems to read vertexes not from 
> address C but from address A as if it somehow remembered the initial 
> address binding.
>
>   
So the actual VRAM (or GART) addresses are X + A, Y + B, X + C;
If they're all different and you get the vertices from X + A either
you have a bug or if X + A == X + C stuff is not written, or maybe
the GPU has cached that area somehow (even if the CPU has flushed).
I know that the kernel should take care of all cache flushing, but,
I made the same observation on nv50 that this somehow doesn't
seem to always work.
But then, maybe I was too quick to blame caches and should
try to find out if it isn't actually some other bug ...

From renouveau dumps it doesn't look like you need to insert
a fence in the command stream when you change the vtxbuf
addresses, on nv50 that is, I don't know about nv40, sorry.
But it *did* look like the blob inserts a fence after *uploading*
new data to a vertex buffer. That's why there's method 0x142c
in nv50_draw_arrays.
> Other observations:
> - the data during execution of gl commands actually seems to be put into 
> location C - when I switch to software path, I could track down that it 
> reads data from location C - rendering is done correctly in software path
> - when I comment out freeing of memory manager node (bo.mem.mm_node), so 
> that the third buffer is Z,C (paired with not yet used offset of VRAM) 
> then hardware rendering behaves correctly - but this will make card "run 
> out" of memory as no memory manager nodes will be deallocated
> - when I switch the calls of glCallList into actual rendering code and 
> disable invocation of glNewList/glEndList the hardware rendering also 
> behaves correctly
>
> Any help is appreciated.
>
> Best regards,
> Krzysztof
>
> _______________________________________________
> Nouveau mailing list
> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau
>   

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-12-17 21:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-17 15:22 Question about nv40_draw_array Krzysztof Smiechowicz
     [not found] ` <4B2A4CAE.2020302-5tc4TXWwyLM@public.gmane.org>
2009-12-17 16:51   ` Christoph Bumiller
     [not found]     ` <4B2A6181.2010601-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
2009-12-17 18:11       ` Krzysztof Smiechowicz
     [not found]         ` <4B2A746B.60603-5tc4TXWwyLM@public.gmane.org>
2009-12-17 21:26           ` Christoph Bumiller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.