* Synchronization mostly missing?
@ 2009-12-28 3:41 Luca Barbieri
[not found] ` <ff13bc9a0912271941s5d465634te43213ea9df1d14f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Luca Barbieri @ 2009-12-28 3:41 UTC (permalink / raw)
To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
It seems that Noveau is assuming that once the FIFO pointer is past a
command, that command has finished executing, and all the buffers it
used are no longer needed.
However, this seems to be false at least on G71.
In particular, the card may not have even finished reading the input
vertex buffers when the pushbuffer "fence" triggers.
While Mesa does not reuse the buffer object itself, the current
allocator tends to return memory that has just been freed, resulting
in the buffer actually been reused.
Thus Mesa will overwrite the vertices before the GPU has used them.
This results in all kinds of artifacts, such as vertices going to
infinity, and random polygons appearing.
This can be seen in progs/demos/engine, progs/demos/dinoshade,
Blender, Extreme Tux Racer and probably any non-trivial OpenGL
software.
The problem can be significantly reduced by just adding a waiting loop
at the end of draw_arrays and draw_elements, or by synchronizing
drawing by adding and calling the following function instead of
pipe->flush in nv40_vbo.c:
I think the remaining artifacts may be due to missing 2D engine
synchronization, but I'm not sure how that works.
Note that this causes the CPU to wait for rendering, which is not the
correct solution
static void nv40_sync(struct nv40_context *nv40)
{
nouveau_notifier_reset(nv40->screen->sync, 0);
// BEGIN_RING(curie, 0x1d6c, 1);
// OUT_RING(0x5c0);
// static int value = 0x23;
// BEGIN_RING(curie, 0x1d70, 1);
// OUT_RING(value++);
BEGIN_RING(curie, NV40TCL_NOTIFY, 1);
OUT_RING(0);
BEGIN_RING(curie, NV40TCL_NOP, 1);
OUT_RING(0);
FIRE_RING(NULL);
nouveau_notifier_wait_status(nv40->screen->sync, 0, 0, 0);
}
It seems that NV40TCL_NOTIFY (which must be followed by a nop for some
reason) triggers a notification of rendering completion.
Furthermore, the card will probably put the value set with 0x1d70
somewhere, where 0x1d6c has an unknown use
The 1d70/1d6c is frequently used by the nVidia driver, with 0x1d70
being a sequence number, while 0x1d6c is always set to 0x5c0, while
NV40TCL_NOTIFY seems to be inserted on demand.
On my machine, setting 0x1d6c/0x1d70 like the nVidia driver does
causes a GPU lockup. That is probably because the location where the
GPU is supposed to put the value has not been setup correctly.
So it seems that the current model is wrong, and the current fence
should only be used to determine whether the pushbuffer itself can be
reused.
It seems that, after figuring out where the GPU writes the value and
how to use the mechanism properly, this should be used by the kernel
driver as the bo->sync_obj implementation.
This will delay destruction of the buffers, and thus prevent
reallocation of them, and artifacts, without synchronizing rendering.
I'm not sure why this hasn't been noticed before though.
Is everyone getting randomly misrendered OpenGL or is my machine
somehow more prone to reusing buffers?
What do you think? Is the analysis correct?
^ permalink raw reply [flat|nested] 10+ messages in thread[parent not found: <ff13bc9a0912271941s5d465634te43213ea9df1d14f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Synchronization mostly missing? [not found] ` <ff13bc9a0912271941s5d465634te43213ea9df1d14f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-12-28 4:41 ` Francisco Jerez [not found] ` <87d41zn20v.fsf-sGOZH3hwPm2sTnJN9+BGXg@public.gmane.org> 2009-12-28 5:50 ` Luca Barbieri 2009-12-28 7:27 ` Krzysztof Smiechowicz 2 siblings, 1 reply; 10+ messages in thread From: Francisco Jerez @ 2009-12-28 4:41 UTC (permalink / raw) To: Luca Barbieri; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1.1: Type: text/plain, Size: 3483 bytes --] Hi, Luca Barbieri <luca-Ukmtq+NC3rhBHFWNQifrYwC/G2K4zDHf@public.gmane.org> writes: > It seems that Noveau is assuming that once the FIFO pointer is past a > command, that command has finished executing, and all the buffers it > used are no longer needed. > > However, this seems to be false at least on G71. > In particular, the card may not have even finished reading the input > vertex buffers when the pushbuffer "fence" triggers. > While Mesa does not reuse the buffer object itself, the current > allocator tends to return memory that has just been freed, resulting > in the buffer actually been reused. > Thus Mesa will overwrite the vertices before the GPU has used them. > > This results in all kinds of artifacts, such as vertices going to > infinity, and random polygons appearing. > This can be seen in progs/demos/engine, progs/demos/dinoshade, > Blender, Extreme Tux Racer and probably any non-trivial OpenGL > software. > Can you reproduce this with your vertex buffers in VRAM instead of GART? (to rule out that it's a fencing issue). > The problem can be significantly reduced by just adding a waiting loop > at the end of draw_arrays and draw_elements, or by synchronizing > drawing by adding and calling the following function instead of > pipe->flush in nv40_vbo.c: > I think the remaining artifacts may be due to missing 2D engine > synchronization, but I'm not sure how that works. > Note that this causes the CPU to wait for rendering, which is not the > correct solution > > static void nv40_sync(struct nv40_context *nv40) > { > nouveau_notifier_reset(nv40->screen->sync, 0); > > // BEGIN_RING(curie, 0x1d6c, 1); > // OUT_RING(0x5c0); > > // static int value = 0x23; > // BEGIN_RING(curie, 0x1d70, 1); > // OUT_RING(value++); > > BEGIN_RING(curie, NV40TCL_NOTIFY, 1); > OUT_RING(0); > > BEGIN_RING(curie, NV40TCL_NOP, 1); > OUT_RING(0); > > FIRE_RING(NULL); > > nouveau_notifier_wait_status(nv40->screen->sync, 0, 0, 0); > } > > It seems that NV40TCL_NOTIFY (which must be followed by a nop for some > reason) triggers a notification of rendering completion. > Furthermore, the card will probably put the value set with 0x1d70 > somewhere, where 0x1d6c has an unknown use > The 1d70/1d6c is frequently used by the nVidia driver, with 0x1d70 > being a sequence number, while 0x1d6c is always set to 0x5c0, while > NV40TCL_NOTIFY seems to be inserted on demand. > On my machine, setting 0x1d6c/0x1d70 like the nVidia driver does > causes a GPU lockup. That is probably because the location where the > GPU is supposed to put the value has not been setup correctly. > > So it seems that the current model is wrong, and the current fence > should only be used to determine whether the pushbuffer itself can be > reused. > It seems that, after figuring out where the GPU writes the value and > how to use the mechanism properly, this should be used by the kernel > driver as the bo->sync_obj implementation. > This will delay destruction of the buffers, and thus prevent > reallocation of them, and artifacts, without synchronizing rendering. > > I'm not sure why this hasn't been noticed before though. > Is everyone getting randomly misrendered OpenGL or is my machine > somehow more prone to reusing buffers? > > What do you think? Is the analysis correct? > _______________________________________________ > Nouveau mailing list > Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > http://lists.freedesktop.org/mailman/listinfo/nouveau [-- Attachment #1.2: Type: application/pgp-signature, Size: 196 bytes --] [-- Attachment #2: Type: text/plain, Size: 181 bytes --] _______________________________________________ Nouveau mailing list Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <87d41zn20v.fsf-sGOZH3hwPm2sTnJN9+BGXg@public.gmane.org>]
* Re: Synchronization mostly missing? [not found] ` <87d41zn20v.fsf-sGOZH3hwPm2sTnJN9+BGXg@public.gmane.org> @ 2009-12-28 6:55 ` Luca Barbieri [not found] ` <ff13bc9a0912272255n13e3ab3wf1c126b0341045e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Luca Barbieri @ 2009-12-28 6:55 UTC (permalink / raw) To: Francisco Jerez; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW > Can you reproduce this with your vertex buffers in VRAM instead of GART? > (to rule out that it's a fencing issue). Putting the vertex buffers in VRAM makes things almost perfect, but still with rare artifacts. In particular, the yellow arrow in dinoshade sometimes becames a yellow polygon on the floor, which happens almost every frame if I move the window around. It does fix demos/engine, blender and etracer is almost perfect. Using my sync patch fixes demos/engine and demos/dinoshade, but still leaves artifacts in blender when moving the rectangle and artifacts in etracer. Putting the vertex buffers in VRAM _AND_ adding my sync patch makes things perfect on my system. Using sync + a delay loop before drawing makes things better but still problematic. Also note that both adding wbinvd in the kernel at the start of push buffer submission, running with "nopat" and synchronizing with the current fence in the kernel had no effect on demos/engine artifacts. Preventing loading of intel_agp did not seem to have any effect either (but strangely, it still listed the aperture size, not sure what's up there). The last test I tried was, all together: 1. My nv40_sync patch 2. 3 wbinvd followed by spinning 10000 times in the kernel at the start of pushbuffer validation 3. Adding BEGIN_RING(curie, NV40TCL_VTX_CACHE_INVALIDATE, 1); OUT_RING(0); before and after draw_elements and draw_arrays 4. Removing intel_agp The logo on etracer's splash screen still, on some frames, flickered. Only putting vertex buffers in VRAM fixed that. I'm not really sure what is happening there. It seems that there is the lack of synchronization plus some other problem. Maybe there is indeed an on-GPU cache for AGP/PCI memory which isn't getting flushed. Maybe NV40TCL_VTX_CACHE_INVALIDATE should be used but not in the way I did. I couldn't find it in renouveau traces, who did reverse engineer that? What does that do? Also, what happens when I remove intel_agp? Does it use PCI DMA? BTW, it seems to me that adding the fencing mechanism I described is necessary even if the vertices are read before the FIFO continues, since rendering is not completed and currently I don't see anything preventing TTM from, for instance, evicting the render buffer while it is being rendered to. ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <ff13bc9a0912272255n13e3ab3wf1c126b0341045e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Synchronization mostly missing? [not found] ` <ff13bc9a0912272255n13e3ab3wf1c126b0341045e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-12-28 7:15 ` Younes Manton [not found] ` <586c2acd0912272315o4e2858b5gffd76b24ad741ee9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Younes Manton @ 2009-12-28 7:15 UTC (permalink / raw) To: Luca Barbieri; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On Mon, Dec 28, 2009 at 1:55 AM, Luca Barbieri <luca-Ukmtq+NC3rhBHFWNQifrYwC/G2K4zDHf@public.gmane.org> wrote: >> Can you reproduce this with your vertex buffers in VRAM instead of GART? >> (to rule out that it's a fencing issue). > > Putting the vertex buffers in VRAM makes things almost perfect, but > still with rare artifacts. > In particular, the yellow arrow in dinoshade sometimes becames a > yellow polygon on the floor, which happens almost every frame if I > move the window around. > It does fix demos/engine, blender and etracer is almost perfect. > > Using my sync patch fixes demos/engine and demos/dinoshade, but still > leaves artifacts in blender when moving the rectangle and artifacts in > etracer. > > Putting the vertex buffers in VRAM _AND_ adding my sync patch makes > things perfect on my system. > > Using sync + a delay loop before drawing makes things better but still > problematic. > > Also note that both adding wbinvd in the kernel at the start of push > buffer submission, running with "nopat" and synchronizing with the > current fence in the kernel had no effect on demos/engine artifacts. > > Preventing loading of intel_agp did not seem to have any effect either > (but strangely, it still listed the aperture size, not sure what's up > there). > > The last test I tried was, all together: > 1. My nv40_sync patch > 2. 3 wbinvd followed by spinning 10000 times in the kernel at the > start of pushbuffer validation > 3. Adding > BEGIN_RING(curie, NV40TCL_VTX_CACHE_INVALIDATE, 1); > OUT_RING(0); > before and after draw_elements and draw_arrays > 4. Removing intel_agp > > The logo on etracer's splash screen still, on some frames, flickered. > Only putting vertex buffers in VRAM fixed that. > > I'm not really sure what is happening there. > > It seems that there is the lack of synchronization plus some other problem. > > Maybe there is indeed an on-GPU cache for AGP/PCI memory which isn't > getting flushed. > Maybe NV40TCL_VTX_CACHE_INVALIDATE should be used but not in the way I did. > I couldn't find it in renouveau traces, who did reverse engineer that? > What does that do? > > Also, what happens when I remove intel_agp? Does it use PCI DMA? > > BTW, it seems to me that adding the fencing mechanism I described is > necessary even if the vertices are read before the FIFO continues, > since rendering is not completed and currently I don't see anything > preventing TTM from, for instance, evicting the render buffer while it > is being rendered to. It's my understanding that once the FIFO get reg is past a certain point all previous commands are guaranteed to be finished, which is what our fencing is based on. I think we would all have corruption issues if this wasn't the case. You can see that the FIFO get ptr stops advancing after long running draw commands are submitted, and the video decoder FIFO works similarly as well when the HW is lagging. Anyhow, another person with a GF7 had the same problem and putting vertex buffers in VRAM also improved things for him, so it could be a hardware bug/quirk for some/all GF7s. We don't do it in general because it's slower, but as a temporary workaround we can do that for GF7 NV40s I guess. It likely also doesn't happen with immediate mode vertex submission, which will be implemented sooner or later. I can't reproduce it on my GF6 and I don't think anyone else has either. ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <586c2acd0912272315o4e2858b5gffd76b24ad741ee9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Synchronization mostly missing? [not found] ` <586c2acd0912272315o4e2858b5gffd76b24ad741ee9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-12-28 11:53 ` Christoph Bumiller 2009-12-28 13:21 ` Luca Barbieri ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Christoph Bumiller @ 2009-12-28 11:53 UTC (permalink / raw) To: Younes Manton; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Luca Barbieri On 12/28/2009 08:15 AM, Younes Manton wrote: > On Mon, Dec 28, 2009 at 1:55 AM, Luca Barbieri <luca-Ukmtq+NC3rhBHFWNQifrYwC/G2K4zDHf@public.gmane.org> wrote: >>> Can you reproduce this with your vertex buffers in VRAM instead of GART? >>> (to rule out that it's a fencing issue). >> >> Putting the vertex buffers in VRAM makes things almost perfect, but >> still with rare artifacts. >> In particular, the yellow arrow in dinoshade sometimes becames a >> yellow polygon on the floor, which happens almost every frame if I >> move the window around. >> It does fix demos/engine, blender and etracer is almost perfect. >> >> Using my sync patch fixes demos/engine and demos/dinoshade, but still >> leaves artifacts in blender when moving the rectangle and artifacts in >> etracer. >> >> Putting the vertex buffers in VRAM _AND_ adding my sync patch makes >> things perfect on my system. >> >> Using sync + a delay loop before drawing makes things better but still >> problematic. >> >> Also note that both adding wbinvd in the kernel at the start of push >> buffer submission, running with "nopat" and synchronizing with the >> current fence in the kernel had no effect on demos/engine artifacts. >> >> Preventing loading of intel_agp did not seem to have any effect either >> (but strangely, it still listed the aperture size, not sure what's up >> there). >> >> The last test I tried was, all together: >> 1. My nv40_sync patch >> 2. 3 wbinvd followed by spinning 10000 times in the kernel at the >> start of pushbuffer validation >> 3. Adding >> BEGIN_RING(curie, NV40TCL_VTX_CACHE_INVALIDATE, 1); >> OUT_RING(0); >> before and after draw_elements and draw_arrays >> 4. Removing intel_agp >> >> The logo on etracer's splash screen still, on some frames, flickered. >> Only putting vertex buffers in VRAM fixed that. >> >> I'm not really sure what is happening there. >> >> It seems that there is the lack of synchronization plus some other problem. >> >> Maybe there is indeed an on-GPU cache for AGP/PCI memory which isn't >> getting flushed. >> Maybe NV40TCL_VTX_CACHE_INVALIDATE should be used but not in the way I did. >> I couldn't find it in renouveau traces, who did reverse engineer that? >> What does that do? >> >> Also, what happens when I remove intel_agp? Does it use PCI DMA? >> >> BTW, it seems to me that adding the fencing mechanism I described is >> necessary even if the vertices are read before the FIFO continues, >> since rendering is not completed and currently I don't see anything >> preventing TTM from, for instance, evicting the render buffer while it >> is being rendered to. > > It's my understanding that once the FIFO get reg is past a certain > point all previous commands are guaranteed to be finished, which is > what our fencing is based on. I think we would all have corruption > issues if this wasn't the case. You can see that the FIFO get ptr > stops advancing after long running draw commands are submitted, and > the video decoder FIFO works similarly as well when the HW is lagging. > > Anyhow, another person with a GF7 had the same problem and putting > vertex buffers in VRAM also improved things for him, so it could be a > hardware bug/quirk for some/all GF7s. We don't do it in general It's probably not a card specific quirk, nv50 also has this kind of problem. It used to occur heavily when mesa uses user buffers that were copied / promoted to a GART buffers just for one rendering call and then immediately destroyed. Sleeping, flushing, WBINVD didn't help, so I unfortunately decided to go the easy route ... Since I moved to immediate mode submission (which is also *A LOT* faster in such cases + blob does it too; putting VBOs to VRAM was expectedly not helping and also really slow), it works fine. Almost. Some apps (e.g. tuxracer, ut2004demo) *still* show corrupted vertices sometimes even if they come from the FIFO, which is kind of odd. I have to investigate why that happens further though, and it might have a totally unrelated reason. Christoph > because it's slower, but as a temporary workaround we can do that for > GF7 NV40s I guess. It likely also doesn't happen with immediate mode > vertex submission, which will be implemented sooner or later. I can't > reproduce it on my GF6 and I don't think anyone else has either. > _______________________________________________ > Nouveau mailing list > Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Synchronization mostly missing? [not found] ` <586c2acd0912272315o4e2858b5gffd76b24ad741ee9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-12-28 11:53 ` Christoph Bumiller @ 2009-12-28 13:21 ` Luca Barbieri 2009-12-28 13:33 ` Francisco Jerez 2009-12-28 13:53 ` Luca Barbieri 3 siblings, 0 replies; 10+ messages in thread From: Luca Barbieri @ 2009-12-28 13:21 UTC (permalink / raw) To: Younes Manton; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW It looks like there are two bugs. One seems related to some kind of GPU cache of GART memory which does not get flushed, causes significant corruption and is worked around by putting buffers in VRAM, software TNL or immediate submission. It may be related to the NV40TCL_VTX_CACHE_INVALIDATE which is in nouveau_class.h but never used. Synchronizing and/or waiting after draw_arrays seems to improve things but does not fully solve them. However, there is another one, which is still present with buffers in VRAM but is eliminated if I add syncing with the DMA_FENCE mechanism at the end of draw_arrays and draw_elements. This one may be more widely reproducible. Try running two or more copies of mesa/progs/demos/dinoshade, all visible. Do you see a flashing yellow region on the floors? I do. If I add NV40TCL_NOTIFY or DMA_FENCE based synchronization, the problem disappears. This also happens if you move around the window, presumably due to the X server. It seems that kernel FIFO/M2MF-based fencing does indeed wait for rendering or at least vertex fetch, but that somehow works only if there is a single application running. If there are multiple applications, then the DMA_FENCE-based mechanism waits more and keeps working while kernel FIFO/M2MF-based fencing fails. I'm not sure why this is the case though. Using nVidia ctxprogs had no effect. The vram_pushbuf option caused an X lockup upon starting the demo. Another thing that comes to mind (purely speculative) is that the FIFO/M2MF synchronization may be due to the fact that GPU component that reads from the FIFO is the same one that reads the vertices or other data and is prioritizing that over reading commands, but having multiple active channels makes that no longer be the case. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Synchronization mostly missing? [not found] ` <586c2acd0912272315o4e2858b5gffd76b24ad741ee9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-12-28 11:53 ` Christoph Bumiller 2009-12-28 13:21 ` Luca Barbieri @ 2009-12-28 13:33 ` Francisco Jerez 2009-12-28 13:53 ` Luca Barbieri 3 siblings, 0 replies; 10+ messages in thread From: Francisco Jerez @ 2009-12-28 13:33 UTC (permalink / raw) To: Luca Barbieri; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1.1: Type: text/plain, Size: 4103 bytes --] Younes Manton <younes.m-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > On Mon, Dec 28, 2009 at 1:55 AM, Luca Barbieri <luca-Ukmtq+NC3rhBHFWNQifrYwC/G2K4zDHf@public.gmane.org> wrote: >>> Can you reproduce this with your vertex buffers in VRAM instead of GART? >>> (to rule out that it's a fencing issue). >> >> Putting the vertex buffers in VRAM makes things almost perfect, but >> still with rare artifacts. >> In particular, the yellow arrow in dinoshade sometimes becames a >> yellow polygon on the floor, which happens almost every frame if I >> move the window around. >> It does fix demos/engine, blender and etracer is almost perfect. >> >> Using my sync patch fixes demos/engine and demos/dinoshade, but still >> leaves artifacts in blender when moving the rectangle and artifacts in >> etracer. >> >> Putting the vertex buffers in VRAM _AND_ adding my sync patch makes >> things perfect on my system. >> >> Using sync + a delay loop before drawing makes things better but still >> problematic. >> >> Also note that both adding wbinvd in the kernel at the start of push >> buffer submission, running with "nopat" and synchronizing with the >> current fence in the kernel had no effect on demos/engine artifacts. >> To stay on the safe side, you should flush both before and after writing your vertex buffers (e.g. both at CPU_PREP and FINI). >> Preventing loading of intel_agp did not seem to have any effect either >> (but strangely, it still listed the aperture size, not sure what's up >> there). >> Some intel AGP chipsets are known to contain an evil write cache, adding drm_agp_chipset_flush() calls at random places in the kernel is something else you could try. >> The last test I tried was, all together: >> 1. My nv40_sync patch >> 2. 3 wbinvd followed by spinning 10000 times in the kernel at the >> start of pushbuffer validation >> 3. Adding >> BEGIN_RING(curie, NV40TCL_VTX_CACHE_INVALIDATE, 1); >> OUT_RING(0); >> before and after draw_elements and draw_arrays >> 4. Removing intel_agp >> >> The logo on etracer's splash screen still, on some frames, flickered. >> Only putting vertex buffers in VRAM fixed that. >> >> I'm not really sure what is happening there. >> >> It seems that there is the lack of synchronization plus some other problem. >> >> Maybe there is indeed an on-GPU cache for AGP/PCI memory which isn't >> getting flushed. >> Maybe NV40TCL_VTX_CACHE_INVALIDATE should be used but not in the way I did. >> I couldn't find it in renouveau traces, who did reverse engineer that? >> What does that do? >> >> Also, what happens when I remove intel_agp? Does it use PCI DMA? >> >> BTW, it seems to me that adding the fencing mechanism I described is >> necessary even if the vertices are read before the FIFO continues, >> since rendering is not completed and currently I don't see anything >> preventing TTM from, for instance, evicting the render buffer while it >> is being rendered to. > > It's my understanding that once the FIFO get reg is past a certain > point all previous commands are guaranteed to be finished, which is > what our fencing is based on. I think we would all have corruption > issues if this wasn't the case. You can see that the FIFO get ptr > stops advancing after long running draw commands are submitted, and > the video decoder FIFO works similarly as well when the HW is lagging. > Yeah, really, if PFIFO wasn't waiting for PGRAPH to finish its task before putting in the next command, your X server wouldn't stand a single minute alive. A fencing implementation based on notifiers or DMA_FENCEs is likely to exhibit the same corruption. > Anyhow, another person with a GF7 had the same problem and putting > vertex buffers in VRAM also improved things for him, so it could be a > hardware bug/quirk for some/all GF7s. We don't do it in general > because it's slower, but as a temporary workaround we can do that for > GF7 NV40s I guess. It likely also doesn't happen with immediate mode > vertex submission, which will be implemented sooner or later. I can't > reproduce it on my GF6 and I don't think anyone else has either. [-- Attachment #1.2: Type: application/pgp-signature, Size: 196 bytes --] [-- Attachment #2: Type: text/plain, Size: 181 bytes --] _______________________________________________ Nouveau mailing list Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Synchronization mostly missing? [not found] ` <586c2acd0912272315o4e2858b5gffd76b24ad741ee9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> ` (2 preceding siblings ...) 2009-12-28 13:33 ` Francisco Jerez @ 2009-12-28 13:53 ` Luca Barbieri 3 siblings, 0 replies; 10+ messages in thread From: Luca Barbieri @ 2009-12-28 13:53 UTC (permalink / raw) To: Younes Manton; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW It looks like there are two bugs. One seems related to some kind of cache of GART memory which does not get flushed, causes significant corruption and is worked around by putting buffers in VRAM. For some reason, adding syncing instead of putting buffers of VRAM does seem to greatly reduce the symptoms of this bug and fully removes them for some programs, for not for all. However, there is another one, which is still present with buffers in VRAM but is eliminated if I add syncing with the DMA_FENCE mechanism at the end of draw_arrays and draw_elements. This one may be more widely reproducible. Try running two or more copies of mesa/progs/demos/dinoshade, all visible. Do you see a flashing yellow region on the floor? I do. If I add NV40_TCL_NOTIFY or DMA_FENCE based synchronization, the problem disappears. This also happens if you move around the window, presumably due to the X server. It seems that M2MF/FIFO-based fencing does indeed work for our purposes, but only if there is a single application running. If there are multiple applications, then the 3D engine DMA_FENCE-based mechanism somehow waits more and keeps working while FIFO/M2MF-based fencing fails. I'm not sure why this is the case though. Using nVidia ctxprogs has no effect. Another things that comes to mind (purely speculative) is that the FIFO synchronization may be due to the fact that the GPU component that reads from the FIFO is the same one that reads the vertices or other data and it prioritizes that over reading commands, but having multiple contexts makes that no longer be the case. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Synchronization mostly missing? [not found] ` <ff13bc9a0912271941s5d465634te43213ea9df1d14f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-12-28 4:41 ` Francisco Jerez @ 2009-12-28 5:50 ` Luca Barbieri 2009-12-28 7:27 ` Krzysztof Smiechowicz 2 siblings, 0 replies; 10+ messages in thread From: Luca Barbieri @ 2009-12-28 5:50 UTC (permalink / raw) To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW I figured out the registers. There is a fence/sync mechanism which apparently triggers after rendering is finished. There are two ways to use it, but they trigger at the same time (spinning in a loop on the CPU checking them, they trigger at the same iteration or in two successive iterations). The first is the "sync" notifier, which involves a notifier object set at NV40TCL_DMA_NOTIFY. When NV40TCL_NOTIFY, with argument 0, followed by NV40TCL_NOP, with argument 0 is inserted in the ring, the notifier object will be notified when rendering is finished. fbcon uses this to sync rendering. Currently the Mesa driver sets an object but does not use it. The renouveau traces use this mechanism only in the EXT_framebuffer_object tests. It's not clear what the purpose of the NOP is, but it seems necessary. The second is the fence mechanism, which involves an object set at NV40TCL_DMA_FENCE. When register 0x1d70 is set, the value set there will be written to the object at the offset programmed in 0x1d6c. The offset in 0x1d6c must be 16-byte aligned, but the GPU seems to only write 4 bytes with the sequence number. Nouveau does not use this currently, and sets NV40TCL_DMA_FENCE to 0. The nVidia driver uses this often. It allocates a 4KB object and asks the GPU to put the sequence number always at offset 0x5c0. Why it does this rather than allocating a 16 byte object and using offset 0 is unknown. IMHO the fence mechanism should be implemented in the kernel along with the current FIFO fencing, and should protect the relocated buffer object. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Synchronization mostly missing? [not found] ` <ff13bc9a0912271941s5d465634te43213ea9df1d14f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-12-28 4:41 ` Francisco Jerez 2009-12-28 5:50 ` Luca Barbieri @ 2009-12-28 7:27 ` Krzysztof Smiechowicz 2 siblings, 0 replies; 10+ messages in thread From: Krzysztof Smiechowicz @ 2009-12-28 7:27 UTC (permalink / raw) To: Luca Barbieri; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Luca Barbieri pisze: > I'm not sure why this hasn't been noticed before though. > Is everyone getting randomly misrendered OpenGL or is my machine > somehow more prone to reusing buffers? I reported a similar problem about 2 weeks ago. It first became apparent with NV40 but I also confirmed it with NV30 - in both cases it was visible in morph3d demo. As long as nothing changes in memory allocation, everything is fine. If I even move a window(which causes some allocations in the system) vertexes become damaged. Some information from that previous emails: "" I see this problem on morph3d demo. What it does is: for each frame create a call list and then call it 4 times. ADDR VRAM OFFSET A X B Y C X A,B,C is the memory offset of 32kb buffer created for vertex buffer when call lists are compiled. X,Y is the VRAM OFFSET (bo.mem.mm_node.start) First buffer is created (X,A). When it gets full (after around 3 frames) second buffer is created (Y,B). Then first one is freed. When second buffer is full, third is created (X,C) - here the problem start: according to my observations, the card seems to read vertexes not from address C but from address A as if it somehow remembered the initial address binding. Other observations: - the data during execution of gl commands actually seems to be put into location C - when I switch to software path, I could track down that it reads data from location C - rendering is done correctly in software path - when I comment out freeing of memory manager node (bo.mem.mm_node), so that the third buffer is Z,C (paired with not yet used offset of VRAM) then hardware rendering behaves correctly - but this will make card "run out" of memory as no memory manager nodes will be deallocated - when I switch the calls of glCallList into actual rendering code and disable invocation of glNewList/glEndList the hardware rendering also behaves correctly "" Best regards, Krzysztof ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-12-28 13:53 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-28 3:41 Synchronization mostly missing? Luca Barbieri
[not found] ` <ff13bc9a0912271941s5d465634te43213ea9df1d14f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-12-28 4:41 ` Francisco Jerez
[not found] ` <87d41zn20v.fsf-sGOZH3hwPm2sTnJN9+BGXg@public.gmane.org>
2009-12-28 6:55 ` Luca Barbieri
[not found] ` <ff13bc9a0912272255n13e3ab3wf1c126b0341045e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-12-28 7:15 ` Younes Manton
[not found] ` <586c2acd0912272315o4e2858b5gffd76b24ad741ee9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-12-28 11:53 ` Christoph Bumiller
2009-12-28 13:21 ` Luca Barbieri
2009-12-28 13:33 ` Francisco Jerez
2009-12-28 13:53 ` Luca Barbieri
2009-12-28 5:50 ` Luca Barbieri
2009-12-28 7:27 ` Krzysztof Smiechowicz
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.