From: Ben Skeggs <skeggsb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Maarten Maathuis <madman2003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH/TESTING(all hw)/DISCUSSION] FIFO (minor) create and (major) destroy instabilities on nv50+
Date: Thu, 07 Jan 2010 08:17:25 +1000 [thread overview]
Message-ID: <1262816245.2485.3.camel@nisroch> (raw)
In-Reply-To: <6d4bc9fc1001060958q3b7d0d5dka7bcd3843584d6e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Wed, 2010-01-06 at 18:58 +0100, Maarten Maathuis wrote:
> Patch v5 remains necessary (a simple swap of pfifo and pgraph unload
> isn't enough) even on a current kernel, the change is that it's now
> possible to generate pgraph errors without locking up. Without the
> patch even nop fails in loops, while running under fbcon.
Yes, the commit fixing the ctxprog hang wasn't intended to fix the
entire problem. I actually came across that issue while working on
something else, it just turns out to be one of the issues that effects
channel destruction too.
Adding a simple nouveau_wait_for_idle() after pgraph->fifo_access(dev,
false) is enough now to make it work *almost* all the time. Still
something else we're not waiting for, mdelay(50) lets me run
bitscan-fail in a loop for as long as I like without issue. I don't
really have any ideas atm of what it could be yet, but i'd *really*
rather fix it properly instead of hiding the problem away...
Ben.
>
> Maarten.
>
> On Tue, Jan 5, 2010 at 11:55 PM, Maarten Maathuis <madman2003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Tue, Jan 5, 2010 at 10:19 PM, Maarten Maathuis <madman2003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> On Tue, Jan 5, 2010 at 9:41 AM, Maarten Maathuis <madman2003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >>> On Tue, Jan 5, 2010 at 4:20 AM, Ben Skeggs <skeggsb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >>>> On Mon, 2010-01-04 at 23:54 +0100, Maarten Maathuis wrote:
> >>>>> I forgot to mention that you should run nop from fbcon without X
> >>>>> running for reliable lockups.
> >>>> Yup, that's what I've been doing.
> >>>>
> >>>>>
> >>>>> On Mon, Jan 4, 2010 at 11:39 PM, Ben Skeggs <skeggsb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >>>>> > On Mon, 2010-01-04 at 20:29 +0100, Maarten Maathuis wrote:
> >>>>> >> I've narrowed it down further, the "pgraph->fifo_access" bit is still
> >>>>> >> cleanup (register 0x400500 represents pgraph fifo access), the rest
> >>>>> >> appears needed for the desired effect. The reordering of pfifo and
> >>>>> >> pgraph destroy is needed. As usual, feedback is appreciated.
> >>>>> > I played a bit yesterday and have the gr/fifoctx unload ordering swap
> >>>>> > and queued up already, as well as unconditionally waiting on a fence at
> >>>>> > channel destroy (not really needed, but served as a bit of a cleanup
> >>>>> > anyway).
> >>>>> >
> >>>>> > I'll try and look at the rest of the changes.
> >>>>> >
> >>>> Mmm OK. The gr/fifoctx swap appears to just achieve a little extra
> >>>> delay before we hit the grctx unload, some of the other changes (the
> >>>> PGRAPH stuff in fifo channel disable specifically) work around the
> >>>> changed ordering.
> >>>>
> >>>> For an identical effect, add a nice mdelay(50) right before the
> >>>> pgraph->fifo_access(dev, false) in nouveau_channel_free().. We have a
> >>>> race.
> >>>
> >>> So what do you propose as the preferred solution?
> >>>
> >>>>
> >>>> Ben.
> >>>>> > Ben.
> >>>>> >>
> >>>>> >> Maarten.
> >>>>> >>
> >>>>> >> On Sat, Jan 2, 2010 at 4:36 PM, Maarten Maathuis <madman2003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >>>>> >> > Many people using nv50+ hardware are aware of gpu lockups when a fifo
> >>>>> >> > closes under certain conditions. Based on a mmio-trace and some trail
> >>>>> >> > and error testing i've come up with a patch that improves the
> >>>>> >> > situation on my NV96.
> >>>>> >> >
> >>>>> >> > This patch needs testing on NV50+ hardware and regression testing on
> >>>>> >> > older hardware, since i did change some of the common codepaths. This
> >>>>> >> > is very much a work in progress, and if you have anything to
> >>>>> >> > add/correct, please share it.
> >>>>> >> >
> >>>>> >> > I've also attached a 2 test apps, once is bitscan-fail from mwk, use
> >>>>> >> > it like ./bitscan-fail 0x200 to trigger PGRAPH errors. A modified
> >>>>> >> > version only emits NOPs (method 0x100) and represents the no error
> >>>>> >> > situation.
> >>>>> >> >
> >>>>> >> > For me, i can run the NOP program in loops of 10000 iterations with no
> >>>>> >> > problems (i've done so several times), the bitscan-fail survives 10000
> >>>>> >> > iterations sometimes, but can also fail after a few thousand. In
> >>>>> >> > comparison, a single run of bitscan-fail could cause a gpu lockup for
> >>>>> >> > me in the past.
> >>>>> >> >
> >>>>> >> > Please try the gallium driver, the test apps, suspend to ram. Suspend
> >>>>> >> > to ram isn't 100% reliable yet for me (this was always the case after
> >>>>> >> > strange experiments/hammering/etc), but should not regress. This goes
> >>>>> >> > for older hw as well, whatever worked should still work, but i
> >>>>> >> > wouldn't expect serious improvements there.
> >>>>> >> >
> >>>>> >> > As always, feedback is appreciated, especially since this is a touchy subject.
> >>>>> >> >
> >>>>> >> > Maarten.
> >>>>> >> >
> >>>>> >> _______________________________________________
> >>>>> >> Nouveau mailing list
> >>>>> >> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> >>>>> >> http://lists.freedesktop.org/mailman/listinfo/nouveau
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >> I've isolated a small part of a mmiotrace, which is one of the few
> >> cases where bit28 of 0x40032c is unset. The end is most interesting,
> >> the beginning is just to be sure everything is there. Maybe it helps.
> >>
> >> W 4 543.049438 3 0xc6100c80 0x50001 0x0 0
> >> R 4 543.049496 3 0xc6100c80 0x50000 0x0 0
> >> R 4 543.049548 3 0xc6400500 0x10010001 0x0 0
> >> R 4 543.049596 3 0xc6400500 0x10010001 0x0 0
> >> W 4 543.049644 3 0xc6400500 0x10010000 0x0 0
> >> R 4 543.049693 3 0xc6400700 0x0 0x0 0
> >> R 4 543.049741 3 0xc6400380 0x0 0x0 0
> >> R 4 543.049797 3 0xc6400384 0x0 0x0 0
> >> R 4 543.049845 3 0xc6400388 0x0 0x0 0
> >> W 4 543.049900 3 0xc6100c80 0x1 0x0 0
> >> R 4 543.049958 3 0xc6100c80 0x0 0x0 0
> >> W 4 543.050009 3 0xc6400500 0x10010001 0x0 0
> >> W 4 543.050150 10 0xc41f04c8 0x1 0x0 0
> >> W 4 543.050175 10 0xc41f04cc 0x4 0x0 0
> >> W 4 543.050282 3 0xc6070000 0x1 0x0 0
> >> R 4 543.050358 3 0xc6070000 0x0 0x0 0
> >> R 4 543.050418 3 0xc661002c 0x370 0x0 0
> >> R 4 543.050462 3 0xc661002c 0x370 0x0 0
> >> W 4 543.050588 10 0xc41f0440 0x1 0x0 0
> >> W 4 543.050614 10 0xc41f0444 0x4 0x0 0
> >> W 4 543.050719 3 0xc6070000 0x1 0x0 0
> >> R 4 543.050793 3 0xc6070000 0x0 0x0 0
> >> W 4 543.050896 10 0xc41f03c0 0x1 0x0 0
> >> W 4 543.050922 10 0xc41f03c4 0x4 0x0 0
> >> W 4 543.051028 3 0xc6070000 0x1 0x0 0
> >> R 4 543.051101 3 0xc6070000 0x0 0x0 0
> >> W 4 543.051227 10 0xc41f05e0 0x1 0x0 0
> >> W 4 543.051253 10 0xc41f05e4 0x4 0x0 0
> >> W 4 543.051360 3 0xc6070000 0x1 0x0 0
> >> R 4 543.051434 3 0xc6070000 0x0 0x0 0
> >> W 4 543.051529 10 0xc41f0200 0x1 0x0 0
> >> W 4 543.051554 10 0xc41f0204 0x4 0x0 0
> >> W 4 543.051659 3 0xc6070000 0x1 0x0 0
> >> R 4 543.051732 3 0xc6070000 0x0 0x0 0
> >> W 4 543.051784 10 0xc439e000 0x7e 0x0 0
> >> W 4 543.051807 10 0xc439e004 0x7e 0x0 0
> >> W 4 543.051829 10 0xc439e008 0x1 0x0 0
> >> W 4 543.051851 10 0xc439e00c 0x2 0x0 0
> >> W 4 543.051926 3 0xc6070000 0x1 0x0 0
> >> R 4 543.051999 3 0xc6070000 0x0 0x0 0
> >> W 4 543.052158 3 0xc60032f4 0x1ff64 0x0 0
> >> W 4 543.052228 3 0xc60032ec 0x4 0x0 0
> >> R 4 543.052296 3 0xc60032ec 0x4 0x0 0
> >> R 4 543.052377 3 0xc6002504 0x0 0x0 0
> >> W 4 543.052451 3 0xc6002504 0x1 0x0 0
> >> R 4 543.052745 3 0xc6000100 0x0 0x0 0
> >> R 4 543.052849 3 0xc6002080 0x0 0x0 0
> >> R 4 543.053007 3 0xc6003220 0xd06191 0x0 0
> >> R 4 543.053075 3 0xc6003250 0x90000001 0x0 0
> >> R 4 543.053154 3 0xc6002504 0x11 0x0 0
> >> R 4 543.053226 3 0xc6002508 0x340 0x0 0
> >> R 4 543.053295 3 0xc6003220 0xd06191 0x0 0
> >> R 4 543.053365 3 0xc6003250 0x90000001 0x0 0
> >> R 4 543.053444 3 0xc6000200 0xdff3d113 0x0 0
> >> R 4 543.053516 3 0xc600251c 0x3f 0x0 0
> >> R 4 543.053581 3 0xc640032c 0x8001fd9a 0x0 0
> >> R 4 543.053630 3 0xc640032c 0x8001fd9a 0x0 0
> >> W 4 543.053678 3 0xc640032c 0x1fd9a 0x0 0
> >> R 4 543.053753 3 0xc60032f0 0x3 0x0 0
> >> W 4 543.053843 3 0xc60032f0 0x7f 0x0 0
> >> R 4 543.053921 3 0xc6003220 0xd06191 0x0 0
> >> W 4 543.053990 3 0xc6003220 0xd06191 0x0 0
> >> R 4 543.054054 3 0xc6002504 0x11 0x0 0
> >> W 4 543.054123 3 0xc6002504 0x10 0x0 0
> >> R 4 543.054195 3 0xc600260c 0x801fd99f 0x0 0
> >> W 4 543.054268 3 0xc600260c 0x1ff68 0x0 0
> >> W 4 543.054371 10 0xc43cdd10 0x0 0x0 0
> >> W 4 543.054393 10 0xc43cdd14 0x0 0x0 0
> >> W 4 543.054415 10 0xc43cdd18 0x0 0x0 0
> >> W 4 543.054437 10 0xc43cdd1c 0x0 0x0 0
> >> W 4 543.054460 10 0xc43cdd20 0x0 0x0 0
> >> W 4 543.054482 10 0xc43cdd24 0x0 0x0 0
> >> W 4 543.054504 10 0xc43cdd28 0x0 0x0 0
> >> W 4 543.054526 10 0xc43cdd2c 0x0 0x0 0
> >> W 4 543.054549 10 0xc43cdd30 0x0 0x0 0
> >> W 4 543.054571 10 0xc43cdd34 0x0 0x0 0
> >> W 4 543.054593 10 0xc43cdd38 0x0 0x0 0
> >> W 4 543.054616 10 0xc43cdd3c 0x0 0x0 0
> >> W 4 543.054638 10 0xc43cdd40 0x0 0x0 0
> >> W 4 543.054660 10 0xc43cdd44 0x0 0x0 0
> >> W 4 543.054823 3 0xc6070000 0x1 0x0 0
> >> R 4 543.054921 3 0xc6070000 0x0 0x0 0
> >>
> >
> > This chunk comes after it, very similar to the one before it. But i
> > forgot to add it.
> >
> > W 4 543.055001 3 0xc6100c80 0x50001 0x0 0
> > R 4 543.055059 3 0xc6100c80 0x50000 0x0 0
> > R 4 543.055111 3 0xc6400500 0x10010001 0x0 0
> > R 4 543.055159 3 0xc6400500 0x10010001 0x0 0
> > W 4 543.055207 3 0xc6400500 0x10010000 0x0 0
> > R 4 543.055256 3 0xc6400700 0x0 0x0 0
> > R 4 543.055304 3 0xc6400380 0x0 0x0 0
> > R 4 543.055352 3 0xc6400384 0x0 0x0 0
> > R 4 543.055400 3 0xc6400388 0x0 0x0 0
> > W 4 543.055454 3 0xc6100c80 0x1 0x0 0
> > R 4 543.055511 3 0xc6100c80 0x0 0x0 0
> > W 4 543.055562 3 0xc6400500 0x10010001 0x0 0
> > W 4 543.055657 3 0xc600260c 0x1ff680 0x0 0
> > W 4 543.055745 3 0xc6000140 0x1 0x0 0
> > W 4 543.055954 3 0xc6000140 0x0 0x0 0
> > W 4 543.055996 10 0xc43cdd48 0x0 0x0 0
> > W 4 543.056019 10 0xc43cdd4c 0x0 0x0 0
> > W 4 543.056041 10 0xc43cdd50 0x0 0x0 0
> > W 4 543.056064 10 0xc43cdd54 0x0 0x0 0
> > W 4 543.056167 3 0xc6070000 0x1 0x0 0
> > R 4 543.056246 3 0xc6070000 0x0 0x0 0
> >
next prev parent reply other threads:[~2010-01-06 22:17 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-02 15:36 [PATCH/TESTING(all hw)/DISCUSSION] FIFO (minor) create and (major) destroy instabilities on nv50+ Maarten Maathuis
[not found] ` <6d4bc9fc1001020736r4b17971ftb5e7c718433df181-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-02 15:39 ` Maarten Maathuis
[not found] ` <6d4bc9fc1001020739y57ad5e81u19df23fd127350bb-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-03 0:37 ` Johannes Obermayr
[not found] ` <201001030137.19767.johannesobermayr-Mmb7MZpHnFY@public.gmane.org>
2010-01-03 2:45 ` Maarten Maathuis
2010-01-04 19:29 ` Maarten Maathuis
[not found] ` <6d4bc9fc1001041129t5ac01715oe64f3e827c01340b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-04 22:39 ` Ben Skeggs
2010-01-04 22:54 ` Maarten Maathuis
[not found] ` <6d4bc9fc1001041454w63d62e7fk7dec9aa2922462f8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-05 3:20 ` Ben Skeggs
2010-01-05 8:41 ` Maarten Maathuis
[not found] ` <6d4bc9fc1001050041w3cefcaacs287d6c1909c182d0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-05 21:19 ` Maarten Maathuis
[not found] ` <6d4bc9fc1001051319l27b5a227ua81dabb98d7a6289-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-05 21:21 ` Maarten Maathuis
2010-01-05 22:55 ` Maarten Maathuis
[not found] ` <6d4bc9fc1001051455y301526cwaa935e8dd1956231-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-06 17:58 ` Maarten Maathuis
[not found] ` <6d4bc9fc1001060958q3b7d0d5dka7bcd3843584d6e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-06 22:17 ` Ben Skeggs [this message]
2010-01-04 22:42 ` okias
[not found] ` <c2673ca61001041442r1cd832cdme5a202b40b173bf0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-10 20:58 ` Marcin Slusarz
2010-01-04 22:58 ` Xavier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1262816245.2485.3.camel@nisroch \
--to=skeggsb-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=madman2003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.