public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Gore, Tim" <tim.gore@intel.com>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"Wood, Thomas" <thomas.wood@intel.com>
Subject: Re: [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings mode bits
Date: Tue, 14 Jul 2015 10:08:31 +0200	[thread overview]
Message-ID: <20150714080831.GG3736@phenom.ffwll.local> (raw)
In-Reply-To: <8FCC70911F3E9548866CA0E51893BCC32F96937D@irsmsx105.ger.corp.intel.com>

On Mon, Jul 13, 2015 at 04:07:14PM +0000, Gore, Tim wrote:
> 
> 
> Tim Gore 
> Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> > Vetter
> > Sent: Monday, July 13, 2015 3:59 PM
> > To: Gore, Tim
> > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; Wood, Thomas
> > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings
> > mode bits
> > 
> > On Mon, Jul 13, 2015 at 09:43:11AM +0000, Gore, Tim wrote:
> > >
> > >
> > > Tim Gore
> > > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon
> > > SN3 1RJ
> > >
> > > > -----Original Message-----
> > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> > > > Daniel Vetter
> > > > Sent: Monday, July 13, 2015 10:30 AM
> > > > To: Gore, Tim
> > > > Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas
> > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes
> > > > to stop_rings mode bits
> > > >
> > > > On Fri, Jul 10, 2015 at 02:06:06PM +0100, tim.gore@intel.com wrote:
> > > > > From: Tim Gore <tim.gore@intel.com>
> > > > >
> > > > > In function igt_set_stop_rings, the test
> > > > >   igt_assert_f(flags == 0 || current == 0, ..
> > > > >
> > > > > will fail if we are trying to force a hang but the
> > > > > STOP_RINGS_ALLOW_BAN or STOP_RINGS_ALLOW_ERROR bit is set.
> > > > > With the introduction of per ring resets in the driver (in
> > > > > android) these bits do not get cleared to zero when an individual
> > > > > ring is reset. This causes subsequent attempt to cause a ring hang
> > > > > via this function to fail, leading to several igt tests failing
> > > > > (ie gem_reset_stats subtest ban-xxx etc).
> > > >
> > > > Fix tdr to reset these instead?
> > > > -Daniel
> > > >
> > > I could change tdr, but why. When the TDR handles a ring hang and
> > > resets the ring, why would it modify the flag that defines if the
> > > driver should ban a frequently hanging context? If we get rid of the
> > > stop_rings interface, as Chris Wilson suggested, we would still need
> > > to keep the STOP_RING_ALLOW_BAN/ALLOW_ERRORS bits in debugfs, but
> > you
> > > would not expect to have to re-write these bits each time there is a ring
> > reset.
> > 
> > The fix current hang recover code to no reset this, add some grace period,
> > then push this patch to igt. We don't have full-blown abi guarantees for
> > debugfs/igt stuff, but I want at least a few months (really last released
> > kernel&igt) of backwards/forward compatibility. And inconsistent behaviour
> > isn't great imo.
> > -Daniel
> 
> Sorry Daniel, I didn't really follow that.
> I didn't want a gpu reset to clear the ALLOW_BAN bit, since this will defeat the
> point of this bit, except perhaps in test situations where you can keep setting it
> each time you deliberately cause a hang. It seems like the ALLOW_BAN bit has
> uses in real world situations, although I don't know it anyone uses it.

ALLOW_BAN is only for testing. It's meant to re-enable auto-banning
because we disable that by default for automated tests. This is all meant
to be used together with the stop_rings stuff only.

> Would you be more comfortable with
> Igt_assert_f ( ( flags ==0 ) || 
>      (( current & STOP_RING_ALL) ==0)  && ((current ^ flags) & ~ STOP_RING_ALL == 0 ) )
> 
> So the either the new flags must be 0 (currently allowed) or the existing flags must
> indicate that  all hangs are cleared (0 except possibly the mode bits) AND the mode
> bits you are writing are the same as the current values. ??

Iirc gem_reset_stats uses stop_rings only to supress gpu reset warnings in
dmesg (since they're expected) and not for the actual stop_rings logic. It
sets stop_rings after submitting the hanging batch, but before that one is
detected as hung. We could just add another bit to stop_rings for that
case.

What I still don't understand is why tdr can't just keep on properly
resetting stop_rings. It'll break tons of existing tests, and I don't
understand the upside. stop_rings has become quasi-abi (that's why the
ALLOW bits have such funky semantics, it's for backwards compat), if you
need to change it you need to extend it, not break it.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-07-14  8:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-10 13:06 [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings mode bits tim.gore
2015-07-13  9:30 ` Daniel Vetter
2015-07-13  9:43   ` Gore, Tim
2015-07-13 14:59     ` Daniel Vetter
2015-07-13 16:07       ` Gore, Tim
2015-07-14  8:08         ` Daniel Vetter [this message]
2015-07-14  8:55           ` Gore, Tim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150714080831.GG3736@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=thomas.wood@intel.com \
    --cc=tim.gore@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox