public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Takashi Iwai <tiwai@suse.de>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org
Subject: Re: S4 resume breakage with i915 driver
Date: Thu, 25 Aug 2016 17:54:58 +0200	[thread overview]
Message-ID: <s5h60qozvvx.wl-tiwai@suse.de> (raw)
In-Reply-To: <20160825153241.GD22360@nuc-i3427.alporthouse.com>

On Thu, 25 Aug 2016 17:32:41 +0200,
Chris Wilson wrote:
> 
> On Thu, Aug 25, 2016 at 03:11:35PM +0200, Takashi Iwai wrote:
> > Hi,
> > 
> > while debugging our 4.4.x based SLE12-SP2 kernel, we noticed that S4
> > resume is broken on many machines with i915 gfx, even on the upstream
> > 4.7 kernel.  Originally it was reported by Intel about SKL machines,
> > but later on, we found that it hits on many other older chips (at
> > least HSW), too.
> > 
> > This was hard to identify because there have been other S4 resume bugs
> > until recently.  But even after these fixes, when the system is tested
> > on i915 gfx, the system gets memory corruption or kernel Oops sooner
> > or later after a few (usually < 10) S4 cycles.
> > 
> > As the bug happened between 4.2 and 4.3, I bisected and it pointed to
> > the commit:
> > 
> >   4c436d55b279bbc6b02aac02e7dc683fc09f884e
> >     drm/i915: Enable Resource Streamer state save/restore on MI_SET_CONTEXT
> > 
> > Indeed, reverting this on top of our 4.4.x kernel seems to make S4
> > working stably (at least on a test machine).
> > 
> > Does this make any sense to you guys?
> 
> It could explain a HSW issue, but not SKL and not BDW by default (using
> execlists).

Hrm, IIRC, SKL didn't work without the commit.  But it doesn't matter
if this doesn't has a bad influence on SKL, either.

> All machine big core, no Braswell or other !llc device?

Yeah, tested only on SKL, HSW and IVY.  No small boxes.

> > Since the commit message doesn't give a good explanation about this
> > change, I wonder what's the purpose of this commit.  Was it merely
> > optimization?
> 
> It's a step towards enabling the Resource Streamer hardware block in the
> GPU, kind of a glorified prefetch. Userspace also has to notify the GPU
> to enable it for a batch as well.
> 
> The only doubt in mind my about that patch was whether the hw ignored
> the RS_RESTORE_STATE if MI_RESTORE_INHIBIT as expected:
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3c97f0e7a003..c618bb86aeb9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -574,10 +574,15 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>  
>         /* These flags are for resource streamer on HSW+ */
>         if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> -               flags |= (HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN);
> +               flags |= HSW_MI_RS_SAVE_STATE_EN;
>         else if (INTEL_GEN(dev_priv) < 8)
> -               flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
> -
> +               flags |= MI_SAVE_EXT_STATE_EN;
> +       if ((flags & MI_RESTORE_INHIBIT) == 0) {
> +               if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> +                       flags |= HSW_MI_RS_RESTORE_STATE_EN;
> +               else if (INTEL_GEN(dev_priv) < 8)
> +                       flags |= MI_RESTORE_EXT_STATE_EN;
> +       }
>  
> 
> But this code is only executed for Haswell, and we only inhibit restore
> from uninitialised contexts. The biggest fear is that we are restoring
> garbage from userspace and making the GPU do strange things.
> 
> That still doesn't explain stray writes into system memory, that is more
> likely our GTT being broken.

Yeah, that's my expectation.  But the bisection didn't reach it.
So maybe this casually surfaced the hidden GTT or cache issue.

> Could you confirm that bisect has any
> impact on the other machines, and of course double check the result?

You're asking bisection on all machines from the scratch for such a
bug taking so long time to reproduce, and especially for i915 code
path, that is known to be deadly difficult due to various merge
commits?  I sincerely decline the offer :)

Yes, the result was double-checked.  This has a positive effect on all
our tested machines.  Maybe But it's hard to tell exactly whether this is
the 100% culprit.  As said, there have been multiple S4 bugs, so far.
IVY worked without this patch (after x86 fixes), but obviously this
had no negative effect, either.


thanks,

Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-08-25 15:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 13:11 S4 resume breakage with i915 driver Takashi Iwai
2016-08-25 15:32 ` Chris Wilson
2016-08-25 15:54   ` Takashi Iwai [this message]
2016-08-25 16:12     ` Chris Wilson
2016-08-25 16:15       ` Takashi Iwai
2016-08-26  9:18         ` Takashi Iwai
2016-08-26 10:25           ` Takashi Iwai
2016-08-26 10:39             ` Chris Wilson
2016-08-26 11:10               ` Imre Deak
2016-08-26 11:42                 ` Imre Deak
2016-08-29 13:32                   ` Daniel Vetter
2016-08-29 14:09                     ` Imre Deak
2016-08-29 14:24                       ` Takashi Iwai
2016-08-29 14:54                         ` Imre Deak
2016-08-29 15:25                           ` Chris Wilson
2016-08-29 16:46                             ` Lukas Wunner
2016-08-30 11:43                               ` Imre Deak
2016-08-30 11:33                             ` Imre Deak
2016-09-02 18:34                     ` Dave Gordon
2016-08-26 12:29   ` David Weinehall
2016-08-26 12:38     ` Chris Wilson
2016-08-26 12:54       ` David Weinehall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=s5h60qozvvx.wl-tiwai@suse.de \
    --to=tiwai@suse.de \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel.vetter@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox