From: Takashi Iwai <tiwai@suse.de>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx@lists.freedesktop.org
Subject: Re: S4 resume breakage with i915 driver
Date: Thu, 25 Aug 2016 17:54:58 +0200 [thread overview]
Message-ID: <s5h60qozvvx.wl-tiwai@suse.de> (raw)
In-Reply-To: <20160825153241.GD22360@nuc-i3427.alporthouse.com>
On Thu, 25 Aug 2016 17:32:41 +0200,
Chris Wilson wrote:
>
> On Thu, Aug 25, 2016 at 03:11:35PM +0200, Takashi Iwai wrote:
> > Hi,
> >
> > while debugging our 4.4.x based SLE12-SP2 kernel, we noticed that S4
> > resume is broken on many machines with i915 gfx, even on the upstream
> > 4.7 kernel. Originally it was reported by Intel about SKL machines,
> > but later on, we found that it hits on many other older chips (at
> > least HSW), too.
> >
> > This was hard to identify because there have been other S4 resume bugs
> > until recently. But even after these fixes, when the system is tested
> > on i915 gfx, the system gets memory corruption or kernel Oops sooner
> > or later after a few (usually < 10) S4 cycles.
> >
> > As the bug happened between 4.2 and 4.3, I bisected and it pointed to
> > the commit:
> >
> > 4c436d55b279bbc6b02aac02e7dc683fc09f884e
> > drm/i915: Enable Resource Streamer state save/restore on MI_SET_CONTEXT
> >
> > Indeed, reverting this on top of our 4.4.x kernel seems to make S4
> > working stably (at least on a test machine).
> >
> > Does this make any sense to you guys?
>
> It could explain a HSW issue, but not SKL and not BDW by default (using
> execlists).
Hrm, IIRC, SKL didn't work without the commit. But it doesn't matter
if this doesn't has a bad influence on SKL, either.
> All machine big core, no Braswell or other !llc device?
Yeah, tested only on SKL, HSW and IVY. No small boxes.
> > Since the commit message doesn't give a good explanation about this
> > change, I wonder what's the purpose of this commit. Was it merely
> > optimization?
>
> It's a step towards enabling the Resource Streamer hardware block in the
> GPU, kind of a glorified prefetch. Userspace also has to notify the GPU
> to enable it for a batch as well.
>
> The only doubt in mind my about that patch was whether the hw ignored
> the RS_RESTORE_STATE if MI_RESTORE_INHIBIT as expected:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3c97f0e7a003..c618bb86aeb9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -574,10 +574,15 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>
> /* These flags are for resource streamer on HSW+ */
> if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> - flags |= (HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN);
> + flags |= HSW_MI_RS_SAVE_STATE_EN;
> else if (INTEL_GEN(dev_priv) < 8)
> - flags |= (MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN);
> -
> + flags |= MI_SAVE_EXT_STATE_EN;
> + if ((flags & MI_RESTORE_INHIBIT) == 0) {
> + if (IS_HASWELL(dev_priv) || INTEL_GEN(dev_priv) >= 8)
> + flags |= HSW_MI_RS_RESTORE_STATE_EN;
> + else if (INTEL_GEN(dev_priv) < 8)
> + flags |= MI_RESTORE_EXT_STATE_EN;
> + }
>
>
> But this code is only executed for Haswell, and we only inhibit restore
> from uninitialised contexts. The biggest fear is that we are restoring
> garbage from userspace and making the GPU do strange things.
>
> That still doesn't explain stray writes into system memory, that is more
> likely our GTT being broken.
Yeah, that's my expectation. But the bisection didn't reach it.
So maybe this casually surfaced the hidden GTT or cache issue.
> Could you confirm that bisect has any
> impact on the other machines, and of course double check the result?
You're asking bisection on all machines from the scratch for such a
bug taking so long time to reproduce, and especially for i915 code
path, that is known to be deadly difficult due to various merge
commits? I sincerely decline the offer :)
Yes, the result was double-checked. This has a positive effect on all
our tested machines. Maybe But it's hard to tell exactly whether this is
the 100% culprit. As said, there have been multiple S4 bugs, so far.
IVY worked without this patch (after x86 fixes), but obviously this
had no negative effect, either.
thanks,
Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-08-25 15:55 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-25 13:11 S4 resume breakage with i915 driver Takashi Iwai
2016-08-25 15:32 ` Chris Wilson
2016-08-25 15:54 ` Takashi Iwai [this message]
2016-08-25 16:12 ` Chris Wilson
2016-08-25 16:15 ` Takashi Iwai
2016-08-26 9:18 ` Takashi Iwai
2016-08-26 10:25 ` Takashi Iwai
2016-08-26 10:39 ` Chris Wilson
2016-08-26 11:10 ` Imre Deak
2016-08-26 11:42 ` Imre Deak
2016-08-29 13:32 ` Daniel Vetter
2016-08-29 14:09 ` Imre Deak
2016-08-29 14:24 ` Takashi Iwai
2016-08-29 14:54 ` Imre Deak
2016-08-29 15:25 ` Chris Wilson
2016-08-29 16:46 ` Lukas Wunner
2016-08-30 11:43 ` Imre Deak
2016-08-30 11:33 ` Imre Deak
2016-09-02 18:34 ` Dave Gordon
2016-08-26 12:29 ` David Weinehall
2016-08-26 12:38 ` Chris Wilson
2016-08-26 12:54 ` David Weinehall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=s5h60qozvvx.wl-tiwai@suse.de \
--to=tiwai@suse.de \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox