From: Francisco Jerez <currojerez@riseup.net>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org, Ben Widawsky <ben@bwidawsk.net>,
Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: Re: [PATCH v2] drm/i915: Restore inhibiting the load of the default context
Date: Thu, 10 Dec 2015 15:57:30 +0200 [thread overview]
Message-ID: <87r3iuqud1.fsf@riseup.net> (raw)
In-Reply-To: <20151210133756.GN29974@nuc-i3427.alporthouse.com>
[-- Attachment #1.1.1: Type: text/plain, Size: 6128 bytes --]
Chris Wilson <chris@chris-wilson.co.uk> writes:
> On Thu, Dec 10, 2015 at 03:24:52PM +0200, Francisco Jerez wrote:
>> Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:
>>
>> > Chris Wilson <chris@chris-wilson.co.uk> writes:
>> >
>> >> Following a GPU reset, we may leave the context in a poorly defined
>> >> state, and reloading from that context will leave the GPU flummoxed. For
>> >> secondary contexts, this will lead to that context being banned - but
>> >> currently it is also causing the default context to become banned,
>> >> leading to turmoil in the shared state.
>> >>
>> >> This is a regression from
>> >>
>> >> commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1]
>> >> Author: Ben Widawsky <benjamin.widawsky@intel.com>
>> >> Date: Mon Mar 16 16:00:58 2015 +0000
>> >>
>> >> drm/i915: Initialize all contexts
>> >>
>> >> which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
>> >> default context.
>> >>
>>
>> AFAICT the removal of MI_RESTORE_INHIBIT in that commit seemed
>> justified. Ben explained that it was needed to fix a pagefault in the
>> default context under certain conditions. I don't know the details of
>> the original change (Ben CC'ed), but it seems like this would be trading
>> one bug for another?
>
> A bug in a feature that never worked and isn't enabled?
>
>> Other than that this opens the door again to subtle state leaks between
>> processes, as I learned recently while implementing L3 partitioning in
>> Mesa. Mesa now changes the L3 configuration in ways that will break
>> assumptions from processes that use the default context (the DDX). The
>> DDX assumes, for instance, that the URB size is set according to the
>> hardware defaults, but it doesn't actually program the URB size itself,
>> so in a way the DDX relies on MI_RESTORE_INHIBIT *not* to be set for the
>> default context for correct operation. This commit breaks that
>> assumption and the kernel ABI.
>
> Wrong the ABI is the other way around and has been for 10 years. Note
> the kernel isn't also ensuring that the default context has the default
> hardware values either. The "golden renderstate" also doesn't set all
> defaults and is also a recent introduction.
>
> It changes the ABI back to what it was....
Sounds like a change fully unrelated to fixing the bug on GPU reset.
The ABI change done in 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b seemed
legitimate because it made the context state on switch more well-defined
than it used to be, IOW it was a backwards-compatible ABI change because
it couldn't possibly break userspace assumptions. This change OTOH
breaks an assumption about the kernel ABI done by the DDX now.
>
>> Mesa has a workaround in place to reduce the likelihood of such leaks,
>> but the solution is far from ideal because it relies on userspace
>> cooperation and had a measurable impact in performance (because it
>> requires userspace to assume the worst-case scenario that the following
>> batch is going to be from a different context with MI_RESTORE_INHIBIT
>> set, so we have to restore the hardware default L3 configuration at the
>> end of every batch even if that's actually not the case), for that
>> reason we would like to drop the userspace workaround in the future at
>> least on v4.1 kernels and newer.
>
> Mesa has workarounds for leaks between hardware contexts, i.e. state
> that is not stored in the context itself. Mesa also has to assume a
> clean slate when setting up the context for the first time, and doesn't
> use the default context itself.
>
No, the workaround involves restoring state which *is* part of the
hardware context, it just won't get saved and restored with this commit
because of the MI_RESTORE_INHIBIT flag when switching to the DDX'
default context.
>> One more question below.
>>
>> >> v2: Mark the global default context as uninitialized on GPU reset so
>> >> that the context-local workarounds are reloaded upon re-enabling.
>> >>
>> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> >
>> > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> >
>> >> Cc: Michel Thierry <michel.thierry@intel.com>
>> >> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> >> ---
>> >> drivers/gpu/drm/i915/i915_gem_context.c | 6 +++++-
>> >> 1 file changed, 5 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>> >> index 43761c5bcaca..f024d5d2c746 100644
>> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> >> @@ -340,6 +340,10 @@ void i915_gem_context_reset(struct drm_device *dev)
>> >> i915_gem_context_unreference(lctx);
>> >> ring->last_context = NULL;
>> >> }
>> >> +
>> >> + /* Force the GPU state to be reinitialised on enabling */
>> >> + if (ring->default_context)
>> >> + ring->default_context->legacy_hw_ctx.initialized = false;
>> >> }
>> >> }
>> >>
>> >> @@ -708,7 +712,7 @@ static int do_switch(struct drm_i915_gem_request *req)
>> >> if (ret)
>> >> goto unpin_out;
>> >>
>> >> - if (!to->legacy_hw_ctx.initialized) {
>> >> + if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
>>
>> This hunk causes MI_RESTORE_INHIBIT to be set again for the default
>> context regardless of whether a reset happened or not, so it seems
>> unrelated to the rest of your change. Maybe I'm understanding this
>> wrong but doesn't the !initialized check together with the hunk above
>> already guarantee that MI_RESTORE_INHIBIT will be set after GPU reset,
>> which is what you wanted to achieve?
>
> This is the change to *restore* the kernel ABI. The flag in reset is to
> force the WA LRI to be re-emitted (not for gen6 ofc) as they will not be
> preserved.
Again you don't justify why changing the kernel ABI is required to fix
a GPU reset bug.
> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 212 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2015-12-10 13:57 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-27 10:07 [PATCH] drm/i915: Restore inhibiting the load of the default context Chris Wilson
2015-11-27 11:32 ` Mika Kuoppala
2015-11-27 13:14 ` Chris Wilson
2015-11-27 13:28 ` [PATCH v2] " Chris Wilson
2015-12-10 10:19 ` Mika Kuoppala
2015-12-10 10:45 ` Daniel Vetter
2016-01-06 10:07 ` Daniel Vetter
2015-12-10 13:24 ` Francisco Jerez
2015-12-10 13:37 ` Chris Wilson
2015-12-10 13:57 ` Francisco Jerez [this message]
2015-12-10 14:03 ` Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r3iuqud1.fsf@riseup.net \
--to=currojerez@riseup.net \
--cc=ben@bwidawsk.net \
--cc=chris@chris-wilson.co.uk \
--cc=daniel.vetter@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox