public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Lister, Ian" <ian.lister@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"Widawsky, Benjamin" <benjamin.widawsky@intel.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix use-after-free in do_switch
Date: Fri, 6 Dec 2013 13:09:46 +0100	[thread overview]
Message-ID: <20131206120946.GK27344@phenom.ffwll.local> (raw)
In-Reply-To: <4EDB5F6A29D7384D88373AF45D7F3DF00A606ACC@IRSMSX106.ger.corp.intel.com>

On Fri, Dec 06, 2013 at 10:05:51AM +0000, Lister, Ian wrote:
> 
> 
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch]
> > Sent: Thursday, December 05, 2013 2:43 PM
> > To: Intel Graphics Development
> > Cc: Daniel Vetter; Lister, Ian; stable@vger.kernel.org; Widawsky, Benjamin;
> > Stéphane Marchesin; Bloomfield, Jon
> > Subject: [PATCH] drm/i915: Fix use-after-free in do_switch
> > 
> > So apparently under ridiculous amounts of memory pressure we can get into
> > trouble in do_switch when we try to move the old hw context backing
> > storage object onto the active lists.
> > 
> > With list debugging enabled that usually results in us chasing a poisoned
> > pointer - which means we've hit upon a vma that has been removed from all
> > lrus with list_del (and then deallocated, so it's a real use-after free).
> > 
> > Ian Lister has done some great callchain chasing and noticed that we can
> > reenter do_switch:
> > 
> > i915_gem_do_execbuffer()
> > 
> > i915_switch_context()
> > 
> > do_switch()
> >    from = ring->last_context;
> >    i915_gem_object_pin()
> > 
> >       i915_gem_object_bind_to_gtt()
> >          ret = drm_mm_insert_node_in_range_generic();
> >          // If the above call fails then it will try i915_gem_evict_something()
> >          // If that fails it will call i915_gem_evict_everything() ...
> > 	 i915_gem_evict_everything()
> > 	    i915_gpu_idle()
> > 	       i915_switch_context(DEFAULT_CONTEXT)
> > 
> > Like with everything else where the shrinker or eviction code can invalidate
> > pointers we need to reload relevant state.
> > 
> > Note that there's no need to recheck whether a context switch is still
> > required because:
> > 
> > - Doing a switch to the same context is harmless (besides wasting a
> >   bit of energy).
> > 
> > - This can only happen with the default context. But since that one's
> >   pinned we'll never call down into evict_everything under normal
> >   circumstances. Note that there's a little driver bringup fun
> >   involved namely that we could recourse into do_switch for the
> >   initial switch. Atm we're fine since we assign the context pointer
> >   only after the call to do_switch at driver load or resume time. And
> >   in the gpu reset case we skip the entire setup sequence (which might
> >   be a bug on its own, but definitely not this one here).
> > 
> > Cc'ing stable since apparently ChromeOS guys are seeing this in the wild (and
> > not just on artificial stress tests), see the reference.
> > 
> > Note that in upstream code doesn't calle evict_everything directly from
> > evict_something, that's an extension in this product branch. But we can still
> > hit upon this bug (and apparently we do, see the linked backtraces). I've
> > noticed this while trying to construct a testcase for this bug and utterly failed
> > to provoke it. It looks like we need to driver the system squarly into the
> > lowmem wall and provoke the shrinker to evict the context object by doing
> > the last-ditch evict_everything call.
> > 
> > Aside: There's currently no means to get a badly-fragmenting hw context
> > object away from a bad spot in the upstream code. We should fix this by at
> > least adding some code to evict_something to handle hw contexts.
> > 
> > References: https://code.google.com/p/chromium/issues/detail?id=248191
> > Reported-by: Ian Lister <ian.lister@intel.com>
> > Cc: Ian Lister <ian.lister@intel.com>
> > Cc: stable@vger.kernel.org
> > Cc: Ben Widawsky <benjamin.widawsky@intel.com>
> > Cc: Stéphane Marchesin <marcheu@chromium.org>
> > Cc: Bloomfield, Jon <jon.bloomfield@intel.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Reviewed-by: Ian Lister <ian.lister@intel.com>

Picked up for -fixes, thanks for the review.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

      reply	other threads:[~2013-12-06 12:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-05 14:42 [PATCH] drm/i915: Fix use-after-free in do_switch Daniel Vetter
2013-12-05 16:28 ` [Intel-gfx] " Barbalho, Rafael
2013-12-06 10:05 ` Lister, Ian
2013-12-06 12:09   ` Daniel Vetter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131206120946.GK27344@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=benjamin.widawsky@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=ian.lister@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox