From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [Intel-gfx] [PATCH] drm/i915: Fix use-after-free in do_switch Date: Fri, 6 Dec 2013 13:09:46 +0100 Message-ID: <20131206120946.GK27344@phenom.ffwll.local> References: <1386254554-26888-1-git-send-email-daniel.vetter@ffwll.ch> <4EDB5F6A29D7384D88373AF45D7F3DF00A606ACC@IRSMSX106.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <4EDB5F6A29D7384D88373AF45D7F3DF00A606ACC@IRSMSX106.ger.corp.intel.com> Sender: stable-owner@vger.kernel.org To: "Lister, Ian" Cc: Daniel Vetter , Intel Graphics Development , "stable@vger.kernel.org" , "Widawsky, Benjamin" List-Id: intel-gfx@lists.freedesktop.org On Fri, Dec 06, 2013 at 10:05:51AM +0000, Lister, Ian wrote: >=20 >=20 > > -----Original Message----- > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] > > Sent: Thursday, December 05, 2013 2:43 PM > > To: Intel Graphics Development > > Cc: Daniel Vetter; Lister, Ian; stable@vger.kernel.org; Widawsky, B= enjamin; > > St=E9phane Marchesin; Bloomfield, Jon > > Subject: [PATCH] drm/i915: Fix use-after-free in do_switch > >=20 > > So apparently under ridiculous amounts of memory pressure we can ge= t into > > trouble in do_switch when we try to move the old hw context backing > > storage object onto the active lists. > >=20 > > With list debugging enabled that usually results in us chasing a po= isoned > > pointer - which means we've hit upon a vma that has been removed fr= om all > > lrus with list_del (and then deallocated, so it's a real use-after = free). > >=20 > > Ian Lister has done some great callchain chasing and noticed that w= e can > > reenter do_switch: > >=20 > > i915_gem_do_execbuffer() > >=20 > > i915_switch_context() > >=20 > > do_switch() > > from =3D ring->last_context; > > i915_gem_object_pin() > >=20 > > i915_gem_object_bind_to_gtt() > > ret =3D drm_mm_insert_node_in_range_generic(); > > // If the above call fails then it will try i915_gem_evict= _something() > > // If that fails it will call i915_gem_evict_everything() = =2E.. > > i915_gem_evict_everything() > > i915_gpu_idle() > > i915_switch_context(DEFAULT_CONTEXT) > >=20 > > Like with everything else where the shrinker or eviction code can i= nvalidate > > pointers we need to reload relevant state. > >=20 > > Note that there's no need to recheck whether a context switch is st= ill > > required because: > >=20 > > - Doing a switch to the same context is harmless (besides wasting a > > bit of energy). > >=20 > > - This can only happen with the default context. But since that one= 's > > pinned we'll never call down into evict_everything under normal > > circumstances. Note that there's a little driver bringup fun > > involved namely that we could recourse into do_switch for the > > initial switch. Atm we're fine since we assign the context pointe= r > > only after the call to do_switch at driver load or resume time. A= nd > > in the gpu reset case we skip the entire setup sequence (which mi= ght > > be a bug on its own, but definitely not this one here). > >=20 > > Cc'ing stable since apparently ChromeOS guys are seeing this in the= wild (and > > not just on artificial stress tests), see the reference. > >=20 > > Note that in upstream code doesn't calle evict_everything directly = from > > evict_something, that's an extension in this product branch. But we= can still > > hit upon this bug (and apparently we do, see the linked backtraces)= =2E I've > > noticed this while trying to construct a testcase for this bug and = utterly failed > > to provoke it. It looks like we need to driver the system squarly i= nto the > > lowmem wall and provoke the shrinker to evict the context object by= doing > > the last-ditch evict_everything call. > >=20 > > Aside: There's currently no means to get a badly-fragmenting hw con= text > > object away from a bad spot in the upstream code. We should fix thi= s by at > > least adding some code to evict_something to handle hw contexts. > >=20 > > References: https://code.google.com/p/chromium/issues/detail?id=3D2= 48191 > > Reported-by: Ian Lister > > Cc: Ian Lister > > Cc: stable@vger.kernel.org > > Cc: Ben Widawsky > > Cc: St=E9phane Marchesin > > Cc: Bloomfield, Jon > > Signed-off-by: Daniel Vetter >=20 > Reviewed-by: Ian Lister Picked up for -fixes, thanks for the review. -Daniel --=20 Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch