* [PATCH] drm/i915: Fix use-after-free in do_switch
@ 2013-12-05 14:42 Daniel Vetter
2013-12-05 16:28 ` [Intel-gfx] " Barbalho, Rafael
2013-12-06 10:05 ` Lister, Ian
0 siblings, 2 replies; 4+ messages in thread
From: Daniel Vetter @ 2013-12-05 14:42 UTC (permalink / raw)
To: Intel Graphics Development
Cc: Daniel Vetter, Ian Lister, stable, Ben Widawsky,
Stéphane Marchesin, Bloomfield, Jon
So apparently under ridiculous amounts of memory pressure we can get
into trouble in do_switch when we try to move the old hw context
backing storage object onto the active lists.
With list debugging enabled that usually results in us chasing a
poisoned pointer - which means we've hit upon a vma that has been
removed from all lrus with list_del (and then deallocated, so it's a
real use-after free).
Ian Lister has done some great callchain chasing and noticed that we
can reenter do_switch:
i915_gem_do_execbuffer()
i915_switch_context()
do_switch()
from = ring->last_context;
i915_gem_object_pin()
i915_gem_object_bind_to_gtt()
ret = drm_mm_insert_node_in_range_generic();
// If the above call fails then it will try i915_gem_evict_something()
// If that fails it will call i915_gem_evict_everything() ...
i915_gem_evict_everything()
i915_gpu_idle()
i915_switch_context(DEFAULT_CONTEXT)
Like with everything else where the shrinker or eviction code can
invalidate pointers we need to reload relevant state.
Note that there's no need to recheck whether a context switch is still
required because:
- Doing a switch to the same context is harmless (besides wasting a
bit of energy).
- This can only happen with the default context. But since that one's
pinned we'll never call down into evict_everything under normal
circumstances. Note that there's a little driver bringup fun
involved namely that we could recourse into do_switch for the
initial switch. Atm we're fine since we assign the context pointer
only after the call to do_switch at driver load or resume time. And
in the gpu reset case we skip the entire setup sequence (which might
be a bug on its own, but definitely not this one here).
Cc'ing stable since apparently ChromeOS guys are seeing this in the
wild (and not just on artificial stress tests), see the reference.
Note that in upstream code doesn't calle evict_everything directly
from evict_something, that's an extension in this product branch. But
we can still hit upon this bug (and apparently we do, see the linked
backtraces). I've noticed this while trying to construct a testcase
for this bug and utterly failed to provoke it. It looks like we need
to driver the system squarly into the lowmem wall and provoke the
shrinker to evict the context object by doing the last-ditch
evict_everything call.
Aside: There's currently no means to get a badly-fragmenting hw
context object away from a bad spot in the upstream code. We should
fix this by at least adding some code to evict_something to handle hw
contexts.
References: https://code.google.com/p/chromium/issues/detail?id=248191
Reported-by: Ian Lister <ian.lister@intel.com>
Cc: Ian Lister <ian.lister@intel.com>
Cc: stable@vger.kernel.org
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Stéphane Marchesin <marcheu@chromium.org>
Cc: Bloomfield, Jon <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 41877045a1a0..2d2877493f61 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -421,11 +421,21 @@ static int do_switch(struct i915_hw_context *to)
if (ret)
return ret;
- /* Clear this page out of any CPU caches for coherent swap-in/out. Note
+ /*
+ * Pin can switch back to the default context if we end up calling into
+ * evict_everything - as a last ditch gtt defrag effort that also
+ * switches to the default context. Hence we need to reload from here.
+ */
+ from = ring->last_context;
+
+ /*
+ * Clear this page out of any CPU caches for coherent swap-in/out. Note
* that thanks to write = false in this call and us not setting any gpu
* write domains when putting a context object onto the active list
* (when switching away from it), this won't block.
- * XXX: We need a real interface to do this instead of trickery. */
+ *
+ * XXX: We need a real interface to do this instead of trickery.
+ */
ret = i915_gem_object_set_to_gtt_domain(to->obj, false);
if (ret) {
i915_gem_object_unpin(to->obj);
--
1.8.4.3
^ permalink raw reply related [flat|nested] 4+ messages in thread* RE: [Intel-gfx] [PATCH] drm/i915: Fix use-after-free in do_switch
2013-12-05 14:42 [PATCH] drm/i915: Fix use-after-free in do_switch Daniel Vetter
@ 2013-12-05 16:28 ` Barbalho, Rafael
2013-12-06 10:05 ` Lister, Ian
1 sibling, 0 replies; 4+ messages in thread
From: Barbalho, Rafael @ 2013-12-05 16:28 UTC (permalink / raw)
To: Daniel Vetter, Intel Graphics Development
Cc: Lister, Ian, stable@vger.kernel.org, Widawsky, Benjamin
> -----Original Message-----
> From: intel-gfx-bounces@lists.freedesktop.org [mailto:intel-gfx-
> bounces@lists.freedesktop.org] On Behalf Of Daniel Vetter
> Sent: Thursday, December 05, 2013 2:43 PM
> To: Intel Graphics Development
> Cc: Lister, Ian; Daniel Vetter; stable@vger.kernel.org; Widawsky, Benjamin
> Subject: [Intel-gfx] [PATCH] drm/i915: Fix use-after-free in do_switch
>
> So apparently under ridiculous amounts of memory pressure we can get into
> trouble in do_switch when we try to move the old hw context backing
> storage object onto the active lists.
>
[SNIP]
>
> Aside: There's currently no means to get a badly-fragmenting hw context
> object away from a bad spot in the upstream code. We should fix this by at
> least adding some code to evict_something to handle hw contexts.
>
> References: https://code.google.com/p/chromium/issues/detail?id=248191
> Reported-by: Ian Lister <ian.lister@intel.com>
> Cc: Ian Lister <ian.lister@intel.com>
> Cc: stable@vger.kernel.org
> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
> Cc: Stéphane Marchesin <marcheu@chromium.org>
> Cc: Bloomfield, Jon <jon.bloomfield@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Rafael Barbalho <rafael.barbalho@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 41877045a1a0..2d2877493f61 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -421,11 +421,21 @@ static int do_switch(struct i915_hw_context *to)
> if (ret)
> return ret;
>
> - /* Clear this page out of any CPU caches for coherent swap-in/out.
> Note
> + /*
> + * Pin can switch back to the default context if we end up calling into
> + * evict_everything - as a last ditch gtt defrag effort that also
> + * switches to the default context. Hence we need to reload from
> here.
> + */
> + from = ring->last_context;
> +
> + /*
> + * Clear this page out of any CPU caches for coherent swap-in/out.
> +Note
> * that thanks to write = false in this call and us not setting any gpu
> * write domains when putting a context object onto the active list
> * (when switching away from it), this won't block.
> - * XXX: We need a real interface to do this instead of trickery. */
> + *
> + * XXX: We need a real interface to do this instead of trickery.
> + */
> ret = i915_gem_object_set_to_gtt_domain(to->obj, false);
> if (ret) {
> i915_gem_object_unpin(to->obj);
> --
> 1.8.4.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] drm/i915: Fix use-after-free in do_switch
2013-12-05 14:42 [PATCH] drm/i915: Fix use-after-free in do_switch Daniel Vetter
2013-12-05 16:28 ` [Intel-gfx] " Barbalho, Rafael
@ 2013-12-06 10:05 ` Lister, Ian
2013-12-06 12:09 ` [Intel-gfx] " Daniel Vetter
1 sibling, 1 reply; 4+ messages in thread
From: Lister, Ian @ 2013-12-06 10:05 UTC (permalink / raw)
To: Daniel Vetter, Intel Graphics Development
Cc: stable@vger.kernel.org, Widawsky, Benjamin
> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch]
> Sent: Thursday, December 05, 2013 2:43 PM
> To: Intel Graphics Development
> Cc: Daniel Vetter; Lister, Ian; stable@vger.kernel.org; Widawsky, Benjamin;
> Stéphane Marchesin; Bloomfield, Jon
> Subject: [PATCH] drm/i915: Fix use-after-free in do_switch
>
> So apparently under ridiculous amounts of memory pressure we can get into
> trouble in do_switch when we try to move the old hw context backing
> storage object onto the active lists.
>
> With list debugging enabled that usually results in us chasing a poisoned
> pointer - which means we've hit upon a vma that has been removed from all
> lrus with list_del (and then deallocated, so it's a real use-after free).
>
> Ian Lister has done some great callchain chasing and noticed that we can
> reenter do_switch:
>
> i915_gem_do_execbuffer()
>
> i915_switch_context()
>
> do_switch()
> from = ring->last_context;
> i915_gem_object_pin()
>
> i915_gem_object_bind_to_gtt()
> ret = drm_mm_insert_node_in_range_generic();
> // If the above call fails then it will try i915_gem_evict_something()
> // If that fails it will call i915_gem_evict_everything() ...
> i915_gem_evict_everything()
> i915_gpu_idle()
> i915_switch_context(DEFAULT_CONTEXT)
>
> Like with everything else where the shrinker or eviction code can invalidate
> pointers we need to reload relevant state.
>
> Note that there's no need to recheck whether a context switch is still
> required because:
>
> - Doing a switch to the same context is harmless (besides wasting a
> bit of energy).
>
> - This can only happen with the default context. But since that one's
> pinned we'll never call down into evict_everything under normal
> circumstances. Note that there's a little driver bringup fun
> involved namely that we could recourse into do_switch for the
> initial switch. Atm we're fine since we assign the context pointer
> only after the call to do_switch at driver load or resume time. And
> in the gpu reset case we skip the entire setup sequence (which might
> be a bug on its own, but definitely not this one here).
>
> Cc'ing stable since apparently ChromeOS guys are seeing this in the wild (and
> not just on artificial stress tests), see the reference.
>
> Note that in upstream code doesn't calle evict_everything directly from
> evict_something, that's an extension in this product branch. But we can still
> hit upon this bug (and apparently we do, see the linked backtraces). I've
> noticed this while trying to construct a testcase for this bug and utterly failed
> to provoke it. It looks like we need to driver the system squarly into the
> lowmem wall and provoke the shrinker to evict the context object by doing
> the last-ditch evict_everything call.
>
> Aside: There's currently no means to get a badly-fragmenting hw context
> object away from a bad spot in the upstream code. We should fix this by at
> least adding some code to evict_something to handle hw contexts.
>
> References: https://code.google.com/p/chromium/issues/detail?id=248191
> Reported-by: Ian Lister <ian.lister@intel.com>
> Cc: Ian Lister <ian.lister@intel.com>
> Cc: stable@vger.kernel.org
> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
> Cc: Stéphane Marchesin <marcheu@chromium.org>
> Cc: Bloomfield, Jon <jon.bloomfield@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ian Lister <ian.lister@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 41877045a1a0..2d2877493f61 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -421,11 +421,21 @@ static int do_switch(struct i915_hw_context *to)
> if (ret)
> return ret;
>
> - /* Clear this page out of any CPU caches for coherent swap-in/out.
> Note
> + /*
> + * Pin can switch back to the default context if we end up calling into
> + * evict_everything - as a last ditch gtt defrag effort that also
> + * switches to the default context. Hence we need to reload from
> here.
> + */
> + from = ring->last_context;
> +
> + /*
> + * Clear this page out of any CPU caches for coherent swap-in/out.
> +Note
> * that thanks to write = false in this call and us not setting any gpu
> * write domains when putting a context object onto the active list
> * (when switching away from it), this won't block.
> - * XXX: We need a real interface to do this instead of trickery. */
> + *
> + * XXX: We need a real interface to do this instead of trickery.
> + */
> ret = i915_gem_object_set_to_gtt_domain(to->obj, false);
> if (ret) {
> i915_gem_object_unpin(to->obj);
> --
> 1.8.4.3
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [Intel-gfx] [PATCH] drm/i915: Fix use-after-free in do_switch
2013-12-06 10:05 ` Lister, Ian
@ 2013-12-06 12:09 ` Daniel Vetter
0 siblings, 0 replies; 4+ messages in thread
From: Daniel Vetter @ 2013-12-06 12:09 UTC (permalink / raw)
To: Lister, Ian
Cc: Daniel Vetter, Intel Graphics Development, stable@vger.kernel.org,
Widawsky, Benjamin
On Fri, Dec 06, 2013 at 10:05:51AM +0000, Lister, Ian wrote:
>
>
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch]
> > Sent: Thursday, December 05, 2013 2:43 PM
> > To: Intel Graphics Development
> > Cc: Daniel Vetter; Lister, Ian; stable@vger.kernel.org; Widawsky, Benjamin;
> > Stéphane Marchesin; Bloomfield, Jon
> > Subject: [PATCH] drm/i915: Fix use-after-free in do_switch
> >
> > So apparently under ridiculous amounts of memory pressure we can get into
> > trouble in do_switch when we try to move the old hw context backing
> > storage object onto the active lists.
> >
> > With list debugging enabled that usually results in us chasing a poisoned
> > pointer - which means we've hit upon a vma that has been removed from all
> > lrus with list_del (and then deallocated, so it's a real use-after free).
> >
> > Ian Lister has done some great callchain chasing and noticed that we can
> > reenter do_switch:
> >
> > i915_gem_do_execbuffer()
> >
> > i915_switch_context()
> >
> > do_switch()
> > from = ring->last_context;
> > i915_gem_object_pin()
> >
> > i915_gem_object_bind_to_gtt()
> > ret = drm_mm_insert_node_in_range_generic();
> > // If the above call fails then it will try i915_gem_evict_something()
> > // If that fails it will call i915_gem_evict_everything() ...
> > i915_gem_evict_everything()
> > i915_gpu_idle()
> > i915_switch_context(DEFAULT_CONTEXT)
> >
> > Like with everything else where the shrinker or eviction code can invalidate
> > pointers we need to reload relevant state.
> >
> > Note that there's no need to recheck whether a context switch is still
> > required because:
> >
> > - Doing a switch to the same context is harmless (besides wasting a
> > bit of energy).
> >
> > - This can only happen with the default context. But since that one's
> > pinned we'll never call down into evict_everything under normal
> > circumstances. Note that there's a little driver bringup fun
> > involved namely that we could recourse into do_switch for the
> > initial switch. Atm we're fine since we assign the context pointer
> > only after the call to do_switch at driver load or resume time. And
> > in the gpu reset case we skip the entire setup sequence (which might
> > be a bug on its own, but definitely not this one here).
> >
> > Cc'ing stable since apparently ChromeOS guys are seeing this in the wild (and
> > not just on artificial stress tests), see the reference.
> >
> > Note that in upstream code doesn't calle evict_everything directly from
> > evict_something, that's an extension in this product branch. But we can still
> > hit upon this bug (and apparently we do, see the linked backtraces). I've
> > noticed this while trying to construct a testcase for this bug and utterly failed
> > to provoke it. It looks like we need to driver the system squarly into the
> > lowmem wall and provoke the shrinker to evict the context object by doing
> > the last-ditch evict_everything call.
> >
> > Aside: There's currently no means to get a badly-fragmenting hw context
> > object away from a bad spot in the upstream code. We should fix this by at
> > least adding some code to evict_something to handle hw contexts.
> >
> > References: https://code.google.com/p/chromium/issues/detail?id=248191
> > Reported-by: Ian Lister <ian.lister@intel.com>
> > Cc: Ian Lister <ian.lister@intel.com>
> > Cc: stable@vger.kernel.org
> > Cc: Ben Widawsky <benjamin.widawsky@intel.com>
> > Cc: Stéphane Marchesin <marcheu@chromium.org>
> > Cc: Bloomfield, Jon <jon.bloomfield@intel.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Reviewed-by: Ian Lister <ian.lister@intel.com>
Picked up for -fixes, thanks for the review.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-12-06 12:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-05 14:42 [PATCH] drm/i915: Fix use-after-free in do_switch Daniel Vetter
2013-12-05 16:28 ` [Intel-gfx] " Barbalho, Rafael
2013-12-06 10:05 ` Lister, Ian
2013-12-06 12:09 ` [Intel-gfx] " Daniel Vetter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox