From: Jani Nikula <jani.nikula@linux.intel.com>
To: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>,
intel-gfx@lists.freedesktop.org
Cc: Andrzej Hajda <andrzej.hajda@intel.com>,
dri-devel@lists.freedesktop.org,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Chris Wilson <chris.p.wilson@linux.intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915/vma: Fix potential UAF on multi-tile platforms
Date: Mon, 06 Nov 2023 11:53:11 +0200 [thread overview]
Message-ID: <87bkc71a48.fsf@intel.com> (raw)
In-Reply-To: <20231106091603.231100-2-janusz.krzysztofik@linux.intel.com>
On Mon, 06 Nov 2023, Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> wrote:
> Object debugging tools were sporadically reporting illegal attempts to
> free a still active i915 VMA object from when parking a GPU tile believed
> to be idle.
>
> [161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915]
> [161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0
> ...
> [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1
> [161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022
> [161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915]
> [161.360592] RIP: 0010:debug_print_object+0x80/0xb0
> ...
> [161.361347] debug_object_free+0xeb/0x110
> [161.361362] i915_active_fini+0x14/0x130 [i915]
> [161.361866] release_references+0xfe/0x1f0 [i915]
> [161.362543] i915_vma_parked+0x1db/0x380 [i915]
> [161.363129] __gt_park+0x121/0x230 [i915]
> [161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915]
>
> That has been tracked down to be happening when another thread was
> deactivating the VMA inside __active_retire() helper, after the VMA's
> active counter was already decremented to 0, but before deactivation of
> the VMA's object was reported to the object debugging tools. Root cause
> has been identified as premature release of last wakeref for the GPU tile
> to which the active VMA belonged.
>
> In case of single tile platforms, an engine associated with a request that
> uses the VMA is keeping the tile's wakeref long enough for that VMA to be
> deactivated on time, before it is going to be freed. However, on multi-
> tile platforms, a request may use a VMA from a tile other than the one
> that hosts the request's engine, then, not protected with the engine's
> wakeref.
>
> Get an extra wakeref for the VMA's tile when activating it, and put that
> wakeref only after the VMA is deactivated. However, exclude GGTT from
> that processing path, otherwise the GPU never goes idle. Since
> __i915_vma_retire() may be called from atomic contexts, use async variant
> of wakeref put.
>
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/8875
> Fixes: Fixes: 213c43676beb ("drm/i915/mtl: Remove the 'force_probe' requirement for Meteor Lake")
I get the motivation, but this is hardly true. Also, double Fixes.
BR,
Jani.
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_vma.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index d09aad34ba37f..70c68f614c6db 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -34,6 +34,7 @@
> #include "gt/intel_engine.h"
> #include "gt/intel_engine_heartbeat.h"
> #include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> #include "gt/intel_gt_requests.h"
> #include "gt/intel_tlb.h"
>
> @@ -103,12 +104,25 @@ static inline struct i915_vma *active_to_vma(struct i915_active *ref)
>
> static int __i915_vma_active(struct i915_active *ref)
> {
> - return i915_vma_tryget(active_to_vma(ref)) ? 0 : -ENOENT;
> + struct i915_vma *vma = active_to_vma(ref);
> +
> + if (!i915_vma_tryget(vma))
> + return -ENOENT;
> +
> + if (!i915_vma_is_ggtt(vma))
> + intel_gt_pm_get(vma->vm->gt);
> +
> + return 0;
> }
>
> static void __i915_vma_retire(struct i915_active *ref)
> {
> - i915_vma_put(active_to_vma(ref));
> + struct i915_vma *vma = active_to_vma(ref);
> +
> + if (!i915_vma_is_ggtt(vma))
> + intel_gt_pm_put_async(vma->vm->gt);
> +
> + i915_vma_put(vma);
> }
>
> static struct i915_vma *
--
Jani Nikula, Intel
WARNING: multiple messages have this Message-ID (diff)
From: Jani Nikula <jani.nikula@linux.intel.com>
To: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>,
intel-gfx@lists.freedesktop.org
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
Andrzej Hajda <andrzej.hajda@intel.com>,
dri-devel@lists.freedesktop.org,
Andi Shyti <andi.shyti@linux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Chris Wilson <chris.p.wilson@linux.intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [PATCH] drm/i915/vma: Fix potential UAF on multi-tile platforms
Date: Mon, 06 Nov 2023 11:53:11 +0200 [thread overview]
Message-ID: <87bkc71a48.fsf@intel.com> (raw)
In-Reply-To: <20231106091603.231100-2-janusz.krzysztofik@linux.intel.com>
On Mon, 06 Nov 2023, Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> wrote:
> Object debugging tools were sporadically reporting illegal attempts to
> free a still active i915 VMA object from when parking a GPU tile believed
> to be idle.
>
> [161.359441] ODEBUG: free active (active state 0) object: ffff88811643b958 object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915]
> [161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 debug_print_object+0x80/0xb0
> ...
> [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1
> [161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022
> [161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915]
> [161.360592] RIP: 0010:debug_print_object+0x80/0xb0
> ...
> [161.361347] debug_object_free+0xeb/0x110
> [161.361362] i915_active_fini+0x14/0x130 [i915]
> [161.361866] release_references+0xfe/0x1f0 [i915]
> [161.362543] i915_vma_parked+0x1db/0x380 [i915]
> [161.363129] __gt_park+0x121/0x230 [i915]
> [161.363515] ____intel_wakeref_put_last+0x1f/0x70 [i915]
>
> That has been tracked down to be happening when another thread was
> deactivating the VMA inside __active_retire() helper, after the VMA's
> active counter was already decremented to 0, but before deactivation of
> the VMA's object was reported to the object debugging tools. Root cause
> has been identified as premature release of last wakeref for the GPU tile
> to which the active VMA belonged.
>
> In case of single tile platforms, an engine associated with a request that
> uses the VMA is keeping the tile's wakeref long enough for that VMA to be
> deactivated on time, before it is going to be freed. However, on multi-
> tile platforms, a request may use a VMA from a tile other than the one
> that hosts the request's engine, then, not protected with the engine's
> wakeref.
>
> Get an extra wakeref for the VMA's tile when activating it, and put that
> wakeref only after the VMA is deactivated. However, exclude GGTT from
> that processing path, otherwise the GPU never goes idle. Since
> __i915_vma_retire() may be called from atomic contexts, use async variant
> of wakeref put.
>
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/8875
> Fixes: Fixes: 213c43676beb ("drm/i915/mtl: Remove the 'force_probe' requirement for Meteor Lake")
I get the motivation, but this is hardly true. Also, double Fixes.
BR,
Jani.
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_vma.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index d09aad34ba37f..70c68f614c6db 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -34,6 +34,7 @@
> #include "gt/intel_engine.h"
> #include "gt/intel_engine_heartbeat.h"
> #include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> #include "gt/intel_gt_requests.h"
> #include "gt/intel_tlb.h"
>
> @@ -103,12 +104,25 @@ static inline struct i915_vma *active_to_vma(struct i915_active *ref)
>
> static int __i915_vma_active(struct i915_active *ref)
> {
> - return i915_vma_tryget(active_to_vma(ref)) ? 0 : -ENOENT;
> + struct i915_vma *vma = active_to_vma(ref);
> +
> + if (!i915_vma_tryget(vma))
> + return -ENOENT;
> +
> + if (!i915_vma_is_ggtt(vma))
> + intel_gt_pm_get(vma->vm->gt);
> +
> + return 0;
> }
>
> static void __i915_vma_retire(struct i915_active *ref)
> {
> - i915_vma_put(active_to_vma(ref));
> + struct i915_vma *vma = active_to_vma(ref);
> +
> + if (!i915_vma_is_ggtt(vma))
> + intel_gt_pm_put_async(vma->vm->gt);
> +
> + i915_vma_put(vma);
> }
>
> static struct i915_vma *
--
Jani Nikula, Intel
next prev parent reply other threads:[~2023-11-06 9:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-06 9:16 [Intel-gfx] [PATCH] drm/i915/vma: Fix potential UAF on multi-tile platforms Janusz Krzysztofik
2023-11-06 9:16 ` Janusz Krzysztofik
2023-11-06 9:53 ` Jani Nikula [this message]
2023-11-06 9:53 ` Jani Nikula
2023-11-06 10:22 ` [Intel-gfx] " Janusz Krzysztofik
2023-11-06 10:22 ` Janusz Krzysztofik
2023-11-06 12:41 ` [Intel-gfx] " Jani Nikula
2023-11-06 12:41 ` Jani Nikula
2023-11-06 19:16 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for " Patchwork
2023-11-07 9:56 ` Janusz Krzysztofik
2023-11-07 17:24 ` Illipilli, TejasreeX
2023-11-07 16:59 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-11-08 10:03 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bkc71a48.fsf@intel.com \
--to=jani.nikula@linux.intel.com \
--cc=andrzej.hajda@intel.com \
--cc=chris.p.wilson@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=janusz.krzysztofik@linux.intel.com \
--cc=nirmoy.das@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.