From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
Lucas De Marchi <lucas.demarchi@intel.com>
Subject: Re: [PATCH 1/2] drm/xe: Process deferred GGTT node removals on device unwind
Date: Mon, 23 Jun 2025 17:22:40 -0400 [thread overview]
Message-ID: <aFnFoO-E_m3UJmBL@intel.com> (raw)
In-Reply-To: <20250612220937.857-2-michal.wajdeczko@intel.com>
On Fri, Jun 13, 2025 at 12:09:36AM +0200, Michal Wajdeczko wrote:
> While we are indirectly draining our dedicated workqueue ggtt->wq
> that we use to complete asynchronous removal of some GGTT nodes,
> this happends as part of the managed-drm unwinding (ggtt_fini_early),
> which could be later then manage-device unwinding, where we could
> already unmap our MMIO/GMS mapping (mmio_fini).
>
> This was recently observed during unsuccessful VF initialization:
>
> [ ] xe 0000:00:02.1: probe with driver xe failed with error -62
> [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes)
> [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes)
> [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes)
> [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes)
> [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes)
> [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes)
> [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes)
> [ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin
> [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes)
> [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes)
> [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes)
> [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes)
>
> and this was leading to:
>
> [ ] BUG: unable to handle page fault for address: ffffc900058162a0
> [ ] #PF: supervisor write access in kernel mode
> [ ] #PF: error_code(0x0002) - not-present page
> [ ] Oops: Oops: 0002 [#1] SMP NOPTI
> [ ] Tainted: [W]=WARN
> [ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe]
> [ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe]
> [ ] Call Trace:
> [ ] <TASK>
> [ ] xe_ggtt_clear+0xb0/0x270 [xe]
> [ ] ggtt_node_remove+0xbb/0x120 [xe]
> [ ] ggtt_node_remove_work_func+0x30/0x50 [xe]
> [ ] process_one_work+0x22b/0x6f0
> [ ] worker_thread+0x1e8/0x3d
>
> Add managed-device action that will explicitly drain the workqueue
> with all pending node removals prior to releasing MMIO/GSM mapping.
>
> Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node")
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
> drivers/gpu/drm/xe/xe_ggtt.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index 7b11fa1356f0..a8830cdb185f 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -238,6 +238,13 @@ int xe_ggtt_init_kunit(struct xe_ggtt *ggtt, u32 reserved, u32 size)
> }
> EXPORT_SYMBOL_IF_KUNIT(xe_ggtt_init_kunit);
>
> +static void dev_fini_ggtt(void *arg)
> +{
> + struct xe_ggtt *ggtt = arg;
> +
> + drain_workqueue(ggtt->wq);
> +}
> +
> /**
> * xe_ggtt_init_early - Early GGTT initialization
> * @ggtt: the &xe_ggtt to be initialized
> @@ -290,6 +297,10 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
> if (err)
> return err;
>
> + err = devm_add_action_or_reset(xe->drm.dev, dev_fini_ggtt, ggtt);
> + if (err)
> + return err;
> +
> if (IS_SRIOV_VF(xe)) {
> err = xe_tile_sriov_vf_prepare_ggtt(ggtt->tile);
> if (err)
> --
> 2.47.1
>
next prev parent reply other threads:[~2025-06-23 21:23 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-12 22:09 [PATCH 0/2] Improve handling of aborted probe Michal Wajdeczko
2025-06-12 22:09 ` [PATCH 1/2] drm/xe: Process deferred GGTT node removals on device unwind Michal Wajdeczko
2025-06-23 21:22 ` Rodrigo Vivi [this message]
2025-06-25 14:20 ` Maarten Lankhorst
2025-06-12 22:09 ` [PATCH 2/2] drm/xe/guc: Explicitly exit CT safe mode on unwind Michal Wajdeczko
2025-06-23 21:29 ` Rodrigo Vivi
2025-06-24 15:32 ` Matthew Brost
2025-06-24 21:30 ` Rodrigo Vivi
2025-06-25 7:51 ` Michal Wajdeczko
2025-06-13 5:23 ` ✗ CI.checkpatch: warning for Improve handling of aborted probe Patchwork
2025-06-13 5:24 ` ✓ CI.KUnit: success " Patchwork
2025-06-13 6:51 ` ✓ Xe.CI.BAT: " Patchwork
2025-06-14 10:47 ` ✓ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFnFoO-E_m3UJmBL@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=michal.wajdeczko@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.