From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
To: "Gote, Nitin R" <nitin.r.gote@intel.com>,
Kamil Konieczny <kamil.konieczny@linux.intel.com>
Cc: "igt-dev@lists.freedesktop.org" <igt-dev@lists.freedesktop.org>,
"Auld, Matthew" <matthew.auld@intel.com>
Subject: Re: [PATCH] tests/core_hotunplug: exercise fd-holding subtests with Xe workload
Date: Fri, 08 May 2026 11:04:14 +0200 [thread overview]
Message-ID: <a4fb5c49367356423c3c52e0dbd0da7f010723ea.camel@linux.intel.com> (raw)
In-Reply-To: <SA3PR11MB81181CA7FFDE5B350CDEBD51D03F2@SA3PR11MB8118.namprd11.prod.outlook.com>
Hi Nitin,
On Wed, 2026-05-06 at 14:37 +0000, Gote, Nitin R wrote:
> Hi Kamil,
>
> > -----Original Message-----
> > From: Kamil Konieczny <kamil.konieczny@linux.intel.com>
> > Sent: Wednesday, May 6, 2026 7:48 PM
> > To: Gote, Nitin R <nitin.r.gote@intel.com>
> > Cc: igt-dev@lists.freedesktop.org; Auld, Matthew <matthew.auld@intel.com>;
> > Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > Subject: Re: [PATCH] tests/core_hotunplug: exercise fd-holding subtests with Xe
> > workload
> >
> > Hi Nitin,
> > On 2026-05-06 at 19:27:15 +0530, Nitin Gote wrote:
> > > The six fd-holding subtests (hotunbind-rebind, hotunplug-rescan,
> > > hotrebind, hotreplug, hotrebind-lateclose, hotreplug-lateclose) keep a
> > > DRM fd open across unbind/unplug but leave the GPU idle, so on Xe the
> > > unbind/unplug path is never exercised against an active VM, exec queue
> > > and BOs.
> > >
> > > Add xe_workload_start()/xe_workload_stop() helpers, gated on
> > > is_xe_device(), that run a spinner on the GPU across the unbind/unplug
> > > sequence, so the device is removed while a workload is using it. i915
> > > already has equivalent coverage via local_i915_healthcheck().
> > >
> > > Validated on BMG with a KASAN-enabled kernel: all subtests pass with
> > > no KASAN reports.
> > >
> > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > Assisted-by: Copilot:claude-opus-4.7
> > > Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
> > > ---
> > > tests/core_hotunplug.c | 74
> > > ++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 74 insertions(+)
> > >
> > > diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c index
> > > 7c9dae1bf..88a2133af 100644
> > > --- a/tests/core_hotunplug.c
> > > +++ b/tests/core_hotunplug.c
> > > @@ -565,6 +565,44 @@ static void set_filter_from_device(int fd)
> > > igt_assert_eq(igt_device_filter_add(filter), 1); }
> > >
> > > +/* Xe GPU workload helpers */
> > > +
> > > +struct xe_workload {
> > > + igt_spin_t *xe_spin;
> > > + uint64_t xe_ahnd;
> > > +};
> > > +
> > > +static void xe_workload_start(struct hotunplug *priv, int fd,
> > > + struct xe_workload *w)
> > > +{
> > > + if (!is_xe_device(fd))
> > > + return;
> > > +
> > > + local_debug("%s\n", "starting Xe GPU spinner");
> > > + priv->failure = "Xe workload start failure!";
> > > + w->xe_ahnd = intel_allocator_open(fd, 0, INTEL_ALLOCATOR_RELOC);
> > > + w->xe_spin = igt_spin_new(fd, .ahnd = w->xe_ahnd);
> > > + priv->failure = NULL;
> > > +}
> > > +
> > > +static void xe_workload_stop(struct xe_workload *w) {
> > > + if (!w->xe_spin)
> > > + return;
> > > +
> > > + /*
> > > + * The device may be gone (unbind/unplug), so avoid igt_spin_free():
> > > + * on Xe it asserts in xe_spin_sync_wait() because the syncobj will
> > > + * never signal. Stop the spinner via a userspace mmap write and let
> > > + * the DRM fd close reclaim the kernel-side resources.
> > > + */
> > > + local_debug("%s\n", "stopping Xe GPU spinner");
> > > + igt_spin_end(w->xe_spin);
> > > + put_ahnd(w->xe_ahnd);
> > > + w->xe_spin = NULL;
> > > + w->xe_ahnd = 0;
> > > +}
> > > +
> > > /* Subtests */
> > >
> > > static void unbind_rebind(struct hotunplug *priv) @@ -591,12 +629,18
> > > @@ static void unplug_rescan(struct hotunplug *priv)
> > >
> > > static void hotunbind_rebind(struct hotunplug *priv) {
> > > + struct xe_workload w = { 0 };
> > > +
> > > pre_check(priv);
> > >
> > > priv->fd.drm = local_drm_open_driver(false, "", " for hot unbind");
> > >
> > > + xe_workload_start(priv, priv->fd.drm, &w);
> > > +
> > > driver_unbind(priv, "hot ", 0);
> > >
> > > + xe_workload_stop(&w);
> > > +
> >
> > This makes these hot* tests Xe driver only which is not a way to go.
> > +Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> >
>
> Thank you for the review, but it is not the case.
> xe_workload_start() checks is_xe_device(fd) and returns immediately on non-Xe drivers, leaving w->xe_spin NULL;
> and xe_workload_stop() then returns early at the !w->xe_spin check.
> So, on i915 the hot* tests work exactly as before.
1. Please don't change scope of existing subtests. If you need
to test hot unplug with a background workload then please add a new
dedicated subtest.
2. Please keep the test hardware agnostic. That also means, don't call
unconditionally functions which names suggest applicability to a
specific platform. That's misleading. You may e.g. follow a code
pattern now used when a platform specific healtcheck is called.
Thanks,
Janusz
>
> - Nitin
>
> > Regards,
> > Kamil
> >
> > > priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
> > > igt_assert_eq(priv->fd.drm, -1);
> > >
> > > @@ -607,12 +651,18 @@ static void hotunbind_rebind(struct hotunplug
> > > *priv)
> > >
> > > static void hotunplug_rescan(struct hotunplug *priv) {
> > > + struct xe_workload w = { 0 };
> > > +
> > > pre_check(priv);
> > >
> > > priv->fd.drm = local_drm_open_driver(false, "", " for hot unplug");
> > >
> > > + xe_workload_start(priv, priv->fd.drm, &w);
> > > +
> > > device_unplug(priv, "hot ", 0);
> > >
> > > + xe_workload_stop(&w);
> > > +
> > > priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
> > > igt_assert_eq(priv->fd.drm, -1);
> > >
> > > @@ -623,40 +673,58 @@ static void hotunplug_rescan(struct hotunplug
> > > *priv)
> > >
> > > static void hotrebind(struct hotunplug *priv) {
> > > + struct xe_workload w = { 0 };
> > > +
> > > pre_check(priv);
> > >
> > > priv->fd.drm = local_drm_open_driver(false, "", " for hot rebind");
> > >
> > > + xe_workload_start(priv, priv->fd.drm, &w);
> > > +
> > > driver_unbind(priv, "hot ", 60);
> > >
> > > driver_bind(priv, 0);
> > >
> > > + xe_workload_stop(&w);
> > > +
> > > igt_assert_f(healthcheck(priv, false), "%s\n", priv->failure); }
> > >
> > > static void hotreplug(struct hotunplug *priv) {
> > > + struct xe_workload w = { 0 };
> > > +
> > > pre_check(priv);
> > >
> > > priv->fd.drm = local_drm_open_driver(false, "", " for hot replug");
> > >
> > > + xe_workload_start(priv, priv->fd.drm, &w);
> > > +
> > > device_unplug(priv, "hot ", 60);
> > >
> > > bus_rescan(priv, 0);
> > >
> > > + xe_workload_stop(&w);
> > > +
> > > igt_assert_f(healthcheck(priv, false), "%s\n", priv->failure); }
> > >
> > > static void hotrebind_lateclose(struct hotunplug *priv) {
> > > + struct xe_workload w = { 0 };
> > > +
> > > pre_check(priv);
> > >
> > > priv->fd.drm = local_drm_open_driver(false, "", " for hot rebind");
> > >
> > > + xe_workload_start(priv, priv->fd.drm, &w);
> > > +
> > > driver_unbind(priv, "hot ", 60);
> > >
> > > driver_bind(priv, 0);
> > >
> > > + xe_workload_stop(&w);
> > > +
> > > priv->fd.drm = close_device(priv->fd.drm, "late ", "unbound ");
> > > igt_assert_eq(priv->fd.drm, -1);
> > >
> > > @@ -665,14 +733,20 @@ static void hotrebind_lateclose(struct hotunplug
> > > *priv)
> > >
> > > static void hotreplug_lateclose(struct hotunplug *priv) {
> > > + struct xe_workload w = { 0 };
> > > +
> > > pre_check(priv);
> > >
> > > priv->fd.drm = local_drm_open_driver(false, "", " for hot replug");
> > >
> > > + xe_workload_start(priv, priv->fd.drm, &w);
> > > +
> > > device_unplug(priv, "hot ", 60);
> > >
> > > bus_rescan(priv, 0);
> > >
> > > + xe_workload_stop(&w);
> > > +
> > > priv->fd.drm = close_device(priv->fd.drm, "late ", "removed ");
> > > igt_assert_eq(priv->fd.drm, -1);
> > >
> > > --
> > > 2.50.1
> > >
next prev parent reply other threads:[~2026-05-08 9:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 13:57 [PATCH] tests/core_hotunplug: exercise fd-holding subtests with Xe workload Nitin Gote
2026-05-06 14:18 ` Kamil Konieczny
2026-05-06 14:37 ` Gote, Nitin R
2026-05-08 9:04 ` Janusz Krzysztofik [this message]
2026-05-12 11:02 ` Gote, Nitin R
2026-05-06 17:08 ` ✓ Xe.CI.BAT: success for " Patchwork
2026-05-06 17:15 ` ✓ i915.CI.BAT: " Patchwork
2026-05-06 18:15 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-05-06 23:19 ` ✗ i915.CI.Full: " Patchwork
2026-05-07 18:49 ` ✓ i915.CI.BAT: success for tests/core_hotunplug: exercise fd-holding subtests with Xe workload (rev2) Patchwork
2026-05-07 19:12 ` ✓ Xe.CI.BAT: " Patchwork
2026-05-08 6:07 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-05-08 8:24 ` ✓ i915.CI.Full: success " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4fb5c49367356423c3c52e0dbd0da7f010723ea.camel@linux.intel.com \
--to=janusz.krzysztofik@linux.intel.com \
--cc=igt-dev@lists.freedesktop.org \
--cc=kamil.konieczny@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=nitin.r.gote@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox