Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Raag Jadav <raag.jadav@intel.com>
To: Xaver Hugl <xaver.hugl@kde.org>
Cc: airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com,
	rodrigo.vivi@intel.com, jani.nikula@linux.intel.com,
	andriy.shevchenko@linux.intel.com, lina@asahilina.net,
	michal.wajdeczko@intel.com, christian.koenig@amd.com,
	intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com,
	aravind.iddamsetty@linux.intel.com, anshuman.gupta@intel.com,
	alexander.deucher@amd.com, andrealmeid@igalia.com,
	amd-gfx@lists.freedesktop.org, kernel-dev@igalia.com
Subject: Re: [PATCH v10 2/4] drm/doc: Document device wedged event
Date: Wed, 22 Jan 2025 07:22:25 +0200	[thread overview]
Message-ID: <Z5CAkXSOpNiiFbv4@black.fi.intel.com> (raw)
In-Reply-To: <CAFZQkGy3R0TMY5CARPQZF70fdKGLX8GQjV_YQAHzEXsiuNHj+w@mail.gmail.com>

On Tue, Jan 21, 2025 at 02:14:56AM +0100, Xaver Hugl wrote:
> > +It is the responsibility of the consumer to make sure that the device or
> > +its resources are not in use by any process before attempting recovery.
> I'm not convinced this is actually doable in practice, outside of
> killing all apps that aren't the one trying to recover the GPU.
> Is this just about not crashing those processes if they don't handle
> GPU hotunplugs well, about leaks, or something else?

Correct, all of it. And since the compositor is in charge of device resources,
this way it atleast has the opportunity to recover the device and recreate
context without all the userspace violence.

I'm not entirely aware of its feasibility though, perhaps something for the
consumers to experiment.

> > +With IOCTLs blocked and device already 'wedged', all device memory should
> > +be unmapped and file descriptors should be closed to prevent leaks.
> Afaiu from a userspace POV, a rebind is just like a GPU hotunplug +
> hotplug with matching "remove" and "add" udev events. As long as the
> application cleans up resources related to the device when it receives
> the event, there should be no leaks with a normal hotunplug... Is this
> different enough that we can't have the same expectations?

The thing about "remove" event is that it is generated *after* we opt for an
unbind, and at that point it might be already too late if userspace doesn't
get enough time to clean things up while the device is removed with a live
client resulting in unknown consequences.

The idea here is to clean things up *before* we opt for an unbind leaving
no room for side effects.

Raag

  reply	other threads:[~2025-01-22  5:22 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-28 15:37 [PATCH v10 0/4] Introduce DRM device wedged event Raag Jadav
2024-11-28 15:37 ` [PATCH v10 1/4] drm: Introduce " Raag Jadav
2024-11-29 13:40   ` André Almeida
2024-12-02  8:07     ` Raag Jadav
2024-12-03  5:00       ` Raag Jadav
2024-12-03 10:18         ` Christian König
2024-12-04 11:17           ` Raag Jadav
2024-12-11 17:14             ` Maxime Ripard
2024-12-12 10:37               ` Raag Jadav
2024-12-16 16:07                 ` Maxime Ripard
2024-12-12 18:31   ` André Almeida
2024-12-13 13:51     ` Raag Jadav
2024-11-28 15:37 ` [PATCH v10 2/4] drm/doc: Document " Raag Jadav
2024-12-12 18:50   ` André Almeida
2024-12-17  8:42     ` Raag Jadav
2024-12-17 11:57       ` André Almeida
2025-01-21  1:14   ` Xaver Hugl
2025-01-22  5:22     ` Raag Jadav [this message]
2025-01-27 10:23       ` Pekka Paalanen
2025-01-28  9:37         ` Raag Jadav
2025-01-28 11:38           ` Pekka Paalanen
2025-01-29  7:04             ` Raag Jadav
2024-11-28 15:37 ` [PATCH v10 3/4] drm/xe: Use " Raag Jadav
2024-11-28 15:37 ` [PATCH v10 4/4] drm/i915: " Raag Jadav
2024-11-28 15:52 ` ✓ CI.Patch_applied: success for Introduce DRM device wedged event (rev8) Patchwork
2024-11-28 15:52 ` ✗ CI.checkpatch: warning " Patchwork
2024-11-28 15:53 ` ✓ CI.KUnit: success " Patchwork
2024-11-28 16:11 ` ✓ CI.Build: " Patchwork
2024-11-28 16:13 ` ✓ CI.Hooks: " Patchwork
2024-11-28 16:15 ` ✗ CI.checksparse: warning " Patchwork
2024-11-28 16:33 ` ✓ Xe.CI.BAT: success " Patchwork
2024-11-29  0:07 ` ✗ Xe.CI.Full: failure " Patchwork
2025-01-21  0:59 ` [PATCH v10 0/4] Introduce DRM device wedged event Xaver Hugl
2025-01-22  4:48   ` Raag Jadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5CAkXSOpNiiFbv4@black.fi.intel.com \
    --to=raag.jadav@intel.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=andrealmeid@igalia.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=anshuman.gupta@intel.com \
    --cc=aravind.iddamsetty@linux.intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=kernel-dev@igalia.com \
    --cc=lina@asahilina.net \
    --cc=lucas.demarchi@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona@ffwll.ch \
    --cc=xaver.hugl@kde.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox