public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Pekka Paalanen <ppaalanen@gmail.com>
To: "André Almeida" <andrealmeid@igalia.com>
Cc: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, kernel-dev@igalia.com,
	alexander.deucher@amd.com, contactshashanksharma@gmail.com,
	amaranath.somalapuram@amd.com, christian.koenig@amd.com,
	pierre-eric.pelloux-prayer@amd.com,
	"Simon Ser" <contact@emersion.fr>,
	"Rob Clark" <robdclark@gmail.com>,
	"Andrey Grodzovsky" <andrey.grodzovsky@amd.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Daniel Stone" <daniel@fooishbar.org>,
	"'Marek Olšák'" <maraeo@gmail.com>,
	"Dave Airlie" <airlied@gmail.com>,
	"Pierre-Loup A . Griffais" <pgriffais@valvesoftware.com>
Subject: Re: [PATCH v3 0/2] drm: Add GPU reset sysfs
Date: Mon, 28 Nov 2022 11:25:28 +0200	[thread overview]
Message-ID: <20221128112528.1206b1f5@eldfell> (raw)
In-Reply-To: <20221125175203.52481-1-andrealmeid@igalia.com>

[-- Attachment #1: Type: text/plain, Size: 2964 bytes --]

On Fri, 25 Nov 2022 14:52:01 -0300
André Almeida <andrealmeid@igalia.com> wrote:

> This patchset adds a udev event for DRM device's resets.

Hi,

this seems a good idea to me.

> Userspace apps can trigger GPU resets by misuse of graphical APIs or driver
> bugs. Either way, the GPU reset might lead the system to a broken state[1], that
> might be recovered if user has access to a tty or a remote shell. Arguably, this
> recovery could happen automatically by the system itself, thus this is the goal
> of this patchset.
> 
> For debugging and report purposes, device coredump support was already added
> for amdgpu[2], but it's not suitable for programmatic usage like this one given
> the uAPI not being stable and the need for parsing.
> 
> GL/VK is out of scope for this use, giving that we are dealing with device
> resets regardless of API.

I see that the reported PID is intended to be the culprit, the process
that caused the GPU to crash or hang, if identified. Hence, killing
that process perhaps makes sense, even if it could recover on its own
through GL/VK "device lost" mechanism.

"VRAM lost" is interesting. Innocent processes essentially lost the GPU
in that case, I suppose, but that's no reason to kill them and restart
the whole graphics stack outright. Those that actually handle GL/VK
device lost should theoretically be fine, right?

Display servers can make more enlightened decisions on whether they
need to restart or not, if they are implemented to handle that.

The example gpu-resetd [3] behaviour in that case seems sub-optimal.
Could it do better? How would it know, or avoid knowing, which
processes handled the GPU reset fine and which need external restarting?

Maybe gpu-resetd should kill the culprit only if it causes resets
repeatedly? But if the culprit does not handle device lost and also
does not die... how do you know you need to kill it?


Thanks,
pq

> 
> A basic userspace daemon is provided at [3] showing how the interface is used
> to recovery from resets.
> 
> [1] A search for "reset" in DRM/AMD issue tracker shows reports of resets
> making the system unusable:
> https://gitlab.freedesktop.org/drm/amd/-/issues/?search=reset
> 
> [2] https://lore.kernel.org/amd-gfx/20220602081538.1652842-2-Amaranath.Somalapuram@amd.com/
> 
> [3] https://gitlab.freedesktop.org/andrealmeid/gpu-resetd
> 
> v2: https://lore.kernel.org/dri-devel/20220308180403.75566-1-contactshashanksharma@gmail.com/
> 
> André Almeida (1):
>   drm/amdgpu: Add work function for GPU reset event
> 
> Shashank Sharma (1):
>   drm: Add GPU reset sysfs event
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  4 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 ++++++++++++++++++++++
>  drivers/gpu/drm/drm_sysfs.c                | 26 +++++++++++++++++++
>  include/drm/drm_sysfs.h                    | 13 ++++++++++
>  4 files changed, 73 insertions(+)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2022-11-28  9:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-25 17:52 [PATCH v3 0/2] drm: Add GPU reset sysfs André Almeida
2022-11-25 17:52 ` [PATCH v3 1/2] drm: Add GPU reset sysfs event André Almeida
2022-11-28  9:27   ` Pekka Paalanen
2022-11-29 14:07   ` Alex Deucher
2022-11-25 17:52 ` [PATCH v3 2/2] drm/amdgpu: Add work function for GPU reset event André Almeida
2022-11-28  9:25 ` Pekka Paalanen [this message]
2022-11-28  9:30   ` [PATCH v3 0/2] drm: Add GPU reset sysfs Simon Ser
2022-11-30 15:23     ` André Almeida
2022-11-30 15:34       ` Simon Ser
2022-11-30 11:11 ` Daniel Vetter
2022-12-08  4:53   ` Alex Deucher
2023-01-05 12:25     ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221128112528.1206b1f5@eldfell \
    --to=ppaalanen@gmail.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amaranath.somalapuram@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=andrealmeid@igalia.com \
    --cc=andrey.grodzovsky@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=contact@emersion.fr \
    --cc=contactshashanksharma@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=daniel@fooishbar.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maraeo@gmail.com \
    --cc=pgriffais@valvesoftware.com \
    --cc=pierre-eric.pelloux-prayer@amd.com \
    --cc=robdclark@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox