From: Jani Nikula <jani.nikula@linux.intel.com>
To: Raag Jadav <raag.jadav@intel.com>,
airlied@gmail.com, simona@ffwll.ch, lucas.demarchi@intel.com,
thomas.hellstrom@linux.intel.com, rodrigo.vivi@intel.com,
joonas.lahtinen@linux.intel.com, tursulin@ursulin.net,
lina@asahilina.net
Cc: intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
dri-devel@lists.freedesktop.org, himal.prasad.ghimiray@intel.com,
francois.dugast@intel.com, aravind.iddamsetty@linux.intel.com,
anshuman.gupta@intel.com, andi.shyti@linux.intel.com,
andriy.shevchenko@linux.intel.com, matthew.d.roper@intel.com,
Raag Jadav <raag.jadav@intel.com>
Subject: Re: [PATCH v5 1/4] drm: Introduce device wedged event
Date: Thu, 19 Sep 2024 10:43:23 +0300 [thread overview]
Message-ID: <87o74k9jhg.fsf@intel.com> (raw)
In-Reply-To: <20240917040235.197019-2-raag.jadav@intel.com>
On Tue, 17 Sep 2024, Raag Jadav <raag.jadav@intel.com> wrote:
> Introduce device wedged event, which will notify userspace of wedged
> (hanged/unusable) state of the DRM device through a uevent. This is
> useful especially in cases where the device is no longer operating as
> expected and has become unrecoverable from driver context.
>
> Purpose of this implementation is to provide drivers a way to recover
> through userspace intervention. Different drivers may have different
> ideas of a "wedged device" depending on their hardware implementation,
> and hence the vendor agnostic nature of the event. It is upto the drivers
> to decide when they see the need for recovery and how they want to recover
> from the available methods.
>
> Current implementation defines three recovery methods, out of which,
> drivers can choose to support any one or multiple of them. Preferred
> recovery method will be sent in the uevent environment as WEDGED=<method>.
> Userspace consumers (sysadmin) can define udev rules to parse this event
> and take respective action to recover the device.
>
> Method | Consumer expectations
> -----------|-----------------------------------
> rebind | unbind + rebind driver
> bus-reset | unbind + reset bus device + rebind
> reboot | reboot system
>
> v4: s/drm_dev_wedged/drm_dev_wedged_event
> Use drm_info() (Jani)
> Kernel doc adjustment (Aravind)
> v5: Send recovery method with uevent (Lina)
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
> drivers/gpu/drm/drm_drv.c | 37 +++++++++++++++++++++++++++++++++++++
> include/drm/drm_device.h | 24 ++++++++++++++++++++++++
> include/drm/drm_drv.h | 1 +
> 3 files changed, 62 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index ac30b0ec9d93..1e850a9f608d 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -497,6 +497,43 @@ void drm_dev_unplug(struct drm_device *dev)
> }
> EXPORT_SYMBOL(drm_dev_unplug);
>
> +const char *const wedge_recovery_opts[] = {
> + [DRM_WEDGE_RECOVERY_REBIND] = "rebind",
> + [DRM_WEDGE_RECOVERY_BUS_RESET] = "bus-reset",
> + [DRM_WEDGE_RECOVERY_REBOOT] = "reboot",
> +};
> +
> +/**
> + * drm_dev_wedged_event - generate a device wedged uevent
> + * @dev: DRM device
> + * @method: method to be used for recovery
> + *
> + * This generates a device wedged uevent for the DRM device specified by @dev.
> + * Recovery @method from wedge_recovery_opts[] (if supprted by the device) is
> + * sent in the uevent environment as WEDGED=<method>, on the basis of which,
> + * userspace may take respective action to recover the device.
> + *
> + * Returns: 0 on success, or negative error code otherwise.
> + */
> +int drm_dev_wedged_event(struct drm_device *dev, enum wedge_recovery_method method)
> +{
> + char event_string[32] = "WEDGED=";
> + char *envp[] = { event_string, NULL };
> + bool supported;
> +
> + supported = test_bit(method, &dev->wedge_recovery);
> + if (unlikely(!supported)) {
The unlikely is unnecessary.
> + drm_err(dev, "device wedged, recovery method not supported\n");
> + return -EOPNOTSUPP;
> + }
> +
> + strcat(event_string, wedge_recovery_opts[method]);
Emphasizing here too, you need bounds checking for
wedge_recovery_opts. And avoid strcat, it's hardly ever the right
choice, as you'll need bounds checking on event_string.
> +
> + drm_info(dev, "device wedged, generating uevent\n");
> + return kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(drm_dev_wedged_event);
> +
> /*
> * DRM internal mount
> * We want to be able to allocate our own "struct address_space" to control
> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> index c91f87b5242d..e4f32967b5ae 100644
> --- a/include/drm/drm_device.h
> +++ b/include/drm/drm_device.h
> @@ -40,6 +40,27 @@ enum switch_power_state {
> DRM_SWITCH_POWER_DYNAMIC_OFF = 3,
> };
>
> +/**
> + * enum wedge_recovery_method - Recovery method for wedged device in order
> + * of severity. To be set as bit fields in drm_device.wedge_recovery variable.
> + * Drivers can choose to support any one or multiple of them depending on their
> + * needs.
> + */
> +
> +enum wedge_recovery_method {
> + /** @DRM_WEDGE_RECOVERY_REBIND: unbind + rebind driver */
> + DRM_WEDGE_RECOVERY_REBIND = 0,
I don't see a need to initialize the enumerations, it's automatic.
> +
> + /** @DRM_WEDGE_RECOVERY_BUS_RESET: unbind + reset bus device + rebind */
> + DRM_WEDGE_RECOVERY_BUS_RESET = 1,
> +
> + /** @DRM_WEDGE_RECOVERY_REBOOT: reboot system */
> + DRM_WEDGE_RECOVERY_REBOOT = 2,
> +
> + /** @DRM_WEDGE_RECOVERY_MAX: for bounds checking, do not use */
> + DRM_WEDGE_RECOVERY_MAX = 3,
> +};
> +
> /**
> * struct drm_device - DRM device structure
> *
> @@ -317,6 +338,9 @@ struct drm_device {
> * Root directory for debugfs files.
> */
> struct dentry *debugfs_root;
> +
> + /** @wedge_recovery: Supported recovery methods for wedged device */
> + unsigned long wedge_recovery;
> };
>
> #endif
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 02ea4e3248fd..6e02187f1f6c 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -461,6 +461,7 @@ void drm_put_dev(struct drm_device *dev);
> bool drm_dev_enter(struct drm_device *dev, int *idx);
> void drm_dev_exit(int idx);
> void drm_dev_unplug(struct drm_device *dev);
> +int drm_dev_wedged_event(struct drm_device *dev, enum wedge_recovery_method method);
>
> /**
> * drm_dev_is_unplugged - is a DRM device unplugged
--
Jani Nikula, Intel
next prev parent reply other threads:[~2024-09-19 7:43 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-17 4:02 [PATCH v5 0/4] Introduce DRM device wedged event Raag Jadav
2024-09-17 4:02 ` [PATCH v5 1/4] drm: Introduce " Raag Jadav
2024-09-19 7:43 ` Jani Nikula [this message]
2024-09-17 4:02 ` [PATCH v5 2/4] drm: Expose wedge recovery methods Raag Jadav
2024-09-17 7:49 ` Jani Nikula
2024-09-19 4:05 ` Raag Jadav
2024-09-19 7:38 ` Jani Nikula
2024-09-19 8:38 ` Raag Jadav
2024-09-19 9:24 ` Jani Nikula
2024-09-19 11:33 ` Raag Jadav
2024-09-19 11:39 ` Jani Nikula
2024-09-19 13:45 ` Andy Shevchenko
2024-09-20 11:18 ` Raag Jadav
2024-09-20 14:27 ` Andy Shevchenko
2024-09-17 4:02 ` [PATCH v5 3/4] drm/xe: Use device wedged event Raag Jadav
2024-09-17 4:41 ` Ghimiray, Himal Prasad
2024-09-17 6:38 ` Raag Jadav
2024-09-17 8:03 ` Ghimiray, Himal Prasad
2024-09-17 8:18 ` Aravind Iddamsetty
2024-09-17 4:02 ` [PATCH v5 4/4] drm/i915: " Raag Jadav
2024-09-17 4:09 ` ✓ CI.Patch_applied: success for Introduce DRM device wedged event (rev3) Patchwork
2024-09-17 4:10 ` ✗ CI.checkpatch: warning " Patchwork
2024-09-17 4:12 ` ✓ CI.KUnit: success " Patchwork
2024-09-17 4:35 ` ✓ CI.Build: " Patchwork
2024-09-17 4:38 ` ✓ CI.Hooks: " Patchwork
2024-09-17 4:39 ` ✗ CI.checksparse: warning " Patchwork
2024-09-17 4:53 ` ✗ Fi.CI.CHECKPATCH: " Patchwork
2024-09-17 4:53 ` ✗ Fi.CI.SPARSE: " Patchwork
2024-09-17 5:13 ` ✓ Fi.CI.BAT: success " Patchwork
2024-09-17 5:24 ` ✗ CI.BAT: failure " Patchwork
2024-09-17 7:31 ` ✓ CI.FULL: success " Patchwork
2024-09-17 12:23 ` ✓ Fi.CI.IGT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o74k9jhg.fsf@intel.com \
--to=jani.nikula@linux.intel.com \
--cc=airlied@gmail.com \
--cc=andi.shyti@linux.intel.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=francois.dugast@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=joonas.lahtinen@linux.intel.com \
--cc=lina@asahilina.net \
--cc=lucas.demarchi@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=raag.jadav@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=simona@ffwll.ch \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tursulin@ursulin.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.