From: "Summers, Stuart" <stuart.summers@intel.com>
To: "Roper, Matthew D" <matthew.d.roper@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
"Lin, Shuicheng" <shuicheng.lin@intel.com>,
"Vivi, Rodrigo" <rodrigo.vivi@intel.com>,
"Sousa, Gustavo" <gustavo.sousa@intel.com>,
"Brost, Matthew" <matthew.brost@intel.com>,
"Wajdeczko, Michal" <michal.wajdeczko@intel.com>,
"Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>,
"Nerlige Ramappa, Umesh" <umesh.nerlige.ramappa@intel.com>
Subject: Re: [PATCH 4/9] drm/xe: Add a new debug focused configfs group
Date: Fri, 8 May 2026 17:55:34 +0000 [thread overview]
Message-ID: <c93880aa61de77c61d5d4d8f93af3efc67ff2acc.camel@intel.com> (raw)
In-Reply-To: <20260508172928.GS2131374@mdroper-desk1.amr.corp.intel.com>
On Fri, 2026-05-08 at 10:29 -0700, Matt Roper wrote:
> On Thu, May 07, 2026 at 04:02:21PM -0700, Summers, Stuart wrote:
> > On Thu, 2026-05-07 at 15:58 -0700, Matt Roper wrote:
> > > On Mon, May 04, 2026 at 04:04:11PM -0300, Gustavo Sousa wrote:
> > > > "Summers, Stuart" <stuart.summers@intel.com> writes:
> > > >
> > > > > On Mon, 2026-05-04 at 14:28 -0300, Gustavo Sousa wrote:
> > > > > > "Summers, Stuart" <stuart.summers@intel.com> writes:
> > > > > >
> > > > > > > On Mon, 2026-05-04 at 12:42 -0300, Gustavo Sousa wrote:
> > > > > > > > Stuart Summers <stuart.summers@intel.com> writes:
> > > > > > > >
> > > > > > > > > Add the skeleton code for a new debug specific
> > > > > > > > > configfs
> > > > > > > > > group.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Stuart Summers
> > > > > > > > > <stuart.summers@intel.com>
> > > > > > > > > Assisted-by: Copilot:claude-opus-4.7
> > > > > > > > > ---
> > > > > > > > > drivers/gpu/drm/xe/Makefile | 2 ++
> > > > > > > > > drivers/gpu/drm/xe/xe_configfs.c | 4 ++++
> > > > > > > > > drivers/gpu/drm/xe/xe_configfs_debug.c | 14
> > > > > > > > > ++++++++++++++
> > > > > > > > > drivers/gpu/drm/xe/xe_configfs_debug.h | 8 ++++++++
> > > > > > > > > drivers/gpu/drm/xe/xe_configfs_types.h | 7 +++++++
> > > > > > > > > 5 files changed, 35 insertions(+)
> > > > > > > > > create mode 100644
> > > > > > > > > drivers/gpu/drm/xe/xe_configfs_debug.c
> > > > > > > > > create mode 100644
> > > > > > > > > drivers/gpu/drm/xe/xe_configfs_debug.h
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > > > > > > > b/drivers/gpu/drm/xe/Makefile
> > > > > > > > > index 09661f079d03..b58667b0b18e 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/Makefile
> > > > > > > > > +++ b/drivers/gpu/drm/xe/Makefile
> > > > > > > > > @@ -161,6 +161,8 @@ xe-$(CONFIG_HWMON) += xe_hwmon.o
> > > > > > > > >
> > > > > > > > > xe-$(CONFIG_PERF_EVENTS) += xe_pmu.o
> > > > > > > > > xe-$(CONFIG_CONFIGFS_FS) += xe_configfs.o
> > > > > > > > > +xe_debug_configfs_obj-$(CONFIG_DRM_XE_DEBUG) :=
> > > > > > > > > xe_configfs_debug.o
> > > > > > > > > +xe-$(CONFIG_CONFIGFS_FS) += $(xe_debug_configfs_obj-
> > > > > > > > > y)
> > > > > > > >
> > > > > > > > What are the downsides of keeping this available for
> > > > > > > > non-
> > > > > > > > debug
> > > > > > > > builds?
> > > > > > > >
> > > > > > > > I imagine that the debug configfs could be useful for
> > > > > > > > someone not
> > > > > > > > involved in kernel development working together with
> > > > > > > > the
> > > > > > > > developers
> > > > > > > > to
> > > > > > > > debug issues without needing to rebuild their kernel.
> > > > > > >
> > > > > > > So the focal parameter we had in i915 was enable_rc6.
> > > > > > > Something
> > > > > > > like
> > > > > > > this is what we explicitly don't want to expose to a
> > > > > > > general
> > > > > > > user
> > > > > > > as it
> > > > > > > can have unforeseen consequences (like having very high
> > > > > > > power
> > > > > > > usage).
> > > > > > > But, having the ability to disable rc6 for debug purposes
> > > > > > > as
> > > > > > > a
> > > > > > > developer is extremely useful at narrowing down certain
> > > > > > > types
> > > > > > > of
> > > > > > > bugs
> > > > > > > at the hardware/software interface.
> > > > > >
> > > > > > We could probably keep the debug stuff available but cause
> > > > > > a
> > > > > > taint if
> > > > > > they ever get used?
> > > > >
> > > > > So the tainting is really just for our CI I think. This won't
> > > > > necessarily impact end users - at least on the client side -
> > > > > who
> > > > > might
> > > > > just ignore it (or not even notice it).
> > > >
> > > > I *think* tainting is not only for CI (someone could correct me
> > > > here).
> > > > I see that our driver adds a taint when the user is applying
> > > > some
> > > > parameters (module parameters or configfs) that alters the
> > > > "officially
> > > > supported" behavior of the driver.
> > > >
> > > > Some examples I could find are usages of
> > > > module_param_named_unsafe() in
> > > > the module param front, and
> > > > setup_configfs_post_ctx_restore_bb(),
> > > > setup_configfs_mid_ctx_restore_bb() in configfs.
> > >
> > > Yeah, there are various reasons to taint the kernel (hardware,
> > > software,
> > > or behavioral/usage) and they indicate different things; some are
> > > more
> > > serious than others. See Documentation/admin-guide/tainted-
> > > kernels.rst
> > > for details. The general idea though is that you've entered a
> > > dangerous/risky situation and we can no longer trust the system
> > > to
> > > behave properly going forward. I believe the kernel community
> > > will
> > > often ignore or de-prioritize bug reports that arise from tainted
> > > kernels because there isn't really any guarantee of proper
> > > behavior
> > > or
> > > support from the community at that point.
> > >
> > > Some of our configfs entries in Xe are "official" and end-user
> > > focused
> > > (e.g., SR-IOV configuration settings and such) so those shouldn't
> > > cause
> > > any taint. But others are low-level debugging knobs that are
> > > effectively handing the user a loaded gun to shoot themselves in
> > > the
> > > foot with. E.g., the "workaround batchbuffer" configfs stuff is
> > > very
> > > useful for hardware debugging sessions by expert driver
> > > developers,
> > > but
> > > if used improperly makes it it very easy to mess up both hardware
> > > and
> > > software internal state in ways that make the system unstable.
> > > Tainting
> > > on these developer-focused debug facilities is important because
> > > we
> > > can't trust any bug reports that show up after someone has used
> > > those
> > > settings. With i915 module parameters we got into situations
> > > where a
> > > bunch of websites and wikis posted horribly inaccurate
> > > suggestions
> > > and
> > > users would blindly copy those settings without realizing that
> > > they
> > > were
> > > deliberately breaking their systems, and some of the debug knobs
> > > we're
> > > making available in configfs are even more risky. Honestly, it
> > > might
> > > be
> > > best to even stick them behind CONFIG_EXPERT and/or other build
> > > settings.
> >
> > So this was the reason I used CONFIG_DRM_XE_DEBUG... Should we have
> > these behind multiple? Or maybe we should force build CONFIG_EXPORT
> > if
> > CONFIG_DRM_XE_DEBUG is set?
>
> Probably best to check with the maintainers on exactly what config
> options we should hide some of these behind.
>
> >
> > I don't really think tainting the kernel adds much here IMO. Yeah
> > it
> > does tell the user they're doing something that could cause issues,
> > but
> > they're already running in a debug mode if they got to this point,
> > so
> > IMO they already know what they're doing (or should).
>
> The taint isn't for the user running the driver in debug mode. It's
> for
> the people receiving bug reports and managing the bug tracking
> databases
> so that they can adjust their expectations, possibly disregarding or
> deprioritizing bug reports from users mucking around with
> hardware-internal or driver-internal state in a way that isn't
> expected
> to be safe. And it's not just us that needs this notification (e.g.,
> for gitlab issues), but also distribution maintainers and such who
> don't
> really know much about our driver and may not otherwise realize that
> that the user broke their system with something only intended for
> low-level debugging.
>
> For example, if someone uses 'ctx_restore_post_bb' to have the driver
> poke arbitrary chicken bits in the hardware on every context switch
> that
> really aren't supposed to be touched, that might be harmless for a
> while
> and then suddenly break when something else in the driver changes.
> We
> might get bug reports incorrectly indicating that a perfectly good
> patch
> is causing a driver regression, and then spend tons of time
> trying to reproduce and debug if we don't have the taint notification
> to
> let us know that the user was doing something unsupported and unsafe.
Yeah it's all a good point. And maybe a good reason to add some kind of
auto taint like we do with the unsafe modparams... ok let me take a
stab here on the next rev.
Thanks,
Stuart
>
>
> Matt
>
> >
> > Thanks,
> > Stuart
> >
> > >
> > >
> > > Matt
> > >
> > > >
> > > > >
> > > > > >
> > > > > > The scenario I had in mind is someone who is not familiar
> > > > > > with
> > > > > > building
> > > > > > their own kernel working together with the development
> > > > > > community on a
> > > > > > hard-to-reproduce issue and some of the debug configfs
> > > > > > attributes
> > > > > > being
> > > > > > useful to gather more relevant data.
> > > > >
> > > > > Yeah but IMO we really should be able to instruct such a user
> > > > > how
> > > > > to do
> > > > > that - the steps there aren't too difficult if they're
> > > > > already
> > > > > installing kernels etc.
> > > > >
> > > > > A lot of our debug hooks are already held under various
> > > > > Kconfig
> > > > > options
> > > > > anyway like DRM_XE_DEBUG, DRM_XE_DEBUG_GUC, DRM_XE_DEBUG_VM,
> > > > > etc.
> > > > > And a
> > > > > lot of the information under those Kconfigs are also critical
> > > > > for
> > > > > different debugs.
> > > >
> > > > Are those due to performance or resource usage implications if
> > > > they
> > > > are
> > > > kept enabled? If so, is this the same case here?
> > > >
> > > > >
> > > > > I'd really prefer to keep these under DRM_XE_DEBUG (or some
> > > > > other
> > > > > Kconfig) if possible so we can maximize the ease of adding
> > > > > new
> > > > > parameters as specific debugs come up.
> > > >
> > > > But why would keeping them under a build flag make it easier to
> > > > add
> > > > new
> > > > parameters as opposed to always keeping the knobs available for
> > > > use?
> > > >
> > > > --
> > > > Gustavo Sousa
> > > >
> > > > >
> > > > > Thanks,
> > > > > Stuart
> > > > >
> > > > > >
> > > > > > --
> > > > > > Gustavo Sousa
> > > > > >
> > > > > > >
> > > > > > > That's just one example, but there are a bunch of things
> > > > > > > that
> > > > > > > could
> > > > > > > fit
> > > > > > > this category and I expect we'll have even more use cases
> > > > > > > over
> > > > > > > time.
> > > > > > >
> > > > > > > Right now we have to hold these types of changes in a
> > > > > > > side
> > > > > > > branch
> > > > > > > not
> > > > > > > exposed externally and I really want to converge our
> > > > > > > processes so
> > > > > > > we
> > > > > > > can more easily make these kinds of changes and let us
> > > > > > > debug
> > > > > > > directly
> > > > > > > out of drm-tip.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > # graphics virtualization (SR-IOV) support
> > > > > > > > > xe-y += \
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_configfs.c
> > > > > > > > > b/drivers/gpu/drm/xe/xe_configfs.c
> > > > > > > > > index 12b7fe65446d..85df8ce5cf2a 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_configfs.c
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_configfs.c
> > > > > > > > > @@ -1006,6 +1006,10 @@ static struct config_group
> > > > > > > > > *xe_config_make_device_group(struct config_group *gro
> > > > > > > > > config_group_init_type_name(&dev-
> > > > > > > > > >sriov,
> > > > > > > > > "sriov",
> > > > > > > > > &xe_config_sriov_type);
> > > > > > > > > configfs_add_default_group(&dev-
> > > > > > > > > >sriov,
> > > > > > > > > &dev-
> > > > > > > > > > group);
> > > > > > > > > }
> > > > > > > > > +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > > > > > > > > + config_group_init_type_name(&dev->debug,
> > > > > > > > > "debug",
> > > > > > > > > &xe_configfs_debug_type);
> > > > > > > > > + configfs_add_default_group(&dev->debug, &dev-
> > > > > > > > > > group);
> > > > > > > > > +#endif
> > > > > > > >
> > > > > > > > I guess there isn't much we can do about it today, but,
> > > > > > > > based on
> > > > > > > > what
> > > > > > > > I
> > > > > > > > read in configfs documentation a while ago, I suspect
> > > > > > > > it
> > > > > > > > wasn't
> > > > > > > > really
> > > > > > > > designed with arbitrary hierarchies of config items in
> > > > > > > > mind. It
> > > > > > > > would
> > > > > > > > be nice if there was an API for that though.
> > > > > > > >
> > > > > > > > Basically, a config group is expected to be a directory
> > > > > > > > to
> > > > > > > > contain a
> > > > > > > > set
> > > > > > > > of config items of the same type, which is not the case
> > > > > > > > on
> > > > > > > > our
> > > > > > > > current
> > > > > > > > usage (i.e. the "sriov" and now the "debug" group).
> > > > > > > >
> > > > > > > > I guess we are able to get our desired behavior
> > > > > > > > (basically
> > > > > > > > just a
> > > > > > > > subdirectory behaving as a config item) by not
> > > > > > > > implementing
> > > > > > > > the
> > > > > > > > "make_item" hook. Not sure if that's an expected use
> > > > > > > > of
> > > > > > > > the API
> > > > > > > > or
> > > > > > > > if
> > > > > > > > that's more of a hack on our side and could come back
> > > > > > > > to
> > > > > > > > bite us
> > > > > > > > in
> > > > > > > > the
> > > > > > > > future.
> > > > > > >
> > > > > > > Hm.. ok that's good feedback. I do like bucketing these
> > > > > > > in
> > > > > > > groups
> > > > > > > and
> > > > > > > subdirectories. It also makes it easy to show or not show
> > > > > > > based on
> > > > > > > the
> > > > > > > Kconfig. But let me go through and make sure there isn't
> > > > > > > something
> > > > > > > more
> > > > > > > suited here. Happy to hear any other feedback as well
> > > > > > > here.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Stuart
> > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Gustavo Sousa
> > > > > > > >
> > > > > > > > >
> > > > > > > > > mutex_init(&dev->lock);
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_configfs_debug.c
> > > > > > > > > b/drivers/gpu/drm/xe/xe_configfs_debug.c
> > > > > > > > > new file mode 100644
> > > > > > > > > index 000000000000..45617282cec5
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_configfs_debug.c
> > > > > > > > > @@ -0,0 +1,14 @@
> > > > > > > > > +// SPDX-License-Identifier: MIT
> > > > > > > > > +/*
> > > > > > > > > + * Copyright © 2026 Intel Corporation
> > > > > > > > > + */
> > > > > > > > > +
> > > > > > > > > +#include <linux/configfs.h>
> > > > > > > > > +#include <linux/module.h>
> > > > > > > > > +
> > > > > > > > > +#include "xe_configfs_debug.h"
> > > > > > > > > +#include "xe_configfs_types.h"
> > > > > > > > > +
> > > > > > > > > +const struct config_item_type xe_configfs_debug_type
> > > > > > > > > = {
> > > > > > > > > + .ct_owner = THIS_MODULE,
> > > > > > > > > +};
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_configfs_debug.h
> > > > > > > > > b/drivers/gpu/drm/xe/xe_configfs_debug.h
> > > > > > > > > new file mode 100644
> > > > > > > > > index 000000000000..01170dc2f97e
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_configfs_debug.h
> > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > +/* SPDX-License-Identifier: MIT */
> > > > > > > > > +/*
> > > > > > > > > + * Copyright © 2026 Intel Corporation
> > > > > > > > > + */
> > > > > > > > > +#ifndef _XE_CONFIGFS_DEBUG_H_
> > > > > > > > > +#define _XE_CONFIGFS_DEBUG_H_
> > > > > > > > > +
> > > > > > > > > +#endif /* _XE_CONFIGFS_DEBUG_H_ */
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_configfs_types.h
> > > > > > > > > b/drivers/gpu/drm/xe/xe_configfs_types.h
> > > > > > > > > index 935097aafa96..c9d94a3c26a7 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_configfs_types.h
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_configfs_types.h
> > > > > > > > > @@ -24,6 +24,9 @@ struct wa_bb {
> > > > > > > > > struct xe_config_group_device {
> > > > > > > > > struct config_group group;
> > > > > > > > > struct config_group sriov;
> > > > > > > > > +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > > > > > > > > + struct config_group debug;
> > > > > > > > > +#endif
> > > > > > > > >
> > > > > > > > > struct xe_config_device {
> > > > > > > > > struct wa_bb
> > > > > > > > > ctx_restore_mid_bb[XE_ENGINE_CLASS_MAX];
> > > > > > > > > @@ -56,4 +59,8 @@ static inline struct
> > > > > > > > > xe_config_device
> > > > > > > > > *xe_configfs_to_device(struct config_item
> > > > > > > > > return &xe_configfs_to_group_device(item)-
> > > > > > > > > > config;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > > > > > > > > +extern const struct config_item_type
> > > > > > > > > xe_configfs_debug_type;
> > > > > > > > > +#endif
> > > > > > > > > +
> > > > > > > > > #endif /* _XE_CONFIGFS_TYPES_H_ */
> > > > > > > > > --
> > > > > > > > > 2.43.0
> > >
> >
>
next prev parent reply other threads:[~2026-05-08 17:55 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 4:43 [PATCH 0/9] Add new debug infrastructure for configfs Stuart Summers
2026-05-04 4:43 ` [PATCH 1/9] drm/xe: Rename survivability_mode to enable_survivability_mode Stuart Summers
2026-05-04 13:29 ` Gustavo Sousa
2026-05-04 14:32 ` Summers, Stuart
2026-05-04 14:38 ` Summers, Stuart
2026-05-04 14:40 ` Summers, Stuart
2026-05-04 18:31 ` Rodrigo Vivi
2026-05-04 18:38 ` Summers, Stuart
2026-05-04 4:43 ` [PATCH 2/9] drm/xe: Sort xe_config_device fields and defaults alphabetically Stuart Summers
2026-05-04 13:58 ` Gustavo Sousa
2026-05-04 14:38 ` Summers, Stuart
2026-05-04 15:47 ` Lin, Shuicheng
2026-05-04 15:54 ` Summers, Stuart
2026-05-04 4:43 ` [PATCH 3/9] drm/xe: Split out configfs data structures Stuart Summers
2026-05-04 4:52 ` Summers, Stuart
2026-05-04 8:47 ` Jani Nikula
2026-05-04 14:24 ` Summers, Stuart
2026-05-04 21:48 ` Matthew Brost
2026-05-04 21:51 ` Summers, Stuart
2026-05-04 4:43 ` [PATCH 4/9] drm/xe: Add a new debug focused configfs group Stuart Summers
2026-05-04 15:42 ` Gustavo Sousa
2026-05-04 15:50 ` Summers, Stuart
2026-05-04 17:28 ` Gustavo Sousa
2026-05-04 17:44 ` Summers, Stuart
2026-05-04 19:04 ` Gustavo Sousa
2026-05-07 22:58 ` Matt Roper
2026-05-07 23:02 ` Summers, Stuart
2026-05-08 17:29 ` Matt Roper
2026-05-08 17:55 ` Summers, Stuart [this message]
2026-05-05 21:45 ` Summers, Stuart
2026-05-04 4:43 ` [PATCH 5/9] drm/xe: Move debug configfs entries to xe_configfs_debug.c Stuart Summers
2026-05-04 4:43 ` [PATCH 6/9] drm/xe/guc: Add configfs support for guc_log_level Stuart Summers
2026-05-05 23:54 ` Daniele Ceraolo Spurio
2026-05-04 4:43 ` [PATCH 7/9] drm/xe/guc: Add support for NPK as a GuC log target Stuart Summers
2026-05-04 4:43 ` [PATCH 8/9] drm/xe: Add infrastructure for debug configfs parameters Stuart Summers
2026-05-04 4:43 ` [PATCH 9/9] drm/xe: Migrate existing debug configfs entries to params infrastructure Stuart Summers
2026-05-04 4:54 ` [PATCH 0/9] Add new debug infrastructure for configfs Summers, Stuart
2026-05-04 5:30 ` ✗ CI.checkpatch: warning for " Patchwork
2026-05-04 5:32 ` ✓ CI.KUnit: success " Patchwork
2026-05-04 6:44 ` ✓ Xe.CI.BAT: " Patchwork
2026-05-04 8:42 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c93880aa61de77c61d5d4d8f93af3efc67ff2acc.camel@intel.com \
--to=stuart.summers@intel.com \
--cc=daniele.ceraolospurio@intel.com \
--cc=gustavo.sousa@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=shuicheng.lin@intel.com \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox