From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, Raag Jadav <raag.jadav@intel.com>
Subject: Re: [PATCH v2 8/8] drm/xe/gt_throttle: Avoid TOCTOU when monitoring reasons
Date: Wed, 29 Oct 2025 16:24:26 -0400 [thread overview]
Message-ID: <aQJ3-mltmVNmgEAP@intel.com> (raw)
In-Reply-To: <q23qstnnvuocvqru7xmev4s7ke23qflgdekfkb2jx6kltuu2l7@b6xg7yt7cx3a>
On Tue, Oct 28, 2025 at 11:04:56AM -0500, Lucas De Marchi wrote:
> On Tue, Oct 28, 2025 at 10:02:45AM -0400, Rodrigo Vivi wrote:
> > On Sun, Oct 26, 2025 at 10:57:20PM -0700, Lucas De Marchi wrote:
> > > It's currently not possible to safely monitor if there's throttling
> > > happening and what are the reasons. The approach of reading the status
> > > and then reading the reasons is not reliable as by the time sysadmin
> > > reads the reason, the throttling could not be happening anymore.
> > >
> > > Previous tentative to fix that[1] was breaking the ABI and potentially
> > > sysadmin's scripts. This takes a different approach of adding and
> > > documenting the additional attribute. It's still valuable, though
> > > redundant, to provide the simpler 0/1 interface.
> > >
> > > In order to avoid userspace knowledge on the bitmask meaning and to be
> > > able to maintain the kernel side in sync with possible changes in
> > > future, just walk the attribute group and check what are the masks that
> > > match the value read.
> > >
> > > [1] https://lore.kernel.org/intel-xe/20241025092238.167042-1-raag.jadav@intel.com/
> > >
> > > Cc: Raag Jadav <raag.jadav@intel.com>
> > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> > > ---
> > > drivers/gpu/drm/xe/xe_gt_throttle.c | 46 +++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 44 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_gt_throttle.c b/drivers/gpu/drm/xe/xe_gt_throttle.c
> > > index fa7068aac3344..fd2988dacbbb6 100644
> > > --- a/drivers/gpu/drm/xe/xe_gt_throttle.c
> > > +++ b/drivers/gpu/drm/xe/xe_gt_throttle.c
> > > @@ -22,9 +22,15 @@
> > > * Their availability depend on the platform and some may not be visible if that
> > > * reason is not available.
> > > *
> > > + * The ``status_reasons`` attribute can be used by sysadmin monitoring all
> > > + * possible reasons for throttling and reporting them. It's preferred over
> > > + * monitoring ``status`` and then reading the reason both for simplicity and to
> > > + * avoid TOCTOU.
> >
> > Perhaps add something like: TOCTOU (time-of-check to time-of-use).
> >
>
> ok... just to be clear, this is not about any security issue that
> sometimes the TOCTOU abbreviation is associated with, it's just a
> "normal bug/race" in the interface.
>
> > > + *
> > > * The following attributes are available on Crescent Island platform:
> > > *
> > > - * - ``status``: Overall throttle status
> > > + * - ``status``: Overall throttle status (0: no throttling, 1: throttling)
> > > + * - ``status_reasons``: All reasons causing throttling separated by newline.
> > > * - ``reason_pl1``: package PL1
> > > * - ``reason_pl2``: package PL2
> > > * - ``reason_pl4``: package PL4
> > > @@ -43,7 +49,8 @@
> > > *
> > > * Other platforms support the following reasons:
> > > *
> > > - * - ``status``: Overall status
> > > + * - ``status``: Overall throttle status (0: no throttling, 1: throttling)
> > > + * - ``status_reasons``: All reasons causing throttling separated by newline.
> > > * - ``reason_pl1``: package PL1
> > > * - ``reason_pl2``: package PL2
> > > * - ``reason_pl4``: package PL4, Iccmax etc.
> > > @@ -111,12 +118,45 @@ static ssize_t reason_show(struct kobject *kobj,
> > > return sysfs_emit(buff, "%u\n", is_throttled_by(gt, ta->mask));
> > > }
> > >
> > > +static const struct attribute_group *get_platform_throttle_group(struct xe_device *xe);
> > > +
> > > +static ssize_t status_reasons_show(struct kobject *kobj,
> > > + struct kobj_attribute *attr, char *buff)
> > > +{
> > > + struct xe_gt *gt = throttle_to_gt(kobj);
> > > + struct xe_device *xe = gt_to_xe(gt);
> > > + const struct attribute_group *group;
> > > + struct attribute **pother;
> > > + ssize_t ret = 0;
> > > + u32 reasons;
> > > +
> > > + reasons = xe_gt_throttle_get_limit_reasons(gt);
> > > + group = get_platform_throttle_group(xe);
> > > +
> > > + for (pother = group->attrs; *pother; pother++) {
> > > + struct kobj_attribute *kattr = container_of(*pother, struct kobj_attribute, attr);
> > > + struct throttle_attribute *other_ta = kobj_attribute_to_throttle(kattr);
> > > +
> > > + if (other_ta->mask != U32_MAX && reasons & other_ta->mask)
> > > + ret += sysfs_emit_at(buff, ret, "%s\n", (*pother)->name);
> >
> > perhaps a space instead of the \n to keep only a single line?
>
> I'm lazy and I don't like the additional space that the lazy approach
> would cause in the last entry :-/. For printing arrays to sysfs
> doesn't seem there's a uniform approach. Some use \n and some use
> space.
>
> do we care about the extra space? if not, then it would be:
>
> ret += sysfs_emit_at(buff, ret, "%s ", (*pother)->name);
For the simplicity (lazyness?!) I believe we should go like this
and then an extra \n after like done in
show_scaling_available_governors() at drivers/cpufreq/cpufreq.c
>
> if we do, then:
>
> ret += sysfs_emit_at(buff, ret, "%s%s", ret ? " " : "", (*pother)->name);
> or
> ret += sysfs_emit_at(buff, ret, "%.*s%s", !!ret, " ", (*pother)->name);
>
> thanks
> Lucas De Marchi
>
> >
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > #define THROTTLE_ATTR_RO(name, _mask) \
> > > struct throttle_attribute attr_##name = { \
> > > .attr = __ATTR(name, 0444, reason_show, NULL), \
> > > .mask = _mask, \
> > > }
> > >
> > > +#define THROTTLE_ATTR_RO_FUNC(name, _mask, _show) \
> > > + struct throttle_attribute attr_##name = { \
> > > + .attr = __ATTR(name, 0444, _show, NULL), \
> > > + .mask = _mask, \
> > > + }
> > > +
> > > +static THROTTLE_ATTR_RO_FUNC(status_reasons, 0, status_reasons_show);
> > > static THROTTLE_ATTR_RO(status, U32_MAX);
> > > static THROTTLE_ATTR_RO(reason_pl1, POWER_LIMIT_1_MASK);
> > > static THROTTLE_ATTR_RO(reason_pl2, POWER_LIMIT_2_MASK);
> > > @@ -128,6 +168,7 @@ static THROTTLE_ATTR_RO(reason_vr_thermalert, VR_THERMALERT_MASK);
> > > static THROTTLE_ATTR_RO(reason_vr_tdc, VR_TDC_MASK);
> > >
> > > static struct attribute *throttle_attrs[] = {
> > > + &attr_status_reasons.attr.attr,
> > > &attr_status.attr.attr,
> > > &attr_reason_pl1.attr.attr,
> > > &attr_reason_pl2.attr.attr,
> > > @@ -153,6 +194,7 @@ static THROTTLE_ATTR_RO(reason_psys_crit, PSYS_CRIT_MASK);
> > >
> > > static struct attribute *cri_throttle_attrs[] = {
> > > /* Common */
> > > + &attr_status_reasons.attr.attr,
> > > &attr_status.attr.attr,
> > > &attr_reason_pl1.attr.attr,
> > > &attr_reason_pl2.attr.attr,
> > >
> > > --
> > > 2.51.0
> > >
next prev parent reply other threads:[~2025-10-29 20:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-27 5:57 [PATCH v2 0/8] drm/xe: CRI support in gt_throttle + refactors Lucas De Marchi
2025-10-27 5:57 ` [PATCH v2 1/8] drm/xe/cri: Add new performance limit reasons bits Lucas De Marchi
2025-10-27 5:57 ` [PATCH v2 2/8] drm/xe/gt_throttle: Tidy up perf reasons reading Lucas De Marchi
2025-10-27 5:57 ` [PATCH v2 3/8] drm/xe/gt_throttle: Always read and mask Lucas De Marchi
2025-10-27 5:57 ` [PATCH v2 4/8] drm/xe/gt_throttle: Add throttle_to_gt() Lucas De Marchi
2025-10-27 5:57 ` [PATCH v2 5/8] drm/xe/gt_throttle: Tidy up attribute definition Lucas De Marchi
2025-10-27 11:38 ` Raag Jadav
2025-10-27 5:57 ` [PATCH v2 6/8] drm/xe: Improve freq and throttle documentation Lucas De Marchi
2025-10-27 11:43 ` Raag Jadav
2025-10-27 5:57 ` [PATCH v2 7/8] drm/xe/gt_throttle: Drop individual show functions Lucas De Marchi
2025-10-27 12:15 ` Raag Jadav
2025-10-27 5:57 ` [PATCH v2 8/8] drm/xe/gt_throttle: Avoid TOCTOU when monitoring reasons Lucas De Marchi
2025-10-27 11:50 ` Raag Jadav
2025-10-27 13:26 ` Lucas De Marchi
2025-10-28 5:24 ` Raag Jadav
2025-10-28 14:02 ` Rodrigo Vivi
2025-10-28 16:04 ` Lucas De Marchi
2025-10-29 20:24 ` Rodrigo Vivi [this message]
2025-10-27 6:04 ` ✗ CI.checkpatch: warning for drm/xe: CRI support in gt_throttle + refactors (rev2) Patchwork
2025-10-27 6:05 ` ✓ CI.KUnit: success " Patchwork
2025-10-27 6:51 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-27 8:25 ` ✗ Xe.CI.Full: failure " Patchwork
2025-10-27 11:38 ` [PATCH v2 0/8] drm/xe: CRI support in gt_throttle + refactors Raag Jadav
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQJ3-mltmVNmgEAP@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=raag.jadav@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.