From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, Raag Jadav <raag.jadav@intel.com>
Subject: Re: [PATCH v3 8/8] drm/xe/gt_throttle: Avoid TOCTOU when monitoring reasons
Date: Thu, 30 Oct 2025 11:42:56 -0400 [thread overview]
Message-ID: <aQOHgB5xHcHeBS9G@intel.com> (raw)
In-Reply-To: <20251029-gt-throttle-cri-v3-8-d1f5abbb8114@intel.com>
On Wed, Oct 29, 2025 at 04:45:10PM -0700, Lucas De Marchi wrote:
> It's currently not possible to safely monitor if there's throttling
> happening and what are the reasons. The approach of reading the status
> and then reading the reasons is not reliable as by the time sysadmin
> reads the reason, the throttling could not be happening anymore.
>
> Previous tentative to fix that[1] was breaking the ABI and potentially
> sysadmin's scripts. This takes a different approach of adding and
> documenting the additional attribute. It's still valuable, though
> redundant, to provide the simpler 0/1 interface.
>
> In order to avoid userspace knowledge on the bitmask meaning and to be
> able to maintain the kernel side in sync with possible changes in
> future, just walk the attribute group and check what are the masks that
> match the value read.
>
> [1] https://lore.kernel.org/intel-xe/20241025092238.167042-1-raag.jadav@intel.com/
>
> Cc: Raag Jadav <raag.jadav@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
> - v2: Use space as separator (Rodrigo)
> ---
> drivers/gpu/drm/xe/xe_gt_throttle.c | 52 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_throttle.c b/drivers/gpu/drm/xe/xe_gt_throttle.c
> index fa7068aac3344..ca45aea8c17a6 100644
> --- a/drivers/gpu/drm/xe/xe_gt_throttle.c
> +++ b/drivers/gpu/drm/xe/xe_gt_throttle.c
> @@ -22,9 +22,15 @@
> * Their availability depend on the platform and some may not be visible if that
> * reason is not available.
> *
> + * The ``status_reasons`` attribute can be used by sysadmin monitoring all
> + * possible reasons for throttling and reporting them. It's preferred over
> + * monitoring ``status`` and then reading the reason both for simplicity and to
> + * avoid TOCTOU (time-of-check to time-of-use).
> + *
> * The following attributes are available on Crescent Island platform:
> *
> - * - ``status``: Overall throttle status
> + * - ``status``: Overall throttle status (0: no throttling, 1: throttling)
> + * - ``status_reasons``: All reasons causing throttling separated by newline.
s/newline/space
> * - ``reason_pl1``: package PL1
> * - ``reason_pl2``: package PL2
> * - ``reason_pl4``: package PL4
> @@ -43,7 +49,8 @@
> *
> * Other platforms support the following reasons:
> *
> - * - ``status``: Overall status
> + * - ``status``: Overall throttle status (0: no throttling, 1: throttling)
> + * - ``status_reasons``: All reasons causing throttling separated by newline.
same
> * - ``reason_pl1``: package PL1
> * - ``reason_pl2``: package PL2
> * - ``reason_pl4``: package PL4, Iccmax etc.
> @@ -111,12 +118,51 @@ static ssize_t reason_show(struct kobject *kobj,
> return sysfs_emit(buff, "%u\n", is_throttled_by(gt, ta->mask));
> }
>
> +static const struct attribute_group *get_platform_throttle_group(struct xe_device *xe);
> +
> +static ssize_t status_reasons_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buff)
> +{
> + struct xe_gt *gt = throttle_to_gt(kobj);
> + struct xe_device *xe = gt_to_xe(gt);
> + const struct attribute_group *group;
> + struct attribute **pother;
> + ssize_t ret = 0;
> + u32 reasons;
> +
> + reasons = xe_gt_throttle_get_limit_reasons(gt);
> + group = get_platform_throttle_group(xe);
> +
> + for (pother = group->attrs; *pother; pother++) {
> + struct kobj_attribute *kattr = container_of(*pother, struct kobj_attribute, attr);
> + struct throttle_attribute *other_ta = kobj_attribute_to_throttle(kattr);
> +
> + if (other_ta->mask != U32_MAX && reasons & other_ta->mask)
> + ret += sysfs_emit_at(buff, ret, "%s ", (*pother)->name);
> + }
> +
> + /* Drop extra space from last iteration above */
> + if (ret)
> + ret--;
> +
> + ret += sysfs_emit_at(buff, ret, "\n");
> +
> + return ret;
> +}
> +
> #define THROTTLE_ATTR_RO(name, _mask) \
> struct throttle_attribute attr_##name = { \
> .attr = __ATTR(name, 0444, reason_show, NULL), \
> .mask = _mask, \
> }
>
> +#define THROTTLE_ATTR_RO_FUNC(name, _mask, _show) \
> + struct throttle_attribute attr_##name = { \
> + .attr = __ATTR(name, 0444, _show, NULL), \
> + .mask = _mask, \
> + }
> +
> +static THROTTLE_ATTR_RO_FUNC(status_reasons, 0, status_reasons_show);
> static THROTTLE_ATTR_RO(status, U32_MAX);
> static THROTTLE_ATTR_RO(reason_pl1, POWER_LIMIT_1_MASK);
> static THROTTLE_ATTR_RO(reason_pl2, POWER_LIMIT_2_MASK);
> @@ -128,6 +174,7 @@ static THROTTLE_ATTR_RO(reason_vr_thermalert, VR_THERMALERT_MASK);
> static THROTTLE_ATTR_RO(reason_vr_tdc, VR_TDC_MASK);
>
> static struct attribute *throttle_attrs[] = {
> + &attr_status_reasons.attr.attr,
> &attr_status.attr.attr,
> &attr_reason_pl1.attr.attr,
> &attr_reason_pl2.attr.attr,
> @@ -153,6 +200,7 @@ static THROTTLE_ATTR_RO(reason_psys_crit, PSYS_CRIT_MASK);
>
> static struct attribute *cri_throttle_attrs[] = {
> /* Common */
> + &attr_status_reasons.attr.attr,
> &attr_status.attr.attr,
> &attr_reason_pl1.attr.attr,
> &attr_reason_pl2.attr.attr,
>
> --
> 2.51.0
>
prev parent reply other threads:[~2025-10-30 15:43 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 23:45 [PATCH v3 0/8] drm/xe: CRI support in gt_throttle + refactors Lucas De Marchi
2025-10-29 23:45 ` [PATCH v3 1/8] drm/xe/cri: Add new performance limit reasons bits Lucas De Marchi
2025-10-29 23:45 ` [PATCH v3 2/8] drm/xe/gt_throttle: Tidy up perf reasons reading Lucas De Marchi
2025-10-29 23:45 ` [PATCH v3 3/8] drm/xe/gt_throttle: Always read and mask Lucas De Marchi
2025-10-29 23:45 ` [PATCH v3 4/8] drm/xe/gt_throttle: Add throttle_to_gt() Lucas De Marchi
2025-10-29 23:45 ` [PATCH v3 5/8] drm/xe/gt_throttle: Tidy up attribute definition Lucas De Marchi
2025-10-29 23:45 ` [PATCH v3 6/8] drm/xe: Improve freq and throttle documentation Lucas De Marchi
2025-10-29 23:45 ` [PATCH v3 7/8] drm/xe/gt_throttle: Drop individual show functions Lucas De Marchi
2025-10-29 23:45 ` [PATCH v3 8/8] drm/xe/gt_throttle: Avoid TOCTOU when monitoring reasons Lucas De Marchi
2025-10-30 9:53 ` Raag Jadav
2025-10-30 14:55 ` Lucas De Marchi
2025-10-30 15:47 ` Rodrigo Vivi
2025-10-30 16:06 ` Raag Jadav
2025-10-30 18:43 ` Lucas De Marchi
2025-10-30 19:54 ` Vivi, Rodrigo
2025-10-31 6:08 ` Raag Jadav
2025-10-30 15:42 ` Rodrigo Vivi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQOHgB5xHcHeBS9G@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=raag.jadav@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.