Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] drm/xe/gt_throttle: Avoid TOCTOU when monitoring reasons
@ 2025-10-31 16:47 Lucas De Marchi
  2025-10-31 16:52 ` ✗ CI.checkpatch: warning for " Patchwork
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Lucas De Marchi @ 2025-10-31 16:47 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Raag Jadav, Rodrigo Vivi

It's currently not possible to safely monitor if there's throttling
happening and what are the reasons. The approach of reading the status
and then reading the reasons is not reliable as by the time sysadmin
reads the reason, the throttling could not be happening anymore.

Previous tentative to fix that[1] was breaking the ABI and potentially
sysadmin's scripts. This takes a different approach of adding and
documenting the additional attribute. It's still valuable, though
redundant, to provide the simpler 0/1 interface.

In order to avoid userspace knowledge on the bitmask meaning and to be
able to maintain the kernel side in sync with possible changes in
future, just walk the attribute group and check what are the masks that
match the value read.

[1] https://lore.kernel.org/intel-xe/20241025092238.167042-1-raag.jadav@intel.com/

Cc: Raag Jadav <raag.jadav@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
All refactors and CRI support merged - refer to the previous versions
for those. Here there's only the new `reasons` attribute.

- v2: Use space as separator (Rodrigo)
- v3:
  - Improve doc (Rodrigo, Raag)
  - Emit "none" if there's no throttling (Rodrigo, Raag)
  - s/status_reasons/reasons/ (Raag)

---
Overall series:

Changes in v4:
- Drop all merged patches, keeping just the last one with new `reasons`
  attribute
- Link to v3: https://patch.msgid.link/20251029-gt-throttle-cri-v3-0-d1f5abbb8114@intel.com

Changes in v3:
- Use space as separator for the reasons (Rodrigo)
- Link to v2: https://patch.msgid.link/20251026-gt-throttle-cri-v2-0-41f8288a71a7@intel.com

Changes in v2:
- Handle comments on v1, noted on individual patches
- Improve documentation, adding it to the html output
- Add patch to introduce status_reasons attribute to overcome the TOCTOU
  of the current API
- Link to v1: https://patch.msgid.link/20251023-gt-throttle-cri-v1-0-9f36f3c8861c@intel.com
---
 drivers/gpu/drm/xe/xe_gt_throttle.c | 54 +++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_throttle.c b/drivers/gpu/drm/xe/xe_gt_throttle.c
index fa7068aac3344..5d95ddafc05ee 100644
--- a/drivers/gpu/drm/xe/xe_gt_throttle.c
+++ b/drivers/gpu/drm/xe/xe_gt_throttle.c
@@ -22,9 +22,15 @@
  * Their availability depend on the platform and some may not be visible if that
  * reason is not available.
  *
+ * The ``reasons`` attribute can be used by sysadmin to monitor all possible
+ * reasons for throttling and report them. It's preferred over monitoring
+ * ``status`` and then reading the reason from individual attributes since that
+ * is racy. If there's no throttling happening, "none" is returned.
+ *
  * The following attributes are available on Crescent Island platform:
  *
- * - ``status``: Overall throttle status
+ * - ``status``: Overall throttle status (0: no throttling, 1: throttling)
+ * - ``reasons``: Array of reasons causing throttling separated by space
  * - ``reason_pl1``: package PL1
  * - ``reason_pl2``: package PL2
  * - ``reason_pl4``: package PL4
@@ -43,7 +49,8 @@
  *
  * Other platforms support the following reasons:
  *
- * - ``status``: Overall status
+ * - ``status``: Overall throttle status (0: no throttling, 1: throttling)
+ * - ``reasons``: Array of reasons causing throttling separated by space
  * - ``reason_pl1``: package PL1
  * - ``reason_pl2``: package PL2
  * - ``reason_pl4``: package PL4, Iccmax etc.
@@ -111,12 +118,53 @@ static ssize_t reason_show(struct kobject *kobj,
 	return sysfs_emit(buff, "%u\n", is_throttled_by(gt, ta->mask));
 }
 
+static const struct attribute_group *get_platform_throttle_group(struct xe_device *xe);
+
+static ssize_t reasons_show(struct kobject *kobj,
+			    struct kobj_attribute *attr, char *buff)
+{
+	struct xe_gt *gt = throttle_to_gt(kobj);
+	struct xe_device *xe = gt_to_xe(gt);
+	const struct attribute_group *group;
+	struct attribute **pother;
+	ssize_t ret = 0;
+	u32 reasons;
+
+	reasons = xe_gt_throttle_get_limit_reasons(gt);
+	group = get_platform_throttle_group(xe);
+
+	for (pother = group->attrs; *pother; pother++) {
+		struct kobj_attribute *kattr = container_of(*pother, struct kobj_attribute, attr);
+		struct throttle_attribute *other_ta = kobj_attribute_to_throttle(kattr);
+
+		if (other_ta->mask != U32_MAX && reasons & other_ta->mask)
+			ret += sysfs_emit_at(buff, ret, "%s ", (*pother)->name);
+	}
+
+	if (ret)
+		/* Drop extra space from last iteration above */
+		ret--;
+	else
+		ret += sysfs_emit_at(buff, ret, "none");
+
+	ret += sysfs_emit_at(buff, ret, "\n");
+
+	return ret;
+}
+
 #define THROTTLE_ATTR_RO(name, _mask)				\
 	struct throttle_attribute attr_##name =	{		\
 		.attr = __ATTR(name, 0444, reason_show, NULL),	\
 		.mask = _mask,					\
 	}
 
+#define THROTTLE_ATTR_RO_FUNC(name, _mask, _show)		\
+	struct throttle_attribute attr_##name =	{		\
+		.attr = __ATTR(name, 0444, _show, NULL),	\
+		.mask = _mask,					\
+	}
+
+static THROTTLE_ATTR_RO_FUNC(reasons, 0, reasons_show);
 static THROTTLE_ATTR_RO(status, U32_MAX);
 static THROTTLE_ATTR_RO(reason_pl1, POWER_LIMIT_1_MASK);
 static THROTTLE_ATTR_RO(reason_pl2, POWER_LIMIT_2_MASK);
@@ -128,6 +176,7 @@ static THROTTLE_ATTR_RO(reason_vr_thermalert, VR_THERMALERT_MASK);
 static THROTTLE_ATTR_RO(reason_vr_tdc, VR_TDC_MASK);
 
 static struct attribute *throttle_attrs[] = {
+	&attr_reasons.attr.attr,
 	&attr_status.attr.attr,
 	&attr_reason_pl1.attr.attr,
 	&attr_reason_pl2.attr.attr,
@@ -153,6 +202,7 @@ static THROTTLE_ATTR_RO(reason_psys_crit, PSYS_CRIT_MASK);
 
 static struct attribute *cri_throttle_attrs[] = {
 	/* Common */
+	&attr_reasons.attr.attr,
 	&attr_status.attr.attr,
 	&attr_reason_pl1.attr.attr,
 	&attr_reason_pl2.attr.attr,




^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-11-04 22:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31 16:47 [PATCH v4] drm/xe/gt_throttle: Avoid TOCTOU when monitoring reasons Lucas De Marchi
2025-10-31 16:52 ` ✗ CI.checkpatch: warning for " Patchwork
2025-10-31 16:54 ` ✓ CI.KUnit: success " Patchwork
2025-10-31 17:55 ` ✓ Xe.CI.BAT: " Patchwork
2025-11-01  6:08 ` ✓ Xe.CI.Full: " Patchwork
2025-11-01  6:38 ` [PATCH v4] " Raag Jadav
2025-11-03 15:20   ` Lucas De Marchi
2025-11-03 16:22     ` Raag Jadav
2025-11-04 22:19       ` Lucas De Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox