On 07-08-2025 16:39, Poosa, Karthik wrote:

On 07-08-2025 16:03, Nilawar, Badal wrote:

On 06-08-2025 22:56, Karthik Poosa wrote:
Clamp writes to power limits powerX_crit/currX_crit, powerX_cap,
powerX_max, to the maximum supported by the pcode mailbox
when sysfs-provided values exceed this limit.
Although the pcode already performs clamping, values beyond the pcode
mailbox's supported range get truncated, leading to incorrect
critical power settings.
This patch ensures proper clamping to prevent such truncation.

v2:
  - Address below review comments. (Riana)
  - Split comments into multiple sentences.
  - Use local variables for readability.
  - Add a debug log.
  - Use u64 instead of unsigned long.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: 92d44a422d0d ("drm/xe/hwmon: Expose card reactive critical power")
Fixes: fb1b70607f73 ("drm/xe/hwmon: Expose power attributes")
---
  drivers/gpu/drm/xe/xe_hwmon.c | 29 +++++++++++++++++++++++++++++
  1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index f08fc4377d25..768a942ab0e7 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -332,6 +332,7 @@ static int xe_hwmon_power_max_write(struct xe_hwmon *hwmon, u32 attr, int channe
      int ret = 0;
      u32 reg_val, max;
      struct xe_reg rapl_limit;
+    u64 max_mbx_power_limit = 0;
        mutex_lock(&hwmon->hwmon_lock);
  @@ -356,6 +357,20 @@ static int xe_hwmon_power_max_write(struct xe_hwmon *hwmon, u32 attr, int channe
          goto unlock;
      }
  +    /*
+     * If the sysfs value exceeds the pcode mailbox cmd WRITE_PSYSGPU/PACKAGE_POWER_LIMIT
+     * max supported value, clamp it to the command's max (U12.3 format).
+     * This is to avoid truncation during reg_val calculation below and ensure the valid
+     * power limit is sent for pcode which would clamp it to card-supported value.
+     */
+    max_mbx_power_limit = ((PWR_LIM_VAL) >> hwmon->scl_shift_power) * SF_POWER;
+    if (value > max_mbx_power_limit) {
+        value = max_mbx_power_limit;
+        drm_dbg(&hwmon->xe->drm,
+            "Sysfs value for ch %d %s exceeds limit; clamped to supported maximum\n",
+            channel, PWR_ATTR_TO_STR(attr));
Is this debug message still needed?

Having this debug message helps to identify clamping from driver due to oversized sysfs input. I think we can keep it. 

If the intention is to surface clamping behavior to users or developers, then this message seems more appropriate as a drm_info rather than drm_dbg.

Regards,
Badal


+    }
+
      /* Computation in 64-bits to avoid overflow. Round to nearest. */
      reg_val = DIV_ROUND_CLOSEST_ULL((u64)value << hwmon->scl_shift_power, SF_POWER);
  @@ -739,9 +754,23 @@ static int xe_hwmon_power_curr_crit_write(struct xe_hwmon *hwmon, int channel,
  {
      int ret;
      u32 uval;
+    u64 max_crit_power_curr = 0;
        mutex_lock(&hwmon->hwmon_lock);
  +    /*
+     * If the sysfs value exceeds the pcode mailbox cmd POWER_SETUP_SUBCOMMAND_WRITE_I1
+     * max supported value, clamp it to the command's max (U10.6 format).
+     * This is to avoid truncation during uval calculation below and ensure the valid power
+     * limit is sent for pcode which would clamp it to card-supported value.
+     */
+    max_crit_power_curr = (POWER_SETUP_I1_DATA_MASK >> POWER_SETUP_I1_SHIFT) * scale_factor;
+    if (value > max_crit_power_curr) {
+        value = max_crit_power_curr;
+        drm_dbg(&hwmon->xe->drm,
+            "Sysfs value for ch %d exceeds limit; clamped to supported maximum\n",
+            channel);

Same question here?
same reply as above

Regards,
Badal

+    }
      uval = DIV_ROUND_CLOSEST_ULL(value << POWER_SETUP_I1_SHIFT, scale_factor);
      ret = xe_hwmon_pcode_write_i1(hwmon, uval);