From: "Poosa, Karthik" <karthik.poosa@intel.com>
To: Raag Jadav <raag.jadav@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <anshuman.gupta@intel.com>,
<badal.nilawar@intel.com>, <rodrigo.vivi@intel.com>
Subject: Re: [PATCH v4 1/4] drm/xe/hwmon: Expose temperature limits
Date: Fri, 9 Jan 2026 19:12:21 +0530 [thread overview]
Message-ID: <0476ce78-0609-4324-82e5-35d6d60988e4@intel.com> (raw)
In-Reply-To: <aWDKXG14vg-7Ajza@black.igk.intel.com>
[-- Attachment #1: Type: text/plain, Size: 13232 bytes --]
On 09-01-2026 14:59, Raag Jadav wrote:
> On Thu, Jan 08, 2026 at 06:33:20PM +0530, Karthik Poosa wrote:
>> Read temperature limits using pcode mailbox and
>> expose shutdown temperature limit as tempX_emergency,
>> critical temperature limit of as tempX_crit and
>> GPU average temperature limit as tempX_max.
> Looks like this can be less lines, please utilize the remaining space.
>
>> Update Xe hwmon documentation for these entries.
>>
>> v2:
>> - Resolve a documentation warning.
>> - Address below review comments from Raag.
>> - Update date and kernel version in Xe hwmon documentation.
>> - Remove explicit disable of has_mbx_thermal_info for unsupported
>> platforms.
>> - Remove unnecessary default case in switches.
>> - Remove obvious comments.
>> - Use TEMP_LIMIT_MAX to compute number of dwords needed in
>> xe_hwmon_thermal_info.
>> - Remove THERMAL_LIMITS_DWORDS macro.
>> - Use has_mbx_thermal_info for checking thermal mailbox support.
>>
>> Signed-off-by: Karthik Poosa<karthik.poosa@intel.com>
>> ---
>> .../ABI/testing/sysfs-driver-intel-xe-hwmon | 40 +++++++
>> drivers/gpu/drm/xe/xe_device_types.h | 2 +
>> drivers/gpu/drm/xe/xe_hwmon.c | 107 +++++++++++++++++-
>> drivers/gpu/drm/xe/xe_pci.c | 3 +
>> drivers/gpu/drm/xe/xe_pci_types.h | 1 +
>> drivers/gpu/drm/xe/xe_pcode_api.h | 3 +
>> 6 files changed, 152 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>> index d9e2b17c6872..c3b367c42741 100644
>> --- a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>> +++ b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>> @@ -196,3 +196,43 @@ Description: RW. Package burst power limit interval (Tau in PL2/Tau) in
>> milliseconds over which sustained power is averaged.
>>
>> Only supported for particular Intel Xe graphics platforms.
>> +
>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp2_emergency
>> +Date: January 2026
>> +KernelVersion: 7.0
>> +Contact: intel-xe@lists.freedesktop.org
>> +Description: RO. Package shutdown temperature in millidegree Celsius.
>> +
>> + Only supported for particular Intel Xe graphics platforms.
>> +
>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp2_crit
>> +Date: January 2026
>> +KernelVersion: 7.0
>> +Contact: intel-xe@lists.freedesktop.org
>> +Description: RO. Package critical temperature in millidegree Celsius.
>> +
>> + Only supported for particular Intel Xe graphics platforms.
>> +
>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp2_max
>> +Date: January 2026
>> +KernelVersion: 7.0
>> +Contact: intel-xe@lists.freedesktop.org
>> +Description: RO. Package average temperature limit in millidegree Celsius.
>> +
>> + Only supported for particular Intel Xe graphics platforms.
>> +
>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp3_emergency
>> +Date: January 2026
>> +KernelVersion: 7.0
>> +Contact: intel-xe@lists.freedesktop.org
>> +Description: RO. VRAM shutdown temperature in millidegree Celsius.
>> +
>> + Only supported for particular Intel Xe graphics platforms.
>> +
>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp3_crit
>> +Date: January 2026
>> +KernelVersion: 7.0
>> +Contact: intel-xe@lists.freedesktop.org
>> +Description: RO. VRAM critical temperature in millidegree Celsius.
>> +
>> + Only supported for particular Intel Xe graphics platforms.
> Group these with existing temperature entries as per channel index in
> alphabetic order.
>
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index 7d46d5ecda91..e359889b4378 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -322,6 +322,8 @@ struct xe_device {
>> * pcode mailbox commands.
>> */
>> u8 has_mbx_power_limits:1;
>> + /** @info.has_mbx_thermal_info: Device has support for thermal mailbox commands */
> Nit: "supports thermal ..."
>
>> + u8 has_mbx_thermal_info:1;
>> /** @info.has_mem_copy_instr: Device supports MEM_COPY instruction */
>> u8 has_mem_copy_instr:1;
>> /** @info.has_mert: Device has standalone MERT */
>> diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
>> index ff2aea52ef75..db00728c89e4 100644
>> --- a/drivers/gpu/drm/xe/xe_hwmon.c
>> +++ b/drivers/gpu/drm/xe/xe_hwmon.c
>> @@ -53,6 +53,15 @@ enum xe_fan_channel {
>> FAN_MAX,
>> };
>>
>> +enum xe_temp_limit {
>> + TEMP_LIMIT_PKG_SHUTDOWN,
>> + TEMP_LIMIT_PKG_TJMAX,
>> + TEMP_LIMIT_MEM_SHUTDOWN,
>> + TEMP_LIMIT_PKG_CRIT,
>> + TEMP_LIMIT_MEM_TJMAX,
>> + TEMP_LIMIT_MAX,
> This is guaranteed last member so redundant comma.
>
>> +};
>> +
>> /* Attribute index for powerX_xxx_interval sysfs entries */
>> enum sensor_attr_power {
>> SENSOR_INDEX_PSYS_PL1,
>> @@ -111,6 +120,18 @@ struct xe_hwmon_fan_info {
>> u64 time_prev;
>> };
>>
>> +/**
>> + * struct xe_hwmon_thermal_info - to store temperature data
>> + */
>> +struct xe_hwmon_thermal_info {
>> + union {
>> + /** @limit: temperatures limits */
>> + u8 limit[TEMP_LIMIT_MAX];
>> + /** @data: temperature limits in dwords */
>> + u32 data[(TEMP_LIMIT_MAX / sizeof(u32)) + ((TEMP_LIMIT_MAX % sizeof(u32)) ? 1 : 0)];
> I think this can be (((TEMP_LIMIT_MAX - 1) / sizeof(u32)) + 1) but please
> double check.
>
>> + };
>> +};
>> +
>> /**
>> * struct xe_hwmon - xe hwmon data structure
>> */
>> @@ -137,7 +158,8 @@ struct xe_hwmon {
>> u32 pl1_on_boot[CHANNEL_MAX];
>> /** @pl2_on_boot: power limit PL2 on boot */
>> u32 pl2_on_boot[CHANNEL_MAX];
>> -
>> + /** @temp: Temperature info */
>> + struct xe_hwmon_thermal_info temp;
>> };
>>
>> static int xe_hwmon_pcode_read_power_limit(const struct xe_hwmon *hwmon, u32 attr, int channel,
>> @@ -677,8 +699,11 @@ static const struct attribute_group *hwmon_groups[] = {
>> };
>>
>> static const struct hwmon_channel_info * const hwmon_info[] = {
>> - HWMON_CHANNEL_INFO(temp, HWMON_T_LABEL, HWMON_T_INPUT | HWMON_T_LABEL,
>> - HWMON_T_INPUT | HWMON_T_LABEL),
>> + HWMON_CHANNEL_INFO(temp,
>> + HWMON_T_LABEL,
>> + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_EMERGENCY | HWMON_T_CRIT |
> Let's make this alphabetic order.
>
>> + HWMON_T_MAX,
>> + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_EMERGENCY | HWMON_T_CRIT),
> Ditto.
>
>> HWMON_CHANNEL_INFO(power, HWMON_P_MAX | HWMON_P_RATED_MAX | HWMON_P_LABEL | HWMON_P_CRIT |
>> HWMON_P_CAP,
>> HWMON_P_MAX | HWMON_P_RATED_MAX | HWMON_P_LABEL | HWMON_P_CAP),
>> @@ -689,6 +714,19 @@ static const struct hwmon_channel_info * const hwmon_info[] = {
>> NULL
>> };
>>
>> +static int xe_hwmon_pcode_read_thermal_info(struct xe_hwmon *hwmon)
>> +{
>> + struct xe_tile *root_tile = xe_device_get_root_tile(hwmon->xe);
>> + int ret = 0;
> Redundant initialization.
>
>> + ret = xe_pcode_read(root_tile, PCODE_MBOX(PCODE_THERMAL_INFO, READ_THERMAL_LIMITS, 0),
>> + &hwmon->temp.data[0], &hwmon->temp.data[1]);
>> + drm_dbg(&hwmon->xe->drm, "thermal info read val 0x%x val1 0x%x\n",
>> + hwmon->temp.data[0], hwmon->temp.data[1]);
>> +
>> + return ret;
>> +}
>> +
>> /* I1 is exposed as power_crit or as curr_crit depending on bit 31 */
>> static int xe_hwmon_pcode_read_i1(const struct xe_hwmon *hwmon, u32 *uval)
>> {
>> @@ -787,6 +825,34 @@ static umode_t
>> xe_hwmon_temp_is_visible(struct xe_hwmon *hwmon, u32 attr, int channel)
>> {
>> switch (attr) {
>> + case hwmon_temp_emergency:
>> + switch (channel) {
>> + case CHANNEL_PKG:
>> + return hwmon->temp.limit[TEMP_LIMIT_PKG_SHUTDOWN] ? 0444 : 0;
>> + case CHANNEL_VRAM:
>> + return hwmon->temp.limit[TEMP_LIMIT_MEM_SHUTDOWN] ? 0444 : 0;
>> + default:
>> + return 0;
>> + }
>> + break;
> I don't believe we'll ever reach here, so please drop the dead code.
>
>> + case hwmon_temp_crit:
>> + switch (channel) {
>> + case CHANNEL_PKG:
>> + return hwmon->temp.limit[TEMP_LIMIT_PKG_TJMAX] ? 0444 : 0;
>> + case CHANNEL_VRAM:
>> + return hwmon->temp.limit[TEMP_LIMIT_MEM_TJMAX] ? 0444 : 0;
>> + default:
>> + return 0;
>> + }
>> + break;
> Ditto.
>
>> + case hwmon_temp_max:
>> + switch (channel) {
>> + case CHANNEL_PKG:
>> + return hwmon->temp.limit[TEMP_LIMIT_PKG_CRIT] ? 0444 : 0;
>> + default:
>> + return 0;
>> + }
>> + break;
> Ditto.
>
>> case hwmon_temp_input:
>> case hwmon_temp_label:
>> return xe_reg_is_valid(xe_hwmon_get_reg(hwmon, REG_TEMP, channel)) ? 0444 : 0;
>> @@ -807,10 +873,38 @@ xe_hwmon_temp_read(struct xe_hwmon *hwmon, u32 attr, int channel, long *val)
>>
>> /* HW register value is in degrees Celsius, convert to millidegrees. */
>> *val = REG_FIELD_GET(TEMP_MASK, reg_val) * MILLIDEGREE_PER_DEGREE;
>> - return 0;
>> + break;
> and we don't have a reason to continue here, so let's try to return where
> possible.
>
>> + case hwmon_temp_emergency:
>> + switch (channel) {
>> + case CHANNEL_PKG:
>> + *val = hwmon->temp.limit[TEMP_LIMIT_PKG_SHUTDOWN] * MILLIDEGREE_PER_DEGREE;
>> + break;
> Ditto.
>
>> + case CHANNEL_VRAM:
>> + *val = hwmon->temp.limit[TEMP_LIMIT_MEM_SHUTDOWN] * MILLIDEGREE_PER_DEGREE;
>> + break;
> Ditto.
>
>> + }
>> + break;
> Ditto.
>
>> + case hwmon_temp_crit:
>> + switch (channel) {
>> + case CHANNEL_PKG:
>> + *val = hwmon->temp.limit[TEMP_LIMIT_PKG_TJMAX] * MILLIDEGREE_PER_DEGREE;
>> + break;
> Ditto.
>
>> + case CHANNEL_VRAM:
>> + *val = hwmon->temp.limit[TEMP_LIMIT_MEM_TJMAX] * MILLIDEGREE_PER_DEGREE;
>> + break;
> Ditto.
>
>> + }
>> + break;
> Ditto.
>
>> + case hwmon_temp_max:
>> + switch (channel) {
>> + case CHANNEL_PKG:
>> + *val = hwmon->temp.limit[TEMP_LIMIT_PKG_CRIT] * MILLIDEGREE_PER_DEGREE;
>> + break;
> Ditto.
>
>> + }
>> + break;
> Ditto.
>
>> default:
>> return -EOPNOTSUPP;
>> }
>> + return 0;
> With above in place, this won't be needed.
>
>> }
>>
>> static umode_t
>> @@ -1263,6 +1357,11 @@ xe_hwmon_get_preregistration_info(struct xe_hwmon *hwmon)
>> for (channel = 0; channel < FAN_MAX; channel++)
>> if (xe_hwmon_is_visible(hwmon, hwmon_fan, hwmon_fan_input, channel))
>> xe_hwmon_fan_input_read(hwmon, channel, &fan_speed);
>> +
>> + if (hwmon->xe->info.has_mbx_thermal_info)
>> + if (xe_hwmon_pcode_read_thermal_info(hwmon))
> These can be a single if condition with && operation.
I kept it this way to avoid showing unsupported logs on platforms where
support is not probed at all.
>
>> + drm_dbg(&hwmon->xe->drm,
>> + "Thermal mailbox support not present in firmware\n");
> Nit: "not supported by firmware"
this can be "Thermal mailbox not supported by card firmware"
>
>> }
>>
>> int xe_hwmon_register(struct xe_device *xe)
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index a1fdca451ce0..776ed4bd538b 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -366,6 +366,7 @@ static const struct xe_device_desc bmg_desc = {
>> .has_fan_control = true,
>> .has_flat_ccs = 1,
>> .has_mbx_power_limits = true,
>> + .has_mbx_thermal_info = true,
>> .has_gsc_nvm = 1,
>> .has_heci_cscfi = 1,
>> .has_i2c = true,
>> @@ -421,6 +422,7 @@ static const struct xe_device_desc cri_desc = {
>> .has_gsc_nvm = 1,
>> .has_i2c = true,
>> .has_mbx_power_limits = true,
>> + .has_mbx_thermal_info = true,
>> .has_mert = true,
>> .has_pre_prod_wa = 1,
>> .has_soc_remapper_sysctrl = true,
>> @@ -686,6 +688,7 @@ static int xe_info_init_early(struct xe_device *xe,
>> /* runtime fusing may force flat_ccs to disabled later */
>> xe->info.has_flat_ccs = desc->has_flat_ccs;
>> xe->info.has_mbx_power_limits = desc->has_mbx_power_limits;
>> + xe->info.has_mbx_thermal_info = desc->has_mbx_thermal_info;
>> xe->info.has_gsc_nvm = desc->has_gsc_nvm;
>> xe->info.has_heci_gscfi = desc->has_heci_gscfi;
>> xe->info.has_heci_cscfi = desc->has_heci_cscfi;
>> diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
>> index 5f20f56571d1..20acc5349ee6 100644
>> --- a/drivers/gpu/drm/xe/xe_pci_types.h
>> +++ b/drivers/gpu/drm/xe/xe_pci_types.h
>> @@ -48,6 +48,7 @@ struct xe_device_desc {
>> u8 has_late_bind:1;
>> u8 has_llc:1;
>> u8 has_mbx_power_limits:1;
>> + u8 has_mbx_thermal_info:1;
>> u8 has_mem_copy_instr:1;
>> u8 has_mert:1;
>> u8 has_pre_prod_wa:1;
>> diff --git a/drivers/gpu/drm/xe/xe_pcode_api.h b/drivers/gpu/drm/xe/xe_pcode_api.h
>> index 975892d6b230..6c84c1780fe9 100644
>> --- a/drivers/gpu/drm/xe/xe_pcode_api.h
>> +++ b/drivers/gpu/drm/xe/xe_pcode_api.h
>> @@ -65,6 +65,9 @@
>> #define FAN_TABLE 1
>> #define VR_CONFIG 2
>>
>> +#define PCODE_THERMAL_INFO 0x25
>> +#define READ_THERMAL_LIMITS 0x0
> This file is full of randomly ordered commands so at some point
> we might want to order them correctly. For now just add this above
> PCODE_LATE_BINDING.
>
> Raag
>
>> #define PCODE_FREQUENCY_CONFIG 0x6e
>> /* Frequency Config Sub Commands (param1) */
>> #define PCODE_MBOX_FC_SC_READ_FUSED_P0 0x0
>> --
>> 2.25.1
>>
[-- Attachment #2: Type: text/html, Size: 17272 bytes --]
next prev parent reply other threads:[~2026-01-09 13:42 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-08 13:03 [PATCH v4 0/4] drm/xe/hwmon: Expose new temperature attributes Karthik Poosa
2026-01-08 13:03 ` [PATCH v4 1/4] drm/xe/hwmon: Expose temperature limits Karthik Poosa
2026-01-09 9:29 ` Raag Jadav
2026-01-09 13:42 ` Poosa, Karthik [this message]
2026-01-09 14:24 ` Poosa, Karthik
2026-01-08 13:03 ` [PATCH v4 2/4] drm/xe/hwmon: Expose memory controller temperature Karthik Poosa
2026-01-09 10:56 ` Raag Jadav
2026-01-08 13:03 ` [PATCH v4 3/4] drm/xe/hwmon: Expose GPU pcie temperature Karthik Poosa
2026-01-09 13:26 ` Raag Jadav
2026-01-08 13:03 ` [PATCH v4 4/4] drm/xe/hwmon: Expose individual vram channel temperature Karthik Poosa
2026-01-08 13:20 ` ✓ CI.KUnit: success for drm/xe/hwmon: Expose new temperature attributes (rev5) Patchwork
2026-01-08 13:58 ` ✓ Xe.CI.BAT: " Patchwork
2026-01-08 19:02 ` ✓ Xe.CI.Full: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0476ce78-0609-4324-82e5-35d6d60988e4@intel.com \
--to=karthik.poosa@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=raag.jadav@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox