Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Poosa, Karthik" <karthik.poosa@intel.com>
To: Raag Jadav <raag.jadav@intel.com>,
	"Nilawar, Badal" <badal.nilawar@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>, <lucas.demarchi@intel.com>,
	<rodrigo.vivi@intel.com>, <matthew.d.roper@intel.com>,
	<andi.shyti@linux.intel.com>, <intel-xe@lists.freedesktop.org>,
	<anshuman.gupta@intel.com>, <anvesh.bakwad@intel.com>,
	<saikishore.konda@intel.com>
Subject: Re: [PATCH v1] drm/xe/hwmon: expose package and vram temperature
Date: Fri, 31 Jan 2025 09:55:01 +0530	[thread overview]
Message-ID: <08bba1cc-3f1f-409d-bbb7-187ee0c63a79@intel.com> (raw)
In-Reply-To: <Z5SPWwiB15ptK4hR@black.fi.intel.com>

[-- Attachment #1: Type: text/plain, Size: 8401 bytes --]


On 25-01-2025 12:44, Raag Jadav wrote:
> On Fri, Jan 24, 2025 at 09:37:55PM +0530, Nilawar, Badal wrote:
>> On 24-01-2025 20:50, Raag Jadav wrote:
>>> On Fri, Jan 24, 2025 at 08:27:14PM +0530, Nilawar, Badal wrote:
>>>> On 24-01-2025 18:03, Raag Jadav wrote:
>>>>> On Fri, Jan 24, 2025 at 05:29:16PM +0530, Nilawar, Badal wrote:
>>>>>> On 24-01-2025 11:46, Riana Tauro wrote:
>>>>>>> Hi Raag
>>>>>>>
>>>>>>> On 1/23/2025 8:21 AM, Raag Jadav wrote:
>>>>>>>> On Tue, Jan 21, 2025 at 01:56:05PM +0530, Riana Tauro wrote:
>>>>>>>>> Hi Raag
>>>>>>>>>
>>>>>>>>> On 1/8/2025 2:54 PM, Raag Jadav wrote:
>>>>>>>>>> Add hwmon support for temp1_input and temp2_input
>>>>>>>>>> attributes, which will
>>>>>>>>>> expose package and vram temperature in millidegree Celsius.
>>>>>>>>>> With this in
>>>>>>>>>> place we can monitor temperature using lm-sensors tool.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Raag Jadav<raag.jadav@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>       .../ABI/testing/sysfs-driver-intel-xe-hwmon   | 16 +++++
>>>>>>>>>>       drivers/gpu/drm/xe/regs/xe_mchbar_regs.h      |  3 +
>>>>>>>>>>       drivers/gpu/drm/xe/regs/xe_pcode_regs.h       |  2 +
>>>>>>>>>>       drivers/gpu/drm/xe/xe_hwmon.c                 | 63
>>>>>>>>>> +++++++++++++++++++
>>>>>>>>>>       4 files changed, 84 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git
>>>>>>>>>> a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>>>>>> b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>>>>>> index d792a56f59ac..998cfb0ee1a6 100644
>>>>>>>>>> --- a/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>>>>>> +++ b/Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
>>>>>>>>>> @@ -108,3 +108,19 @@ Contact:intel-xe@lists.freedesktop.org
>>>>>>>>>>       Description:    RO. Package current voltage in millivolt.
>>>>>>>>>>               Only supported for particular Intel Xe graphics platforms.
>>>>>>>>>> +
>>>>>>>>>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp1_input
>>>>>>>>>> +Date:        April 2025
>>>>>>>>>> +KernelVersion:    6.15
>>>>>>>>>> +Contact:intel-xe@lists.freedesktop.org
>>>>>>>>>> +Description:    RO. Package temperature in millidegree Celsius.
>>>>>>>>>> +
>>>>>>>>>> +        Only supported for particular Intel Xe graphics platforms.
>>>>>>>>>> +
>>>>>>>>>> +What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/temp2_input
>>>>>>>>>> +Date:        April 2025
>>>>>>>>>> +KernelVersion:    6.15
>>>>>>>>>> +Contact:intel-xe@lists.freedesktop.org
>>>>>>>>>> +Description:    RO. VRAM temperature in millidegree Celsius.
>>>>>>>>>> +
>>>>>>>>>> +        Only supported for particular Intel Xe graphics platforms.
>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>>>>>> b/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>>>>>> index 519dd1067a19..f5e5234857c1 100644
>>>>>>>>>> --- a/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>>>>>> +++ b/drivers/gpu/drm/xe/regs/xe_mchbar_regs.h
>>>>>>>>>> @@ -34,6 +34,9 @@
>>>>>>>>>>       #define PCU_CR_PACKAGE_ENERGY_STATUS
>>>>>>>>>> XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x593c)
>>>>>>>>>> +#define PCU_CR_PACKAGE_TEMPERATURE
>>>>>>>>>> XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x5978)
>>>>>>>>>> +#define   TEMP_MASK                REG_GENMASK(7, 0)
>>>>>>>>>> +
>>>>>>>>>>       #define PCU_CR_PACKAGE_RAPL_LIMIT
>>>>>>>>>> XE_REG(MCHBAR_MIRROR_BASE_SNB + 0x59a0)
>>>>>>>>>>       #define   PKG_PWR_LIM_1                REG_GENMASK(14, 0)
>>>>>>>>>>       #define   PKG_PWR_LIM_1_EN            REG_BIT(15)
>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>>>>>> b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>>>>>> index 0b0b49d850ae..8846eb9ce2a4 100644
>>>>>>>>>> --- a/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>>>>>> +++ b/drivers/gpu/drm/xe/regs/xe_pcode_regs.h
>>>>>>>>>> @@ -21,6 +21,8 @@
>>>>>>>>>>       #define BMG_PACKAGE_POWER_SKU            XE_REG(0x138098)
>>>>>>>>>>       #define BMG_PACKAGE_POWER_SKU_UNIT XE_REG(0x1380dc)
>>>>>>>>>>       #define BMG_PACKAGE_ENERGY_STATUS        XE_REG(0x138120)
>>>>>>>>>> +#define BMG_VRAM_TEMPERATURE            XE_REG(0x1382c0)
>>>>>>>>>> +#define BMG_PACKAGE_TEMPERATURE            XE_REG(0x138434)
>>>>>>>>> indentation.
>>>>>>>> It's a git quirk, you won't see it in file.
>>>>>>>>
>>>>>>>>> Also you are using the same for DG2. Should have a common name
>>>>>>>> Just following the conventions.
>>>>>>> Did not find this convention in the file.
>>>>>>> BMG_VRAM_TEMPERATURE is used in both dg2 and bmg and has a bmg prefix.
>>>>>>> Doesn't seem right
>>>>>>>>>>       #define BMG_PACKAGE_RAPL_LIMIT            XE_REG(0x138440)
>>>>>>>>>>       #define BMG_PLATFORM_ENERGY_STATUS XE_REG(0x138458)
>>>>>>>>>>       #define BMG_PLATFORM_POWER_LIMIT        XE_REG(0x138460)
>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>>>>>> b/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>>>>>> index fde56dad3ab7..5b5c844adf4a 100644
>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_hwmon.c
>>>>>>>>>> @@ -6,6 +6,7 @@
>>>>>>>>>>       #include <linux/hwmon-sysfs.h>
>>>>>>>>>>       #include <linux/hwmon.h>
>>>>>>>>>>       #include <linux/types.h>
>>>>>>>>>> +#include <linux/units.h>
>>>>>>>>>>       #include <drm/drm_managed.h>
>>>>>>>>>>       #include "regs/xe_gt_regs.h"
>>>>>>>>>> @@ -20,6 +21,7 @@
>>>>>>>>>>       #include "xe_pm.h"
>>>>>>>>>>       enum xe_hwmon_reg {
>>>>>>>>>> +    REG_TEMP,
>>>>>>>>> add to the end
>>>>>>>>>>           REG_PKG_RAPL_LIMIT,
>>>>>>>>>>           REG_PKG_POWER_SKU,
>>>>>>>>>>           REG_PKG_POWER_SKU_UNIT,
>>>>>>>>>> @@ -39,6 +41,11 @@ enum xe_hwmon_channel {
>>>>>>>>>>           CHANNEL_MAX,
>>>>>>>>>>       };
>>>>>>>>>> +enum xe_hwmon_temp {
>>>>>>>>>> +    TEMP_PKG,
>>>>>>>>>> +    TEMP_VRAM,
>>>>>>>>>> +};
>>>>>>>>> Can't the existing channel enum be used here?
>>>>>>>> Nope, that'd break the indexes.
>>>>>>> @badal/@karthik Are multiple indexes for the same channel okay?
>>>>>>>
>>>>>>> In the current code, for dg2 only channel 1 is exposed for power and
>>>>>>> channel 0 skipped. Something like that needs to be done here too?
>>>>>> Thanks for looping me in this. Yes, Channel 0 represent card specific
>>>>>> attributes and Channel 1 represent package specific attributes. That's how
>>>>>> it should be followed.
>>>>>> With that BMG_PACKAGE_TEMPERATURE should go under CHANNEL_PKG. For
>>>>>> BMG_VRAM_TEMPERATURE new channel (channel 3) should be added in enum
>>>>>> xe_hwmon_channel.
>>>>> And how does that work with hwmon_channel_info?
>>>> Check curr_crit implementation.
>>>> HWMON_CHANNEL_INFO(curr, HWMON_C_LABEL, HWMON_C_CRIT | HWMON_C_LABEL)
>>> Exactly, and hence the separate enums for temp channels.
>>
>> No, we want temp2_input for package (channel 2) temperature and temp3_input
>> (Channel 3) for VRAM temperature.
>> Just for information, in i915 we create separate hwmon node for each layer
>> i.e. package, gt0 and gt1. During review of xe hwmon implementation upstream
>> architect recommended that there should be single node for a device. So we
>> are using channel based approach. So lets stick to that approach.
> And what about fan channels? Are they to be indexed from 4 and on?
> I'm not sure if we'd want to abuse hwmon_channel_info in such a way.
>
> Raag

Hi Raag,

We have adopted a hierarchical approach for HWMON channels and their 
entries. Channel 1 is designated for card-related information, while 
Channel 2 is used for package-related information.
If new HWMON data does not fit within these channels, we can create a 
new channel and assign an appropriate label to it.
For instance, you can map package_temperature under temp2_xxx. Since 
there is no temperature data for the card, temp1_xxx can be made invisible.
For VRAM temperature, add CHANNEL_VRAM, which will be temp3_xxx with the 
label temp3_label set to “vram”.

Regarding fan channels, based on our offline discussion, it appears that 
fan data does not fall within the above mappings and pertains to the 
entire card.

For it we can have a separate enum like,

*enum xe_hwmon_fan_channel {*

* FAN_1,*

* FAN_2, *

* FAN_3 };*

for that, with label "card" for all those.

[-- Attachment #2: Type: text/html, Size: 11760 bytes --]

  reply	other threads:[~2025-01-31  4:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-08  9:24 [PATCH v1] drm/xe/hwmon: expose package and vram temperature Raag Jadav
2025-01-08  9:32 ` ✓ CI.Patch_applied: success for " Patchwork
2025-01-08  9:32 ` ✓ CI.checkpatch: " Patchwork
2025-01-08  9:34 ` ✓ CI.KUnit: " Patchwork
2025-01-08 10:00 ` ✓ CI.Build: " Patchwork
2025-01-08 10:02 ` ✓ CI.Hooks: " Patchwork
2025-01-08 10:04 ` ✓ CI.checksparse: " Patchwork
2025-01-08 10:30 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-01-10  1:32 ` ✗ Xe.CI.Full: " Patchwork
2025-01-11  1:22 ` [PATCH v1] " Andi Shyti
2025-01-21  8:26 ` Riana Tauro
2025-01-23  2:51   ` Raag Jadav
2025-01-24  6:16     ` Riana Tauro
2025-01-24 11:59       ` Nilawar, Badal
2025-01-24 12:33         ` Raag Jadav
2025-01-24 14:57           ` Nilawar, Badal
2025-01-24 15:20             ` Raag Jadav
2025-01-24 16:07               ` Nilawar, Badal
2025-01-25  7:14                 ` Raag Jadav
2025-01-31  4:25                   ` Poosa, Karthik [this message]
2025-01-24 12:50       ` Raag Jadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08bba1cc-3f1f-409d-bbb7-187ee0c63a79@intel.com \
    --to=karthik.poosa@intel.com \
    --cc=andi.shyti@linux.intel.com \
    --cc=anshuman.gupta@intel.com \
    --cc=anvesh.bakwad@intel.com \
    --cc=badal.nilawar@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=raag.jadav@intel.com \
    --cc=riana.tauro@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=saikishore.konda@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox