All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors
@ 2025-08-18 20:39 Marc Burkhardt
  2025-08-19  8:05 ` Ilpo Järvinen
  2025-08-20  0:03 ` Mark Pearson
  0 siblings, 2 replies; 7+ messages in thread
From: Marc Burkhardt @ 2025-08-18 20:39 UTC (permalink / raw)
  To: platform-driver-x86, ibm-acpi-devel
  Cc: Henrique de Moraes Holschuh, Mark Pearson, Derek J. Clark,
	Hans de Goede, Ilpo Järvinen, Marc Burkhardt

While moving an existing Icinga installation to a Lenovo P15 20SU I came
across broken JSON output from a "sensors -Aj" command consumed by the
Icinga check_lm_sensors plugin. After fiddling around trying to build a
fix in either lm_sensors or Icinga I found out the error was rooted in
some sysfs file that was created but threw errors while being read. On my
Lenovo ThinkPad the default fallback to 8 temperature sensors creates
sysfs entries like in my case "temp8_input" that fail when read, causing
the issue in user-space.

This patch adds a module parameter (suppress_sensor) using
module_param_array() to allow users to specify a comma-separated list of
zero-based sensor indices to suppress sysfs file creation (e.g.
suppress_sensor=3,7). Instead of a model-specific quirk, this provides
flexible configuration for any ThinkPad with similar issues and is working
out-of-the-box without additional models being marked for the quirk. The
parameter uses a fixed-size array based on TPACPI_MAX_THERMAL_SENSORS (16)
consistent with the driver’s thermal sensor handling (ie.
ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).

Logging via pr_info() reports the number of suppressed sensors at module
initialization, and pr_info() logs each suppressed sensor during sysfs
attribute creation. Invalid sensor indices are logged once via pr_warn()
to avoid repetitive warnings. Tested on a ThinkPad P15 with
suppress_sensor=3,7, confirming suppression of temp4_input and temp8_input
with no sysfs errors. Bounds checking for uncommon values is in place or
will be logged.

The patch applies to the current
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git although
it was initially written for a 6.16.0 kernel.

I look forward to any feedback on the patch and/or handling of submission.
Please CC: for now as I am not (yet) subscribed. Thank you.

Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting>
---
Notes:
I haven't posted on LKML or send a patch for over a decade now so
please forgive any possible mistakes I made regarding current coding
conventions or more generally in submitting this patch. The patch was
running for some time here with faulty sensors removed from sysfs and no
problems otherwise detected and was surely run through checkpatch.pl before
submission. get_maintainer.pl was helpful to find the hopefully right
people for CC:ing but I am otherweise totally unaware of any current
procedures or best-practices when it comes to submitting a patch.

drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 +++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c b/drivers/platform/x86/lenovo/thinkpad_acpi.c
index cc19fe520ea9..30ff01f87403 100644
--- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
+++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
@@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
 	s32 temp[TPACPI_MAX_THERMAL_SENSORS];
 };
 
+static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
+static unsigned int suppress_sensor_count;
+
+static bool is_sensor_suppressed(int index)
+{
+	unsigned int i;
+	bool logged = false;
+
+	for (i = 0; i < suppress_sensor_count; i++) {
+		if (suppress_sensor[i] == index)
+			return true;
+
+		if (!logged &&
+			(suppress_sensor[i] < 0
+				|| suppress_sensor[i] >= TPACPI_MAX_THERMAL_SENSORS)) {
+			pr_warn("Invalid sensor index %d in suppress_sensor\n",
+				suppress_sensor[i]);
+			logged = true;
+		}
+	}
+
+	return false;
+}
+
 static const struct tpacpi_quirk thermal_quirk_table[] __initconst = {
 	/* Non-standard address for thermal registers on some ThinkPads */
 	TPACPI_Q_LNV3('R', '1', 'F', true),	/* L13 Yoga Gen 2 */
@@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj,
 
 	int idx = sensor_attr->index;
 
+	if (is_sensor_suppressed(idx)) {
+		pr_info("Sensor %d suppressed\n", idx);
+		return 0;
+	}
+
 	switch (thermal_read_mode) {
 	case TPACPI_THERMAL_NONE:
 		return 0;
@@ -11653,6 +11682,9 @@ static void __init thinkpad_acpi_init_banner(void)
 			thinkpad_id.model_str,
 			(thinkpad_id.nummodel_str) ?
 				thinkpad_id.nummodel_str : "unknown");
+
+	pr_info("Suppressing %d user-supplied sensor(s) via parameter suppress_sensor\n",
+		suppress_sensor_count);
 }
 
 /* Module init, exit, parameters */
@@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
 module_param_named(debug, dbg_level, uint, 0);
 MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
 
+module_param_array(suppress_sensor, int, &suppress_sensor_count, 0444);
+MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to suppress (e.g., 3,7)");
+
 module_param(force_load, bool, 0444);
 MODULE_PARM_DESC(force_load,
 		 "Attempts to load the driver even on a mis-identified ThinkPad when true");

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors
  2025-08-18 20:39 [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors Marc Burkhardt
@ 2025-08-19  8:05 ` Ilpo Järvinen
  2025-08-21 17:47   ` Marc Burkhardt
  2025-08-20  0:03 ` Mark Pearson
  1 sibling, 1 reply; 7+ messages in thread
From: Ilpo Järvinen @ 2025-08-19  8:05 UTC (permalink / raw)
  To: Marc Burkhardt, Mark Pearson
  Cc: platform-driver-x86, ibm-acpi-devel, Henrique de Moraes Holschuh,
	Derek J. Clark, Hans de Goede

[-- Attachment #1: Type: text/plain, Size: 6452 bytes --]

On Mon, 18 Aug 2025, Marc Burkhardt wrote:

> While moving an existing Icinga installation to a Lenovo P15 20SU I came
> across broken JSON output from a "sensors -Aj" command consumed by the

The commit message is not meant to be a history lesson about how the patch
came to be but to describe the problem seen and how patch is fixing it. 
Please try to state imperatively what is the problem, not "I did x" kind 
of sentences. E.g.,

"sensors -Aj" JSON output on Lenovo P15 20SU is broken because ...

Also try to avoid using "This patch" to start a sentence.

> Icinga check_lm_sensors plugin. After fiddling around trying to build a
> fix in either lm_sensors or Icinga I found out the error was rooted in
> some sysfs file that was created but threw errors while being read. On my
> Lenovo ThinkPad the default fallback to 8 temperature sensors creates
> sysfs entries like in my case "temp8_input" that fail when read, causing
> the issue in user-space.
>
> This patch adds a module parameter (suppress_sensor) using
> module_param_array() to allow users to specify a comma-separated list of
> zero-based sensor indices to suppress sysfs file creation (e.g.
> suppress_sensor=3,7). Instead of a model-specific quirk, this provides
> flexible configuration for any ThinkPad with similar issues and is working
> out-of-the-box without additional models being marked for the quirk.

Can't we determine this at probe time automatically?

We generally try to avoid module parameters whenever possible. Sometimes,
if not automated way exists, quirk + a parameter intended for temporary 
use might be acceptable compromise. I don't understand why you say 
"additional models being marked for the quirk" as that seems never 
necessary even if a quirk exists.

> The
> parameter uses a fixed-size array based on TPACPI_MAX_THERMAL_SENSORS (16)
> consistent with the driver’s thermal sensor handling (ie.
> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).
> 
> Logging via pr_info() reports the number of suppressed sensors at module
> initialization, and pr_info() logs each suppressed sensor during sysfs
> attribute creation. Invalid sensor indices are logged once via pr_warn()
> to avoid repetitive warnings.

This logging likely needs work too (to print much less), but lets first 
agree with the way forward.

> Tested on a ThinkPad P15 with
> suppress_sensor=3,7, confirming suppression of temp4_input and temp8_input
> with no sysfs errors. Bounds checking for uncommon values is in place or
> will be logged.
> 
> The patch applies to the current
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git although
> it was initially written for a 6.16.0 kernel.
>
> I look forward to any feedback on the patch and/or handling of submission.
> Please CC: for now as I am not (yet) subscribed. Thank you.

Please don't include details like these 2 last paragraphs into the commit 
message itself, it's fine to state things like this below --- line as 
they're automatically removed when the patch is applied.

Usually Linus' tree is fine as base, but sometimes pdx86 for-next has 
diverged.

> 
> Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting>
> ---
> Notes:
> I haven't posted on LKML or send a patch for over a decade now so
> please forgive any possible mistakes I made regarding current coding
> conventions or more generally in submitting this patch. The patch was
> running for some time here with faulty sensors removed from sysfs and no
> problems otherwise detected and was surely run through checkpatch.pl before
> submission. get_maintainer.pl was helpful to find the hopefully right
> people for CC:ing but I am otherweise totally unaware of any current
> procedures or best-practices when it comes to submitting a patch.

Welcome back. You did quite well regardless. :-)

> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 +++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c b/drivers/platform/x86/lenovo/thinkpad_acpi.c
> index cc19fe520ea9..30ff01f87403 100644
> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
>  	s32 temp[TPACPI_MAX_THERMAL_SENSORS];
>  };
>  
> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
> +static unsigned int suppress_sensor_count;
> +
> +static bool is_sensor_suppressed(int index)
> +{
> +	unsigned int i;
> +	bool logged = false;
> +
> +	for (i = 0; i < suppress_sensor_count; i++) {
> +		if (suppress_sensor[i] == index)
> +			return true;
> +
> +		if (!logged &&
> +			(suppress_sensor[i] < 0
> +				|| suppress_sensor[i] >= TPACPI_MAX_THERMAL_SENSORS)) {
> +			pr_warn("Invalid sensor index %d in suppress_sensor\n",
> +				suppress_sensor[i]);
> +			logged = true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
>  static const struct tpacpi_quirk thermal_quirk_table[] __initconst = {
>  	/* Non-standard address for thermal registers on some ThinkPads */
>  	TPACPI_Q_LNV3('R', '1', 'F', true),	/* L13 Yoga Gen 2 */
> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj,
>  
>  	int idx = sensor_attr->index;
>  
> +	if (is_sensor_suppressed(idx)) {
> +		pr_info("Sensor %d suppressed\n", idx);
> +		return 0;
> +	}
> +
>  	switch (thermal_read_mode) {
>  	case TPACPI_THERMAL_NONE:
>  		return 0;
> @@ -11653,6 +11682,9 @@ static void __init thinkpad_acpi_init_banner(void)
>  			thinkpad_id.model_str,
>  			(thinkpad_id.nummodel_str) ?
>  				thinkpad_id.nummodel_str : "unknown");
> +
> +	pr_info("Suppressing %d user-supplied sensor(s) via parameter suppress_sensor\n",
> +		suppress_sensor_count);
>  }
>  
>  /* Module init, exit, parameters */
> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
>  module_param_named(debug, dbg_level, uint, 0);
>  MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
>  
> +module_param_array(suppress_sensor, int, &suppress_sensor_count, 0444);
> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to suppress (e.g., 3,7)");
> +
>  module_param(force_load, bool, 0444);
>  MODULE_PARM_DESC(force_load,
>  		 "Attempts to load the driver even on a mis-identified ThinkPad when true");
> 

-- 
 i.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors
  2025-08-18 20:39 [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors Marc Burkhardt
  2025-08-19  8:05 ` Ilpo Järvinen
@ 2025-08-20  0:03 ` Mark Pearson
  2025-08-21 17:32   ` Marc Burkhardt
  1 sibling, 1 reply; 7+ messages in thread
From: Mark Pearson @ 2025-08-20  0:03 UTC (permalink / raw)
  To: Marc Burkhardt, platform-driver-x86@vger.kernel.org,
	ibm-acpi-devel
  Cc: Henrique de Moraes Holschuh, Derek J . Clark, Hans de Goede,
	Ilpo Järvinen

Hi Marc,

On Mon, Aug 18, 2025, at 4:39 PM, Marc Burkhardt wrote:
> While moving an existing Icinga installation to a Lenovo P15 20SU I came
> across broken JSON output from a "sensors -Aj" command consumed by the
> Icinga check_lm_sensors plugin. After fiddling around trying to build a
> fix in either lm_sensors or Icinga I found out the error was rooted in
> some sysfs file that was created but threw errors while being read. On my
> Lenovo ThinkPad the default fallback to 8 temperature sensors creates
> sysfs entries like in my case "temp8_input" that fail when read, causing
> the issue in user-space.
>
> This patch adds a module parameter (suppress_sensor) using
> module_param_array() to allow users to specify a comma-separated list of
> zero-based sensor indices to suppress sysfs file creation (e.g.
> suppress_sensor=3,7). Instead of a model-specific quirk, this provides
> flexible configuration for any ThinkPad with similar issues and is working
> out-of-the-box without additional models being marked for the quirk. The
> parameter uses a fixed-size array based on TPACPI_MAX_THERMAL_SENSORS (16)
> consistent with the driver’s thermal sensor handling (ie.
> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).
>
> Logging via pr_info() reports the number of suppressed sensors at module
> initialization, and pr_info() logs each suppressed sensor during sysfs
> attribute creation. Invalid sensor indices are logged once via pr_warn()
> to avoid repetitive warnings. Tested on a ThinkPad P15 with
> suppress_sensor=3,7, confirming suppression of temp4_input and temp8_input
> with no sysfs errors. Bounds checking for uncommon values is in place or
> will be logged.
>
> The patch applies to the current
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git although
> it was initially written for a 6.16.0 kernel.
>
> I look forward to any feedback on the patch and/or handling of submission.
> Please CC: for now as I am not (yet) subscribed. Thank you.
>
> Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting>
> ---
> Notes:
> I haven't posted on LKML or send a patch for over a decade now so
> please forgive any possible mistakes I made regarding current coding
> conventions or more generally in submitting this patch. The patch was
> running for some time here with faulty sensors removed from sysfs and no
> problems otherwise detected and was surely run through checkpatch.pl before
> submission. get_maintainer.pl was helpful to find the hopefully right
> people for CC:ing but I am otherweise totally unaware of any current
> procedures or best-practices when it comes to submitting a patch.
>
> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 +++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
>
> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c 
> b/drivers/platform/x86/lenovo/thinkpad_acpi.c
> index cc19fe520ea9..30ff01f87403 100644
> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
>  	s32 temp[TPACPI_MAX_THERMAL_SENSORS];
>  };
> 
> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
> +static unsigned int suppress_sensor_count;
> +
> +static bool is_sensor_suppressed(int index)
> +{
> +	unsigned int i;
> +	bool logged = false;
> +
> +	for (i = 0; i < suppress_sensor_count; i++) {
> +		if (suppress_sensor[i] == index)
> +			return true;
> +
> +		if (!logged &&
> +			(suppress_sensor[i] < 0
> +				|| suppress_sensor[i] >= TPACPI_MAX_THERMAL_SENSORS)) {
> +			pr_warn("Invalid sensor index %d in suppress_sensor\n",
> +				suppress_sensor[i]);
> +			logged = true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
>  static const struct tpacpi_quirk thermal_quirk_table[] __initconst = {
>  	/* Non-standard address for thermal registers on some ThinkPads */
>  	TPACPI_Q_LNV3('R', '1', 'F', true),	/* L13 Yoga Gen 2 */
> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct 
> kobject *kobj,
> 
>  	int idx = sensor_attr->index;
> 
> +	if (is_sensor_suppressed(idx)) {
> +		pr_info("Sensor %d suppressed\n", idx);
> +		return 0;
> +	}
> +
>  	switch (thermal_read_mode) {
>  	case TPACPI_THERMAL_NONE:
>  		return 0;
> @@ -11653,6 +11682,9 @@ static void __init 
> thinkpad_acpi_init_banner(void)
>  			thinkpad_id.model_str,
>  			(thinkpad_id.nummodel_str) ?
>  				thinkpad_id.nummodel_str : "unknown");
> +
> +	pr_info("Suppressing %d user-supplied sensor(s) via parameter 
> suppress_sensor\n",
> +		suppress_sensor_count);
>  }
> 
>  /* Module init, exit, parameters */
> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
>  module_param_named(debug, dbg_level, uint, 0);
>  MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
> 
> +module_param_array(suppress_sensor, int, &suppress_sensor_count, 0444);
> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to 
> suppress (e.g., 3,7)");
> +
>  module_param(force_load, bool, 0444);
>  MODULE_PARM_DESC(force_load,
>  		 "Attempts to load the driver even on a mis-identified ThinkPad when 
> true");

The P15 is one of the Linux certified platforms...though it's a bit older now.

I'd be more interested in figuring out which sensors are returning an error and figuring out how we address that. I have access to the FW and platform team for questions (though this platform is a bit older now, so if we need FW fixes that will be trickier). My gut feeling is we shouldn't be creating sysfs entries if the sensors don't exist or aren't accessible.

I do have a P15 so can check it out (I'm going to have to blow some dust off it). If you've got the details on which sensors need suppressing that would be useful. I have seen previously where it's trying to access a GPU sensor on a UMA model.

Mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors
  2025-08-20  0:03 ` Mark Pearson
@ 2025-08-21 17:32   ` Marc Burkhardt
  2025-08-22 11:54     ` Armin Wolf
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Burkhardt @ 2025-08-21 17:32 UTC (permalink / raw)
  To: Mark Pearson
  Cc: platform-driver-x86, ibm-acpi-devel, Henrique de Moraes Holschuh,
	Derek J . Clark, Hans de Goede, Ilpo Järvinen

Am 2025-08-20 00:03, schrieb Mark Pearson:

Hi Mark,

thanks for replying.

> Hi Marc,
> 
> On Mon, Aug 18, 2025, at 4:39 PM, Marc Burkhardt wrote:
>> While moving an existing Icinga installation to a Lenovo P15 20SU I 
>> came
>> across broken JSON output from a "sensors -Aj" command consumed by the
>> Icinga check_lm_sensors plugin. After fiddling around trying to build 
>> a
>> fix in either lm_sensors or Icinga I found out the error was rooted in
>> some sysfs file that was created but threw errors while being read. On 
>> my
>> Lenovo ThinkPad the default fallback to 8 temperature sensors creates
>> sysfs entries like in my case "temp8_input" that fail when read, 
>> causing
>> the issue in user-space.
>> 
>> This patch adds a module parameter (suppress_sensor) using
>> module_param_array() to allow users to specify a comma-separated list 
>> of
>> zero-based sensor indices to suppress sysfs file creation (e.g.
>> suppress_sensor=3,7). Instead of a model-specific quirk, this provides
>> flexible configuration for any ThinkPad with similar issues and is 
>> working
>> out-of-the-box without additional models being marked for the quirk. 
>> The
>> parameter uses a fixed-size array based on TPACPI_MAX_THERMAL_SENSORS 
>> (16)
>> consistent with the driver’s thermal sensor handling (ie.
>> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).
>> 
>> Logging via pr_info() reports the number of suppressed sensors at 
>> module
>> initialization, and pr_info() logs each suppressed sensor during sysfs
>> attribute creation. Invalid sensor indices are logged once via 
>> pr_warn()
>> to avoid repetitive warnings. Tested on a ThinkPad P15 with
>> suppress_sensor=3,7, confirming suppression of temp4_input and 
>> temp8_input
>> with no sysfs errors. Bounds checking for uncommon values is in place 
>> or
>> will be logged.
>> 
>> The patch applies to the current
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>> although
>> it was initially written for a 6.16.0 kernel.
>> 
>> I look forward to any feedback on the patch and/or handling of 
>> submission.
>> Please CC: for now as I am not (yet) subscribed. Thank you.
>> 
>> Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting>
>> ---
>> Notes:
>> I haven't posted on LKML or send a patch for over a decade now so
>> please forgive any possible mistakes I made regarding current coding
>> conventions or more generally in submitting this patch. The patch was
>> running for some time here with faulty sensors removed from sysfs and 
>> no
>> problems otherwise detected and was surely run through checkpatch.pl 
>> before
>> submission. get_maintainer.pl was helpful to find the hopefully right
>> people for CC:ing but I am otherweise totally unaware of any current
>> procedures or best-practices when it comes to submitting a patch.
>> 
>> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 
>> +++++++++++++++++++++++++++++
>>  1 file changed, 35 insertions(+)
>> 
>> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> index cc19fe520ea9..30ff01f87403 100644
>> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
>>  	s32 temp[TPACPI_MAX_THERMAL_SENSORS];
>>  };
>> 
>> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
>> +static unsigned int suppress_sensor_count;
>> +
>> +static bool is_sensor_suppressed(int index)
>> +{
>> +	unsigned int i;
>> +	bool logged = false;
>> +
>> +	for (i = 0; i < suppress_sensor_count; i++) {
>> +		if (suppress_sensor[i] == index)
>> +			return true;
>> +
>> +		if (!logged &&
>> +			(suppress_sensor[i] < 0
>> +				|| suppress_sensor[i] >= TPACPI_MAX_THERMAL_SENSORS)) {
>> +			pr_warn("Invalid sensor index %d in suppress_sensor\n",
>> +				suppress_sensor[i]);
>> +			logged = true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>  static const struct tpacpi_quirk thermal_quirk_table[] __initconst = 
>> {
>>  	/* Non-standard address for thermal registers on some ThinkPads */
>>  	TPACPI_Q_LNV3('R', '1', 'F', true),	/* L13 Yoga Gen 2 */
>> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct
>> kobject *kobj,
>> 
>>  	int idx = sensor_attr->index;
>> 
>> +	if (is_sensor_suppressed(idx)) {
>> +		pr_info("Sensor %d suppressed\n", idx);
>> +		return 0;
>> +	}
>> +
>>  	switch (thermal_read_mode) {
>>  	case TPACPI_THERMAL_NONE:
>>  		return 0;
>> @@ -11653,6 +11682,9 @@ static void __init
>> thinkpad_acpi_init_banner(void)
>>  			thinkpad_id.model_str,
>>  			(thinkpad_id.nummodel_str) ?
>>  				thinkpad_id.nummodel_str : "unknown");
>> +
>> +	pr_info("Suppressing %d user-supplied sensor(s) via parameter
>> suppress_sensor\n",
>> +		suppress_sensor_count);
>>  }
>> 
>>  /* Module init, exit, parameters */
>> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
>>  module_param_named(debug, dbg_level, uint, 0);
>>  MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
>> 
>> +module_param_array(suppress_sensor, int, &suppress_sensor_count, 
>> 0444);
>> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to
>> suppress (e.g., 3,7)");
>> +
>>  module_param(force_load, bool, 0444);
>>  MODULE_PARM_DESC(force_load,
>>  		 "Attempts to load the driver even on a mis-identified ThinkPad 
>> when
>> true");
> 
> The P15 is one of the Linux certified platforms...though it's a bit 
> older now.
> 
> I'd be more interested in figuring out which sensors are returning an 
> error and figuring out how we address that. I have access to the FW and 
> platform team for questions (though this platform is a bit older now, 
> so if we need FW fixes that will be trickier). My gut feeling is we 
> shouldn't be creating sysfs entries if the sensors don't exist or 
> aren't accessible.

That is what my patch does - it prevents creating the sysfs entries but 
not based on a check for validity of the sensor in code (as probably 
desired by Ilpo when I understand a previous mail correctly) but rather 
on a user-provided configuration via the new parameter. I reply to the 
other mail as well soon.

> 
> I do have a P15 so can check it out (I'm going to have to blow some 
> dust off it). If you've got the details on which sensors need 
> suppressing that would be useful. I have seen previously where it's 
> trying to access a GPU sensor on a UMA model.

On my hardware it's sensor temp8_input which is unreadable at all und 
sensor temp4_input that has a constant value of 0, no matter how hot, 
cold or loud the machine is running. I am, however, able to monitor GPU 
temps via nvidia _and_ thinkpad ACPI. The values are mostly equal, 
differ a bit due to internal timing sometimes.

> 
> Mark

Marc

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors
  2025-08-19  8:05 ` Ilpo Järvinen
@ 2025-08-21 17:47   ` Marc Burkhardt
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Burkhardt @ 2025-08-21 17:47 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Mark Pearson, platform-driver-x86, ibm-acpi-devel,
	Henrique de Moraes Holschuh, Derek J. Clark, Hans de Goede

Hi Ilpo,

below find my probably (a bit) lengthy feedback to your mail and excuse 
the delay in replying.

Am 2025-08-19 08:05, schrieb Ilpo Järvinen:
> On Mon, 18 Aug 2025, Marc Burkhardt wrote:
> 
>> While moving an existing Icinga installation to a Lenovo P15 20SU I 
>> came
>> across broken JSON output from a "sensors -Aj" command consumed by the
> 
> The commit message is not meant to be a history lesson about how the 
> patch
> came to be but to describe the problem seen and how patch is fixing it.
> Please try to state imperatively what is the problem, not "I did x" 
> kind
> of sentences. E.g.,
> 
> "sensors -Aj" JSON output on Lenovo P15 20SU is broken because ...
> 
> Also try to avoid using "This patch" to start a sentence.
> 
>> Icinga check_lm_sensors plugin. After fiddling around trying to build 
>> a
>> fix in either lm_sensors or Icinga I found out the error was rooted in
>> some sysfs file that was created but threw errors while being read. On 
>> my
>> Lenovo ThinkPad the default fallback to 8 temperature sensors creates
>> sysfs entries like in my case "temp8_input" that fail when read, 
>> causing
>> the issue in user-space.
>> 
>> This patch adds a module parameter (suppress_sensor) using
>> module_param_array() to allow users to specify a comma-separated list 
>> of
>> zero-based sensor indices to suppress sysfs file creation (e.g.
>> suppress_sensor=3,7). Instead of a model-specific quirk, this provides
>> flexible configuration for any ThinkPad with similar issues and is 
>> working
>> out-of-the-box without additional models being marked for the quirk.
> 
> Can't we determine this at probe time automatically?
> 
> We generally try to avoid module parameters whenever possible. 
> Sometimes,
> if not automated way exists, quirk + a parameter intended for temporary
> use might be acceptable compromise. I don't understand why you say
> "additional models being marked for the quirk" as that seems never
> necessary even if a quirk exists.
> 
>> The
>> parameter uses a fixed-size array based on TPACPI_MAX_THERMAL_SENSORS 
>> (16)
>> consistent with the driver’s thermal sensor handling (ie.
>> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).
>> 
>> Logging via pr_info() reports the number of suppressed sensors at 
>> module
>> initialization, and pr_info() logs each suppressed sensor during sysfs
>> attribute creation. Invalid sensor indices are logged once via 
>> pr_warn()
>> to avoid repetitive warnings.
> 
> This logging likely needs work too (to print much less), but lets first
> agree with the way forward.

I can reduce the logging going from pr_info to pr_debug or just remove 
it
as a whole for sure. But as I thought that this is an end-user thing 
(given it
is a ThinkPad) I put it there on purpose to let the user know what was 
going
on. The message about the de-facto suppressed sensors just logs once per
provided sensor index on sysfs creation, not the sensor's usage. The
thinkpad_acpi module, moreover, logs excessively in general already. 
Just saw
this from a user's perspective seeing myself google forum posts for 
missing
sensors without a trace. I can surely make the logging conditional in
thinkpad_acpi_init_banner().

That user-thinking was also part my second thought: as I didn't find any 
such
problem for a P15 on Google or the like I suspected this to be a 
not-so-usual case. Thus,
I have already re-worked the patch to its current form using a parameter
instead of using a quirk-table (analog to the ones already in place) 
that I had
written initially (using BIOS revision N30) as this would mean
suppressing the sensor on any such device without knowing if the problem
_really_ exists on the device in general in the real world - I haven't 
found any such
report on the Net while searching for it so I _assume_ that this is a 
weird
edge-case we're talking about here that the sensor is non-functional or 
my
hardware is borked in some way for whatever reason. The rather static 
creation
of let's say 8 or 16 sensors in general _could_ probably be the problem. 
The code
seems to make 16 iterations on my device as I saw when I had debug log 
in place,
probably due to the static size of TPACPI_MAX_THERMAL_SENSORS;

Also I imagine there will be other ThinkPad-devices/models having such
problem with a different sensor index (which I surely don't
know of yet) so these would not be listed in the new quirk-table and 
needed a new
patch to include them valid for this quirk. It might be another model 
needs a
quirk for sensor 14, 15 and 16 which the current code would just create 
if the
device falls through a specific match (not validating the actual 
sensors) as the
sensors creation currently is rather "static". Using the parameter a 
user would be
able to just add the index of the broken sensor and would be good to go 
no
matter what model being used and what sensors indices it would affect - 
it's just
my approach to be generic and future proof. The quirk-table approach 
also used
more code than the parameter approach with even reduced flexibility.

As I understand your mail you would also eventually like testing the 
sensor
being created for let's say "technical validity". On my machine 
temp4_input is
technically valid (sysfs created and working) but spills a static 0. So 
this
sensor is actually unusable from a user-point of view making me exclude 
it via
the new parameter as well. What would (in your opinion) the handling be 
for
this specific sensor temp4_input be when using the quirk-table approach?

Please advice on how to re-write the functionality so I can give it 
another spin.
Sorry for the lengthy explanation and thanks for your time reading this.

> 
>> Tested on a ThinkPad P15 with
>> suppress_sensor=3,7, confirming suppression of temp4_input and 
>> temp8_input
>> with no sysfs errors. Bounds checking for uncommon values is in place 
>> or
>> will be logged.
>> 
>> The patch applies to the current
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>> although
>> it was initially written for a 6.16.0 kernel.
>> 
>> I look forward to any feedback on the patch and/or handling of 
>> submission.
>> Please CC: for now as I am not (yet) subscribed. Thank you.
> 
> Please don't include details like these 2 last paragraphs into the 
> commit
> message itself, it's fine to state things like this below --- line as
> they're automatically removed when the patch is applied.
> 
> Usually Linus' tree is fine as base, but sometimes pdx86 for-next has
> diverged.
> 
>> 
>> Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting>
>> ---
>> Notes:
>> I haven't posted on LKML or send a patch for over a decade now so
>> please forgive any possible mistakes I made regarding current coding
>> conventions or more generally in submitting this patch. The patch was
>> running for some time here with faulty sensors removed from sysfs and 
>> no
>> problems otherwise detected and was surely run through checkpatch.pl 
>> before
>> submission. get_maintainer.pl was helpful to find the hopefully right
>> people for CC:ing but I am otherweise totally unaware of any current
>> procedures or best-practices when it comes to submitting a patch.
> 
> Welcome back. You did quite well regardless. :-)
> 
>> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 
>> +++++++++++++++++++++++++++++
>>  1 file changed, 35 insertions(+)
>> 
>> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c 
>> b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> index cc19fe520ea9..30ff01f87403 100644
>> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
>>  	s32 temp[TPACPI_MAX_THERMAL_SENSORS];
>>  };
>> 
>> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
>> +static unsigned int suppress_sensor_count;
>> +
>> +static bool is_sensor_suppressed(int index)
>> +{
>> +	unsigned int i;
>> +	bool logged = false;
>> +
>> +	for (i = 0; i < suppress_sensor_count; i++) {
>> +		if (suppress_sensor[i] == index)
>> +			return true;
>> +
>> +		if (!logged &&
>> +			(suppress_sensor[i] < 0
>> +				|| suppress_sensor[i] >= TPACPI_MAX_THERMAL_SENSORS)) {
>> +			pr_warn("Invalid sensor index %d in suppress_sensor\n",
>> +				suppress_sensor[i]);
>> +			logged = true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>  static const struct tpacpi_quirk thermal_quirk_table[] __initconst = 
>> {
>>  	/* Non-standard address for thermal registers on some ThinkPads */
>>  	TPACPI_Q_LNV3('R', '1', 'F', true),	/* L13 Yoga Gen 2 */
>> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct 
>> kobject *kobj,
>> 
>>  	int idx = sensor_attr->index;
>> 
>> +	if (is_sensor_suppressed(idx)) {
>> +		pr_info("Sensor %d suppressed\n", idx);
>> +		return 0;
>> +	}
>> +
>>  	switch (thermal_read_mode) {
>>  	case TPACPI_THERMAL_NONE:
>>  		return 0;
>> @@ -11653,6 +11682,9 @@ static void __init 
>> thinkpad_acpi_init_banner(void)
>>  			thinkpad_id.model_str,
>>  			(thinkpad_id.nummodel_str) ?
>>  				thinkpad_id.nummodel_str : "unknown");
>> +
>> +	pr_info("Suppressing %d user-supplied sensor(s) via parameter 
>> suppress_sensor\n",
>> +		suppress_sensor_count);
>>  }
>> 
>>  /* Module init, exit, parameters */
>> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
>>  module_param_named(debug, dbg_level, uint, 0);
>>  MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
>> 
>> +module_param_array(suppress_sensor, int, &suppress_sensor_count, 
>> 0444);
>> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to 
>> suppress (e.g., 3,7)");
>> +
>>  module_param(force_load, bool, 0444);
>>  MODULE_PARM_DESC(force_load,
>>  		 "Attempts to load the driver even on a mis-identified ThinkPad 
>> when true");
>> 

Marc

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors
  2025-08-21 17:32   ` Marc Burkhardt
@ 2025-08-22 11:54     ` Armin Wolf
  2025-08-26 18:43       ` Mark Pearson
  0 siblings, 1 reply; 7+ messages in thread
From: Armin Wolf @ 2025-08-22 11:54 UTC (permalink / raw)
  To: marc.burkhardt, Mark Pearson
  Cc: platform-driver-x86, ibm-acpi-devel, Henrique de Moraes Holschuh,
	Derek J . Clark, Hans de Goede, Ilpo Järvinen,
	linux-hwmon@vger.kernel.org

Am 21.08.25 um 19:32 schrieb Marc Burkhardt:

> Am 2025-08-20 00:03, schrieb Mark Pearson:
>
> Hi Mark,
>
> thanks for replying.
>
>> Hi Marc,
>>
>> On Mon, Aug 18, 2025, at 4:39 PM, Marc Burkhardt wrote:
>>> While moving an existing Icinga installation to a Lenovo P15 20SU I 
>>> came
>>> across broken JSON output from a "sensors -Aj" command consumed by the
>>> Icinga check_lm_sensors plugin. After fiddling around trying to build a
>>> fix in either lm_sensors or Icinga I found out the error was rooted in
>>> some sysfs file that was created but threw errors while being read. 
>>> On my
>>> Lenovo ThinkPad the default fallback to 8 temperature sensors creates
>>> sysfs entries like in my case "temp8_input" that fail when read, 
>>> causing
>>> the issue in user-space.
>>>
>>> This patch adds a module parameter (suppress_sensor) using
>>> module_param_array() to allow users to specify a comma-separated 
>>> list of
>>> zero-based sensor indices to suppress sysfs file creation (e.g.
>>> suppress_sensor=3,7). Instead of a model-specific quirk, this provides
>>> flexible configuration for any ThinkPad with similar issues and is 
>>> working
>>> out-of-the-box without additional models being marked for the quirk. 
>>> The
>>> parameter uses a fixed-size array based on 
>>> TPACPI_MAX_THERMAL_SENSORS (16)
>>> consistent with the driver’s thermal sensor handling (ie.
>>> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).
>>>
>>> Logging via pr_info() reports the number of suppressed sensors at 
>>> module
>>> initialization, and pr_info() logs each suppressed sensor during sysfs
>>> attribute creation. Invalid sensor indices are logged once via 
>>> pr_warn()
>>> to avoid repetitive warnings. Tested on a ThinkPad P15 with
>>> suppress_sensor=3,7, confirming suppression of temp4_input and 
>>> temp8_input
>>> with no sysfs errors. Bounds checking for uncommon values is in 
>>> place or
>>> will be logged.
>>>
>>> The patch applies to the current
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>>> although
>>> it was initially written for a 6.16.0 kernel.
>>>
>>> I look forward to any feedback on the patch and/or handling of 
>>> submission.
>>> Please CC: for now as I am not (yet) subscribed. Thank you.
>>>
>>> Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting>
>>> ---
>>> Notes:
>>> I haven't posted on LKML or send a patch for over a decade now so
>>> please forgive any possible mistakes I made regarding current coding
>>> conventions or more generally in submitting this patch. The patch was
>>> running for some time here with faulty sensors removed from sysfs 
>>> and no
>>> problems otherwise detected and was surely run through checkpatch.pl 
>>> before
>>> submission. get_maintainer.pl was helpful to find the hopefully right
>>> people for CC:ing but I am otherweise totally unaware of any current
>>> procedures or best-practices when it comes to submitting a patch.
>>>
>>> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 
>>> +++++++++++++++++++++++++++++
>>>  1 file changed, 35 insertions(+)
>>>
>>> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>> b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>> index cc19fe520ea9..30ff01f87403 100644
>>> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
>>>      s32 temp[TPACPI_MAX_THERMAL_SENSORS];
>>>  };
>>>
>>> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
>>> +static unsigned int suppress_sensor_count;
>>> +
>>> +static bool is_sensor_suppressed(int index)
>>> +{
>>> +    unsigned int i;
>>> +    bool logged = false;
>>> +
>>> +    for (i = 0; i < suppress_sensor_count; i++) {
>>> +        if (suppress_sensor[i] == index)
>>> +            return true;
>>> +
>>> +        if (!logged &&
>>> +            (suppress_sensor[i] < 0
>>> +                || suppress_sensor[i] >= 
>>> TPACPI_MAX_THERMAL_SENSORS)) {
>>> +            pr_warn("Invalid sensor index %d in suppress_sensor\n",
>>> +                suppress_sensor[i]);
>>> +            logged = true;
>>> +        }
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>>  static const struct tpacpi_quirk thermal_quirk_table[] __initconst = {
>>>      /* Non-standard address for thermal registers on some ThinkPads */
>>>      TPACPI_Q_LNV3('R', '1', 'F', true),    /* L13 Yoga Gen 2 */
>>> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct
>>> kobject *kobj,
>>>
>>>      int idx = sensor_attr->index;
>>>
>>> +    if (is_sensor_suppressed(idx)) {
>>> +        pr_info("Sensor %d suppressed\n", idx);
>>> +        return 0;
>>> +    }
>>> +
>>>      switch (thermal_read_mode) {
>>>      case TPACPI_THERMAL_NONE:
>>>          return 0;
>>> @@ -11653,6 +11682,9 @@ static void __init
>>> thinkpad_acpi_init_banner(void)
>>>              thinkpad_id.model_str,
>>>              (thinkpad_id.nummodel_str) ?
>>>                  thinkpad_id.nummodel_str : "unknown");
>>> +
>>> +    pr_info("Suppressing %d user-supplied sensor(s) via parameter
>>> suppress_sensor\n",
>>> +        suppress_sensor_count);
>>>  }
>>>
>>>  /* Module init, exit, parameters */
>>> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
>>>  module_param_named(debug, dbg_level, uint, 0);
>>>  MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
>>>
>>> +module_param_array(suppress_sensor, int, &suppress_sensor_count, 
>>> 0444);
>>> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to
>>> suppress (e.g., 3,7)");
>>> +
>>>  module_param(force_load, bool, 0444);
>>>  MODULE_PARM_DESC(force_load,
>>>           "Attempts to load the driver even on a mis-identified 
>>> ThinkPad when
>>> true");
>>
>> The P15 is one of the Linux certified platforms...though it's a bit 
>> older now.
>>
>> I'd be more interested in figuring out which sensors are returning an 
>> error and figuring out how we address that. I have access to the FW 
>> and platform team for questions (though this platform is a bit older 
>> now, so if we need FW fixes that will be trickier). My gut feeling is 
>> we shouldn't be creating sysfs entries if the sensors don't exist or 
>> aren't accessible.
>
> That is what my patch does - it prevents creating the sysfs entries 
> but not based on a check for validity of the sensor in code (as 
> probably desired by Ilpo when I understand a previous mail correctly) 
> but rather on a user-provided configuration via the new parameter. I 
> reply to the other mail as well soon.
>
Such sensors are meant to be ignored using /etc/sensors3.conf (provided by libsensors) unless the driver itself can
automatically determine this by asking the platform firmware. I suggest that you use this mechanism instead of adding
additional module parameters.

Thanks,
Armin Wolf

(I also CCed the hwmon mailing list as libsensors originally came from there)

>>
>> I do have a P15 so can check it out (I'm going to have to blow some 
>> dust off it). If you've got the details on which sensors need 
>> suppressing that would be useful. I have seen previously where it's 
>> trying to access a GPU sensor on a UMA model.
>
> On my hardware it's sensor temp8_input which is unreadable at all und 
> sensor temp4_input that has a constant value of 0, no matter how hot, 
> cold or loud the machine is running. I am, however, able to monitor 
> GPU temps via nvidia _and_ thinkpad ACPI. The values are mostly equal, 
> differ a bit due to internal timing sometimes.
>
>>
>> Mark
>
> Marc
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors
  2025-08-22 11:54     ` Armin Wolf
@ 2025-08-26 18:43       ` Mark Pearson
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Pearson @ 2025-08-26 18:43 UTC (permalink / raw)
  To: Armin Wolf, Marc Burkhardt
  Cc: platform-driver-x86@vger.kernel.org, ibm-acpi-devel,
	Henrique de Moraes Holschuh, Derek J . Clark, Hans de Goede,
	Ilpo Järvinen, linux-hwmon@vger.kernel.org

Hi,

On Fri, Aug 22, 2025, at 7:54 AM, Armin Wolf wrote:
> Am 21.08.25 um 19:32 schrieb Marc Burkhardt:
>
>> Am 2025-08-20 00:03, schrieb Mark Pearson:
>>
>> Hi Mark,
>>
>> thanks for replying.
>>
>>> Hi Marc,
>>>
>>> On Mon, Aug 18, 2025, at 4:39 PM, Marc Burkhardt wrote:
>>>> While moving an existing Icinga installation to a Lenovo P15 20SU I 
>>>> came
>>>> across broken JSON output from a "sensors -Aj" command consumed by the
>>>> Icinga check_lm_sensors plugin. After fiddling around trying to build a
>>>> fix in either lm_sensors or Icinga I found out the error was rooted in
>>>> some sysfs file that was created but threw errors while being read. 
>>>> On my
>>>> Lenovo ThinkPad the default fallback to 8 temperature sensors creates
>>>> sysfs entries like in my case "temp8_input" that fail when read, 
>>>> causing
>>>> the issue in user-space.
>>>>
>>>> This patch adds a module parameter (suppress_sensor) using
>>>> module_param_array() to allow users to specify a comma-separated 
>>>> list of
>>>> zero-based sensor indices to suppress sysfs file creation (e.g.
>>>> suppress_sensor=3,7). Instead of a model-specific quirk, this provides
>>>> flexible configuration for any ThinkPad with similar issues and is 
>>>> working
>>>> out-of-the-box without additional models being marked for the quirk. 
>>>> The
>>>> parameter uses a fixed-size array based on 
>>>> TPACPI_MAX_THERMAL_SENSORS (16)
>>>> consistent with the driver’s thermal sensor handling (ie.
>>>> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).
>>>>
>>>> Logging via pr_info() reports the number of suppressed sensors at 
>>>> module
>>>> initialization, and pr_info() logs each suppressed sensor during sysfs
>>>> attribute creation. Invalid sensor indices are logged once via 
>>>> pr_warn()
>>>> to avoid repetitive warnings. Tested on a ThinkPad P15 with
>>>> suppress_sensor=3,7, confirming suppression of temp4_input and 
>>>> temp8_input
>>>> with no sysfs errors. Bounds checking for uncommon values is in 
>>>> place or
>>>> will be logged.
>>>>
>>>> The patch applies to the current
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>>>> although
>>>> it was initially written for a 6.16.0 kernel.
>>>>
>>>> I look forward to any feedback on the patch and/or handling of 
>>>> submission.
>>>> Please CC: for now as I am not (yet) subscribed. Thank you.
>>>>
>>>> Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting>
>>>> ---
>>>> Notes:
>>>> I haven't posted on LKML or send a patch for over a decade now so
>>>> please forgive any possible mistakes I made regarding current coding
>>>> conventions or more generally in submitting this patch. The patch was
>>>> running for some time here with faulty sensors removed from sysfs 
>>>> and no
>>>> problems otherwise detected and was surely run through checkpatch.pl 
>>>> before
>>>> submission. get_maintainer.pl was helpful to find the hopefully right
>>>> people for CC:ing but I am otherweise totally unaware of any current
>>>> procedures or best-practices when it comes to submitting a patch.
>>>>
>>>> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 
>>>> +++++++++++++++++++++++++++++
>>>>  1 file changed, 35 insertions(+)
>>>>
>>>> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> index cc19fe520ea9..30ff01f87403 100644
>>>> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
>>>>      s32 temp[TPACPI_MAX_THERMAL_SENSORS];
>>>>  };
>>>>
>>>> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
>>>> +static unsigned int suppress_sensor_count;
>>>> +
>>>> +static bool is_sensor_suppressed(int index)
>>>> +{
>>>> +    unsigned int i;
>>>> +    bool logged = false;
>>>> +
>>>> +    for (i = 0; i < suppress_sensor_count; i++) {
>>>> +        if (suppress_sensor[i] == index)
>>>> +            return true;
>>>> +
>>>> +        if (!logged &&
>>>> +            (suppress_sensor[i] < 0
>>>> +                || suppress_sensor[i] >= 
>>>> TPACPI_MAX_THERMAL_SENSORS)) {
>>>> +            pr_warn("Invalid sensor index %d in suppress_sensor\n",
>>>> +                suppress_sensor[i]);
>>>> +            logged = true;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return false;
>>>> +}
>>>> +
>>>>  static const struct tpacpi_quirk thermal_quirk_table[] __initconst = {
>>>>      /* Non-standard address for thermal registers on some ThinkPads */
>>>>      TPACPI_Q_LNV3('R', '1', 'F', true),    /* L13 Yoga Gen 2 */
>>>> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct
>>>> kobject *kobj,
>>>>
>>>>      int idx = sensor_attr->index;
>>>>
>>>> +    if (is_sensor_suppressed(idx)) {
>>>> +        pr_info("Sensor %d suppressed\n", idx);
>>>> +        return 0;
>>>> +    }
>>>> +
>>>>      switch (thermal_read_mode) {
>>>>      case TPACPI_THERMAL_NONE:
>>>>          return 0;
>>>> @@ -11653,6 +11682,9 @@ static void __init
>>>> thinkpad_acpi_init_banner(void)
>>>>              thinkpad_id.model_str,
>>>>              (thinkpad_id.nummodel_str) ?
>>>>                  thinkpad_id.nummodel_str : "unknown");
>>>> +
>>>> +    pr_info("Suppressing %d user-supplied sensor(s) via parameter
>>>> suppress_sensor\n",
>>>> +        suppress_sensor_count);
>>>>  }
>>>>
>>>>  /* Module init, exit, parameters */
>>>> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
>>>>  module_param_named(debug, dbg_level, uint, 0);
>>>>  MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
>>>>
>>>> +module_param_array(suppress_sensor, int, &suppress_sensor_count, 
>>>> 0444);
>>>> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to
>>>> suppress (e.g., 3,7)");
>>>> +
>>>>  module_param(force_load, bool, 0444);
>>>>  MODULE_PARM_DESC(force_load,
>>>>           "Attempts to load the driver even on a mis-identified 
>>>> ThinkPad when
>>>> true");
>>>
>>> The P15 is one of the Linux certified platforms...though it's a bit 
>>> older now.
>>>
>>> I'd be more interested in figuring out which sensors are returning an 
>>> error and figuring out how we address that. I have access to the FW 
>>> and platform team for questions (though this platform is a bit older 
>>> now, so if we need FW fixes that will be trickier). My gut feeling is 
>>> we shouldn't be creating sysfs entries if the sensors don't exist or 
>>> aren't accessible.
>>
>> That is what my patch does - it prevents creating the sysfs entries 
>> but not based on a check for validity of the sensor in code (as 
>> probably desired by Ilpo when I understand a previous mail correctly) 
>> but rather on a user-provided configuration via the new parameter. I 
>> reply to the other mail as well soon.
>>
> Such sensors are meant to be ignored using /etc/sensors3.conf (provided 
> by libsensors) unless the driver itself can
> automatically determine this by asking the platform firmware. I suggest 
> that you use this mechanism instead of adding
> additional module parameters.
>
> Thanks,
> Armin Wolf
>
> (I also CCed the hwmon mailing list as libsensors originally came from there)
>
>>>
>>> I do have a P15 so can check it out (I'm going to have to blow some 
>>> dust off it). If you've got the details on which sensors need 
>>> suppressing that would be useful. I have seen previously where it's 
>>> trying to access a GPU sensor on a UMA model.
>>
>> On my hardware it's sensor temp8_input which is unreadable at all und 
>> sensor temp4_input that has a constant value of 0, no matter how hot, 
>> cold or loud the machine is running. I am, however, able to monitor 
>> GPU temps via nvidia _and_ thinkpad ACPI. The values are mostly equal, 
>> differ a bit due to internal timing sometimes.
>>
>>>

I tried this on my P15, and I do get an error when the GPU sensor is accessed as it's not available (no Nvidia card on mine).

A suggestion (based a bit on Armin's suggestions): If the is_visible function is changed so if the sensor returns an error (or not available) then the sysfs entry isn't displayed. 
I think that would prevent errors/access issues from user space - at least it worked on my system.

Something like the below (I can send this up as a proper patch if it makes sense)

diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c b/drivers/platform/x86/lenovo/thinkpad_acpi.c
index cc19fe520ea9..075d15df183c 100644
--- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
+++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
@@ -6312,6 +6312,8 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj,
                                        to_sensor_dev_attr(dev_attr);
 
        int idx = sensor_attr->index;
+       s32 value;
+       int res;
 
        switch (thermal_read_mode) {
        case TPACPI_THERMAL_NONE:
@@ -6334,6 +6336,11 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj,
 
        }
 
+       /* Check if sensor is available */
+       res = thermal_get_sensor(idx, &value);
+       if ((res) || (value == TPACPI_THERMAL_SENSOR_NA))
+               return 0;
+
        return attr->mode;
 }

I think this would generally be useful for removing unwanted sensors without having to do extra steps?

Mark

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-26 18:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 20:39 [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors Marc Burkhardt
2025-08-19  8:05 ` Ilpo Järvinen
2025-08-21 17:47   ` Marc Burkhardt
2025-08-20  0:03 ` Mark Pearson
2025-08-21 17:32   ` Marc Burkhardt
2025-08-22 11:54     ` Armin Wolf
2025-08-26 18:43       ` Mark Pearson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.