All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Burkhardt <marc.burkhardt@protoco.consulting>
To: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Cc: Mark Pearson <mpearson-lenovo@squebb.ca>,
	platform-driver-x86@vger.kernel.org,
	ibm-acpi-devel@lists.sourceforge.net,
	Henrique de Moraes Holschuh <hmh@hmh.eng.br>,
	"Derek J. Clark" <derekjohn.clark@gmail.com>,
	Hans de Goede <hansg@kernel.org>
Subject: Re: [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors
Date: Thu, 21 Aug 2025 17:47:43 +0000	[thread overview]
Message-ID: <864df0dfcb93d3b8985d9a4b18cf8d71@protoco.consulting> (raw)
In-Reply-To: <5a10e50a-ba06-e326-0643-73135709c8a3@linux.intel.com>

Hi Ilpo,

below find my probably (a bit) lengthy feedback to your mail and excuse 
the delay in replying.

Am 2025-08-19 08:05, schrieb Ilpo Järvinen:
> On Mon, 18 Aug 2025, Marc Burkhardt wrote:
> 
>> While moving an existing Icinga installation to a Lenovo P15 20SU I 
>> came
>> across broken JSON output from a "sensors -Aj" command consumed by the
> 
> The commit message is not meant to be a history lesson about how the 
> patch
> came to be but to describe the problem seen and how patch is fixing it.
> Please try to state imperatively what is the problem, not "I did x" 
> kind
> of sentences. E.g.,
> 
> "sensors -Aj" JSON output on Lenovo P15 20SU is broken because ...
> 
> Also try to avoid using "This patch" to start a sentence.
> 
>> Icinga check_lm_sensors plugin. After fiddling around trying to build 
>> a
>> fix in either lm_sensors or Icinga I found out the error was rooted in
>> some sysfs file that was created but threw errors while being read. On 
>> my
>> Lenovo ThinkPad the default fallback to 8 temperature sensors creates
>> sysfs entries like in my case "temp8_input" that fail when read, 
>> causing
>> the issue in user-space.
>> 
>> This patch adds a module parameter (suppress_sensor) using
>> module_param_array() to allow users to specify a comma-separated list 
>> of
>> zero-based sensor indices to suppress sysfs file creation (e.g.
>> suppress_sensor=3,7). Instead of a model-specific quirk, this provides
>> flexible configuration for any ThinkPad with similar issues and is 
>> working
>> out-of-the-box without additional models being marked for the quirk.
> 
> Can't we determine this at probe time automatically?
> 
> We generally try to avoid module parameters whenever possible. 
> Sometimes,
> if not automated way exists, quirk + a parameter intended for temporary
> use might be acceptable compromise. I don't understand why you say
> "additional models being marked for the quirk" as that seems never
> necessary even if a quirk exists.
> 
>> The
>> parameter uses a fixed-size array based on TPACPI_MAX_THERMAL_SENSORS 
>> (16)
>> consistent with the driver’s thermal sensor handling (ie.
>> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).
>> 
>> Logging via pr_info() reports the number of suppressed sensors at 
>> module
>> initialization, and pr_info() logs each suppressed sensor during sysfs
>> attribute creation. Invalid sensor indices are logged once via 
>> pr_warn()
>> to avoid repetitive warnings.
> 
> This logging likely needs work too (to print much less), but lets first
> agree with the way forward.

I can reduce the logging going from pr_info to pr_debug or just remove 
it
as a whole for sure. But as I thought that this is an end-user thing 
(given it
is a ThinkPad) I put it there on purpose to let the user know what was 
going
on. The message about the de-facto suppressed sensors just logs once per
provided sensor index on sysfs creation, not the sensor's usage. The
thinkpad_acpi module, moreover, logs excessively in general already. 
Just saw
this from a user's perspective seeing myself google forum posts for 
missing
sensors without a trace. I can surely make the logging conditional in
thinkpad_acpi_init_banner().

That user-thinking was also part my second thought: as I didn't find any 
such
problem for a P15 on Google or the like I suspected this to be a 
not-so-usual case. Thus,
I have already re-worked the patch to its current form using a parameter
instead of using a quirk-table (analog to the ones already in place) 
that I had
written initially (using BIOS revision N30) as this would mean
suppressing the sensor on any such device without knowing if the problem
_really_ exists on the device in general in the real world - I haven't 
found any such
report on the Net while searching for it so I _assume_ that this is a 
weird
edge-case we're talking about here that the sensor is non-functional or 
my
hardware is borked in some way for whatever reason. The rather static 
creation
of let's say 8 or 16 sensors in general _could_ probably be the problem. 
The code
seems to make 16 iterations on my device as I saw when I had debug log 
in place,
probably due to the static size of TPACPI_MAX_THERMAL_SENSORS;

Also I imagine there will be other ThinkPad-devices/models having such
problem with a different sensor index (which I surely don't
know of yet) so these would not be listed in the new quirk-table and 
needed a new
patch to include them valid for this quirk. It might be another model 
needs a
quirk for sensor 14, 15 and 16 which the current code would just create 
if the
device falls through a specific match (not validating the actual 
sensors) as the
sensors creation currently is rather "static". Using the parameter a 
user would be
able to just add the index of the broken sensor and would be good to go 
no
matter what model being used and what sensors indices it would affect - 
it's just
my approach to be generic and future proof. The quirk-table approach 
also used
more code than the parameter approach with even reduced flexibility.

As I understand your mail you would also eventually like testing the 
sensor
being created for let's say "technical validity". On my machine 
temp4_input is
technically valid (sysfs created and working) but spills a static 0. So 
this
sensor is actually unusable from a user-point of view making me exclude 
it via
the new parameter as well. What would (in your opinion) the handling be 
for
this specific sensor temp4_input be when using the quirk-table approach?

Please advice on how to re-write the functionality so I can give it 
another spin.
Sorry for the lengthy explanation and thanks for your time reading this.

> 
>> Tested on a ThinkPad P15 with
>> suppress_sensor=3,7, confirming suppression of temp4_input and 
>> temp8_input
>> with no sysfs errors. Bounds checking for uncommon values is in place 
>> or
>> will be logged.
>> 
>> The patch applies to the current
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>> although
>> it was initially written for a 6.16.0 kernel.
>> 
>> I look forward to any feedback on the patch and/or handling of 
>> submission.
>> Please CC: for now as I am not (yet) subscribed. Thank you.
> 
> Please don't include details like these 2 last paragraphs into the 
> commit
> message itself, it's fine to state things like this below --- line as
> they're automatically removed when the patch is applied.
> 
> Usually Linus' tree is fine as base, but sometimes pdx86 for-next has
> diverged.
> 
>> 
>> Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting>
>> ---
>> Notes:
>> I haven't posted on LKML or send a patch for over a decade now so
>> please forgive any possible mistakes I made regarding current coding
>> conventions or more generally in submitting this patch. The patch was
>> running for some time here with faulty sensors removed from sysfs and 
>> no
>> problems otherwise detected and was surely run through checkpatch.pl 
>> before
>> submission. get_maintainer.pl was helpful to find the hopefully right
>> people for CC:ing but I am otherweise totally unaware of any current
>> procedures or best-practices when it comes to submitting a patch.
> 
> Welcome back. You did quite well regardless. :-)
> 
>> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 
>> +++++++++++++++++++++++++++++
>>  1 file changed, 35 insertions(+)
>> 
>> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c 
>> b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> index cc19fe520ea9..30ff01f87403 100644
>> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
>>  	s32 temp[TPACPI_MAX_THERMAL_SENSORS];
>>  };
>> 
>> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
>> +static unsigned int suppress_sensor_count;
>> +
>> +static bool is_sensor_suppressed(int index)
>> +{
>> +	unsigned int i;
>> +	bool logged = false;
>> +
>> +	for (i = 0; i < suppress_sensor_count; i++) {
>> +		if (suppress_sensor[i] == index)
>> +			return true;
>> +
>> +		if (!logged &&
>> +			(suppress_sensor[i] < 0
>> +				|| suppress_sensor[i] >= TPACPI_MAX_THERMAL_SENSORS)) {
>> +			pr_warn("Invalid sensor index %d in suppress_sensor\n",
>> +				suppress_sensor[i]);
>> +			logged = true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>  static const struct tpacpi_quirk thermal_quirk_table[] __initconst = 
>> {
>>  	/* Non-standard address for thermal registers on some ThinkPads */
>>  	TPACPI_Q_LNV3('R', '1', 'F', true),	/* L13 Yoga Gen 2 */
>> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct 
>> kobject *kobj,
>> 
>>  	int idx = sensor_attr->index;
>> 
>> +	if (is_sensor_suppressed(idx)) {
>> +		pr_info("Sensor %d suppressed\n", idx);
>> +		return 0;
>> +	}
>> +
>>  	switch (thermal_read_mode) {
>>  	case TPACPI_THERMAL_NONE:
>>  		return 0;
>> @@ -11653,6 +11682,9 @@ static void __init 
>> thinkpad_acpi_init_banner(void)
>>  			thinkpad_id.model_str,
>>  			(thinkpad_id.nummodel_str) ?
>>  				thinkpad_id.nummodel_str : "unknown");
>> +
>> +	pr_info("Suppressing %d user-supplied sensor(s) via parameter 
>> suppress_sensor\n",
>> +		suppress_sensor_count);
>>  }
>> 
>>  /* Module init, exit, parameters */
>> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
>>  module_param_named(debug, dbg_level, uint, 0);
>>  MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
>> 
>> +module_param_array(suppress_sensor, int, &suppress_sensor_count, 
>> 0444);
>> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to 
>> suppress (e.g., 3,7)");
>> +
>>  module_param(force_load, bool, 0444);
>>  MODULE_PARM_DESC(force_load,
>>  		 "Attempts to load the driver even on a mis-identified ThinkPad 
>> when true");
>> 

Marc

  reply	other threads:[~2025-08-21 17:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-18 20:39 [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors Marc Burkhardt
2025-08-19  8:05 ` Ilpo Järvinen
2025-08-21 17:47   ` Marc Burkhardt [this message]
2025-08-20  0:03 ` Mark Pearson
2025-08-21 17:32   ` Marc Burkhardt
2025-08-22 11:54     ` Armin Wolf
2025-08-26 18:43       ` Mark Pearson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=864df0dfcb93d3b8985d9a4b18cf8d71@protoco.consulting \
    --to=marc.burkhardt@protoco.consulting \
    --cc=derekjohn.clark@gmail.com \
    --cc=hansg@kernel.org \
    --cc=hmh@hmh.eng.br \
    --cc=ibm-acpi-devel@lists.sourceforge.net \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=mpearson-lenovo@squebb.ca \
    --cc=platform-driver-x86@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.