From: "Pali Rohár" <pali.rohar@gmail.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Arnd Bergmann <arnd@arndb.de>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
linux-kernel@vger.kernel.org,
Steven Honeyman <stevenhoneyman@gmail.com>
Subject: Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
Date: Wed, 22 Oct 2014 18:35:53 +0200 [thread overview]
Message-ID: <201410221835.53526@pali> (raw)
In-Reply-To: <20141022161947.GA31045@roeck-us.net>
[-- Attachment #1: Type: Text/Plain, Size: 5236 bytes --]
On Wednesday 22 October 2014 18:19:47 Guenter Roeck wrote:
> On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár wrote:
> > On Tuesday 21 October 2014 06:27:23 Guenter Roeck wrote:
> > > On 10/20/2014 09:46 AM, Pali Rohár wrote:
> > > > Ok, I will describe my problem. Guenter, maybe you can
> > > > find another solution/fix for it.
> > > >
> > > > Calling i8k_get_temp(3) on my laptop without
> > > > I8K_TEMPERATURE_BUG always returns value 193 (which is
> > > > above I8K_MAX_TEMP).
> > > >
> > > > When I8K_TEMPERATURE_BUG is enabled (by default) then
> > > > i8k_get_temp(3) returns value from prev[3] and store new
> > > > value I8K_TEMPERATURE_BUG to prev[3]. Value in prev[3]
> > > > is initialized to 0.
> > > >
> > > > What I want to achieve is: when i8k_get_temp() for
> > > > particular sensor id always returns invalid value (>
> > > > I8K_MAX_TEMP) then we should totally ignore sensor with
> > > > that id and do not export it via hwmon.
> > > >
> > > > My solution is: initialize prev[id] to I8K_MAX_TEMP, so
> > > > on invalid data first call to i8k_get_temp(id) returns
> > > > I8K_MAX_TEMP. Then in i8k_init_hwmon check if value is
> > > > < I8K_MAX_TEMP and if not ignore sensor id.
> > > >
> > > > Guenter, it is clear now? Are you ok that we should
> > > > ignore sensor if always report value above
> > > > I8K_MAX_TEMP? If you do not like my solution/patch for
> > > > it, can you specify how other can it be fixed?
> > >
> > > I still don't see the point in initializing prev[].
> >
> > Now prev[] is initialized to 0. It means that first call
> > i8k_get_temp() (with sensor id which return value >
> > I8K_MAX_TEMP) returns 0. Second and other calls returns
> > I8K_MAX_TEMP.
> >
> > So point is to return same value for first and other calls.
>
> Yes, I realized that after I sent my previous mail.
>
> > > Yes, I am ok with ignoring sensor values if the reported
> > > temperature is above I8K_MAX_TEMP. I am just not sure if
> > > we should check against I8K_MAX_TEMP or against, say,
> > > 192. Reason is that we do know that the sensor can
> > > erroneously return 0x99 on some systems once in a while.
> > > We would not want to ignore those sensors just because
> > > they happen to report 0x99 during initialization.
> > >
> > > So maybe make it
> > >
> > > if (err >= 0 && err < 192)
> > >
> > > and add a note before the first if(), explaining that
> > > higher values suggest that there is no sensor attached.
> > >
> > > Thanks,
> > > Guenter
> >
> > Right, now we need to decide which magic constant to use...
> >
> > And now I found another problem :-)
> >
> > On my laptop i8k_get_temp(3) not always return value 193. It
> > is only when AMD graphics card is turned off. When card is
> > on i8k_get_temp(3) returns same value as temperature hwmon
> > part from radeon DRM driver.
>
> Can you turn the GPU on or off during runtime ?
> That would make it really tricky to handle the situation.
>
Yes. New laptops with Nvidia Optimus or AMD PowerXpress or Enduro
technology are designed to automatically turn off secondary GPU
when is not in use. And nouveau/radeon drivers together with
vga_switcheroo implements this dynamic power on/off.
> > So it looks like that on my laptop i8k sensor with id 3
> > reports GPU temperature.
> >
> > When card is turned off radeon driver reports -EINVAL for
> > temperature hwmon sysnode.
> >
> > So now I think i8k could not ignore sensor totally as it can
> > be mapped to some HW which can be dynamically turned on/off
> > (like my graphics card).
> >
> > So what do you think about reporting -EINVAL instead
> > I8K_MAX_TEMP when dell SMM returns value above
> > I8K_MAX_TEMP?
>
> -EINVAL is supposed to mean "Invalid Argument", so it really
> has ia different semantics. We could use -ENXIO, "No such
> device or address", which seems more appropriate.
>
I prefer to use -EINVAL because other driver (radeon) is using it
and userspace "sensors" programs handle EINVAL and show "N/A" in
output instead reporting some error about reading value. If
nothing else consistency (with other drivers) is my argument.
> Overall, I think the entire error handling is broken and
> should be replaced. One option would be to explicitly check
> for 0x99 and, if detected, go to sleep for, say, 100ms and
> try again. If it still fails, and for all other bad values,
> return -ENXIO. Then the calling code can either return the
> error to user space in the show function, or not install the
> sensor in the probe function.
>
> Does that make sense ?
>
Yes, replacing error handling with retry call (after some sleep)
is better then current code (which returns previous value or
returns I8K_MAX_TEMP).
But your solution not install the sensor if probe fails on bad
value does not solve problem that i8k.ko is loading at time when
GPU card is turned off.
I think current check for installing sensor (err < 0) is enough
and when invalid value (> I8K_MAX_TEMP) is returned just do not
show this value for userspace in hwmon sysnode.
> Thanks,
> Guenter
--
Pali Rohár
pali.rohar@gmail.com
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2014-10-22 16:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-19 14:46 [PATCH] i8k: Ignore temperature sensors which report invalid values Pali Rohár
2014-10-19 15:13 ` Guenter Roeck
2014-10-20 16:46 ` Pali Rohár
2014-10-21 4:27 ` Guenter Roeck
2014-10-22 12:29 ` Pali Rohár
2014-10-22 16:19 ` Guenter Roeck
2014-10-22 16:35 ` Pali Rohár [this message]
2014-10-22 17:10 ` Guenter Roeck
2014-10-23 10:37 ` Pali Rohár
2014-10-23 16:45 ` Guenter Roeck
2014-11-17 8:35 ` Pali Rohár
2014-11-18 5:56 ` Guenter Roeck
2014-11-18 14:46 ` Pali Rohár
2014-11-18 14:56 ` [PATCH] i8k: Fix temperature bug handling in i8k_get_temp() Pali Rohár
2014-11-30 0:12 ` Guenter Roeck
2014-11-30 14:44 ` Pali Rohár
2014-11-30 9:00 ` Guenter Roeck
2014-11-30 14:48 ` Pali Rohár
2014-11-30 15:56 ` Guenter Roeck
2014-11-30 16:00 ` Pali Rohár
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201410221835.53526@pali \
--to=pali.rohar@gmail.com \
--cc=arnd@arndb.de \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=stevenhoneyman@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.