All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali.rohar@gmail.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Arnd Bergmann <arnd@arndb.de>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org,
	Steven Honeyman <stevenhoneyman@gmail.com>
Subject: Re: [PATCH] i8k: Ignore temperature sensors which report invalid values
Date: Tue, 18 Nov 2014 15:46:51 +0100	[thread overview]
Message-ID: <201411181546.51284@pali> (raw)
In-Reply-To: <546ADF7B.5020201@roeck-us.net>

[-- Attachment #1: Type: Text/Plain, Size: 7473 bytes --]

On Tuesday 18 November 2014 06:56:11 Guenter Roeck wrote:
> On 11/17/2014 12:35 AM, Pali Rohár wrote:
> > On Thursday 23 October 2014 18:45:07 Guenter Roeck wrote:
> >> On Thu, Oct 23, 2014 at 12:37:34PM +0200, Pali Rohár wrote:
> >>> On Wednesday 22 October 2014 19:10:05 Guenter Roeck wrote:
> >>>> On Wed, Oct 22, 2014 at 06:35:53PM +0200, Pali Rohár 
wrote:
> >>>>> On Wednesday 22 October 2014 18:19:47 Guenter Roeck
> > 
> > wrote:
> >>>>>> On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár
> > 
> > wrote:
> >>>>>>> On Tuesday 21 October 2014 06:27:23 Guenter Roeck
> > 
> > wrote:
> >>>>>>>> On 10/20/2014 09:46 AM, Pali Rohár wrote:
> >>>>>>>>> Ok, I will describe my problem. Guenter, maybe
> >>>>>>>>> you can find another solution/fix for it.
> >>>>>>>>> 
> >>>>>>>>> Calling i8k_get_temp(3) on my laptop without
> >>>>>>>>> I8K_TEMPERATURE_BUG always returns value 193
> >>>>>>>>> (which is above I8K_MAX_TEMP).
> >>>>>>>>> 
> >>>>>>>>> When I8K_TEMPERATURE_BUG is enabled (by default)
> >>>>>>>>> then i8k_get_temp(3) returns value from prev[3]
> >>>>>>>>> and store new value I8K_TEMPERATURE_BUG to
> >>>>>>>>> prev[3]. Value in prev[3] is initialized to 0.
> >>>>>>>>> 
> >>>>>>>>> What I want to achieve is: when i8k_get_temp()
> >>>>>>>>> for particular sensor id always returns invalid
> >>>>>>>>> value (> I8K_MAX_TEMP) then we should totally
> >>>>>>>>> ignore sensor with that id and do not export it
> >>>>>>>>> via hwmon.
> >>>>>>>>> 
> >>>>>>>>> My solution is: initialize prev[id] to
> >>>>>>>>> I8K_MAX_TEMP, so on invalid data first call to
> >>>>>>>>> i8k_get_temp(id) returns I8K_MAX_TEMP. Then in
> >>>>>>>>> i8k_init_hwmon check if value is < I8K_MAX_TEMP
> >>>>>>>>> and if not ignore sensor id.
> >>>>>>>>> 
> >>>>>>>>> Guenter, it is clear now? Are you ok that we
> >>>>>>>>> should ignore sensor if always report value
> >>>>>>>>> above I8K_MAX_TEMP? If you do not like my
> >>>>>>>>> solution/patch for it, can you specify how
> >>>>>>>>> other can it be fixed?
> >>>>>>>> 
> >>>>>>>> I still don't see the point in initializing
> >>>>>>>> prev[].
> >>>>>>> 
> >>>>>>> Now prev[] is initialized to 0. It means that first
> >>>>>>> call i8k_get_temp() (with sensor id which return
> >>>>>>> value > I8K_MAX_TEMP) returns 0. Second and other
> >>>>>>> calls returns I8K_MAX_TEMP.
> >>>>>>> 
> >>>>>>> So point is to return same value for first and other
> >>>>>>> calls.
> >>>>>> 
> >>>>>> Yes, I realized that after I sent my previous mail.
> >>>>>> 
> >>>>>>>> Yes, I am ok with ignoring sensor values if the
> >>>>>>>> reported temperature is above I8K_MAX_TEMP. I am
> >>>>>>>> just not sure if we should check against
> >>>>>>>> I8K_MAX_TEMP or against, say, 192. Reason is that
> >>>>>>>> we do know that the sensor can erroneously return
> >>>>>>>> 0x99 on some systems once in a while. We would
> >>>>>>>> not want to ignore those sensors just because
> >>>>>>>> they happen to report 0x99 during initialization.
> >>>>>>>> 
> >>>>>>>> So maybe make it
> >>>>>>>> 
> >>>>>>>> 	if (err >= 0 && err < 192)
> >>>>>>>> 
> >>>>>>>> and add a note before the first if(), explaining
> >>>>>>>> that higher values suggest that there is no
> >>>>>>>> sensor attached.
> >>>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> Guenter
> >>>>>>> 
> >>>>>>> Right, now we need to decide which magic constant to
> >>>>>>> use...
> >>>>>>> 
> >>>>>>> And now I found another problem :-)
> >>>>>>> 
> >>>>>>> On my laptop i8k_get_temp(3) not always return value
> >>>>>>> 193. It is only when AMD graphics card is turned
> >>>>>>> off. When card is on i8k_get_temp(3) returns same
> >>>>>>> value as temperature hwmon part from radeon DRM
> >>>>>>> driver.
> >>>>>> 
> >>>>>> Can you turn the GPU on or off during runtime ?
> >>>>>> That would make it really tricky to handle the
> >>>>>> situation.
> >>>>> 
> >>>>> Yes. New laptops with Nvidia Optimus or AMD PowerXpress
> >>>>> or Enduro technology are designed to automatically turn
> >>>>> off secondary GPU when is not in use. And
> >>>>> nouveau/radeon drivers together with vga_switcheroo
> >>>>> implements this dynamic power on/off.
> >>>>> 
> >>>>>>> So it looks like that on my laptop i8k sensor with
> >>>>>>> id 3 reports GPU temperature.
> >>>>>>> 
> >>>>>>> When card is turned off radeon driver reports
> >>>>>>> -EINVAL for temperature hwmon sysnode.
> >>>>>>> 
> >>>>>>> So now I think i8k could not ignore sensor totally
> >>>>>>> as it can be mapped to some HW which can be
> >>>>>>> dynamically turned on/off (like my graphics card).
> >>>>>>> 
> >>>>>>> So what do you think about reporting -EINVAL instead
> >>>>>>> I8K_MAX_TEMP when dell SMM returns value above
> >>>>>>> I8K_MAX_TEMP?
> >>>>>> 
> >>>>>> -EINVAL is supposed to mean "Invalid Argument", so it
> >>>>>> really has ia different semantics. We could use
> >>>>>> -ENXIO, "No such device or address", which seems more
> >>>>>> appropriate.
> >>>>> 
> >>>>> I prefer to use -EINVAL because other driver (radeon) is
> >>>>> using it and userspace "sensors" programs handle EINVAL
> >>>>> and show "N/A" in output instead reporting some error
> >>>>> about reading value. If nothing else consistency (with
> >>>>> other drivers) is my argument.
> >>>> 
> >>>> Ok, if sensors implements it that way then let's do it.
> >>>> 
> >>>>>> Overall, I think the entire error handling is broken
> >>>>>> and should be replaced. One option would be to
> >>>>>> explicitly check for 0x99 and, if detected, go to
> >>>>>> sleep for, say, 100ms and try again. If it still
> >>>>>> fails, and for all other bad values, return -ENXIO.
> >>>>>> Then the calling code can either return the error to
> >>>>>> user space in the show function, or not install the
> >>>>>> sensor in the probe function.
> >>>>>> 
> >>>>>> Does that make sense ?
> >>>>> 
> >>>>> Yes, replacing error handling with retry call (after
> >>>>> some sleep) is better then current code (which returns
> >>>>> previous value or returns I8K_MAX_TEMP).
> >>>>> 
> >>>>> But your solution not install the sensor if probe fails
> >>>>> on bad value does not solve problem that i8k.ko is
> >>>>> loading at time when GPU card is turned off.
> >>>> 
> >>>> Yes, the dynamics in that situation makes it a bit
> >>>> difficult to handle the situation.
> >>>> 
> >>>>> I think current check for installing sensor (err < 0) is
> >>>>> enough and when invalid value (> I8K_MAX_TEMP) is
> >>>>> returned just do not show this value for userspace in
> >>>>> hwmon sysnode.
> >>>> 
> >>>> Ok with me, and agreed.
> >>>> 
> >>>> Thanks,
> >>>> Guenter
> >>> 
> >>> Ok, are you going to fix i8k_get_temp() function (with
> >>> sleeping)?
> >> 
> >> I had hoped you would find the time to do it ;-).
> >> 
> >> Sure, I can do it, but I am kind of busy right now, so it
> >> will take a while.
> >> 
> >> Thanks,
> >> Guenter
> > 
> > Ok. Will you do that for 3.19 kernel? Meanwhile I can
> > prepare patch for temperature labels. I looked into
> > NBSVC.MDM and there is something related for type of
> > temperature sensors...
> 
> Highly unlikely. I don't have the time right now.
> 
> Guenter

Ok. I will try to at least fix patch from first post in this 
thread, so i8k_get_temp() does not return totally invalid value 
to userspace (or at first call).

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  reply	other threads:[~2014-11-18 14:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-19 14:46 [PATCH] i8k: Ignore temperature sensors which report invalid values Pali Rohár
2014-10-19 15:13 ` Guenter Roeck
2014-10-20 16:46   ` Pali Rohár
2014-10-21  4:27     ` Guenter Roeck
2014-10-22 12:29       ` Pali Rohár
2014-10-22 16:19         ` Guenter Roeck
2014-10-22 16:35           ` Pali Rohár
2014-10-22 17:10             ` Guenter Roeck
2014-10-23 10:37               ` Pali Rohár
2014-10-23 16:45                 ` Guenter Roeck
2014-11-17  8:35                   ` Pali Rohár
2014-11-18  5:56                     ` Guenter Roeck
2014-11-18 14:46                       ` Pali Rohár [this message]
2014-11-18 14:56                         ` [PATCH] i8k: Fix temperature bug handling in i8k_get_temp() Pali Rohár
2014-11-30  0:12                           ` Guenter Roeck
2014-11-30 14:44                             ` Pali Rohár
2014-11-30  9:00                           ` Guenter Roeck
2014-11-30 14:48                             ` Pali Rohár
2014-11-30 15:56                               ` Guenter Roeck
2014-11-30 16:00                                 ` Pali Rohár

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201411181546.51284@pali \
    --to=pali.rohar@gmail.com \
    --cc=arnd@arndb.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=stevenhoneyman@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.