All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jean Delvare <khali@linux-fr.org>
To: lm-sensors@vger.kernel.org
Subject: Re: [lm-sensors] pwmconfig doesn't detect correlations properly
Date: Thu, 19 Aug 2010 13:16:30 +0000	[thread overview]
Message-ID: <20100819151630.1ee5b124@hyperion.delvare> (raw)
In-Reply-To: <AANLkTinN=fLZZyMNfHL5CShkRDOjMTUQeV+GZChUKdWy@mail.gmail.com>

Hi Charles,

On Thu, 19 Aug 2010 11:06:41 +0200, Jean Delvare wrote:
> On Mon, 16 Aug 2010 08:51:22 -0700, Guenter Roeck wrote:
> > On Mon, 2010-08-16 at 00:07 -0400, Charles Pillar wrote:
> > > Hi all,
> > >    I'm new here, apologies in.advance if I do something wrong. I
> > > believe that I have found a bug in pwmconfig. I first observed this
> > > behavior many many months ago and couldn't find anyone else with the
> > > problem so I just assumed it was just me. I've since stumbled on it
> > > again so I decided to look into it myself. I don't know if anyone is
> > > aware of the behavior I am seeing, but here it is...
> > > 
> > > Take for example a board with two or more PWM controllable fans both
> > > which of which the speed can be measured. Thus I have:
> > > 
> > > /sys/class/hwmon/hwmon0/pwm1
> > > /sys/class/hwmon/hwmon0/pwm1_enable
> > > /sys/class/hwmon/hwmon0/fan1_input
> > > /sys/class/hwmon/hwmon0/pwm2
> > > /sys/class/hwmon/hwmon0/pwm2_enable
> > > /sys/class/hwmon/hwmon0/fan2_input
> > > (etc...)
> > > 
> > > pwm1 & pwm1_enable = fan1_input
> > > pwm2 & pwm2_enable = fan2_input
> > > (etc...)
> > > 
> > > I think this would be a fairly common set up? Indeed I have three
> > > machines that are setup this way (1 board has 2 fans, the other 2 have
> > > 3 fans each)
> > > 
> > > From what I can see, pwmconfig does this:
> > > 
> > > pwm1_enable=0
> > > pwm2_enable=0
> > > wait...
> > > for each PWM:
> > >    this pwm_enable=1
> > >    this pwm=0
> > >    for each fan
> > >       compare this fan before / after
> > >       this pwm_enable=0
> > >       check fan returns to normal
> > >    next fan
> > > next pwm
> > > 
> > > The problem with this logic is that for each PWM, the pwm_enable is
> > > set to 1, then the first fan is tested, after the first fan is tested,
> > > the pwm is disabled and never re-enabled (until the next pwm)...
> > > This means the pwm1ún1 correlation is detected, but pwm2ún2 is not
> > > - but only because the pwm_enable is still set to 0 when the second
> > > and subsequent fans are tested...
> 
> Thanks a lot for reporting. I have a hard time believing that this has
> been broken for years and you're the first one to report, but this is
> the case... I'm even more surprised that _I_ did not notice the bug,
> while I have been using pwmconfig a lot and worked a lot on it over the
> past few years.

I tested the old code again to try and understand how I could have
missed the bug so far. The explanation is that the current code is not
only broken, it's also racy. And the race works around the bug, at
least in my case.

The reason why the problem wasn't noticed earlier is because we do not
have an unconditional delay after the faulty pwmdisable $i. We only
wait for the fan to settle if a correlation was found, because we want
to ensure that the fan is back to full speed again. If no correlation
is found, we move on quickly to the next fan input. With some luck,
said fan didn't yet have the time to go back to full speed. Or if it
did, the caching we do in most hwmon drivers will cause the driver to
not return an up-to-date fan speed value. Either way, the correlation
with the second fan is found in my case. If I add a sleep after
pwmdisable $i, then it is no longer found (i.e. the bug can be reliably
triggered.)

Now I am surprised that you managed to repeatedly hit the bug without
the extra sleep. Maybe you are using a driver which doesn't cache fan
speeds?

As a side note, this all suggests that cached fan speeds should be
invalidated each time PWM settings change. I can't remember any hwmon
driver doing this, but they should.

We still want to fix pwmconfig of course, but at least I now understand
why the bug wasn't found earlier.

-- 
Jean Delvare

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

  parent reply	other threads:[~2010-08-19 13:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-28  6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
2010-08-16  4:19 ` Charles Pillar
2010-08-16 15:51 ` Guenter Roeck
2010-08-19  9:06 ` Jean Delvare
2010-08-19  9:37 ` Jean Delvare
2010-08-19 11:23 ` Charles Pillar
2010-08-19 13:16 ` Jean Delvare [this message]
2010-08-19 13:37 ` Jean Delvare
2010-08-19 14:31 ` Guenter Roeck
2010-08-22 12:11 ` Jean Delvare
2010-08-22 12:58 ` Charles Pillar
2010-08-22 16:26 ` Jean Delvare
2010-08-24  5:52 ` Charles Pillar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100819151630.1ee5b124@hyperion.delvare \
    --to=khali@linux-fr.org \
    --cc=lm-sensors@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.