* [lm-sensors] pwmconfig doesn't detect correlations properly
@ 2010-07-28 6:26 Charles Pillar
2010-08-16 4:19 ` Charles Pillar
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: Charles Pillar @ 2010-07-28 6:26 UTC (permalink / raw)
To: lm-sensors
[-- Attachment #1.1: Type: text/plain, Size: 2414 bytes --]
Hi all,
I'm new here. I believe that I have found a bug in pwmconfig. I first
observed this behavior many many months ago and couldn't find anyone else
with the problem so I just assumed it was just me. I've since stumbled on it
again so I decided to look into it myself. I don't know if anyone is aware
of the behavior I am seeing, but here it is...
Take for example a board with two or more PWM controllable fans both which
of which the speed can be measured. Thus I have:
/sys/class/hwmon/hwmon0/pwm1
/sys/class/hwmon/hwmon0/pwm1_enable
/sys/class/hwmon/hwmon0/fan1_input
/sys/class/hwmon/hwmon0/pwm2
/sys/class/hwmon/hwmon0/pwm2_enable
/sys/class/hwmon/hwmon0/fan2_input
(etc...)
pwm1 & pwm1_enable = fan1_input
pwm2 & pwm2_enable = fan2_input
(etc...)
I think this would be a fairly common set up? Indeed I have three machines
that are setup this way (1 has 2 fans, the other 2 have 3 fans)
From what I can see, pwmconfig does this:
pwm1_enable=0
pwm2_enable=0
wait...
for each PWM:
this pwm_enable=1
this pwm=0
for each fan
compare this fan before / after
this pwm_enable=0
check fan returns to normal
next fan
next pwm
The problem with this logic is that for each PWM, the pwm_enable is set to
1, then the first fan is tested, after the first fan is tested, the pwm is
disabled and never re-enabled (until the next pwm)...
This means the pwm1=fan1 correlation is detected, but pwm2=fan2 is not - but
only because the pwm_enable is still set to 0 when the second and subsequent
fans are tested...
There are several ways to fix this, but after some pondering, I think that
this logic might be most appropriate:
pwm1_enable=0
pwm2_enable=0
wait...
for each PWM:
for each fan
this pwm_enable=1
this pwm=0
compare this fan before / after
this pwm_enable=0
check fan returns to normal
next fan
next pwm
This causes the given pwm/fan to stop/start once for each fan input, which
might seem bad, but then each fan is individually checked to make sure that
it has begun spinning again after pwm_enable is set to 0... (as compared to
stopping and starting once per pwm...)
I have a patch which I can supply. It fixes the problem for me, but I'm not
100% sure if it introduces any other problems. Let me know if I should email
it or submit somewhere etc...
I would value hearing what others think about this problem...
Thanks
Charles
[-- Attachment #1.2: Type: text/html, Size: 3170 bytes --]
[-- Attachment #2: Type: text/plain, Size: 153 bytes --]
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
@ 2010-08-16 4:19 ` Charles Pillar
2010-08-16 15:51 ` Guenter Roeck
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Charles Pillar @ 2010-08-16 4:19 UTC (permalink / raw)
To: lm-sensors
Hi all,
I'm new here, apologies in.advance if I do something wrong. I
believe that I have found a bug in pwmconfig. I first observed this
behavior many many months ago and couldn't find anyone else with the
problem so I just assumed it was just me. I've since stumbled on it
again so I decided to look into it myself. I don't know if anyone is
aware of the behavior I am seeing, but here it is...
Take for example a board with two or more PWM controllable fans both
which of which the speed can be measured. Thus I have:
/sys/class/hwmon/hwmon0/pwm1
/sys/class/hwmon/hwmon0/pwm1_enable
/sys/class/hwmon/hwmon0/fan1_input
/sys/class/hwmon/hwmon0/pwm2
/sys/class/hwmon/hwmon0/pwm2_enable
/sys/class/hwmon/hwmon0/fan2_input
(etc...)
pwm1 & pwm1_enable = fan1_input
pwm2 & pwm2_enable = fan2_input
(etc...)
I think this would be a fairly common set up? Indeed I have three
machines that are setup this way (1 board has 2 fans, the other 2 have
3 fans each)
From what I can see, pwmconfig does this:
pwm1_enable=0
pwm2_enable=0
wait...
for each PWM:
this pwm_enable=1
this pwm=0
for each fan
compare this fan before / after
this pwm_enable=0
check fan returns to normal
next fan
next pwm
The problem with this logic is that for each PWM, the pwm_enable is
set to 1, then the first fan is tested, after the first fan is tested,
the pwm is disabled and never re-enabled (until the next pwm)...
This means the pwm1ún1 correlation is detected, but pwm2ún2 is not
- but only because the pwm_enable is still set to 0 when the second
and subsequent fans are tested...
There are several ways to fix this, but after some pondering, I think
that this logic might be most appropriate:
pwm1_enable=0
pwm2_enable=0
wait...
for each PWM:
for each fan
this pwm_enable=1
this pwm=0
compare this fan before / after
this pwm_enable=0
check fan returns to normal
next fan
next pwm
This causes the given pwm/fan to stop/start once for each fan input,
which might seem bad, but then each fan is individually checked to
make sure that it has begun spinning again after pwm_enable is set to
0... (as compared to stopping and starting once per pwm...)
I have a patch which I can supply. It fixes the problem for me, but
I'm not 100% sure if it introduces any other problems. Let me know if
I should email it or submit somewhere etc...
I would value hearing what others think about this problem...
Thanks
Charles
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
2010-08-16 4:19 ` Charles Pillar
@ 2010-08-16 15:51 ` Guenter Roeck
2010-08-19 9:06 ` Jean Delvare
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Guenter Roeck @ 2010-08-16 15:51 UTC (permalink / raw)
To: lm-sensors
On Mon, 2010-08-16 at 00:07 -0400, Charles Pillar wrote:
> Hi all,
> I'm new here, apologies in.advance if I do something wrong. I
> believe that I have found a bug in pwmconfig. I first observed this
> behavior many many months ago and couldn't find anyone else with the
> problem so I just assumed it was just me. I've since stumbled on it
> again so I decided to look into it myself. I don't know if anyone is
> aware of the behavior I am seeing, but here it is...
>
> Take for example a board with two or more PWM controllable fans both
> which of which the speed can be measured. Thus I have:
>
> /sys/class/hwmon/hwmon0/pwm1
> /sys/class/hwmon/hwmon0/pwm1_enable
> /sys/class/hwmon/hwmon0/fan1_input
> /sys/class/hwmon/hwmon0/pwm2
> /sys/class/hwmon/hwmon0/pwm2_enable
> /sys/class/hwmon/hwmon0/fan2_input
> (etc...)
>
> pwm1 & pwm1_enable = fan1_input
> pwm2 & pwm2_enable = fan2_input
> (etc...)
>
> I think this would be a fairly common set up? Indeed I have three
> machines that are setup this way (1 board has 2 fans, the other 2 have
> 3 fans each)
>
> From what I can see, pwmconfig does this:
>
> pwm1_enable=0
> pwm2_enable=0
> wait...
> for each PWM:
> this pwm_enable=1
> this pwm=0
> for each fan
> compare this fan before / after
> this pwm_enable=0
> check fan returns to normal
> next fan
> next pwm
>
> The problem with this logic is that for each PWM, the pwm_enable is
> set to 1, then the first fan is tested, after the first fan is tested,
> the pwm is disabled and never re-enabled (until the next pwm)...
> This means the pwm1ún1 correlation is detected, but pwm2ún2 is not
> - but only because the pwm_enable is still set to 0 when the second
> and subsequent fans are tested...
>
> There are several ways to fix this, but after some pondering, I think
> that this logic might be most appropriate:
>
> pwm1_enable=0
> pwm2_enable=0
> wait...
> for each PWM:
> for each fan
> this pwm_enable=1
> this pwm=0
> compare this fan before / after
> this pwm_enable=0
> check fan returns to normal
> next fan
> next pwm
>
I would suggest something like
pwm1_enable=0
pwm2_enable=0
wait...
for each PWM:
this pwm_enable=1
this pwm=0
for each fan
check fan speed
if speed lower than threshold:
mark fan as controlled by this pwm
this pwm_enable=0
verify fan speed returned to normal
this pwm_enable=1
this pwm=0
next fan
this pwm_enable=0
next pwm
This would also solve your problem while reducing the number of times
pwm needs to be turned on and off.
Guenter
> This causes the given pwm/fan to stop/start once for each fan input,
> which might seem bad, but then each fan is individually checked to
> make sure that it has begun spinning again after pwm_enable is set to
> 0... (as compared to stopping and starting once per pwm...)
> I have a patch which I can supply. It fixes the problem for me, but
> I'm not 100% sure if it introduces any other problems. Let me know if
> I should email it or submit somewhere etc...
>
> I would value hearing what others think about this problem...
>
> Thanks
> Charles
>
> _______________________________________________
> lm-sensors mailing list
> lm-sensors@lm-sensors.org
> http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
2010-08-16 4:19 ` Charles Pillar
2010-08-16 15:51 ` Guenter Roeck
@ 2010-08-19 9:06 ` Jean Delvare
2010-08-19 9:37 ` Jean Delvare
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Jean Delvare @ 2010-08-19 9:06 UTC (permalink / raw)
To: lm-sensors
Hi Charles, Guenter,
On Mon, 16 Aug 2010 08:51:22 -0700, Guenter Roeck wrote:
> On Mon, 2010-08-16 at 00:07 -0400, Charles Pillar wrote:
> > Hi all,
> > I'm new here, apologies in.advance if I do something wrong. I
> > believe that I have found a bug in pwmconfig. I first observed this
> > behavior many many months ago and couldn't find anyone else with the
> > problem so I just assumed it was just me. I've since stumbled on it
> > again so I decided to look into it myself. I don't know if anyone is
> > aware of the behavior I am seeing, but here it is...
> >
> > Take for example a board with two or more PWM controllable fans both
> > which of which the speed can be measured. Thus I have:
> >
> > /sys/class/hwmon/hwmon0/pwm1
> > /sys/class/hwmon/hwmon0/pwm1_enable
> > /sys/class/hwmon/hwmon0/fan1_input
> > /sys/class/hwmon/hwmon0/pwm2
> > /sys/class/hwmon/hwmon0/pwm2_enable
> > /sys/class/hwmon/hwmon0/fan2_input
> > (etc...)
> >
> > pwm1 & pwm1_enable = fan1_input
> > pwm2 & pwm2_enable = fan2_input
> > (etc...)
> >
> > I think this would be a fairly common set up? Indeed I have three
> > machines that are setup this way (1 board has 2 fans, the other 2 have
> > 3 fans each)
> >
> > From what I can see, pwmconfig does this:
> >
> > pwm1_enable=0
> > pwm2_enable=0
> > wait...
> > for each PWM:
> > this pwm_enable=1
> > this pwm=0
> > for each fan
> > compare this fan before / after
> > this pwm_enable=0
> > check fan returns to normal
> > next fan
> > next pwm
> >
> > The problem with this logic is that for each PWM, the pwm_enable is
> > set to 1, then the first fan is tested, after the first fan is tested,
> > the pwm is disabled and never re-enabled (until the next pwm)...
> > This means the pwm1ún1 correlation is detected, but pwm2ún2 is not
> > - but only because the pwm_enable is still set to 0 when the second
> > and subsequent fans are tested...
Thanks a lot for reporting. I have a hard time believing that this has
been broken for years and you're the first one to report, but this is
the case... I'm even more surprised that _I_ did not notice the bug,
while I have been using pwmconfig a lot and worked a lot on it over the
past few years.
> >
> > There are several ways to fix this, but after some pondering, I think
> > that this logic might be most appropriate:
> >
> > pwm1_enable=0
> > pwm2_enable=0
> > wait...
> > for each PWM:
> > for each fan
> > this pwm_enable=1
> > this pwm=0
> > compare this fan before / after
> > this pwm_enable=0
> > check fan returns to normal
> > next fan
> > next pwm
> >
>
> I would suggest something like
>
> pwm1_enable=0
> pwm2_enable=0
> wait...
> for each PWM:
> this pwm_enable=1
> this pwm=0
> for each fan
> check fan speed
> if speed lower than threshold:
> mark fan as controlled by this pwm
> this pwm_enable=0
> verify fan speed returned to normal
> this pwm_enable=1
> this pwm=0
> next fan
> this pwm_enable=0
> next pwm
>
> This would also solve your problem while reducing the number of times
> pwm needs to be turned on and off.
>
> > This causes the given pwm/fan to stop/start once for each fan input,
> > which might seem bad, but then each fan is individually checked to
> > make sure that it has begun spinning again after pwm_enable is set to
> > 0... (as compared to stopping and starting once per pwm...)
> > I have a patch which I can supply. It fixes the problem for me, but
> > I'm not 100% sure if it introduces any other problems. Let me know if
> > I should email it or submit somewhere etc...
> >
> > I would value hearing what others think about this problem...
I think I would prefer a radically different approach:
pwm1_enable=0
pwm2_enable=0
wait...
for each PWM:
this pwm_enable=1
this pwm=0
for each fan
check and store fan speed
next fan
this pwm_enable=0
for each fan
verify fan speed returned to normal
next fan
for each fan
check stored fan speed
if speed lower than threshold:
mark fan as controlled by this pwm
next fan
next pwm
This idea is to split the inner for loop into individual actions. This
requires more code, but when dealing with hardware, doing the right
thing in a timely manner is more important than saving a couple lines
of code and CPU cycles. My approach has three merits:
* It is more obviously correct (or, if the code is broken, more
obviously broken.) Each inner for loop will do only one thing, so
hopefully it will do it well.
* It limits the number of PWM on/off to the minimum, as well as the
zero duty cycle time (fan potentially off).
* It warns about stuck fans faster than other approaches do.
Opinions?
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
` (2 preceding siblings ...)
2010-08-19 9:06 ` Jean Delvare
@ 2010-08-19 9:37 ` Jean Delvare
2010-08-19 11:23 ` Charles Pillar
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Jean Delvare @ 2010-08-19 9:37 UTC (permalink / raw)
To: lm-sensors
On Thu, 19 Aug 2010 11:06:41 +0200, Jean Delvare wrote:
> Thanks a lot for reporting. I have a hard time believing that this has
> been broken for years and you're the first one to report, but this is
> the case... I'm even more surprised that _I_ did not notice the bug,
> while I have been using pwmconfig a lot and worked a lot on it over the
> past few years.
I've created ticket #2380 to track this issue, so that it doesn't get
lost:
http://www.lm-sensors.org/ticket/2380
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
` (3 preceding siblings ...)
2010-08-19 9:37 ` Jean Delvare
@ 2010-08-19 11:23 ` Charles Pillar
2010-08-19 13:16 ` Jean Delvare
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Charles Pillar @ 2010-08-19 11:23 UTC (permalink / raw)
To: lm-sensors
Hi Jean, Guenter,
Thanks for your feedback. Jean that suggestion looks great to me, I
guess I was going for the minimal approach, just in case I was stark
raving mad :P but I agree, what you proposed does look like a better
way to go.
Thanks
Charles
On Thu, Aug 19, 2010 at 6:36 PM, Jean Delvare <khali@linux-fr.org> wrote:
> Hi Charles, Guenter,
>
> On Mon, 16 Aug 2010 08:51:22 -0700, Guenter Roeck wrote:
>> On Mon, 2010-08-16 at 00:07 -0400, Charles Pillar wrote:
>> > Hi all,
>> > I'm new here, apologies in.advance if I do something wrong. I
>> > believe that I have found a bug in pwmconfig. I first observed this
>> > behavior many many months ago and couldn't find anyone else with the
>> > problem so I just assumed it was just me. I've since stumbled on it
>> > again so I decided to look into it myself. I don't know if anyone is
>> > aware of the behavior I am seeing, but here it is...
>> >
>> > Take for example a board with two or more PWM controllable fans both
>> > which of which the speed can be measured. Thus I have:
>> >
>> > /sys/class/hwmon/hwmon0/pwm1
>> > /sys/class/hwmon/hwmon0/pwm1_enable
>> > /sys/class/hwmon/hwmon0/fan1_input
>> > /sys/class/hwmon/hwmon0/pwm2
>> > /sys/class/hwmon/hwmon0/pwm2_enable
>> > /sys/class/hwmon/hwmon0/fan2_input
>> > (etc...)
>> >
>> > pwm1 & pwm1_enable = fan1_input
>> > pwm2 & pwm2_enable = fan2_input
>> > (etc...)
>> >
>> > I think this would be a fairly common set up? Indeed I have three
>> > machines that are setup this way (1 board has 2 fans, the other 2 have
>> > 3 fans each)
>> >
>> > From what I can see, pwmconfig does this:
>> >
>> > pwm1_enable=0
>> > pwm2_enable=0
>> > wait...
>> > for each PWM:
>> > this pwm_enable=1
>> > this pwm=0
>> > for each fan
>> > compare this fan before / after
>> > this pwm_enable=0
>> > check fan returns to normal
>> > next fan
>> > next pwm
>> >
>> > The problem with this logic is that for each PWM, the pwm_enable is
>> > set to 1, then the first fan is tested, after the first fan is tested,
>> > the pwm is disabled and never re-enabled (until the next pwm)...
>> > This means the pwm1ún1 correlation is detected, but pwm2ún2 is not
>> > - but only because the pwm_enable is still set to 0 when the second
>> > and subsequent fans are tested...
>
> Thanks a lot for reporting. I have a hard time believing that this has
> been broken for years and you're the first one to report, but this is
> the case... I'm even more surprised that _I_ did not notice the bug,
> while I have been using pwmconfig a lot and worked a lot on it over the
> past few years.
>
>> >
>> > There are several ways to fix this, but after some pondering, I think
>> > that this logic might be most appropriate:
>> >
>> > pwm1_enable=0
>> > pwm2_enable=0
>> > wait...
>> > for each PWM:
>> > for each fan
>> > this pwm_enable=1
>> > this pwm=0
>> > compare this fan before / after
>> > this pwm_enable=0
>> > check fan returns to normal
>> > next fan
>> > next pwm
>> >
>>
>> I would suggest something like
>>
>> pwm1_enable=0
>> pwm2_enable=0
>> wait...
>> for each PWM:
>> this pwm_enable=1
>> this pwm=0
>> for each fan
>> check fan speed
>> if speed lower than threshold:
>> mark fan as controlled by this pwm
>> this pwm_enable=0
>> verify fan speed returned to normal
>> this pwm_enable=1
>> this pwm=0
>> next fan
>> this pwm_enable=0
>> next pwm
>>
>> This would also solve your problem while reducing the number of times
>> pwm needs to be turned on and off.
>>
>> > This causes the given pwm/fan to stop/start once for each fan input,
>> > which might seem bad, but then each fan is individually checked to
>> > make sure that it has begun spinning again after pwm_enable is set to
>> > 0... (as compared to stopping and starting once per pwm...)
>> > I have a patch which I can supply. It fixes the problem for me, but
>> > I'm not 100% sure if it introduces any other problems. Let me know if
>> > I should email it or submit somewhere etc...
>> >
>> > I would value hearing what others think about this problem...
>
> I think I would prefer a radically different approach:
>
> pwm1_enable=0
> pwm2_enable=0
> wait...
> for each PWM:
> this pwm_enable=1
> this pwm=0
> for each fan
> check and store fan speed
> next fan
> this pwm_enable=0
>
> for each fan
> verify fan speed returned to normal
> next fan
>
> for each fan
> check stored fan speed
> if speed lower than threshold:
> mark fan as controlled by this pwm
> next fan
> next pwm
>
> This idea is to split the inner for loop into individual actions. This
> requires more code, but when dealing with hardware, doing the right
> thing in a timely manner is more important than saving a couple lines
> of code and CPU cycles. My approach has three merits:
> * It is more obviously correct (or, if the code is broken, more
> obviously broken.) Each inner for loop will do only one thing, so
> hopefully it will do it well.
> * It limits the number of PWM on/off to the minimum, as well as the
> zero duty cycle time (fan potentially off).
> * It warns about stuck fans faster than other approaches do.
>
> Opinions?
>
> --
> Jean Delvare
>
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
` (4 preceding siblings ...)
2010-08-19 11:23 ` Charles Pillar
@ 2010-08-19 13:16 ` Jean Delvare
2010-08-19 13:37 ` Jean Delvare
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Jean Delvare @ 2010-08-19 13:16 UTC (permalink / raw)
To: lm-sensors
Hi Charles,
On Thu, 19 Aug 2010 11:06:41 +0200, Jean Delvare wrote:
> On Mon, 16 Aug 2010 08:51:22 -0700, Guenter Roeck wrote:
> > On Mon, 2010-08-16 at 00:07 -0400, Charles Pillar wrote:
> > > Hi all,
> > > I'm new here, apologies in.advance if I do something wrong. I
> > > believe that I have found a bug in pwmconfig. I first observed this
> > > behavior many many months ago and couldn't find anyone else with the
> > > problem so I just assumed it was just me. I've since stumbled on it
> > > again so I decided to look into it myself. I don't know if anyone is
> > > aware of the behavior I am seeing, but here it is...
> > >
> > > Take for example a board with two or more PWM controllable fans both
> > > which of which the speed can be measured. Thus I have:
> > >
> > > /sys/class/hwmon/hwmon0/pwm1
> > > /sys/class/hwmon/hwmon0/pwm1_enable
> > > /sys/class/hwmon/hwmon0/fan1_input
> > > /sys/class/hwmon/hwmon0/pwm2
> > > /sys/class/hwmon/hwmon0/pwm2_enable
> > > /sys/class/hwmon/hwmon0/fan2_input
> > > (etc...)
> > >
> > > pwm1 & pwm1_enable = fan1_input
> > > pwm2 & pwm2_enable = fan2_input
> > > (etc...)
> > >
> > > I think this would be a fairly common set up? Indeed I have three
> > > machines that are setup this way (1 board has 2 fans, the other 2 have
> > > 3 fans each)
> > >
> > > From what I can see, pwmconfig does this:
> > >
> > > pwm1_enable=0
> > > pwm2_enable=0
> > > wait...
> > > for each PWM:
> > > this pwm_enable=1
> > > this pwm=0
> > > for each fan
> > > compare this fan before / after
> > > this pwm_enable=0
> > > check fan returns to normal
> > > next fan
> > > next pwm
> > >
> > > The problem with this logic is that for each PWM, the pwm_enable is
> > > set to 1, then the first fan is tested, after the first fan is tested,
> > > the pwm is disabled and never re-enabled (until the next pwm)...
> > > This means the pwm1ún1 correlation is detected, but pwm2ún2 is not
> > > - but only because the pwm_enable is still set to 0 when the second
> > > and subsequent fans are tested...
>
> Thanks a lot for reporting. I have a hard time believing that this has
> been broken for years and you're the first one to report, but this is
> the case... I'm even more surprised that _I_ did not notice the bug,
> while I have been using pwmconfig a lot and worked a lot on it over the
> past few years.
I tested the old code again to try and understand how I could have
missed the bug so far. The explanation is that the current code is not
only broken, it's also racy. And the race works around the bug, at
least in my case.
The reason why the problem wasn't noticed earlier is because we do not
have an unconditional delay after the faulty pwmdisable $i. We only
wait for the fan to settle if a correlation was found, because we want
to ensure that the fan is back to full speed again. If no correlation
is found, we move on quickly to the next fan input. With some luck,
said fan didn't yet have the time to go back to full speed. Or if it
did, the caching we do in most hwmon drivers will cause the driver to
not return an up-to-date fan speed value. Either way, the correlation
with the second fan is found in my case. If I add a sleep after
pwmdisable $i, then it is no longer found (i.e. the bug can be reliably
triggered.)
Now I am surprised that you managed to repeatedly hit the bug without
the extra sleep. Maybe you are using a driver which doesn't cache fan
speeds?
As a side note, this all suggests that cached fan speeds should be
invalidated each time PWM settings change. I can't remember any hwmon
driver doing this, but they should.
We still want to fix pwmconfig of course, but at least I now understand
why the bug wasn't found earlier.
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
` (5 preceding siblings ...)
2010-08-19 13:16 ` Jean Delvare
@ 2010-08-19 13:37 ` Jean Delvare
2010-08-19 14:31 ` Guenter Roeck
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Jean Delvare @ 2010-08-19 13:37 UTC (permalink / raw)
To: lm-sensors
Hi Charles,
> On Thu, Aug 19, 2010 at 6:36 PM, Jean Delvare <khali@linux-fr.org> wrote:
> > I think I would prefer a radically different approach:
> >
> > pwm1_enable=0
> > pwm2_enable=0
> > wait...
> > for each PWM:
> > this pwm_enable=1
> > this pwm=0
> > for each fan
> > check and store fan speed
> > next fan
> > this pwm_enable=0
> >
> > for each fan
> > verify fan speed returned to normal
> > next fan
> >
> > for each fan
> > check stored fan speed
> > if speed lower than threshold:
> > mark fan as controlled by this pwm
> > next fan
> > next pwm
> >
> > This idea is to split the inner for loop into individual actions. This
> > requires more code, but when dealing with hardware, doing the right
> > thing in a timely manner is more important than saving a couple lines
> > of code and CPU cycles. My approach has three merits:
> > * It is more obviously correct (or, if the code is broken, more
> > obviously broken.) Each inner for loop will do only one thing, so
> > hopefully it will do it well.
> > * It limits the number of PWM on/off to the minimum, as well as the
> > zero duty cycle time (fan potentially off).
> > * It warns about stuck fans faster than other approaches do.
> >
> > Opinions?
On Thu, 19 Aug 2010 20:41:37 +0930, Charles Pillar wrote:
> Thanks for your feedback. Jean that suggestion looks great to me, I
> guess I was going for the minimal approach, just in case I was stark
> raving mad :P but I agree, what you proposed does look like a better
> way to go.
Hmm, in the end I came up with the following fix, which is quite
different from my proposed implementation, but also way less intrusive.
The two parts could even be committed separately, as they fix different
bugs. The upper part of the patch fixes the problem you actually
reported. The lower part of the patch fixes redundant sleeps in case a
given PWM output controls more than one fan.
Can you please try this patch and confirm that it solves your problem?
Index: prog/pwm/pwmconfig
=================================--- prog/pwm/pwmconfig (révision 5853)
+++ prog/pwm/pwmconfig (copie de travail)
@@ -437,15 +437,19 @@
echo '^C received, restoring PWM and aborting...'
exit 1
fi
+
+ # Sample all current fan speeds at once, so that we can quickly
+ # disable PWM.
+ CURRENT_SPEEDS="`cat $GOODFAN`"
+ pwmdisable $i
+
let pwmactivecount=0
let count=1
for j in $GOODFAN
do
OS=`echo $SPEEDS | cut -d' ' -f$count`
- # this will return the first field if there's only one (sysfs)
- S=`cat $j | cut -d' ' -f2`
+ S=`echo $CURRENT_SPEEDS | cut -d' ' -f$count`
echo " $j ... speed was $OS now $S"
- pwmdisable $i
let threshold=2*$OS/3
if [ $S -lt $threshold ]
then
@@ -460,16 +464,18 @@
pwmactive="$i ${pwmactive}"
fanactive="$j ${fanactive}"
fanactive_min="$S ${fanactive_min}"
+
+ # Give the fans time to return to full speed
+ sleep $DELAY
+ if [ $? -ne 0 ]
+ then
+ echo '^C received, aborting...'
+ exit 1
+ fi
else
fanactive="$j+${fanactive}" #not supported yet by fancontrol
fanactive_min="$S+${fanactive_min}"
fi
- sleep $DELAY
- if [ $? -ne 0 ]
- then
- echo '^C received, aborting...'
- exit 1
- fi
# this will return the first field if there's only one (sysfs)
S=`cat $j | cut -d' ' -f2`
if [ $S -lt $threshold ]
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
` (6 preceding siblings ...)
2010-08-19 13:37 ` Jean Delvare
@ 2010-08-19 14:31 ` Guenter Roeck
2010-08-22 12:11 ` Jean Delvare
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Guenter Roeck @ 2010-08-19 14:31 UTC (permalink / raw)
To: lm-sensors
On Thu, Aug 19, 2010 at 09:37:23AM -0400, Jean Delvare wrote:
> Hi Charles,
>
> > On Thu, Aug 19, 2010 at 6:36 PM, Jean Delvare <khali@linux-fr.org> wrote:
> > > I think I would prefer a radically different approach:
> > >
> > > pwm1_enable=0
> > > pwm2_enable=0
> > > wait...
> > > for each PWM:
> > > this pwm_enable=1
> > > this pwm=0
> > > for each fan
> > > check and store fan speed
> > > next fan
> > > this pwm_enable=0
> > >
> > > for each fan
> > > verify fan speed returned to normal
> > > next fan
> > >
> > > for each fan
> > > check stored fan speed
> > > if speed lower than threshold:
> > > mark fan as controlled by this pwm
> > > next fan
> > > next pwm
> > >
> > > This idea is to split the inner for loop into individual actions. This
> > > requires more code, but when dealing with hardware, doing the right
> > > thing in a timely manner is more important than saving a couple lines
> > > of code and CPU cycles. My approach has three merits:
> > > * It is more obviously correct (or, if the code is broken, more
> > > obviously broken.) Each inner for loop will do only one thing, so
> > > hopefully it will do it well.
> > > * It limits the number of PWM on/off to the minimum, as well as the
> > > zero duty cycle time (fan potentially off).
> > > * It warns about stuck fans faster than other approaches do.
> > >
> > > Opinions?
>
> On Thu, 19 Aug 2010 20:41:37 +0930, Charles Pillar wrote:
> > Thanks for your feedback. Jean that suggestion looks great to me, I
> > guess I was going for the minimal approach, just in case I was stark
> > raving mad :P but I agree, what you proposed does look like a better
> > way to go.
>
> Hmm, in the end I came up with the following fix, which is quite
> different from my proposed implementation, but also way less intrusive.
> The two parts could even be committed separately, as they fix different
> bugs. The upper part of the patch fixes the problem you actually
> reported. The lower part of the patch fixes redundant sleeps in case a
> given PWM output controls more than one fan.
>
> Can you please try this patch and confirm that it solves your problem?
>
> Index: prog/pwm/pwmconfig
> =================================> --- prog/pwm/pwmconfig (révision 5853)
> +++ prog/pwm/pwmconfig (copie de travail)
> @@ -437,15 +437,19 @@
> echo '^C received, restoring PWM and aborting...'
> exit 1
> fi
> +
> + # Sample all current fan speeds at once, so that we can quickly
> + # disable PWM.
> + CURRENT_SPEEDS="`cat $GOODFAN`"
> + pwmdisable $i
> +
> let pwmactivecount=0
> let count=1
> for j in $GOODFAN
> do
> OS=`echo $SPEEDS | cut -d' ' -f$count`
> - # this will return the first field if there's only one (sysfs)
> - S=`cat $j | cut -d' ' -f2`
> + S=`echo $CURRENT_SPEEDS | cut -d' ' -f$count`
> echo " $j ... speed was $OS now $S"
> - pwmdisable $i
> let threshold=2*$OS/3
> if [ $S -lt $threshold ]
> then
> @@ -460,16 +464,18 @@
> pwmactive="$i ${pwmactive}"
> fanactive="$j ${fanactive}"
> fanactive_min="$S ${fanactive_min}"
> +
> + # Give the fans time to return to full speed
> + sleep $DELAY
> + if [ $? -ne 0 ]
> + then
> + echo '^C received, aborting...'
> + exit 1
> + fi
> else
> fanactive="$j+${fanactive}" #not supported yet by fancontrol
> fanactive_min="$S+${fanactive_min}"
> fi
> - sleep $DELAY
> - if [ $? -ne 0 ]
> - then
> - echo '^C received, aborting...'
> - exit 1
> - fi
> # this will return the first field if there's only one (sysfs)
> S=`cat $j | cut -d' ' -f2`
> if [ $S -lt $threshold ]
>
Looks good to me. My intend was to fix the problem while not being too intrusive,
for which you found a much better solution than mine.
Guenter
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
` (7 preceding siblings ...)
2010-08-19 14:31 ` Guenter Roeck
@ 2010-08-22 12:11 ` Jean Delvare
2010-08-22 12:58 ` Charles Pillar
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Jean Delvare @ 2010-08-22 12:11 UTC (permalink / raw)
To: lm-sensors
On Thu, 19 Aug 2010 15:37:23 +0200, Jean Delvare wrote:
> Can you please try this patch and confirm that it solves your problem?
Charles, did you find the time to test my patch? I'd like to commit it
as soon as possible, and then release a new version of lm-sensors.
Thanks,
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
` (8 preceding siblings ...)
2010-08-22 12:11 ` Jean Delvare
@ 2010-08-22 12:58 ` Charles Pillar
2010-08-22 16:26 ` Jean Delvare
2010-08-24 5:52 ` Charles Pillar
11 siblings, 0 replies; 13+ messages in thread
From: Charles Pillar @ 2010-08-22 12:58 UTC (permalink / raw)
To: lm-sensors
Hi Jean,
Sorry, was planning to test it yesterday, but then things got busy
:( . I've just tried it now and it worked great, I can see no reason
not to go ahead and commit.
Thanks
Charles
On Sun, Aug 22, 2010 at 9:41 PM, Jean Delvare <khali@linux-fr.org> wrote:
> On Thu, 19 Aug 2010 15:37:23 +0200, Jean Delvare wrote:
>> Can you please try this patch and confirm that it solves your problem?
>
> Charles, did you find the time to test my patch? I'd like to commit it
> as soon as possible, and then release a new version of lm-sensors.
>
> Thanks,
> --
> Jean Delvare
>
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
` (9 preceding siblings ...)
2010-08-22 12:58 ` Charles Pillar
@ 2010-08-22 16:26 ` Jean Delvare
2010-08-24 5:52 ` Charles Pillar
11 siblings, 0 replies; 13+ messages in thread
From: Jean Delvare @ 2010-08-22 16:26 UTC (permalink / raw)
To: lm-sensors
On Sun, 22 Aug 2010 22:26:04 +0930, Charles Pillar wrote:
> Hi Jean,
> Sorry, was planning to test it yesterday, but then things got busy
> :( . I've just tried it now and it worked great, I can see no reason
> not to go ahead and commit.
Thanks, it is committed now.
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [lm-sensors] pwmconfig doesn't detect correlations properly
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
` (10 preceding siblings ...)
2010-08-22 16:26 ` Jean Delvare
@ 2010-08-24 5:52 ` Charles Pillar
11 siblings, 0 replies; 13+ messages in thread
From: Charles Pillar @ 2010-08-24 5:52 UTC (permalink / raw)
To: lm-sensors
Excellent!,
Thank you to both of you for your help.
Regards
Charles
On Mon, Aug 23, 2010 at 1:56 AM, Jean Delvare <khali@linux-fr.org> wrote:
> On Sun, 22 Aug 2010 22:26:04 +0930, Charles Pillar wrote:
>> Hi Jean,
>> Sorry, was planning to test it yesterday, but then things got busy
>> :( . I've just tried it now and it worked great, I can see no reason
>> not to go ahead and commit.
>
> Thanks, it is committed now.
>
> --
> Jean Delvare
>
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-08-24 5:52 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-28 6:26 [lm-sensors] pwmconfig doesn't detect correlations properly Charles Pillar
2010-08-16 4:19 ` Charles Pillar
2010-08-16 15:51 ` Guenter Roeck
2010-08-19 9:06 ` Jean Delvare
2010-08-19 9:37 ` Jean Delvare
2010-08-19 11:23 ` Charles Pillar
2010-08-19 13:16 ` Jean Delvare
2010-08-19 13:37 ` Jean Delvare
2010-08-19 14:31 ` Guenter Roeck
2010-08-22 12:11 ` Jean Delvare
2010-08-22 12:58 ` Charles Pillar
2010-08-22 16:26 ` Jean Delvare
2010-08-24 5:52 ` Charles Pillar
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.