* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
@ 2006-08-06 5:22 Sunil Kumar
2006-08-07 10:19 ` Jean Delvare
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: Sunil Kumar @ 2006-08-06 5:22 UTC (permalink / raw)
To: lm-sensors
Hans,
I found an issue with the driver. It seems to have put something in the
uguru part of BIOS, which makes my BIOS hang when I enter Abit uGuru
temperature monitor screen in the BIOS. No keys work and only way for me at
that point is to give it the three finger salute. The peculiar thing I
noticed was that somehow it modified the shutdown and beep CPU temperatures
to 245C and 235C while I had them set at 65C and 75C.
As soon as the temp monitor displays 245C in CPU row, it just hangs. The
next values to be displayed are the enable bit for these temps, I think.
I never used the driver to write anything to BIOS, so how did it end up
updating those values?
Thanks,
-Sunil
On 7/31/06, Hans de Goede <j.w.r.degoede at hhs.nl> wrote:
>
> Sunil Kumar wrote:
> > but how do you account for the HZ.
>
> I don't, the system you've been running on has a HZ of 1000 I assume?
> That means that they delays we have found are them minimal ones needed
> to mkae things work, sleeping longer with lower HZ is unfortunate but
> not harmfull. Lower HZ has been taken into account in that we try not to
> sleep much, because otherwise delays for the calling up would become
> unacceptable. But besides that I do not take HZ into account. Remember
> we are dealing with an error / exception path here. It doesn't have to
> be beautifull or very efficient it just has to work and not suck.
>
> > I will give the 1:3 a run.
> >
> Thanks, In combination with a TIMEOUT of 100 I assume?
>
> Regards,
>
> Hans
>
>
>
>
> With msleep(1), one system will sleep
> > for 20ms while other will sleep only 2ms for the last try. We need to
> > either 1. make it msleep(20), so all systems sleep for 20ms per read OR
> > 2. have a conditional based on HZ to do msleep(1) for 100 and do
> > multiple msleep(1) for HZ 1000 OR 3. have it as a configured parameter.
> > 1st option means that 1000HZ folks will suffer delays which they could
> > have avoided because they can sleep finer, but without option 2, they
> > will just sleep for 2ms which may not be sufficient. 2nd and 3rd options
> > work but 3rd works better because its simple and makes a lot of sense
> > for the variety of hardware and BIOSes that you could be dealing with.
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060805/b2c7951f/attachment.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
@ 2006-08-07 10:19 ` Jean Delvare
2006-08-07 16:10 ` Sunil Kumar
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Jean Delvare @ 2006-08-07 10:19 UTC (permalink / raw)
To: lm-sensors
Hi Sunil, Hans,
> I found an issue with the driver. It seems to have put something in the
> uguru part of BIOS, which makes my BIOS hang when I enter Abit uGuru
> temperature monitor screen in the BIOS. No keys work and only way for me at
> that point is to give it the three finger salute. The peculiar thing I
> noticed was that somehow it modified the shutdown and beep CPU temperatures
> to 245C and 235C while I had them set at 65C and 75C.
>
> As soon as the temp monitor displays 245C in CPU row, it just hangs. The
> next values to be displayed are the enable bit for these temps, I think.
>
> I never used the driver to write anything to BIOS, so how did it end up
> updating those values?
Err, this is bad :(
Hans, see why I was reluctant to having your driver in the kernel tree?
Nothing personal, but that kind of problem is to be expected when
writing a driver without a datasheet :(
What should we do now? Mark the driver broken in the kernel tree?
2.6.18 is just around the corner, and we sure don't want people to
corrupt their BIOS.
Sunil, did you try a complete power-off (with PSU switched off or
unplugged) to see if the default values are back?
--
Jean Delvare
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
2006-08-07 10:19 ` Jean Delvare
@ 2006-08-07 16:10 ` Sunil Kumar
2006-08-07 19:34 ` Sergey Vlasov
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sunil Kumar @ 2006-08-07 16:10 UTC (permalink / raw)
To: lm-sensors
Yeah, I tried that. Didn't help. I think BIOS reset might help, but I have
to note down some settings before I do that (bigger exercise than just cmos
jumper moves, which in itself involves opening the box....<yawn>). I am able
to read/write all settings except for uguru temp screen. My system functions
without any problems (none that I have noted), except it will not beep or
shutdown if CPU were to become hotter than my tcase of 65C, which is
possible if my cpu fan dies. With the fan working, at full load it stays
around 45-46C.
Following is the actual output (245/235 I wrote from own memory) from
sensors and my BIOS screen gets stuck when it prints "250C". The values I
had set for these were either 60-65 or 65-75, I don't remember exactly (was
more than 9 months ago).
$ sensors
abituguru-isa-00e0
Adapter: ISA adapter
CPU Core Voltage: +1.38 V (min +1.00 V, max +1.60 V)
DDR Voltage: +2.60 V (min +2.10 V, max +3.10 V)
DDR VTT Voltage: +1.31 V (min +1.05 V, max +1.55 V)
nForce4 Standby Voltage:
+1.51 V (min +1.25 V, max +1.85 V)
CPU VDDA 2.5V Voltage: +2.56 V (min +2.00 V, max +3.00 V)
HyperTransport Voltage: +1.21 V (min +0.94 V, max +1.45 V)
nForce4 Voltage: +1.59 V (min +1.25 V, max +1.85 V)
ATX +5V: +5.02 V (min +3.97 V, max +5.98 V)
ATX +3.3V: +3.33 V (min +2.63 V, max +3.93 V)
ATX 5VSB Voltage: +5.00 V (min +3.97 V, max +5.98 V)
ATX +12V: +11.95 V (min +9.56 V, max +14.34 V)
CPU Temperature: +33 C (high = +245 C, crit = +250 C)
SYS Temperature: +30 C (high = +55 C, crit = +65 C)
PWM Temperature: +36 C (high = +80 C, crit = +90 C)
CPU FAN Speed: 1740 RPM (min 900 RPM)
NF4 FAN Speed: 0 RPM (min 1200 RPM)
SYS FAN Speed: 1080 RPM (min 720 RPM)
OTES1 FAN Speed: 0 RPM (min 1200 RPM)
OTES2 FAN Speed: 0 RPM (min 720 RPM)
AUX FAN Speed: 0 RPM (min 1200 RPM)
-Sunil
On 8/7/06, Jean Delvare <khali at linux-fr.org> wrote:
>
> Hi Sunil, Hans,
>
> > I found an issue with the driver. It seems to have put something in the
> > uguru part of BIOS, which makes my BIOS hang when I enter Abit uGuru
> > temperature monitor screen in the BIOS. No keys work and only way for me
> at
> > that point is to give it the three finger salute. The peculiar thing I
> > noticed was that somehow it modified the shutdown and beep CPU
> temperatures
> > to 245C and 235C while I had them set at 65C and 75C.
> >
> > As soon as the temp monitor displays 245C in CPU row, it just hangs. The
> > next values to be displayed are the enable bit for these temps, I think.
> >
> > I never used the driver to write anything to BIOS, so how did it end up
> > updating those values?
>
> Err, this is bad :(
>
> Hans, see why I was reluctant to having your driver in the kernel tree?
> Nothing personal, but that kind of problem is to be expected when
> writing a driver without a datasheet :(
>
> What should we do now? Mark the driver broken in the kernel tree?
> 2.6.18 is just around the corner, and we sure don't want people to
> corrupt their BIOS.
>
> Sunil, did you try a complete power-off (with PSU switched off or
> unplugged) to see if the default values are back?
>
> --
> Jean Delvare
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060807/39b01fd0/attachment.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
2006-08-07 10:19 ` Jean Delvare
2006-08-07 16:10 ` Sunil Kumar
@ 2006-08-07 19:34 ` Sergey Vlasov
2006-08-07 20:53 ` Sunil Kumar
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Vlasov @ 2006-08-07 19:34 UTC (permalink / raw)
To: lm-sensors
On Mon, 7 Aug 2006 09:09:45 -0701 Sunil Kumar wrote:
> Yeah, I tried that. Didn't help. I think BIOS reset might help, but I have
> to note down some settings before I do that (bigger exercise than just cmos
> jumper moves, which in itself involves opening the box....<yawn>). I am able
> to read/write all settings except for uguru temp screen. My system functions
> without any problems (none that I have noted), except it will not beep or
> shutdown if CPU were to become hotter than my tcase of 65C, which is
> possible if my cpu fan dies. With the fan working, at full load it stays
> around 45-46C.
>
> Following is the actual output (245/235 I wrote from own memory) from
> sensors and my BIOS screen gets stuck when it prints "250C". The values I
> had set for these were either 60-65 or 65-75, I don't remember exactly (was
> more than 9 months ago).
>
> $ sensors
> abituguru-isa-00e0
> Adapter: ISA adapter
> CPU Core Voltage: +1.38 V (min +1.00 V, max +1.60 V)
> DDR Voltage: +2.60 V (min +2.10 V, max +3.10 V)
> DDR VTT Voltage: +1.31 V (min +1.05 V, max +1.55 V)
> nForce4 Standby Voltage:
> +1.51 V (min +1.25 V, max +1.85 V)
> CPU VDDA 2.5V Voltage: +2.56 V (min +2.00 V, max +3.00 V)
> HyperTransport Voltage: +1.21 V (min +0.94 V, max +1.45 V)
> nForce4 Voltage: +1.59 V (min +1.25 V, max +1.85 V)
> ATX +5V: +5.02 V (min +3.97 V, max +5.98 V)
> ATX +3.3V: +3.33 V (min +2.63 V, max +3.93 V)
> ATX 5VSB Voltage: +5.00 V (min +3.97 V, max +5.98 V)
> ATX +12V: +11.95 V (min +9.56 V, max +14.34 V)
> CPU Temperature: +33 C (high = +245 C, crit = +250 C)
Hm, then this code in abituguru_detect_bank1_sensor_type() might be the
culprit:
/* Volt sensor test, enable volt low alarm, set min value ridicously
high. If its a volt sensor this should always give us an alarm. */
buf[0] = ABIT_UGURU_VOLT_LOW_ALARM_ENABLE;
buf[1] = 245;
buf[2] = 250;
if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2, sensor_addr,
buf, 3) != 3)
return -ENODEV;
If this call succeeds, but then one of the two abituguru_read() calls
below fails for some reason, abituguru_detect_bank1_sensor_type()
returns without restoring the original sensor settings.
This method of probing looks dangerous - is there really no better way
to do it?
Now, the limits could be restored by writing the appropriate values to
sysfs, but there is no way to restore the corrupted flags (except for
patching the driver) - there is no sysfs attribute to control
ABIT_UGURU_VOLT_LOW_ALARM_ENABLE for a temperature sensor.
> SYS Temperature: +30 C (high = +55 C, crit = +65 C)
> PWM Temperature: +36 C (high = +80 C, crit = +90 C)
> CPU FAN Speed: 1740 RPM (min 900 RPM)
> NF4 FAN Speed: 0 RPM (min 1200 RPM)
> SYS FAN Speed: 1080 RPM (min 720 RPM)
> OTES1 FAN Speed: 0 RPM (min 1200 RPM)
> OTES2 FAN Speed: 0 RPM (min 720 RPM)
> AUX FAN Speed: 0 RPM (min 1200 RPM)
>
> -Sunil
>
>
> On 8/7/06, Jean Delvare <khali at linux-fr.org> wrote:
> >
> > Hi Sunil, Hans,
> >
> > > I found an issue with the driver. It seems to have put something in the
> > > uguru part of BIOS, which makes my BIOS hang when I enter Abit uGuru
> > > temperature monitor screen in the BIOS. No keys work and only way for me
> > at
> > > that point is to give it the three finger salute. The peculiar thing I
> > > noticed was that somehow it modified the shutdown and beep CPU
> > temperatures
> > > to 245C and 235C while I had them set at 65C and 75C.
> > >
> > > As soon as the temp monitor displays 245C in CPU row, it just hangs. The
> > > next values to be displayed are the enable bit for these temps, I think.
> > >
> > > I never used the driver to write anything to BIOS, so how did it end up
> > > updating those values?
> >
> > Err, this is bad :(
> >
> > Hans, see why I was reluctant to having your driver in the kernel tree?
> > Nothing personal, but that kind of problem is to be expected when
> > writing a driver without a datasheet :(
> >
> > What should we do now? Mark the driver broken in the kernel tree?
> > 2.6.18 is just around the corner, and we sure don't want people to
> > corrupt their BIOS.
> >
> > Sunil, did you try a complete power-off (with PSU switched off or
> > unplugged) to see if the default values are back?
> >
> > --
> > Jean Delvare
> >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: not available
Url : http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060807/9d9eae6d/attachment.bin
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
` (2 preceding siblings ...)
2006-08-07 19:34 ` Sergey Vlasov
@ 2006-08-07 20:53 ` Sunil Kumar
2006-08-08 15:25 ` Sergey Vlasov
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sunil Kumar @ 2006-08-07 20:53 UTC (permalink / raw)
To: lm-sensors
abituguru_read() calls have timedout on me quite a few times. In fact, that
was the discussion point in this thread so far. So, that is highly likely
the reason for this corruption.
what is the course of action for me here?
On 8/7/06, Sergey Vlasov <vsu at altlinux.ru> wrote:
>
> On Mon, 7 Aug 2006 09:09:45 -0701 Sunil Kumar wrote:
>
> > Yeah, I tried that. Didn't help. I think BIOS reset might help, but I
> have
> > to note down some settings before I do that (bigger exercise than just
> cmos
> > jumper moves, which in itself involves opening the box....<yawn>). I am
> able
> > to read/write all settings except for uguru temp screen. My system
> functions
> > without any problems (none that I have noted), except it will not beep
> or
> > shutdown if CPU were to become hotter than my tcase of 65C, which is
> > possible if my cpu fan dies. With the fan working, at full load it stays
> > around 45-46C.
> >
> > Following is the actual output (245/235 I wrote from own memory) from
> > sensors and my BIOS screen gets stuck when it prints "250C". The values
> I
> > had set for these were either 60-65 or 65-75, I don't remember exactly
> (was
> > more than 9 months ago).
> >
> > $ sensors
> > abituguru-isa-00e0
> > Adapter: ISA adapter
> > CPU Core Voltage: +1.38 V (min +1.00 V, max +1.60 V)
> > DDR Voltage: +2.60 V (min +2.10 V, max +3.10 V)
> > DDR VTT Voltage: +1.31 V (min +1.05 V, max +1.55 V)
> > nForce4 Standby Voltage:
> > +1.51 V (min +1.25 V, max +1.85 V)
> > CPU VDDA 2.5V Voltage: +2.56 V (min +2.00 V, max +3.00 V)
> > HyperTransport Voltage: +1.21 V (min +0.94 V, max +1.45 V)
> > nForce4 Voltage: +1.59 V (min +1.25 V, max +1.85 V)
> > ATX +5V: +5.02 V (min +3.97 V, max +5.98 V)
> > ATX +3.3V: +3.33 V (min +2.63 V, max +3.93 V)
> > ATX 5VSB Voltage: +5.00 V (min +3.97 V, max +5.98 V)
> > ATX +12V: +11.95 V (min +9.56 V, max +14.34 V)
> > CPU Temperature: +33 C (high = +245 C, crit = +250 C)
>
> Hm, then this code in abituguru_detect_bank1_sensor_type() might be the
> culprit:
>
> /* Volt sensor test, enable volt low alarm, set min value
> ridicously
> high. If its a volt sensor this should always give us an alarm.
> */
> buf[0] = ABIT_UGURU_VOLT_LOW_ALARM_ENABLE;
> buf[1] = 245;
> buf[2] = 250;
> if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2,
> sensor_addr,
> buf, 3) != 3)
> return -ENODEV;
>
> If this call succeeds, but then one of the two abituguru_read() calls
> below fails for some reason, abituguru_detect_bank1_sensor_type()
> returns without restoring the original sensor settings.
>
> This method of probing looks dangerous - is there really no better way
> to do it?
>
> Now, the limits could be restored by writing the appropriate values to
> sysfs, but there is no way to restore the corrupted flags (except for
> patching the driver) - there is no sysfs attribute to control
> ABIT_UGURU_VOLT_LOW_ALARM_ENABLE for a temperature sensor.
>
> > SYS Temperature: +30 C (high = +55 C, crit = +65 C)
> > PWM Temperature: +36 C (high = +80 C, crit = +90 C)
> > CPU FAN Speed: 1740 RPM (min 900 RPM)
> > NF4 FAN Speed: 0 RPM (min 1200 RPM)
> > SYS FAN Speed: 1080 RPM (min 720 RPM)
> > OTES1 FAN Speed: 0 RPM (min 1200 RPM)
> > OTES2 FAN Speed: 0 RPM (min 720 RPM)
> > AUX FAN Speed: 0 RPM (min 1200 RPM)
> >
> > -Sunil
> >
> >
> > On 8/7/06, Jean Delvare <khali at linux-fr.org> wrote:
> > >
> > > Hi Sunil, Hans,
> > >
> > > > I found an issue with the driver. It seems to have put something in
> the
> > > > uguru part of BIOS, which makes my BIOS hang when I enter Abit uGuru
> > > > temperature monitor screen in the BIOS. No keys work and only way
> for me
> > > at
> > > > that point is to give it the three finger salute. The peculiar thing
> I
> > > > noticed was that somehow it modified the shutdown and beep CPU
> > > temperatures
> > > > to 245C and 235C while I had them set at 65C and 75C.
> > > >
> > > > As soon as the temp monitor displays 245C in CPU row, it just hangs.
> The
> > > > next values to be displayed are the enable bit for these temps, I
> think.
> > > >
> > > > I never used the driver to write anything to BIOS, so how did it end
> up
> > > > updating those values?
> > >
> > > Err, this is bad :(
> > >
> > > Hans, see why I was reluctant to having your driver in the kernel
> tree?
> > > Nothing personal, but that kind of problem is to be expected when
> > > writing a driver without a datasheet :(
> > >
> > > What should we do now? Mark the driver broken in the kernel tree?
> > > 2.6.18 is just around the corner, and we sure don't want people to
> > > corrupt their BIOS.
> > >
> > > Sunil, did you try a complete power-off (with PSU switched off or
> > > unplugged) to see if the default values are back?
> > >
> > > --
> > > Jean Delvare
> > >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060807/bac09b4e/attachment-0001.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
` (3 preceding siblings ...)
2006-08-07 20:53 ` Sunil Kumar
@ 2006-08-08 15:25 ` Sergey Vlasov
2006-08-09 18:35 ` Hans de Goede
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sergey Vlasov @ 2006-08-08 15:25 UTC (permalink / raw)
To: lm-sensors
On Mon, Aug 07, 2006 at 01:53:39PM -0700, Sunil Kumar wrote:
> abituguru_read() calls have timedout on me quite a few times. In fact, that
> was the discussion point in this thread so far. So, that is highly likely
> the reason for this corruption.
>
> what is the course of action for me here?
Did you try to reset BIOS settings to defaults (first in the BIOS
setup, if that does not help, try the jumper)?
> On 8/7/06, Sergey Vlasov <vsu at altlinux.ru> wrote:
> >
> >On Mon, 7 Aug 2006 09:09:45 -0701 Sunil Kumar wrote:
> >
> >> Yeah, I tried that. Didn't help. I think BIOS reset might help, but I
> >have
> >> to note down some settings before I do that (bigger exercise than just
> >cmos
> >> jumper moves, which in itself involves opening the box....<yawn>). I am
> >able
> >> to read/write all settings except for uguru temp screen. My system
> >functions
> >> without any problems (none that I have noted), except it will not beep
> >or
> >> shutdown if CPU were to become hotter than my tcase of 65C, which is
> >> possible if my cpu fan dies. With the fan working, at full load it stays
> >> around 45-46C.
[skip]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060808/68334ab2/attachment.bin
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
` (4 preceding siblings ...)
2006-08-08 15:25 ` Sergey Vlasov
@ 2006-08-09 18:35 ` Hans de Goede
2006-08-09 20:48 ` Sunil Kumar
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Hans de Goede @ 2006-08-09 18:35 UTC (permalink / raw)
To: lm-sensors
Hi All,
I was on a long weekend / short vacation hence the slow response.
Jean Delvare wrote:
> Hi Sunil, Hans,
>
>> I found an issue with the driver. It seems to have put something in the
>> uguru part of BIOS, which makes my BIOS hang when I enter Abit uGuru
>> temperature monitor screen in the BIOS. No keys work and only way for me at
>> that point is to give it the three finger salute. The peculiar thing I
>> noticed was that somehow it modified the shutdown and beep CPU temperatures
>> to 245C and 235C while I had them set at 65C and 75C.
>>
>> As soon as the temp monitor displays 245C in CPU row, it just hangs. The
>> next values to be displayed are the enable bit for these temps, I think.
>>
>> I never used the driver to write anything to BIOS, so how did it end up
>> updating those values?
>
> Err, this is bad :(
>
> Hans, see why I was reluctant to having your driver in the kernel tree?
> Nothing personal, but that kind of problem is to be expected when
> writing a driver without a datasheet :(
>
This would / could have happened with a datasheet too, its a combination
of a buggy chip (the uguru) which is a pain to work with (it would be
even with a datasheet) and a bug in my driver.
> What should we do now? Mark the driver broken in the kernel tree?
> 2.6.18 is just around the corner, and we sure don't want people to
> corrupt their BIOS.
>
Things are not as bad as they look, the driver did not corrupt the BIOS,
nor the CMOS ram. For some strange reason (design probably) the Abit
uGuru motherboards BIOS saves the uguru settings to CMOS on successfull
(ACPI) poweroff. So all the driver did was write the uGuru register it
was designed to write, its not writing over random memory.
Also notice that the BIOS is also being buggy in this case, because it
hangs because of a single invalid byte in the uguru's settings, whereas
it should just be checking the relevant bits and ignoring the irrelevant
ones.
The problem has been very correctly analysed by Sergey Vlasov
<vsu at altlinux.ru>, there is indeed a bug in the driver where it fails to
restore the original settings when any read fails during the sensor type
detect code. Attached is a version which fixes this bug and which will
always restore the settings (if changed) no matter how that function is
left. To be really sure it tries the restore up to 3 times in case of a
writing error.
Sergey also correctly points out that this code is writing the uguru
todo the probing and that this is not ideal, unfortunatly, I cannot
think of another way.
I do not believe this bug will reoccur, not with sunil's new and
improved timeout handling, which make the chance of actually getting a
timeout error pretty small, combine this with the always restoring the
settings on an error, instead of the current buggy behaviour and
retrying this restore in case of a write failure (due to those same
timeouts, which are pretty rare now), the total chance of this happening
again is very small. And even if it happens a CMOS reset is enough, most
likely load setup defaults is enough, that fixed a similar issue for me
when I completly trashed the uguru during driver development (I was
probing random uguru bank addresses, which the uguru did not like).
To make really really sure we never get bitten by this I've added code
to the sensors detect code, to mask out any invalid bits from the
original settings byte before restoring it to the uguru.
Sunil,
Could you try the attached abituguru.c, after loading it once, it should
have fixed those invalid settings and you should be able to reenter the
relevant screen in your BIOS' setup just fine.
Here is a complete list of all the changes in this version compared to
the one in the kernel. I've also attached a unified diff between the
latest in kernel version and the attached one, so that it can get an
initial review. If this version fixes Sunil's problems as expected
without him needing to reset his BIOS, I'll then send an offical PATCH
with these changes for merging:
-Much improved timeout / wait for status handling. Many thanks to Sunil
Kumar, for all his testing, ideas and patches! The code now first busy
waits, polling the uguru for the expected status as this usual succeeds and
it succeeds pretty quickly (within 90 reads). To avoid unnescesarry CPU
burn
in timeout conditions, the amount of busy waiting has been halved from
previous versions (120 tries instead of 250). This is not a problem,
because
this version goes to sleep after 120 attemps for 1ms and then tries
again,
it does this sleep and try again 5 times before finally giving up. This
(almost?) completly removes the timeout errors some people have seen
regulary. Appearantly some older uguru versions sometimes are distracted
for a (relativly) long time. This solves this.
-These timeout errors not only occur in the sending address part of reading
the uguru but also in the wait for read state, so errors in this state are
now handled as retryable just like send address state errors and are only
logged and reported to userspace if 3 executive tries fail.
-Add rudimentary suspend / resume support, this protects the uguru and
the driver against suspend / resume cycles, so there is no reason to unload
the driver in your suspend / resume scripts.
-Fix a very nasty bug in the bank1 sensor type detection code, where it
would not restore the original settings in any of the error paths!
-Since not successfully restoring the original settings can seriously
confuse the system BIOS (hang when entering the relevant setup menu), we
now try restoring them 3 times before giving up.
-On successfull dectection of bank1 sensor type mask out any bits invalid
for this type from the sensors original settings before restoring them.
This should restore any damage done by the bug mentioned above, and should
also repair any damage done in the highly unlikely scenario of the original
sensor settings restoring failing.
Regards,
Hans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: abituguru.c
Type: text/x-csrc
Size: 53013 bytes
Desc: not available
Url : http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060809/0a222b95/attachment-0001.bin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: diff
Url: http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060809/0a222b95/attachment-0001.pl
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
` (5 preceding siblings ...)
2006-08-09 18:35 ` Hans de Goede
@ 2006-08-09 20:48 ` Sunil Kumar
2006-08-10 11:57 ` Hans de Goede
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sunil Kumar @ 2006-08-09 20:48 UTC (permalink / raw)
To: lm-sensors
ok, loading the patched abituguru didn't help the BIOS hang, its more than
those bits and BIOS may be really buggy. I finally noted down all the custom
changes to the BIOS settings (actually quite a few, it turned out) and did a
'load optimised defaults'. That helped clear up the hung screen in BIOS.
I am back to my original settings and things look good right now.
One thing I noted is that earlier 'modprobe abituguru' used to take about
560ms and now it takes almost twice that. As I suggested earlier, can we
please use msleep(0) instead of msleep(1). msleep(0) is a 1 jiffy sleep (1
ms for HZ 1000) while msleep(1) is twice that. I don't mind making
TIMEOUT_SLEEP 10 if we have to (to get the same effect as your current
code), but its better to sleep as fine in each step as you can.
Thanks
-Sunil
On 8/9/06, Hans de Goede <j.w.r.degoede at hhs.nl> wrote:
>
> Hi All,
>
> I was on a long weekend / short vacation hence the slow response.
>
> Jean Delvare wrote:
> > Hi Sunil, Hans,
> >
> >> I found an issue with the driver. It seems to have put something in the
> >> uguru part of BIOS, which makes my BIOS hang when I enter Abit uGuru
> >> temperature monitor screen in the BIOS. No keys work and only way for
> me at
> >> that point is to give it the three finger salute. The peculiar thing I
> >> noticed was that somehow it modified the shutdown and beep CPU
> temperatures
> >> to 245C and 235C while I had them set at 65C and 75C.
> >>
> >> As soon as the temp monitor displays 245C in CPU row, it just hangs.
> The
> >> next values to be displayed are the enable bit for these temps, I
> think.
> >>
> >> I never used the driver to write anything to BIOS, so how did it end up
> >> updating those values?
> >
> > Err, this is bad :(
> >
> > Hans, see why I was reluctant to having your driver in the kernel tree?
> > Nothing personal, but that kind of problem is to be expected when
> > writing a driver without a datasheet :(
> >
> This would / could have happened with a datasheet too, its a combination
> of a buggy chip (the uguru) which is a pain to work with (it would be
> even with a datasheet) and a bug in my driver.
>
> > What should we do now? Mark the driver broken in the kernel tree?
> > 2.6.18 is just around the corner, and we sure don't want people to
> > corrupt their BIOS.
> >
>
> Things are not as bad as they look, the driver did not corrupt the BIOS,
> nor the CMOS ram. For some strange reason (design probably) the Abit
> uGuru motherboards BIOS saves the uguru settings to CMOS on successfull
> (ACPI) poweroff. So all the driver did was write the uGuru register it
> was designed to write, its not writing over random memory.
>
> Also notice that the BIOS is also being buggy in this case, because it
> hangs because of a single invalid byte in the uguru's settings, whereas
> it should just be checking the relevant bits and ignoring the irrelevant
> ones.
>
> The problem has been very correctly analysed by Sergey Vlasov
> <vsu at altlinux.ru>, there is indeed a bug in the driver where it fails to
> restore the original settings when any read fails during the sensor type
> detect code. Attached is a version which fixes this bug and which will
> always restore the settings (if changed) no matter how that function is
> left. To be really sure it tries the restore up to 3 times in case of a
> writing error.
>
> Sergey also correctly points out that this code is writing the uguru
> todo the probing and that this is not ideal, unfortunatly, I cannot
> think of another way.
>
> I do not believe this bug will reoccur, not with sunil's new and
> improved timeout handling, which make the chance of actually getting a
> timeout error pretty small, combine this with the always restoring the
> settings on an error, instead of the current buggy behaviour and
> retrying this restore in case of a write failure (due to those same
> timeouts, which are pretty rare now), the total chance of this happening
> again is very small. And even if it happens a CMOS reset is enough, most
> likely load setup defaults is enough, that fixed a similar issue for me
> when I completly trashed the uguru during driver development (I was
> probing random uguru bank addresses, which the uguru did not like).
>
> To make really really sure we never get bitten by this I've added code
> to the sensors detect code, to mask out any invalid bits from the
> original settings byte before restoring it to the uguru.
>
> Sunil,
>
> Could you try the attached abituguru.c, after loading it once, it should
> have fixed those invalid settings and you should be able to reenter the
> relevant screen in your BIOS' setup just fine.
>
>
> Here is a complete list of all the changes in this version compared to
> the one in the kernel. I've also attached a unified diff between the
> latest in kernel version and the attached one, so that it can get an
> initial review. If this version fixes Sunil's problems as expected
> without him needing to reset his BIOS, I'll then send an offical PATCH
> with these changes for merging:
>
> -Much improved timeout / wait for status handling. Many thanks to Sunil
> Kumar, for all his testing, ideas and patches! The code now first busy
> waits, polling the uguru for the expected status as this usual succeeds
> and
> it succeeds pretty quickly (within 90 reads). To avoid unnescesarry CPU
> burn
> in timeout conditions, the amount of busy waiting has been halved from
> previous versions (120 tries instead of 250). This is not a problem,
> because
> this version goes to sleep after 120 attemps for 1ms and then tries
> again,
> it does this sleep and try again 5 times before finally giving up. This
> (almost?) completly removes the timeout errors some people have seen
> regulary. Appearantly some older uguru versions sometimes are distracted
> for a (relativly) long time. This solves this.
> -These timeout errors not only occur in the sending address part of
> reading
> the uguru but also in the wait for read state, so errors in this state are
> now handled as retryable just like send address state errors and are only
> logged and reported to userspace if 3 executive tries fail.
> -Add rudimentary suspend / resume support, this protects the uguru and
> the driver against suspend / resume cycles, so there is no reason to
> unload
> the driver in your suspend / resume scripts.
> -Fix a very nasty bug in the bank1 sensor type detection code, where it
> would not restore the original settings in any of the error paths!
> -Since not successfully restoring the original settings can seriously
> confuse the system BIOS (hang when entering the relevant setup menu), we
> now try restoring them 3 times before giving up.
> -On successfull dectection of bank1 sensor type mask out any bits invalid
> for this type from the sensors original settings before restoring them.
> This should restore any damage done by the bug mentioned above, and should
> also repair any damage done in the highly unlikely scenario of the
> original
> sensor settings restoring failing.
>
> Regards,
>
> Hans
>
>
>
> --- abituguru-1.1.10.c 2006-06-27 14:00:02.000000000 +0200
> +++ abituguru.c 2006-08-09 20:08:24.000000000 +0200
> @@ -26,6 +26,7 @@
> #include <linux/jiffies.h>
> #include <linux/mutex.h>
> #include <linux/err.h>
> +#include <linux/delay.h>
> #include <linux/platform_device.h>
> #include <linux/hwmon.h>
> #include <linux/hwmon-sysfs.h>
> @@ -64,17 +65,17 @@
> #define ABIT_UGURU_IN_SENSOR 0
> #define ABIT_UGURU_TEMP_SENSOR 1
> #define ABIT_UGURU_NC 2
> -/* Timeouts / Retries, if these turn out to need a lot of fiddling we
> could
> - convert them to params. */
> -/* 250 was determined by trial and error, 200 works most of the time, but
> not
> - always. I assume this is cpu-speed independent, since the ISA-bus and
> not
> - the CPU should be the bottleneck. Note that 250 sometimes is still not
> - enough (only reported on AN7 mb) this is handled by a higher layer. */
> -#define ABIT_UGURU_WAIT_TIMEOUT 250
> +/* In many cases we need to wait for the uGuru to reach a certain status,
> most
> + of the time it will reach this status within 30 - 90 ISA reads, and
> thus we
> + can best busy wait. This define gives the total amount of reads to
> try. */
> +#define ABIT_UGURU_WAIT_TIMEOUT 125
> +/* However sometimes older versions of the uGuru seem to be distracted
> and they
> + do not respond for a long time. To handle this we sleep before each of
> the
> + last ABIT_UGURU_WAIT_TIMEOUT_SLEEP tries. */
> +#define ABIT_UGURU_WAIT_TIMEOUT_SLEEP 5
> /* Normally all expected status in abituguru_ready, are reported after the
> - first read, but sometimes not and we need to poll, 5 polls was not
> enough
> - 50 sofar is. */
> -#define ABIT_UGURU_READY_TIMEOUT 50
> + first read, but sometimes not and we need to poll. */
> +#define ABIT_UGURU_READY_TIMEOUT 5
> /* Maximum 3 retries on timedout reads/writes, delay 200 ms before
> retrying */
> #define ABIT_UGURU_MAX_RETRIES 3
> #define ABIT_UGURU_RETRY_DELAY (HZ/5)
> @@ -226,6 +227,10 @@
> timeout--;
> if (timeout = 0)
> return -EBUSY;
> + /* sleep a bit before our last few tries, see the comment
> on
> + this where ABIT_UGURU_WAIT_TIMEOUT_SLEEP is defined. */
> + if (timeout <= ABIT_UGURU_WAIT_TIMEOUT_SLEEP)
> + msleep(1);
> }
> return 0;
> }
> @@ -256,6 +261,7 @@
> "CMD reg does not hold 0xAC after ready
> command\n");
> return -EIO;
> }
> + msleep(1);
> }
>
> /* After this the ABIT_UGURU_DATA port should contain
> @@ -268,6 +274,7 @@
> "state != more input after ready
> command\n");
> return -EIO;
> }
> + msleep(1);
> }
>
> data->uguru_ready = 1;
> @@ -331,7 +338,8 @@
> /* And read the data */
> for (i = 0; i < count; i++) {
> if (abituguru_wait(data, ABIT_UGURU_STATUS_READ)) {
> - ABIT_UGURU_DEBUG(1, "timeout exceeded waiting for
> "
> + ABIT_UGURU_DEBUG(retries? 1:3,
> + "timeout exceeded waiting for "
> "read state (bank: %d, sensor: %d)\n",
> (int)bank_addr, (int)sensor_addr);
> break;
> @@ -350,7 +358,9 @@
> static int abituguru_write(struct abituguru_data *data,
> u8 bank_addr, u8 sensor_addr, u8 *buf, int count)
> {
> - int i;
> + /* We use the ready timeout as we have to wait for 0xAC just like
> the
> + ready function */
> + int i, timeout = ABIT_UGURU_READY_TIMEOUT;
>
> /* Send the address */
> i = abituguru_send_address(data, bank_addr, sensor_addr,
> @@ -370,7 +380,8 @@
> }
>
> /* Now we need to wait till the chip is ready to be read again,
> - don't ask why */
> + so that we can read 0xAC as confirmation that our write has
> + succeeded. */
> if (abituguru_wait(data, ABIT_UGURU_STATUS_READ)) {
> ABIT_UGURU_DEBUG(1, "timeout exceeded waiting for read
> state "
> "after write (bank: %d, sensor: %d)\n",
> (int)bank_addr,
> @@ -379,11 +390,15 @@
> }
>
> /* Cmd port MUST be read now and should contain 0xAC */
> - if (inb_p(data->addr + ABIT_UGURU_CMD) != 0xAC) {
> - ABIT_UGURU_DEBUG(1, "CMD reg does not hold 0xAC after
> write "
> - "(bank: %d, sensor: %d)\n", (int)bank_addr,
> - (int)sensor_addr);
> - return -EIO;
> + while (inb_p(data->addr + ABIT_UGURU_CMD) != 0xAC) {
> + timeout--;
> + if (timeout = 0) {
> + ABIT_UGURU_DEBUG(1, "CMD reg does not hold 0xAC
> after "
> + "write (bank: %d, sensor: %d)\n",
> + (int)bank_addr, (int)sensor_addr);
> + return -EIO;
> + }
> + msleep(1);
> }
>
> /* Last put the chip back in ready state */
> @@ -403,7 +418,7 @@
> u8 sensor_addr)
> {
> u8 val, buf[3];
> - int ret = ABIT_UGURU_NC;
> + int i, ret = -ENODEV; /* error is the most commen used retval :|
> */
>
> /* If overriden by the user return the user selected type */
> if (bank1_types[sensor_addr] >= ABIT_UGURU_IN_SENSOR &&
> @@ -439,7 +454,7 @@
> buf[2] = 250;
> if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2,
> sensor_addr,
> buf, 3) != 3)
> - return -ENODEV;
> + goto abituguru_detect_bank1_sensor_type_exit;
> /* Now we need 20 ms to give the uguru time to read the sensors
> and raise a voltage alarm */
> set_current_state(TASK_UNINTERRUPTIBLE);
> @@ -447,21 +462,29 @@
> /* Check for alarm and check the alarm is a volt low alarm. */
> if (abituguru_read(data, ABIT_UGURU_ALARM_BANK, 0, buf, 3,
> ABIT_UGURU_MAX_RETRIES) != 3)
> - return -ENODEV;
> + goto abituguru_detect_bank1_sensor_type_exit;
> if (buf[sensor_addr/8] & (0x01 << (sensor_addr % 8))) {
> if (abituguru_read(data, ABIT_UGURU_SENSOR_BANK1 + 1,
> sensor_addr, buf, 3,
> ABIT_UGURU_MAX_RETRIES) != 3)
> - return -ENODEV;
> + goto abituguru_detect_bank1_sensor_type_exit;
> if (buf[0] & ABIT_UGURU_VOLT_LOW_ALARM_FLAG) {
> - /* Restore original settings */
> - if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1
> + 2,
> - sensor_addr,
> - data->bank1_settings[sensor_addr],
> - 3) != 3)
> - return -ENODEV;
> ABIT_UGURU_DEBUG(2, " found volt sensor\n");
> - return ABIT_UGURU_IN_SENSOR;
> + /* Now that we are sure this is a volt sensor mask
> out
> + any invalid bits from the settings, this is
> done to
> + cleanup the settings in case they got messed
> up,
> + which could happen if we failed to restore the
> + original settings during a previous sensor type
> + detection. This has happened with older
> versions of
> + the driver due to broken error paths, and could
> + theorically still happen with this version */
> + data->bank1_settings[sensor_addr][0] &> + ABIT_UGURU_VOLT_LOW_ALARM_ENABLE |
> + ABIT_UGURU_VOLT_HIGH_ALARM_ENABLE |
> + ABIT_UGURU_BEEP_ENABLE |
> + ABIT_UGURU_SHUTDOWN_ENABLE;
> + ret = ABIT_UGURU_IN_SENSOR;
> + goto abituguru_detect_bank1_sensor_type_exit;
> } else
> ABIT_UGURU_DEBUG(2, " alarm raised during volt "
> "sensor test, but volt low flag not
> set\n");
> @@ -477,7 +500,7 @@
> buf[2] = 10;
> if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2,
> sensor_addr,
> buf, 3) != 3)
> - return -ENODEV;
> + goto abituguru_detect_bank1_sensor_type_exit;
> /* Now we need 50 ms to give the uguru time to read the sensors
> and raise a temp alarm */
> set_current_state(TASK_UNINTERRUPTIBLE);
> @@ -485,15 +508,22 @@
> /* Check for alarm and check the alarm is a temp high alarm. */
> if (abituguru_read(data, ABIT_UGURU_ALARM_BANK, 0, buf, 3,
> ABIT_UGURU_MAX_RETRIES) != 3)
> - return -ENODEV;
> + goto abituguru_detect_bank1_sensor_type_exit;
> if (buf[sensor_addr/8] & (0x01 << (sensor_addr % 8))) {
> if (abituguru_read(data, ABIT_UGURU_SENSOR_BANK1 + 1,
> sensor_addr, buf, 3,
> ABIT_UGURU_MAX_RETRIES) != 3)
> - return -ENODEV;
> + goto abituguru_detect_bank1_sensor_type_exit;
> if (buf[0] & ABIT_UGURU_TEMP_HIGH_ALARM_FLAG) {
> - ret = ABIT_UGURU_TEMP_SENSOR;
> ABIT_UGURU_DEBUG(2, " found temp sensor\n");
> + /* Now that we are sure this is a temp sensor mask
> out
> + any invalid bits from the settings */
> + data->bank1_settings[sensor_addr][0] &> + ABIT_UGURU_TEMP_HIGH_ALARM_ENABLE |
> + ABIT_UGURU_BEEP_ENABLE |
> + ABIT_UGURU_SHUTDOWN_ENABLE;
> + ret = ABIT_UGURU_TEMP_SENSOR;
> + goto abituguru_detect_bank1_sensor_type_exit;
> } else
> ABIT_UGURU_DEBUG(2, " alarm raised during temp "
> "sensor test, but temp high flag not
> set\n");
> @@ -501,11 +531,23 @@
> ABIT_UGURU_DEBUG(2, " alarm not raised during temp sensor
> "
> "test\n");
>
> - /* Restore original settings */
> - if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2,
> sensor_addr,
> - data->bank1_settings[sensor_addr], 3) != 3)
> + ret = ABIT_UGURU_NC;
> +abituguru_detect_bank1_sensor_type_exit:
> + /* Restore original settings, failing here is really BAD, it has
> been
> + reported that some BIOS-es hang when entering the uGuru menu
> with
> + invalid settings present in the uGuru, so we try this 3 times.
> */
> + for (i = 0; i < 3; i++)
> + if (abituguru_write(data, ABIT_UGURU_SENSOR_BANK1 + 2,
> + sensor_addr,
> data->bank1_settings[sensor_addr],
> + 3) = 3)
> + break;
> + if (i = 3) {
> + printk(KERN_ERR ABIT_UGURU_NAME
> + ": Fatal error could not restore original
> settings. "
> + "This should never happen please report this to
> the "
> + "abituguru maintainer (see MAINTAINERS)\n");
> return -ENODEV;
> -
> + }
> return ret;
> }
>
> @@ -1234,7 +1276,9 @@
> /* Fail safe check, this should never happen! */
> if (sysfs_names_free < 0) {
> printk(KERN_ERR ABIT_UGURU_NAME
> - ": Fatal error ran out of space for sysfs attr
> names. This should never happen please report to the abituguru maintainer
> (see MAINTAINERS)\n");
> + ": Fatal error ran out of space for sysfs attr
> names."
> + " This should never happen please report to the "
> + "abituguru maintainer (see MAINTAINERS)\n");
> res = -ENAMETOOLONG;
> goto abituguru_probe_error;
> }
> @@ -1303,7 +1347,7 @@
> data->update_timeouts = 0;
> LEAVE_UPDATE:
> /* handle timeout condition */
> - if (err = -EBUSY) {
> + if (!success && (err = -EBUSY || err >= 0)) {
> /* No overflow please */
> if (data->update_timeouts < 255u)
> data->update_timeouts++;
> @@ -1329,6 +1373,25 @@
> return NULL;
> }
>
> +static int abituguru_suspend(struct platform_device *pdev, pm_message_t
> state)
> +{
> + struct abituguru_data *data = platform_get_drvdata(pdev);
> + /* make sure all communications with the uguru are done and no new
> + ones are started */
> + mutex_lock(&data->update_lock);
> + return 0;
> +}
> +
> +static int abituguru_resume(struct platform_device *pdev)
> +{
> + struct abituguru_data *data = platform_get_drvdata(pdev);
> + /* See if the uGuru is still ready */
> + if (inb_p(data->addr + ABIT_UGURU_DATA) !> ABIT_UGURU_STATUS_INPUT)
> + data->uguru_ready = 0;
> + mutex_unlock(&data->update_lock);
> + return 0;
> +}
> +
> static struct platform_driver abituguru_driver = {
> .driver = {
> .owner = THIS_MODULE,
> @@ -1336,6 +1399,8 @@
> },
> .probe = abituguru_probe,
> .remove = __devexit_p(abituguru_remove),
> + .suspend = abituguru_suspend,
> + .resume = abituguru_resume
> };
>
> static int __init abituguru_detect(void)
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060809/2815bbac/attachment-0001.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
` (6 preceding siblings ...)
2006-08-09 20:48 ` Sunil Kumar
@ 2006-08-10 11:57 ` Hans de Goede
2006-08-10 15:48 ` Sunil Kumar
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Hans de Goede @ 2006-08-10 11:57 UTC (permalink / raw)
To: lm-sensors
Sunil Kumar wrote:
> ok, loading the patched abituguru didn't help the BIOS hang, its more than
> those bits and BIOS may be really buggy. I finally noted down all the
> custom
> changes to the BIOS settings (actually quite a few, it turned out) and
> did a
> 'load optimised defaults'. That helped clear up the hung screen in BIOS.
>
> I am back to my original settings and things look good right now.
>
Good, so you didn't have to open your PC and use the CMOS reset jumper,
I'm glad to hear that, that makes this bug much less severe IMHO.
> One thing I noted is that earlier 'modprobe abituguru' used to take about
> 560ms and now it takes almost twice that. As I suggested earlier, can we
> please use msleep(0) instead of msleep(1). msleep(0) is a 1 jiffy sleep (1
> ms for HZ 1000) while msleep(1) is twice that.
I must have missed your request for this change earlier, sleeping 1
jiffy is what I had in mind. So I've changed this. Here is my latest
version:
-Change msleep(1) to msleep(0) for shorter sleeps.
-Remove the masking out of invalid setting bits after successfull sensor
type detection, this didn't help the BIOS confusion, so its useless and
it could have unforseen unwanted side effects.
I'll send an official patch to the list soon to upgrade the in kernel
version to this too, so that others can benefit from your improved
timeout code, and more important so that the error paths in the sensor
type detect code are fixed to always restore the original settings!
Regards,
Hans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: abituguru.c
Type: text/x-csrc
Size: 52142 bytes
Desc: not available
Url : http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060810/218f2abf/attachment-0001.bin
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
` (7 preceding siblings ...)
2006-08-10 11:57 ` Hans de Goede
@ 2006-08-10 15:48 ` Sunil Kumar
2006-08-11 7:47 ` Jean Delvare
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Sunil Kumar @ 2006-08-10 15:48 UTC (permalink / raw)
To: lm-sensors
I am happy to report that after 24 hours uptime, I haven't seen any timeouts
(or corruption ...;-)). So, it looks like we are all good.
Thanks, Hans.
-Sunil
On 8/10/06, Hans de Goede <j.w.r.degoede at hhs.nl> wrote:
>
>
>
> Sunil Kumar wrote:
> > ok, loading the patched abituguru didn't help the BIOS hang, its more
> than
> > those bits and BIOS may be really buggy. I finally noted down all the
> > custom
> > changes to the BIOS settings (actually quite a few, it turned out) and
> > did a
> > 'load optimised defaults'. That helped clear up the hung screen in BIOS.
> >
> > I am back to my original settings and things look good right now.
> >
>
> Good, so you didn't have to open your PC and use the CMOS reset jumper,
> I'm glad to hear that, that makes this bug much less severe IMHO.
>
> > One thing I noted is that earlier 'modprobe abituguru' used to take
> about
> > 560ms and now it takes almost twice that. As I suggested earlier, can we
> > please use msleep(0) instead of msleep(1). msleep(0) is a 1 jiffy sleep
> (1
> > ms for HZ 1000) while msleep(1) is twice that.
>
> I must have missed your request for this change earlier, sleeping 1
> jiffy is what I had in mind. So I've changed this. Here is my latest
> version:
>
> -Change msleep(1) to msleep(0) for shorter sleeps.
> -Remove the masking out of invalid setting bits after successfull sensor
> type detection, this didn't help the BIOS confusion, so its useless and
> it could have unforseen unwanted side effects.
>
>
> I'll send an official patch to the list soon to upgrade the in kernel
> version to this too, so that others can benefit from your improved
> timeout code, and more important so that the error paths in the sensor
> type detect code are fixed to always restore the original settings!
>
> Regards,
>
> Hans
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060810/f992f2b1/attachment-0001.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
` (8 preceding siblings ...)
2006-08-10 15:48 ` Sunil Kumar
@ 2006-08-11 7:47 ` Jean Delvare
2006-08-11 7:50 ` Hans de Goede
2006-08-11 9:05 ` Jean Delvare
11 siblings, 0 replies; 13+ messages in thread
From: Jean Delvare @ 2006-08-11 7:47 UTC (permalink / raw)
To: lm-sensors
Hi Hans,
> > Hans, see why I was reluctant to having your driver in the kernel tree?
> > Nothing personal, but that kind of problem is to be expected when
> > writing a driver without a datasheet :(
>
> This would / could have happened with a datasheet too, its a combination
> of a buggy chip (the uguru) which is a pain to work with (it would be
> even with a datasheet) and a bug in my driver.
I believe this is less likely to happen when a chip has a datasheet.
Reasons for this are:
1* A public datasheet means more people can work on the driver, so bugs
are more likely to be caught early.
2* A public datasheet means the manufacturer is ashamed to make buggy
hardware or firmware and pays more attention (in the long run, of
course.)
It is my experience that the availability and quality of technical
documentation comes with well designed hardware.
> Things are not as bad as they look, the driver did not corrupt the BIOS,
> nor the CMOS ram. For some strange reason (design probably) the Abit
> uGuru motherboards BIOS saves the uguru settings to CMOS on successfull
> (ACPI) poweroff. So all the driver did was write the uGuru register it
> was designed to write, its not writing over random memory.
OK.
> Also notice that the BIOS is also being buggy in this case, because it
> hangs because of a single invalid byte in the uguru's settings, whereas
> it should just be checking the relevant bits and ignoring the irrelevant
> ones.
Yeah, this is quite common that BIOS assume lots of things where it
shouldn't. But users want their machines to just work when they use our
drivers, so we must keep the BIOS happy.
> Sergey also correctly points out that this code is writing the uguru
> todo the probing and that this is not ideal, unfortunatly, I cannot
> think of another way.
Same goes for all legacy ISA hardware monitoring chips, and to a lesser
extent for Super-I/O chips. We try to limit the writes to be as safe as
possible, and nobody ever reported a problem with that so far.
> I do not believe this bug will reoccur, not with sunil's new and
> improved timeout handling, which make the chance of actually getting a
> timeout error pretty small, combine this with the always restoring the
> settings on an error, instead of the current buggy behaviour and
> retrying this restore in case of a write failure (due to those same
> timeouts, which are pretty rare now), the total chance of this happening
> again is very small. And even if it happens a CMOS reset is enough, most
> likely load setup defaults is enough, that fixed a similar issue for me
> when I completly trashed the uguru during driver development (I was
> probing random uguru bank addresses, which the uguru did not like).
>
> To make really really sure we never get bitten by this I've added code
> to the sensors detect code, to mask out any invalid bits from the
> original settings byte before restoring it to the uguru.
Great, thanks.
(As a side note, our sensors-detect script still doesn't know about the
uGuru. Don't you want to add support for it? Or is it just too
dangerous because of the non-standard I/O addresses?)
--
Jean Delvare
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
` (9 preceding siblings ...)
2006-08-11 7:47 ` Jean Delvare
@ 2006-08-11 7:50 ` Hans de Goede
2006-08-11 9:05 ` Jean Delvare
11 siblings, 0 replies; 13+ messages in thread
From: Hans de Goede @ 2006-08-11 7:50 UTC (permalink / raw)
To: lm-sensors
Jean Delvare wrote:
>
> (As a side note, our sensors-detect script still doesn't know about the
> uGuru. Don't you want to add support for it? Or is it just too
> dangerous because of the non-standard I/O addresses?)
>
I don't know what else could be behind those addresses, so thats not the
reason I haven't worked on adding it to the script yets. The real
problem is that the current detection code in the driver could very well
miss-detect. The only good way of doing detection would be todo a proof
read. A better idea would be to use dmidecode and sue the info from
there, atleast for Abit motherboards this info seems sane. I could do
that if you want me to (as time permits).
Regards,
Hans
^ permalink raw reply [flat|nested] 13+ messages in thread
* [lm-sensors] BIOS Corruption (was : new abituguru driver in mm
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
` (10 preceding siblings ...)
2006-08-11 7:50 ` Hans de Goede
@ 2006-08-11 9:05 ` Jean Delvare
11 siblings, 0 replies; 13+ messages in thread
From: Jean Delvare @ 2006-08-11 9:05 UTC (permalink / raw)
To: lm-sensors
Hi Hans,
> Jean Delvare wrote:
> >
> > (As a side note, our sensors-detect script still doesn't know about the
> > uGuru. Don't you want to add support for it? Or is it just too
> > dangerous because of the non-standard I/O addresses?)
>
> I don't know what else could be behind those addresses, so thats not the
> reason I haven't worked on adding it to the script yets. The real
> problem is that the current detection code in the driver could very well
> miss-detect. The only good way of doing detection would be todo a proof
> read. A better idea would be to use dmidecode and sue the info from
> there, atleast for Abit motherboards this info seems sane. I could do
> that if you want me to (as time permits).
I don't really care myself, but that would probably bring you some more
testers. Indeed checking that the board is an Abit board would make
some sense. Not only in sensors-detect, but also in the driver itself.
--
Jean Delvare
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-08-11 9:05 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-06 5:22 [lm-sensors] BIOS Corruption (was : new abituguru driver in mm Sunil Kumar
2006-08-07 10:19 ` Jean Delvare
2006-08-07 16:10 ` Sunil Kumar
2006-08-07 19:34 ` Sergey Vlasov
2006-08-07 20:53 ` Sunil Kumar
2006-08-08 15:25 ` Sergey Vlasov
2006-08-09 18:35 ` Hans de Goede
2006-08-09 20:48 ` Sunil Kumar
2006-08-10 11:57 ` Hans de Goede
2006-08-10 15:48 ` Sunil Kumar
2006-08-11 7:47 ` Jean Delvare
2006-08-11 7:50 ` Hans de Goede
2006-08-11 9:05 ` Jean Delvare
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.