* [lm-sensors] [Ticket #2078] sensor values are sometimes wrong
@ 2005-10-21 10:37 Christian Hammers
2005-10-21 11:39 ` Grant Coady
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Christian Hammers @ 2005-10-21 10:37 UTC (permalink / raw)
To: lm-sensors
Hello
It seems to be related to the above mentioned bug number but the FAQ was
a bit unclear how to include the ticket number in the subject :)
My problem is that sensors sometimes reports confusing values i.e.
- it prints ALERT although the corresponding values are fine, the ALERT
then vanished in the next read
- sensor values give suddenly 0 or arbitrary values
- configured limits are 0 for one or two reads and then go back to the
defined value
- chassis intrusion is set just for a couple of seconds
This makes it absolutely unusable for monitoring in production use.
I had have this before on several other mainboards, too. It can be
circumvented by reading the sensors output trice and only alert if you
get an alert in all three readings but this should not be the task of
the user :) On this mainboard here it is worse than ever though.
bye,
-christian-
Data for this mainboard although the problem is more general and
experienced on several other models and brands, too:
# lsmod
Module Size Used by
i2c_dev 10880 0
eeprom 7944 0
asb100 23872 0
i2c_sensor 3104 2 eeprom,asb100
i2c_i801 8400 0
i2c_core 24576 5 i2c_dev,eeprom,asb100,i2c_sensor,i2c_i801
...
# lspci -n
0000:00:00.0 0600: 8086:2560 (rev 02)
0000:00:01.0 0604: 8086:2561 (rev 02)
0000:00:1d.0 0c03: 8086:24c2 (rev 02)
0000:00:1d.1 0c03: 8086:24c4 (rev 02)
0000:00:1d.2 0c03: 8086:24c7 (rev 02)
0000:00:1d.7 0c03: 8086:24cd (rev 02)
0000:00:1e.0 0604: 8086:244e (rev 82)
0000:00:1f.0 0601: 8086:24c0 (rev 02)
0000:00:1f.1 0101: 8086:24cb (rev 02)
0000:00:1f.3 0c05: 8086:24c3 (rev 02)
0000:00:1f.5 0401: 8086:24c5 (rev 02)
0000:01:00.0 0300: 10de:002c (rev 15)
0000:02:03.0 0c00: 1106:3044 (rev 80)
0000:02:04.0 0104: 105a:3376 (rev 02)
0000:02:05.0 0200: 14e4:16a6 (rev 02)
0000:02:0a.0 0200: 8086:1076
0000:02:0b.0 0200: 10b7:9055 (rev 30)
0000:02:0c.0 0100: 1000:000f (rev 03)
# lspci
0000:00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 02)
0000:00:01.0 PCI bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE Host-to-AGP Bridge (rev 02)
0000:00:1d.0 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 02)
0000:00:1d.1 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 02)
0000:00:1d.2 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 02)
0000:00:1d.7 USB Controller: Intel Corp. 82801DB/DBM (ICH4/ICH4-M) USB 2.0 EHCI Controller (rev 02)
0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 82) 0000:00:1f.0 ISA bridge: Intel Corp. 82801DB/DBL (ICH4/ICH4-L) LPC Bridge (rev 02)
0000:00:1f.1 IDE interface: Intel Corp. 82801DB/DBL (ICH4/ICH4-L) UltraATA-100 IDE Controller (rev 02)
0000:00:1f.3 SMBus: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 02)
0000:00:1f.5 Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 02)
0000:01:00.0 VGA compatible controller: nVidia Corporation NV6 [Vanta/Vanta LT] (rev 15)
0000:02:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80)
0000:02:04.0 RAID bus controller: Promise Technology, Inc. PDC20376 (FastTrak 376) (rev 02)
0000:02:05.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702X Gigabit Ethernet (rev 02)
0000:02:0a.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller
0000:02:0b.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30)
0000:02:0c.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 03)
Installed I2C busses:
i2c-0 unknown SMBus I801 adapter at e800
Algorithm unavailable
root@netflow:/home/ch# i2cdetect 0
WARNING! This program can confuse your I2C bus, cause data loss and
worse!
I will probe file /dev/i2c-0.
I will probe address range 0x03-0x77.
Continue? [Y/n]
0 1 2 3 4 5 6 7 8 9 a b c d e f
00: XX XX XX XX XX 08 XX XX XX XX XX XX XX
10: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX
20: XX XX XX XX XX XX XX XX XX XX XX XX XX UU XX XX
30: 30 XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX
40: XX XX XX XX 44 XX XX XX UU UU XX XX XX XX XX XX
50: UU XX UU XX XX XX XX XX XX XX XX XX XX XX XX XX
60: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX
70: XX XX XX XX XX XX XX XX
# dmidecode
# dmidecode 2.6
SMBIOS 2.3 present.
BIOS Information
Vendor: Award Software, Inc.
Version: ASUS P4PE ACPI BIOS Revision 1002
Release Date: 10/28/2002
Base Board Information
Manufacturer: ASUSTeK Computer INC.
Product Name: P4PE
Version: REV 1.xx
# sensors -v
sensors version 2.9.1 with libsensors version 2.9.1
# uname -r
2.6.8-2-686-smp
--
Christian Hammers WESTEND GmbH | Internet-Business-Provider
Technik CISCO Systems Partner - Authorized Reseller
L?tticher Stra?e 10 Tel 0241/701333-11
ch@westend.com D-52064 Aachen Fax 0241/911879
^ permalink raw reply [flat|nested] 6+ messages in thread* [lm-sensors] [Ticket #2078] sensor values are sometimes wrong 2005-10-21 10:37 [lm-sensors] [Ticket #2078] sensor values are sometimes wrong Christian Hammers @ 2005-10-21 11:39 ` Grant Coady 2005-10-21 12:07 ` Christian Hammers ` (3 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: Grant Coady @ 2005-10-21 11:39 UTC (permalink / raw) To: lm-sensors Christian Hammers wrote: > Hello > > It seems to be related to the above mentioned bug number but the FAQ was > a bit unclear how to include the ticket number in the subject :) > > My problem is that sensors sometimes reports confusing values i.e. > - it prints ALERT although the corresponding values are fine, the ALERT > then vanished in the next read Yup, that is what the sensor chip reports. It is up to _you_ properly process the signals to suit your requirements. ALERT is typically cleared by the act of reading a value. > - sensor values give suddenly 0 or arbitrary values The sensor chips operate in an extremely noisy electrical environment, stuff happens, your application software may denounce this. > - configured limits are 0 for one or two reads and then go back to the > defined value Again nature of the beast, your expectations do not match reality :o) > - chassis intrusion is set just for a couple of seconds So denounce it? > > This makes it absolutely unusable for monitoring in production use. No, you expect a kernel driver to do the work of a custom user-space solution, IOW you want the kernel to enforce policy. The kernel + drivers do not enforce policy, they control access to resources. > I had have this before on several other mainboards, too. It can be > circumvented by reading the sensors output trice and only alert if you > get an alert in all three readings but this should not be the task of > the user :) On this mainboard here it is worse than ever though. Yes, because you need to better understand the issues, when everything looks wrong, it is time to check the viewer. No? > Data for this mainboard although the problem is more general and > experienced on several other models and brands, too: I don't do free consulting. Grant. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [lm-sensors] [Ticket #2078] sensor values are sometimes wrong 2005-10-21 10:37 [lm-sensors] [Ticket #2078] sensor values are sometimes wrong Christian Hammers 2005-10-21 11:39 ` Grant Coady @ 2005-10-21 12:07 ` Christian Hammers 2005-10-23 22:40 ` Rudolf Marek ` (2 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: Christian Hammers @ 2005-10-21 12:07 UTC (permalink / raw) To: lm-sensors Hello On Fri, Oct 21, 2005 at 07:37:12PM +1000, Grant Coady wrote: > > - sensor values give suddenly 0 or arbitrary values > > The sensor chips operate in an extremely noisy electrical environment, > stuff happens, your application software may denounce this. ... > > This makes it absolutely unusable for monitoring in production use. > > No, you expect a kernel driver to do the work of a custom user-space > solution, IOW you want the kernel to enforce policy. The kernel + > drivers do not enforce policy, they control access to resources. A user normally does not access the kernel driver not even via /proc. He uses /usr/bin/sensors or /usr/sbin/sensord, which look like they should be considered applications. So if the current problems arise from /usr/bin/sensors being to lowlevel, it should be enhanced by a flag that applies a certain policy like e.g. hide temperature alerts that only exists for one second as the CPU surely does not heat up to 999?C and cool down back to 42?C within that time. This would make it way more usable. > > Data for this mainboard although the problem is more general and > > experienced on several other models and brands, too: > > I don't do free consulting. Writing that sounds stupid and moreover lets users feel offended, especially as they only write exactly those information that the projects FAQ asked them to supply to each bug report. -> *plonk* bye, -christian- -- Christian Hammers WESTEND GmbH | Internet-Business-Provider Technik CISCO Systems Partner - Authorized Reseller L?tticher Stra?e 10 Tel 0241/701333-11 ch@westend.com D-52064 Aachen Fax 0241/911879 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [lm-sensors] [Ticket #2078] sensor values are sometimes wrong 2005-10-21 10:37 [lm-sensors] [Ticket #2078] sensor values are sometimes wrong Christian Hammers 2005-10-21 11:39 ` Grant Coady 2005-10-21 12:07 ` Christian Hammers @ 2005-10-23 22:40 ` Rudolf Marek 2005-10-24 10:11 ` Christian Hammers 2005-10-25 22:49 ` Rudolf Marek 4 siblings, 0 replies; 6+ messages in thread From: Rudolf Marek @ 2005-10-23 22:40 UTC (permalink / raw) To: lm-sensors Hello Christian, > > My problem is that sensors sometimes reports confusing values i.e. > - it prints ALERT although the corresponding values are fine, the ALERT > then vanished in the next read > - sensor values give suddenly 0 or arbitrary values > - configured limits are 0 for one or two reads and then go back to the > defined value This sounds like there might be something else accessing the chip. Did you check log if there are no i2c bus related problems? We may also implement some checks to asb100 read routine to be sure that the reads actualy succeed instead of failing silencely. Regards Rudolf ^ permalink raw reply [flat|nested] 6+ messages in thread
* [lm-sensors] [Ticket #2078] sensor values are sometimes wrong 2005-10-21 10:37 [lm-sensors] [Ticket #2078] sensor values are sometimes wrong Christian Hammers ` (2 preceding siblings ...) 2005-10-23 22:40 ` Rudolf Marek @ 2005-10-24 10:11 ` Christian Hammers 2005-10-25 22:49 ` Rudolf Marek 4 siblings, 0 replies; 6+ messages in thread From: Christian Hammers @ 2005-10-24 10:11 UTC (permalink / raw) To: lm-sensors Hello On Sun, Oct 23, 2005 at 10:40:38PM +0200, Rudolf Marek wrote: > > My problem is that sensors sometimes reports confusing values i.e. > > - it prints ALERT although the corresponding values are fine, the ALERT > > then vanished in the next read > > - sensor values give suddenly 0 or arbitrary values > > - configured limits are 0 for one or two reads and then go back to the > > defined value > > This sounds like there might be something else accessing the chip. Did > you check log if there are no i2c bus related problems? There are no related syslog messages and I'm not aware of any other application using the i2c bus. There's no TV card in this server, are there other likely candidates? > We may also implement some checks to asb100 read routine to be sure > that the reads actualy succeed instead of failing silencely. bye, -christian- -- Christian Hammers WESTEND GmbH | Internet-Business-Provider Technik CISCO Systems Partner - Authorized Reseller L?tticher Stra?e 10 Tel 0241/701333-11 ch@westend.com D-52064 Aachen Fax 0241/911879 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [lm-sensors] [Ticket #2078] sensor values are sometimes wrong 2005-10-21 10:37 [lm-sensors] [Ticket #2078] sensor values are sometimes wrong Christian Hammers ` (3 preceding siblings ...) 2005-10-24 10:11 ` Christian Hammers @ 2005-10-25 22:49 ` Rudolf Marek 4 siblings, 0 replies; 6+ messages in thread From: Rudolf Marek @ 2005-10-25 22:49 UTC (permalink / raw) To: lm-sensors > There are no related syslog messages and I'm not aware of any other > application using the i2c bus. There's no TV card in this server, are > there other likely candidates? OK, This ASB100 chip was reverse engineered it is very hard to support such chip. We can try last thing. Please apply patch from attachment to asb100.c It should retry the reading when failed. I hope you know C, please fix the typos in code if any. If not just post the errors back to the list. I'm very busy nowdays so I'm sorry I cant test it... Please let it run with the patch for some time, ideally until you see strange values. Check the system log for details. It might happen that we wont see any problems because this patch solved them :) In this case please change dev_dbg to dev_err and we will see. I hope this helps, regards Rudolf -------------- next part -------------- diff -Naur a/asb100.c b/asb100.c --- a/asb100.c 2005-10-20 08:23:05.000000000 +0200 +++ b/asb100.c 2005-10-25 22:44:53.352492250 +0200 @@ -880,6 +880,54 @@ return 0; } +#define MAX_RETRIES 30 +static int do_read_8(struct i2c_client *client, u8 reg, u8 defval) +{ + int value, i; + + /* Frequent read errors have been reported on Asus boards, so we + * retry on read errors. If it still fails (unlikely), return the + * default value requested by the caller. */ + for (i = 1; i <= MAX_RETRIES; i++) { + value = i2c_smbus_read_byte_data(client, reg); + if (value >= 0) { + dev_dbg(&client->dev, "Read 0x%02x from register " + "0x%02x.\n", value, reg); + return value; + } + dev_dbg(&client->dev, "Read failed, will retry in %d.\n", i); + msleep(i); + } + + dev_err(&client->dev, "Couldn't read value from register 0x%02x. " + "Please report.\n", reg); + return defval; +} + +static int do_read_16(struct i2c_client *client, u16 reg, u8 defval) +{ + int value, i; + + /* Frequent read errors have been reported on Asus boards, so we + * retry on read errors. If it still fails (unlikely), return the + * default value requested by the caller. */ + for (i = 1; i <= MAX_RETRIES; i++) { + value = i2c_smbus_read_word_data(client, reg); + if (value >= 0) { + dev_dbg(&client->dev, "Read 0x%02x from register " + "0x%02x.\n", value, reg); + return value; + } + dev_dbg(&client->dev, "Read failed, will retry in %d.\n", i); + msleep(i); + } + + dev_err(&client->dev, "Couldn't read value from register 0x%02x. " + "Please report.\n", reg); + return defval; +} + + /* The SMBus locks itself, usually, but nothing may access the chip between bank switches. */ static int asb100_read_value(struct i2c_client *client, u16 reg) @@ -896,7 +944,7 @@ i2c_smbus_write_byte_data(client, ASB100_REG_BANK, bank); if (bank = 0 || bank > 2) { - res = i2c_smbus_read_byte_data(client, reg & 0xff); + res = do_read_8(client, reg & 0xff,0x0); } else { /* switch to subclient */ cl = data->lm75[bank - 1]; @@ -904,17 +952,17 @@ /* convert from ISA to LM75 I2C addresses */ switch (reg & 0xff) { case 0x50: /* TEMP */ - res = swab16(i2c_smbus_read_word_data (cl, 0)); + res = swab16(do_read_16(cl, 0,0x0)); break; case 0x52: /* CONFIG */ - res = i2c_smbus_read_byte_data(cl, 1); + res = do_read_8(cl, 1,0x0); break; case 0x53: /* HYST */ - res = swab16(i2c_smbus_read_word_data (cl, 2)); + res = swab16(do_read_16(cl, 2,0x0)); break; case 0x55: /* MAX */ default: - res = swab16(i2c_smbus_read_word_data (cl, 3)); + res = swab16(do_read_16 (cl, 3,0x0)); break; } } ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-10-25 22:49 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-10-21 10:37 [lm-sensors] [Ticket #2078] sensor values are sometimes wrong Christian Hammers 2005-10-21 11:39 ` Grant Coady 2005-10-21 12:07 ` Christian Hammers 2005-10-23 22:40 ` Rudolf Marek 2005-10-24 10:11 ` Christian Hammers 2005-10-25 22:49 ` Rudolf Marek
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.