* [lm-sensors] latest findings - my older posting from ~ 11 days
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
@ 2006-05-03 21:37 ` Rudolf Marek
2006-05-05 19:54 ` Rudolf Marek
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-03 21:37 UTC (permalink / raw)
To: lm-sensors
Hello Dieter
> after having configured the debugging nothing bad had happened - but I readily
> knew that could happen so I was patient.
> Today returning from home the problem had had reappeared:
>
> Please see file attached. You will find a "SMBus collision", and, after that,
> "sensors" is dead.
OK got it.
To me it seems as:
1) silicon bug in the AMD chipset state machine
2) design error in motherboard or some malfunctioning chip hooked on SMBus
Lets see if there is somewhere a register with the state of state machine ;)
so we know if it is #1. Jordan just contacted some chip designer so we will have
more information later..
Now my questions:
Does it start to work after reboot? After cold boot? After power unplug?
Have you ACPI in kernel? Have you compiled/loaded/in kernel the thermal module?
If it will happen again in future you may try following:
1) obtain the smb base addr
AMD756_smba = xxx
Should be in your debug log, you may use value from older log because it does
not change. Alternatively you may also look into cat /proc/ioports
then you may use the isadump tool
2)
isadump -f xxx
(please replace the xxx with the base address from the log) This will allow us
to check the status of the SMBDATA and SMBCLK lines and see if they are stuck
low. Eventually we can try to excersise the DATA and CLK lines to see if they
are working properly.
You may also try to provoke the failure by running this:
modprobe i2c-dev
modprobe i2c-amd756
while true; do i2cdump -y 0 0x2d b > /dev/null; done
This will read from the bus again and again and I'm expecting that it will soon
or later fail. You dont need to enable the the debugging now I guess it wont
come with anything new.
Regards
Rudolf
^ permalink raw reply [flat|nested] 10+ messages in thread* [lm-sensors] latest findings - my older posting from ~ 11 days
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
2006-05-03 21:37 ` [lm-sensors] latest findings - my older posting from ~ 11 days Rudolf Marek
@ 2006-05-05 19:54 ` Rudolf Marek
2006-05-05 21:06 ` Dieter Jurzitza
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-05 19:54 UTC (permalink / raw)
To: lm-sensors
Hello Dieter,
Please dont forget to CC the list.
> No! The reset button does *not* cure the problem. This is different from
> "normal" behaviour"
Ok
>
>>Have you ACPI in kernel?
>
> Yes. Any info required on that?
Please mail dsdt.bin
cat /proc/acpi/dsdt > /tmp/dsdt.bin
>>Have you compiled/loaded/in kernel the thermal module?
And no support for "thermal" module (thermal is the module name or ACPI subsytem
option in kernel)
> It will :-(
> This is my homework, I'll be back with you as soon as I have results to tell.
>
>>1) obtain the smb base addr
>> AMD756_smba = xxx
>>Should be in your debug log, you may use value from older log because it
>>does not change. Alternatively you may also look into cat /proc/ioports
>>
>>then you may use the isadump tool
>>2)
>>isadump -f xxx
Please dont forget the 0x prefix before the address
>>while true; do i2cdump -y 0 0x2d b > /dev/null; done
You may insert here some sleep 1 maybe. But lets try without first.
> Thanks for helping that far. I will unload the debug kernel now as it spits
> about 400 MByte of i2c-debug logs into /var/log/messages.
Good.
regards
Rudolf
^ permalink raw reply [flat|nested] 10+ messages in thread* [lm-sensors] latest findings - my older posting from ~ 11 days
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
2006-05-03 21:37 ` [lm-sensors] latest findings - my older posting from ~ 11 days Rudolf Marek
2006-05-05 19:54 ` Rudolf Marek
@ 2006-05-05 21:06 ` Dieter Jurzitza
2006-05-05 21:33 ` Rudolf Marek
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Dieter Jurzitza @ 2006-05-05 21:06 UTC (permalink / raw)
To: lm-sensors
Hello Rudolf,
> Please dont forget to CC the list.
will not happen any more. Best is sending to the list - you are subscribed
anyway, you won't need it twice.
> Please mail dsdt.bin
Attached to this email
> And no support for "thermal" module (thermal is the module name or ACPI
> subsytem option in kernel)
there is a /proc/acpi/thermal_zone directory, but it is empty.
/home/fred> ls -l /proc/acpi/
insgesamt 0
dr-xr-xr-x 10 root root 0 2006-05-05 22:47 ./
dr-xr-xr-x 166 root root 0 2006-05-05 22:46 ../
dr-xr-xr-x 2 root root 0 2006-05-05 23:04 ac_adapter/
-rw-r--r-- 1 root root 0 2006-05-05 23:04 alarm
dr-xr-xr-x 2 root root 0 2006-05-05 23:04 battery/
dr-xr-xr-x 4 root root 0 2006-05-05 23:04 button/
-r-------- 1 root root 0 2006-05-05 23:04 dsdt
dr-xr-xr-x 2 root root 0 2006-05-05 23:04 embedded_controller/
-r-------- 1 root root 0 2006-05-05 22:47 event
-r-------- 1 root root 0 2006-05-05 23:04 fadt
dr-xr-xr-x 2 root root 0 2006-05-05 23:04 fan/
-r--r--r-- 1 root root 0 2006-05-05 23:04 info
dr-xr-xr-x 2 root root 0 2006-05-05 23:04 power_resource/
dr-xr-xr-x 4 root root 0 2006-05-05 23:04 processor/
-rw-r--r-- 1 root root 0 2006-05-05 23:04 sleep
dr-xr-xr-x 2 root root 0 2006-05-05 23:04 thermal_zone/
-rw-r--r-- 1 root root 0 2006-05-05 23:04 wakeup << empty!!!
That's it for the moment! I wonder what findings you'll bring up!
Thanks again,
take care
Dieter
--
-----------------------------------------------------------
|
\
/\_/\ |
| ~x~ |/-----\ /
\ /- \_/
^^__ _ / _ ____ /
<??__ \- \_/ | |/ | |
|| || _| _| _| _|
if you really want to see the pictures above - use some font
with constant spacing like courier! :-)
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dsdt.bin.zip
Type: application/x-zip
Size: 3542 bytes
Desc: not available
Url : http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060505/9c997768/dsdt.bin-0001.bin
^ permalink raw reply [flat|nested] 10+ messages in thread* [lm-sensors] latest findings - my older posting from ~ 11 days
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
` (2 preceding siblings ...)
2006-05-05 21:06 ` Dieter Jurzitza
@ 2006-05-05 21:33 ` Rudolf Marek
2006-05-05 21:38 ` Rudolf Marek
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-05 21:33 UTC (permalink / raw)
To: lm-sensors
Thanks for the report.
80c0: 68 48 20 2c 2c 6c 6c 6c 65 65 44 65 28 4c 20 44
the 0x48 means that DATA line of smbus is forced low
so the bus is stuck. The bit 5 offers the realtime status
of the bus line.
(http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/23167.pdf
page 83)
When the bus is stuck again you may try following:
rmmod i2c-amd756
Now lets reprogram the chipset to force 1 to the DATA line,
if the isadump will be 0x68 then the chipset has some bug
otherwise some device on the bus has the bug.
isaset -y -f 0x80c0 0x5
isaset -y -f 0x80c1 0x5
isadump -y -f 0x8000
Regards
Rudolf
Thanks,
regards
Rudolf
^ permalink raw reply [flat|nested] 10+ messages in thread* [lm-sensors] latest findings - my older posting from ~ 11 days
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
` (3 preceding siblings ...)
2006-05-05 21:33 ` Rudolf Marek
@ 2006-05-05 21:38 ` Rudolf Marek
2006-05-08 19:37 ` Dieter Jurzitza
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-05 21:38 UTC (permalink / raw)
To: lm-sensors
Thanks for the info,
ACPI is not to blame in this case. Lets focus on
the HW chipset/other chip bug.
Regards
Rudolf
^ permalink raw reply [flat|nested] 10+ messages in thread* [lm-sensors] latest findings - my older posting from ~ 11 days
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
` (4 preceding siblings ...)
2006-05-05 21:38 ` Rudolf Marek
@ 2006-05-08 19:37 ` Dieter Jurzitza
2006-05-08 20:02 ` Dieter Jurzitza
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Dieter Jurzitza @ 2006-05-08 19:37 UTC (permalink / raw)
To: lm-sensors
Dear Rudolf,
would have been a nice workaround - if it had had worked :-((( Unfortunately,
the line 0x80c1 remains at "48" rather than going "68" when being
reprogrammed.
Please take a look at the logfile I attached; it is the result of the commands
below. Repetitive unloading and loading of all i2c-drivers does not cure the
issue.
Thank you very much,
take care
Dieter Jurzitza
*********************************
> rmmod i2c-amd756
> isaset -y -f 0x80c0 0x5
> isaset -y -f 0x80c1 0x5
> isadump -y -f 0x8000
*********************************
--
-----------------------------------------------------------
|
\
/\_/\ |
| ~x~ |/-----\ /
\ /- \_/
^^__ _ / _ ____ /
<??__ \- \_/ | |/ | |
|| || _| _| _| _|
if you really want to see the pictures above - use some font
with constant spacing like courier! :-)
-----------------------------------------------------------
-------------- next part --------------
0 1 2 3 4 5 6 7 8 9 a b c d e f
8000: 01 04 20 03 01 00 00 00 49 04 63 00 00 00 00 00
8010: 0c 00 00 00 00 00 c3 03 00 00 ff ff 00 00 f0 00
8020: 00 00 00 00 e0 00 00 17 02 00 80 00 01 00 00 f0
8030: 0a 00 00 00 00 00 00 00 38 80 02 00 00 00 00 00
8040: 00 00 00 00 08 00 03 00 00 00 00 00 00 00 00 00
8050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80a0: 00 00 00 00 00 00 00 00 ff 86 00 00 00 00 00 00
80b0: ff 0d 00 00 fa f2 00 00 00 00 00 00 00 00 00 00
80c0: 68 48 20 2c 2c 6c 6c 6c 65 65 44 65 28 4c 20 44
80d0: 65 20 40 40 02 af cd 0f 00 00 00 00 ff ff ff ff
80e0: 00 08 22 00 5a 00 00 00 4e 00 42 00 ff 00 10 10
80f0: 00 00 00 00 20 28 65 65 65 44 44 44 20 20 20 20
^ permalink raw reply [flat|nested] 10+ messages in thread* [lm-sensors] latest findings - my older posting from ~ 11 days
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
` (5 preceding siblings ...)
2006-05-08 19:37 ` Dieter Jurzitza
@ 2006-05-08 20:02 ` Dieter Jurzitza
2006-05-16 19:58 ` Dieter Jurzitza
2006-05-16 20:53 ` Rudolf Marek
8 siblings, 0 replies; 10+ messages in thread
From: Dieter Jurzitza @ 2006-05-08 20:02 UTC (permalink / raw)
To: lm-sensors
Dear Rudolf,
I just discovered that I replied to the wrong email. I actually had had
written
isaset -y -f 0x80c0 0x8
isaset -y -f 0x80c1 0x8
not 0x5 as is denoted in this mail.
Take care
Dieter Jurzitza
Am Montag, 8. Mai 2006 21:37 schrieb Dieter Jurzitza:
>>>>> > isaset -y -f 0x80c0 0x5
>>>>> > isaset -y -f 0x80c1 0x5
--
-----------------------------------------------------------
|
\
/\_/\ |
| ~x~ |/-----\ /
\ /- \_/
^^__ _ / _ ____ /
<??__ \- \_/ | |/ | |
|| || _| _| _| _|
if you really want to see the pictures above - use some font
with constant spacing like courier! :-)
-----------------------------------------------------------
^ permalink raw reply [flat|nested] 10+ messages in thread* [lm-sensors] latest findings - my older posting from ~ 11 days
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
` (6 preceding siblings ...)
2006-05-08 20:02 ` Dieter Jurzitza
@ 2006-05-16 19:58 ` Dieter Jurzitza
2006-05-16 20:53 ` Rudolf Marek
8 siblings, 0 replies; 10+ messages in thread
From: Dieter Jurzitza @ 2006-05-16 19:58 UTC (permalink / raw)
To: lm-sensors
Hi folks,
dear Rudolf,
I think I've solved the issue. As it is a very "heuristic" approach I can only
tell what I did and you can give me a commentary on it saying it makes no
sense this way or whatsoever ...
Nevertheless, I keep crash-looped now doing
while true; do
i2cdump -y 0 0x2d b > /dev/null
done
for more than 105000 times (the best value I had had reached before was around
20000, and only with debugging turned on) so I think it is fair to say things
have improved significantly at least.
I readily mentioned that I had had the impression that things were less
problematic with debugging turned on. Now, what does debugging do? It does
many printk's or printf's thereby making the process slower.
Rudolf pointed me to the file in charge: i2c-amd756.c. I simply added a single
msleep(1);
/usr/src/linux/drivers/i2c/busses/i2c-amd756.c (referring to linux-2.6.11.4):
static int amd756_transaction(struct i2c_adapter *adap)
{
int temp;
int result = 0;
int timeout = 0;
dev_dbg(&adap->dev, "Transaction (pre): GS=%04x, GE=%04x, ADD=%04x, "
"DAT=%04x\n", inw_p(SMB_GLOBAL_STATUS),
inw_p(SMB_GLOBAL_ENABLE), inw_p(SMB_HOST_ADDRESS),
inb_p(SMB_HOST_DATA));
/* Make sure the SMBus host is ready to start transmitting */
/* but always wait one millisecond just in case ... */
>> msleep(1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Here!
if ((temp = inw_p(SMB_GLOBAL_STATUS)) & (GS_HST_STS | GS_SMB_STS)) {
and, until now, this seems to do the trick (knock on wood!!!). Tell me why
this cannot happen - as a matter of fact it seems to work. The explanation is
up to you, folks ;-).
Take care
Dieter Jurzitza
--
-----------------------------------------------------------
|
\
/\_/\ |
| ~x~ |/-----\ /
\ /- \_/
^^__ _ / _ ____ /
<??__ \- \_/ | |/ | |
|| || _| _| _| _|
if you really want to see the pictures above - use some font
with constant spacing like courier! :-)
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: i2c-amd756.patch
Type: text/x-diff
Size: 410 bytes
Desc: not available
Url : http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060516/45a19749/i2c-amd756.bin
^ permalink raw reply [flat|nested] 10+ messages in thread* [lm-sensors] latest findings - my older posting from ~ 11 days
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
` (7 preceding siblings ...)
2006-05-16 19:58 ` Dieter Jurzitza
@ 2006-05-16 20:53 ` Rudolf Marek
8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-16 20:53 UTC (permalink / raw)
To: lm-sensors
Hi all,
> Hi folks,
> dear Rudolf,
> I think I've solved the issue. As it is a very "heuristic" approach I can only
> tell what I did and you can give me a commentary on it saying it makes no
> sense this way or whatsoever ...
>
> Nevertheless, I keep crash-looped now doing
>
> while true; do
> i2cdump -y 0 0x2d b > /dev/null
> done
>
> for more than 105000 times (the best value I had had reached before was around
> 20000, and only with debugging turned on) so I think it is fair to say things
> have improved significantly at least.
>
> I readily mentioned that I had had the impression that things were less
> problematic with debugging turned on. Now, what does debugging do? It does
> many printk's or printf's thereby making the process slower.
>
> Rudolf pointed me to the file in charge: i2c-amd756.c. I simply added a single
> msleep(1);
Good.
>
> /usr/src/linux/drivers/i2c/busses/i2c-amd756.c (referring to linux-2.6.11.4):
>
> static int amd756_transaction(struct i2c_adapter *adap)
> {
> int temp;
> int result = 0;
> int timeout = 0;
>
> dev_dbg(&adap->dev, "Transaction (pre): GS=%04x, GE=%04x, ADD=%04x, "
> "DAT=%04x\n", inw_p(SMB_GLOBAL_STATUS),
> inw_p(SMB_GLOBAL_ENABLE), inw_p(SMB_HOST_ADDRESS),
> inb_p(SMB_HOST_DATA));
>
> /* Make sure the SMBus host is ready to start transmitting */
> /* but always wait one millisecond just in case ... */
>>> msleep(1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Here!
> if ((temp = inw_p(SMB_GLOBAL_STATUS)) & (GS_HST_STS | GS_SMB_STS)) {
>
> and, until now, this seems to do the trick (knock on wood!!!). Tell me why
> this cannot happen - as a matter of fact it seems to work. The explanation is
> up to you, folks ;-).
I'm still waiting for Raphael to contact someone at AMD, that was the reason for
the quiet in this case...
This is "sleep" is good idea, there must be something wrong with the controller,
as we already know ...
Explanation hmm :) hard to tell...
I checked the sequence of command and nothing really strange...
Have you some crash even with this delay?
Thanks
Regards
Rudolf
^ permalink raw reply [flat|nested] 10+ messages in thread