All of lore.kernel.org
 help / color / mirror / Atom feed
* [lm-sensors] latest findings - my older posting from ~ 11 days ago
@ 2006-05-02 19:28 Dieter Jurzitza
  2006-05-03 21:37 ` [lm-sensors] latest findings - my older posting from ~ 11 days Rudolf Marek
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Dieter Jurzitza @ 2006-05-02 19:28 UTC (permalink / raw)
  To: lm-sensors

Dear Rudolf,
dear listmembers,
after having configured the debugging nothing bad had happened - but I readily 
knew that could happen so I was patient.
Today returning from home the problem had had reappeared:

Please see file attached. You will find a "SMBus collision", and, after that, 
"sensors" is dead.

So, I added several lines on top in order to (possibly ...) help who 
understands what is happening here.

This time the module loaded are:
i2c-amd75, i2c-isa, w83781d and eeprom

Let me know how to proceed, thank you in advance, take care



Dieter

P.S. please see the excerpt of a messages file attached. I bzip2-ed it in 
order to save space.

-- 
-----------------------------------------------------------

                               |
                                \
                 /\_/\           |
                | ~x~ |/-----\   /
                 \   /-       \_/
  ^^__   _        /  _  ____   /
 <??__ \- \_/     |  |/    |  |
  ||  ||         _| _|    _| _|

if you really want to see the pictures above - use some font
with constant spacing like courier! :-)
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: collision.bz2
Type: application/x-bzip2
Size: 1407 bytes
Desc: not available
Url : http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060502/c8bd81a2/collision.bz2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lm-sensors] latest findings - my older posting from ~ 11 days
  2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
@ 2006-05-03 21:37 ` Rudolf Marek
  2006-05-05 19:54 ` Rudolf Marek
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-03 21:37 UTC (permalink / raw)
  To: lm-sensors

Hello Dieter

> after having configured the debugging nothing bad had happened - but I readily 
> knew that could happen so I was patient.
> Today returning from home the problem had had reappeared:
> 
> Please see file attached. You will find a "SMBus collision", and, after that, 
> "sensors" is dead.

OK got it.

To me it seems as:

1) silicon bug in the AMD chipset state machine
2) design error in motherboard or some malfunctioning chip hooked on SMBus

Lets see if there is somewhere a register with the state of state machine ;)
so we know if it is #1. Jordan just contacted some chip designer so we will have 
more information later..

Now my questions:

Does it start to work after reboot? After cold boot? After power unplug?
Have you ACPI in kernel? Have you compiled/loaded/in kernel the thermal module?

If it will happen again in future you may try following:

1) obtain the smb base addr
	AMD756_smba = xxx
Should be in your debug log, you may use value from older log because it does 
not change. Alternatively you may also look into cat /proc/ioports

then you may use the isadump tool
2)
isadump -f xxx

(please replace the xxx with the base address from the log) This will allow us 
to check the status of the SMBDATA and SMBCLK lines and see if they are stuck 
low. Eventually we can try to excersise the DATA and CLK lines to see if they
are working properly.

You may also try to provoke the failure by running this:

modprobe i2c-dev
modprobe i2c-amd756
while true; do i2cdump -y 0 0x2d b > /dev/null; done

This will read from the bus again and again and I'm expecting that it will soon 
or later fail. You dont need to enable the the debugging now I guess it wont 
come with anything new.

Regards
Rudolf


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lm-sensors] latest findings - my older posting from ~ 11 days
  2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
  2006-05-03 21:37 ` [lm-sensors] latest findings - my older posting from ~ 11 days Rudolf Marek
@ 2006-05-05 19:54 ` Rudolf Marek
  2006-05-05 21:06 ` Dieter Jurzitza
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-05 19:54 UTC (permalink / raw)
  To: lm-sensors

Hello Dieter,

Please dont forget to CC the list.

> No! The reset button does *not* cure the problem. This is different from 
> "normal" behaviour"

Ok

> 
>>Have you ACPI in kernel? 
> 
> Yes. Any info required on that?

Please mail dsdt.bin

cat /proc/acpi/dsdt > /tmp/dsdt.bin

>>Have you compiled/loaded/in kernel the thermal module?

And no support for "thermal" module (thermal is the module name or ACPI subsytem
option in kernel)

> It will :-(
> This is my homework, I'll be back with you as soon as I have results to tell.
> 
>>1) obtain the smb base addr
>>	AMD756_smba = xxx
>>Should be in your debug log, you may use value from older log because it
>>does not change. Alternatively you may also look into cat /proc/ioports
>>
>>then you may use the isadump tool
>>2)
>>isadump -f xxx

Please dont forget the 0x prefix before the address

>>while true; do i2cdump -y 0 0x2d b > /dev/null; done

You may insert here some sleep 1 maybe. But lets try without first.

> Thanks for helping that far. I will unload the debug kernel now as it spits 
> about 400 MByte of i2c-debug logs into /var/log/messages.

Good.

regards
Rudolf


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lm-sensors] latest findings - my older posting from ~ 11 days
  2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
  2006-05-03 21:37 ` [lm-sensors] latest findings - my older posting from ~ 11 days Rudolf Marek
  2006-05-05 19:54 ` Rudolf Marek
@ 2006-05-05 21:06 ` Dieter Jurzitza
  2006-05-05 21:33 ` Rudolf Marek
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Dieter Jurzitza @ 2006-05-05 21:06 UTC (permalink / raw)
  To: lm-sensors

Hello Rudolf,
> Please dont forget to CC the list.
will not happen any more. Best is sending to the list - you are subscribed 
anyway, you won't need it twice.
> Please mail dsdt.bin
Attached to this email
> And no support for "thermal" module (thermal is the module name or ACPI
> subsytem option in kernel)
there is a /proc/acpi/thermal_zone directory, but it is empty.

/home/fred> ls -l /proc/acpi/
insgesamt 0
dr-xr-xr-x   10 root root 0 2006-05-05 22:47 ./
dr-xr-xr-x  166 root root 0 2006-05-05 22:46 ../
dr-xr-xr-x    2 root root 0 2006-05-05 23:04 ac_adapter/
-rw-r--r--    1 root root 0 2006-05-05 23:04 alarm
dr-xr-xr-x    2 root root 0 2006-05-05 23:04 battery/
dr-xr-xr-x    4 root root 0 2006-05-05 23:04 button/
-r--------    1 root root 0 2006-05-05 23:04 dsdt
dr-xr-xr-x    2 root root 0 2006-05-05 23:04 embedded_controller/
-r--------    1 root root 0 2006-05-05 22:47 event
-r--------    1 root root 0 2006-05-05 23:04 fadt
dr-xr-xr-x    2 root root 0 2006-05-05 23:04 fan/
-r--r--r--    1 root root 0 2006-05-05 23:04 info
dr-xr-xr-x    2 root root 0 2006-05-05 23:04 power_resource/
dr-xr-xr-x    4 root root 0 2006-05-05 23:04 processor/
-rw-r--r--    1 root root 0 2006-05-05 23:04 sleep
dr-xr-xr-x    2 root root 0 2006-05-05 23:04 thermal_zone/
-rw-r--r--    1 root root 0 2006-05-05 23:04 wakeup << empty!!!


That's it for the moment! I wonder what findings you'll bring up!
Thanks again,
take care


Dieter

-- 
-----------------------------------------------------------

                               |
                                \
                 /\_/\           |
                | ~x~ |/-----\   /
                 \   /-       \_/
  ^^__   _        /  _  ____   /
 <??__ \- \_/     |  |/    |  |
  ||  ||         _| _|    _| _|

if you really want to see the pictures above - use some font
with constant spacing like courier! :-)
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dsdt.bin.zip
Type: application/x-zip
Size: 3542 bytes
Desc: not available
Url : http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060505/9c997768/dsdt.bin-0001.bin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lm-sensors] latest findings - my older posting from ~ 11 days
  2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
                   ` (2 preceding siblings ...)
  2006-05-05 21:06 ` Dieter Jurzitza
@ 2006-05-05 21:33 ` Rudolf Marek
  2006-05-05 21:38 ` Rudolf Marek
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-05 21:33 UTC (permalink / raw)
  To: lm-sensors

Thanks for the report.

80c0: 68 48 20 2c 2c 6c 6c 6c 65 65 44 65 28 4c 20 44

the 0x48 means that DATA line of smbus is forced low
so the bus is stuck. The bit 5 offers the realtime status
of the bus line.

(http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/23167.pdf
page 83)

When the bus is stuck again you may try following:

rmmod i2c-amd756

Now lets reprogram the chipset to force 1 to the DATA line,
if the isadump will be 0x68 then the chipset has some bug
otherwise some device on the bus has the bug.

isaset -y -f 0x80c0 0x5
isaset -y -f 0x80c1 0x5
isadump -y -f 0x8000

Regards
Rudolf

Thanks,

regards
Rudolf


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lm-sensors] latest findings - my older posting from ~ 11 days
  2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
                   ` (3 preceding siblings ...)
  2006-05-05 21:33 ` Rudolf Marek
@ 2006-05-05 21:38 ` Rudolf Marek
  2006-05-08 19:37 ` Dieter Jurzitza
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-05 21:38 UTC (permalink / raw)
  To: lm-sensors

Thanks for the info,

ACPI is not to blame in this case. Lets focus on
the HW chipset/other chip bug.

Regards
Rudolf


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lm-sensors] latest findings - my older posting from ~ 11 days
  2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
                   ` (4 preceding siblings ...)
  2006-05-05 21:38 ` Rudolf Marek
@ 2006-05-08 19:37 ` Dieter Jurzitza
  2006-05-08 20:02 ` Dieter Jurzitza
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Dieter Jurzitza @ 2006-05-08 19:37 UTC (permalink / raw)
  To: lm-sensors

Dear Rudolf,
would have been a nice workaround - if it had had worked :-((( Unfortunately, 
the line 0x80c1 remains at "48" rather than going "68" when being 
reprogrammed.
Please take a look at the logfile I attached; it is the result of the commands 
below. Repetitive unloading and loading of all i2c-drivers does not cure the 
issue.

Thank you very much,
take care


Dieter Jurzitza

*********************************
> rmmod i2c-amd756
> isaset -y -f 0x80c0 0x5
> isaset -y -f 0x80c1 0x5
> isadump -y -f 0x8000
*********************************

-- 
-----------------------------------------------------------

                               |
                                \
                 /\_/\           |
                | ~x~ |/-----\   /
                 \   /-       \_/
  ^^__   _        /  _  ____   /
 <??__ \- \_/     |  |/    |  |
  ||  ||         _| _|    _| _|

if you really want to see the pictures above - use some font
with constant spacing like courier! :-)
-----------------------------------------------------------
-------------- next part --------------
       0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
8000: 01 04 20 03 01 00 00 00 49 04 63 00 00 00 00 00 
8010: 0c 00 00 00 00 00 c3 03 00 00 ff ff 00 00 f0 00 
8020: 00 00 00 00 e0 00 00 17 02 00 80 00 01 00 00 f0 
8030: 0a 00 00 00 00 00 00 00 38 80 02 00 00 00 00 00 
8040: 00 00 00 00 08 00 03 00 00 00 00 00 00 00 00 00 
8050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
8060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
8070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
8080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
8090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
80a0: 00 00 00 00 00 00 00 00 ff 86 00 00 00 00 00 00 
80b0: ff 0d 00 00 fa f2 00 00 00 00 00 00 00 00 00 00 
80c0: 68 48 20 2c 2c 6c 6c 6c 65 65 44 65 28 4c 20 44 
80d0: 65 20 40 40 02 af cd 0f 00 00 00 00 ff ff ff ff 
80e0: 00 08 22 00 5a 00 00 00 4e 00 42 00 ff 00 10 10 
80f0: 00 00 00 00 20 28 65 65 65 44 44 44 20 20 20 20 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lm-sensors] latest findings - my older posting from ~ 11 days
  2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
                   ` (5 preceding siblings ...)
  2006-05-08 19:37 ` Dieter Jurzitza
@ 2006-05-08 20:02 ` Dieter Jurzitza
  2006-05-16 19:58 ` Dieter Jurzitza
  2006-05-16 20:53 ` Rudolf Marek
  8 siblings, 0 replies; 10+ messages in thread
From: Dieter Jurzitza @ 2006-05-08 20:02 UTC (permalink / raw)
  To: lm-sensors

Dear Rudolf,
I just discovered that I replied to the wrong email. I actually had had 
written
isaset -y -f 0x80c0 0x8
isaset -y -f 0x80c1 0x8
not 0x5 as is denoted in this mail.
Take care

Dieter Jurzitza

Am Montag, 8. Mai 2006 21:37 schrieb Dieter Jurzitza:
>>>>> > isaset -y -f 0x80c0 0x5
>>>>> > isaset -y -f 0x80c1 0x5

-- 
-----------------------------------------------------------

                               |
                                \
                 /\_/\           |
                | ~x~ |/-----\   /
                 \   /-       \_/
  ^^__   _        /  _  ____   /
 <??__ \- \_/     |  |/    |  |
  ||  ||         _| _|    _| _|

if you really want to see the pictures above - use some font
with constant spacing like courier! :-)
-----------------------------------------------------------


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lm-sensors] latest findings - my older posting from ~ 11 days
  2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
                   ` (6 preceding siblings ...)
  2006-05-08 20:02 ` Dieter Jurzitza
@ 2006-05-16 19:58 ` Dieter Jurzitza
  2006-05-16 20:53 ` Rudolf Marek
  8 siblings, 0 replies; 10+ messages in thread
From: Dieter Jurzitza @ 2006-05-16 19:58 UTC (permalink / raw)
  To: lm-sensors

Hi folks,
dear Rudolf,
I think I've solved the issue. As it is a very "heuristic" approach I can only 
tell what I did and you can give me a commentary on it saying it makes no 
sense this way or whatsoever ...

Nevertheless, I keep crash-looped now doing 

while true; do
	i2cdump -y 0 0x2d b > /dev/null
done

for more than 105000 times (the best value I had had reached before was around 
20000, and only with debugging turned on) so I think it is fair to say things 
have improved significantly at least.

I readily mentioned that I had had the impression that things were less 
problematic with debugging turned on. Now, what does debugging do? It does 
many printk's or printf's thereby making the process slower.

Rudolf pointed me to the file in charge: i2c-amd756.c. I simply added a single 
msleep(1);

/usr/src/linux/drivers/i2c/busses/i2c-amd756.c (referring to linux-2.6.11.4):

static int amd756_transaction(struct i2c_adapter *adap)
{
        int temp;
        int result = 0;
        int timeout = 0;

        dev_dbg(&adap->dev, "Transaction (pre): GS=%04x, GE=%04x, ADD=%04x, "
                "DAT=%04x\n", inw_p(SMB_GLOBAL_STATUS),
                inw_p(SMB_GLOBAL_ENABLE), inw_p(SMB_HOST_ADDRESS),
                inb_p(SMB_HOST_DATA));  

        /* Make sure the SMBus host is ready to start transmitting */
        /* but always wait one millisecond just in case ... */
>>      msleep(1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<  Here!
        if ((temp = inw_p(SMB_GLOBAL_STATUS)) & (GS_HST_STS | GS_SMB_STS)) {

and, until now, this seems to do the trick (knock on wood!!!). Tell me why 
this cannot happen - as a matter of fact it seems to work. The explanation is 
up to you, folks ;-).

Take care




Dieter Jurzitza

-- 
-----------------------------------------------------------

                               |
                                \
                 /\_/\           |
                | ~x~ |/-----\   /
                 \   /-       \_/
  ^^__   _        /  _  ____   /
 <??__ \- \_/     |  |/    |  |
  ||  ||         _| _|    _| _|

if you really want to see the pictures above - use some font
with constant spacing like courier! :-)
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: i2c-amd756.patch
Type: text/x-diff
Size: 410 bytes
Desc: not available
Url : http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20060516/45a19749/i2c-amd756.bin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lm-sensors] latest findings - my older posting from ~ 11 days
  2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
                   ` (7 preceding siblings ...)
  2006-05-16 19:58 ` Dieter Jurzitza
@ 2006-05-16 20:53 ` Rudolf Marek
  8 siblings, 0 replies; 10+ messages in thread
From: Rudolf Marek @ 2006-05-16 20:53 UTC (permalink / raw)
  To: lm-sensors

Hi all,

> Hi folks,
> dear Rudolf,
> I think I've solved the issue. As it is a very "heuristic" approach I can only 
> tell what I did and you can give me a commentary on it saying it makes no 
> sense this way or whatsoever ...
> 
> Nevertheless, I keep crash-looped now doing 
> 
> while true; do
> 	i2cdump -y 0 0x2d b > /dev/null
> done
> 
> for more than 105000 times (the best value I had had reached before was around 
> 20000, and only with debugging turned on) so I think it is fair to say things 
> have improved significantly at least.
> 
> I readily mentioned that I had had the impression that things were less 
> problematic with debugging turned on. Now, what does debugging do? It does 
> many printk's or printf's thereby making the process slower.
> 
> Rudolf pointed me to the file in charge: i2c-amd756.c. I simply added a single 
> msleep(1);

Good.

> 
> /usr/src/linux/drivers/i2c/busses/i2c-amd756.c (referring to linux-2.6.11.4):
> 
> static int amd756_transaction(struct i2c_adapter *adap)
> {
>         int temp;
>         int result = 0;
>         int timeout = 0;
> 
>         dev_dbg(&adap->dev, "Transaction (pre): GS=%04x, GE=%04x, ADD=%04x, "
>                 "DAT=%04x\n", inw_p(SMB_GLOBAL_STATUS),
>                 inw_p(SMB_GLOBAL_ENABLE), inw_p(SMB_HOST_ADDRESS),
>                 inb_p(SMB_HOST_DATA));  
> 
>         /* Make sure the SMBus host is ready to start transmitting */
>         /* but always wait one millisecond just in case ... */
>>>      msleep(1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<  Here!
>         if ((temp = inw_p(SMB_GLOBAL_STATUS)) & (GS_HST_STS | GS_SMB_STS)) {
> 
> and, until now, this seems to do the trick (knock on wood!!!). Tell me why 
> this cannot happen - as a matter of fact it seems to work. The explanation is 
> up to you, folks ;-).

I'm still waiting for Raphael to contact someone at AMD, that was the reason for 
the quiet in this case...

This is "sleep" is good idea, there must be something wrong with the controller, 
  as we already know ...

Explanation hmm :) hard to tell...

I checked the sequence of command and nothing really strange...

Have you some crash even with this delay?

Thanks
Regards
Rudolf


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-05-16 20:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-02 19:28 [lm-sensors] latest findings - my older posting from ~ 11 days ago Dieter Jurzitza
2006-05-03 21:37 ` [lm-sensors] latest findings - my older posting from ~ 11 days Rudolf Marek
2006-05-05 19:54 ` Rudolf Marek
2006-05-05 21:06 ` Dieter Jurzitza
2006-05-05 21:33 ` Rudolf Marek
2006-05-05 21:38 ` Rudolf Marek
2006-05-08 19:37 ` Dieter Jurzitza
2006-05-08 20:02 ` Dieter Jurzitza
2006-05-16 19:58 ` Dieter Jurzitza
2006-05-16 20:53 ` Rudolf Marek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.