* [lm-sensors] Tyan S2877 CPU Temp Erratic
2006-11-12 23:47 [lm-sensors] Tyan S2877 CPU Temp Erratic Paul Reilly
@ 2006-11-24 13:07 ` Rudolf Marek
2006-11-25 10:02 ` Paul Reilly
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Rudolf Marek @ 2006-11-24 13:07 UTC (permalink / raw)
To: lm-sensors
Paul Reilly wrote:
> Hello,
> I have a Tyan S2877 motherboard. I had lm-sensors working perfectly,
> but every now and then the CPU temp field, goes wrong. For instance
> it was reporting around 50'C for ages, then after a reboot, it now
> reports -199.75?C . Opening the box, and checking all is OK. BIOS
> reports correct temp (around 48'C). Restarting multiple times, and
> still can't get it to go back to report correct CPU temp.
Hmm this may be some electrical problems or problems with the bus driver. Please
can you switch on the i2c bus debugging in the kernel. If it is a production
machine and you cannot experiment much with it you may try to use the k8temp
driver which will read the CPU temperature directly from the processor.
The driver is in 2.6.19.
1) you may patch the kernel with patch for 2.6.19 - patches
(http://lists.lm-sensors.org/pipermail/lm-sensors/2006-August/017426.html)
2) use standalone version from here:
http://assembler.cz/download/amd_digital_temp.tar.gz
3) lm-sensors 2.10.1 has the userspace support.
> I am using the LM85 module/code forced to use a ADT7463 chip
> as this best matches my hardware. My sensors.conf and output of
> sensors is shown below. NOTE, I only have one CPU in this dual
> socket board.
>
> CPU1 Temp:-199.75?C (low = -71?C, high = -73?C)
>
> Any one seen this problem before or know a fix?
> The CPU is a dual-core Opteron 265.
Oh one more idea. Please remove the "thermal" driver from kernel (rmmod thermal)
or compile kernel without thermal zone support. Also, please send us the output of
cat /proc/acpi/dsdt > /tmp/dsdt.bin
Thanks,
Regards
Rudolf
^ permalink raw reply [flat|nested] 8+ messages in thread* [lm-sensors] Tyan S2877 CPU Temp Erratic
2006-11-12 23:47 [lm-sensors] Tyan S2877 CPU Temp Erratic Paul Reilly
2006-11-24 13:07 ` Rudolf Marek
@ 2006-11-25 10:02 ` Paul Reilly
2006-11-25 10:04 ` Rudolf Marek
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Paul Reilly @ 2006-11-25 10:02 UTC (permalink / raw)
To: lm-sensors
Thanks Rudolf,
I now use K8-temp and it seems to work.
k8temp-pci-00c3
Adapter: PCI adapter
CPU#0 Core1: +54?C
CPU#0 Core2: +59?C
I am using a sensors.conf of:
chip "k8temp-*"
label temp1 "CPU#0 Core1"
label temp3 "CPU#0 Core2"
label temp2 "CPU#1 Core1"
label temp4 "CPU#1 Core2"
Is this correct?
I get no reading on temp2 or temp4, which is right as I only have
one dual core Opteron in this two socket board.
Thanks,
Paul
^ permalink raw reply [flat|nested] 8+ messages in thread* [lm-sensors] Tyan S2877 CPU Temp Erratic
2006-11-12 23:47 [lm-sensors] Tyan S2877 CPU Temp Erratic Paul Reilly
2006-11-24 13:07 ` Rudolf Marek
2006-11-25 10:02 ` Paul Reilly
@ 2006-11-25 10:04 ` Rudolf Marek
2006-11-25 10:07 ` Rudolf Marek
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Rudolf Marek @ 2006-11-25 10:04 UTC (permalink / raw)
To: lm-sensors
Hi,
Please CC to the list.
Paul Reilly wrote:
> Hi Rudolf,
>
> Thanks for your help. The temp readings came back for a while,
> but are now gone again. Interesting they do change, from -199.5
> to -197.5 etc. So it seems to be affected by possible the real
> reading from the CPU.
Or corrupted.
>
>> cat /proc/acpi/dsdt > /tmp/dsdt.bin
>
> http://astropaul.com/export/dsdt.bin
Ok I did not found anything suspect here.
> I cannot make too many changes to this machine. It runs a static
> kernel ( LM85 module is also compiled statically). As suggested I
> have turned off ACPI thermal zone, and turned on i2c bus debugging.
> It will be rebooted tomorrow, and I will report back
Good thanks.
> Will K8 Temp work with Opteron chip?
It will but this cool feature is undocumented by AMD for some reason. (for the
pre F. rev of cores)
regards
Rudolf
^ permalink raw reply [flat|nested] 8+ messages in thread* [lm-sensors] Tyan S2877 CPU Temp Erratic
2006-11-12 23:47 [lm-sensors] Tyan S2877 CPU Temp Erratic Paul Reilly
` (2 preceding siblings ...)
2006-11-25 10:04 ` Rudolf Marek
@ 2006-11-25 10:07 ` Rudolf Marek
2006-11-25 12:48 ` Rudolf Marek
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Rudolf Marek @ 2006-11-25 10:07 UTC (permalink / raw)
To: lm-sensors
Hi,
> OK, I have patched my kernel 2.6.16.30 with support for K8 Temp
> and
>
> [*] AMD K8 processor sensor
> [*] National Semiconductor LM85 and compatibles
Good.
> I will not be able to reboot as is is a production machine.
> We can reboot over the weekend only.
Ok.
> Do you have an example sensors.conf entry for the K8
> sensor to get the temperature values? It looks like there
> 2 sensors per core, so I should be able to monitor 4 temp
> with my Opteron? Great!
If your opteron supports 2 temps/core then yes ;)
Yes please check this page:
http://www.lm-sensors.org/browser/lm-sensors/trunk/etc/sensors.conf.eg
hip "k8temp-*"
label temp1 "Core0 Temp"
label temp2 "Core0 Temp"
label temp3 "Core1 Temp"
label temp4 "Core1 Temp"
Rudolf
^ permalink raw reply [flat|nested] 8+ messages in thread* [lm-sensors] Tyan S2877 CPU Temp Erratic
2006-11-12 23:47 [lm-sensors] Tyan S2877 CPU Temp Erratic Paul Reilly
` (3 preceding siblings ...)
2006-11-25 10:07 ` Rudolf Marek
@ 2006-11-25 12:48 ` Rudolf Marek
2006-11-25 13:21 ` Paul Reilly
2006-11-25 21:10 ` Rudolf Marek
6 siblings, 0 replies; 8+ messages in thread
From: Rudolf Marek @ 2006-11-25 12:48 UTC (permalink / raw)
To: lm-sensors
Hi,
> Thanks Rudolf,
Heh good ;)
> I now use K8-temp and it seems to work.
>
> k8temp-pci-00c3
> Adapter: PCI adapter
> CPU#0 Core1: +54?C
> CPU#0 Core2: +59?C
>
> I am using a sensors.conf of:
>
> chip "k8temp-*"
> label temp1 "CPU#0 Core1"
> label temp3 "CPU#0 Core2"
> label temp2 "CPU#1 Core1"
> label temp4 "CPU#1 Core2"
>
> Is this correct?
> I get no reading on temp2 or temp4, which is right as I only have
> one dual core Opteron in this two socket board.
Theoretically there could be 4 temps for one physical CPU. Two for each core * 2
= 4. And because your CPU is dualcore, and your CPU supports only 1
temperature/core so you see two of them ;) I hope I clarified that.
So all in all, this works fine. Lets check the i2c bus related problem. And wait
for some logs from you ;)
Regards
Rudolf
^ permalink raw reply [flat|nested] 8+ messages in thread* [lm-sensors] Tyan S2877 CPU Temp Erratic
2006-11-12 23:47 [lm-sensors] Tyan S2877 CPU Temp Erratic Paul Reilly
` (4 preceding siblings ...)
2006-11-25 12:48 ` Rudolf Marek
@ 2006-11-25 13:21 ` Paul Reilly
2006-11-25 21:10 ` Rudolf Marek
6 siblings, 0 replies; 8+ messages in thread
From: Paul Reilly @ 2006-11-25 13:21 UTC (permalink / raw)
To: lm-sensors
Hi Rudolf,
> Theoretically there could be 4 temps for one physical CPU. Two for each core
> * 2 = 4. And because your CPU is dualcore, and your CPU supports only 1
> temperature/core so you see two of them ;) I hope I clarified that.
Yes, I see from the k8temp.c
static SENSOR_DEVICE_ATTR_2(temp1_input, S_IRUGO, show_temp, NULL, 0, 0);
static SENSOR_DEVICE_ATTR_2(temp2_input, S_IRUGO, show_temp, NULL, 0, 1);
static SENSOR_DEVICE_ATTR_2(temp3_input, S_IRUGO, show_temp, NULL, 1, 0);
static SENSOR_DEVICE_ATTR_2(temp4_input, S_IRUGO, show_temp, NULL, 1, 1);
So,
temp1 = CPU#0 Core#0 Temp #0
temp2 = CPU#0 Core#0 Temp #1
temp3 = CPU#0 Core#1 Temp #0
temp4 = CPU#0 Core#1 Temp #1
My Dual Core Opteron (265) only reports temp1, and temp3.
That is OK. One temperature per core.
What would it look like if I add a second physical processor in
the second socket? Would I get the same lmsensor names repeated?
> So all in all, this works fine. Lets check the i2c bus related problem. And
> wait for some logs from you ;)
Here are the logs.
It looks like a timeout problem....
syslog:Nov 25 08:31:58 discover kernel: i2c /dev entries driver
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-9191: ISA main adapter
registered
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-0: nForce2 SMBus adapter
at 0xa000
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: nForce2 SMBus adapter
at 0xa040
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-0: SMBus Timeout! (0x10)
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: SMBus Timeout! (0x10)
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-9191: Driver w83627hf
registered
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-0: SMBus Timeout! (0x10)
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: SMBus Timeout! (0x10)
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-9191: Driver w83781d-isa
registered
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-0: SMBus Timeout! (0x10)
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: SMBus Timeout! (0x10)
syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: SMBus Timeout! (0x10)
^ permalink raw reply [flat|nested] 8+ messages in thread* [lm-sensors] Tyan S2877 CPU Temp Erratic
2006-11-12 23:47 [lm-sensors] Tyan S2877 CPU Temp Erratic Paul Reilly
` (5 preceding siblings ...)
2006-11-25 13:21 ` Paul Reilly
@ 2006-11-25 21:10 ` Rudolf Marek
6 siblings, 0 replies; 8+ messages in thread
From: Rudolf Marek @ 2006-11-25 21:10 UTC (permalink / raw)
To: lm-sensors
Hi,
> What would it look like if I add a second physical processor in
> the second socket? Would I get the same lmsensor names repeated?
Yes, it will be just new instance of the driver. So, there will be
k8temp-pci-00xx where xx is something else here for next physical CPU (in fact
it is PCI bus address)
> Here are the logs.
> It looks like a timeout problem....
>
> syslog:Nov 25 08:31:58 discover kernel: i2c /dev entries driver
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-9191: ISA main adapter
> registered
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-0: nForce2 SMBus adapter
> at 0xa000
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: nForce2 SMBus adapter
> at 0xa040
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-0: SMBus Timeout! (0x10)
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: SMBus Timeout! (0x10)
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-9191: Driver w83627hf
> registered
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-0: SMBus Timeout! (0x10)
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: SMBus Timeout! (0x10)
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-9191: Driver w83781d-isa
> registered
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-0: SMBus Timeout! (0x10)
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: SMBus Timeout! (0x10)
> syslog:Nov 25 08:31:58 discover kernel: i2c_adapter i2c-1: SMBus Timeout! (0x10)
Well this is from probing phase. It might happen that the probed devices are not
on that address. It seems the SMBus controller follows the ACPI 2.00 proposal
(13.9.1.1 Status Register, SMB_STS ), and 0x10 means: Indicates the transaction
failed because the slave device address was not acknowledged.
I will need part of the log just after you issued the "sensors" command. (when
there is some error in log...) also when you will have the error condition,
(strange temperature) please send the output of following command:
i2cdump -y 1 0x2e
This will show if the chip reports sane values in registers, if so then there is
something wrong with the "analog" path to the chip.
Does it work when you poweroff the machine - unplug the power cable? And try
again? What kernel are you using? When it was not working and you entered the
BIOS I guess it worked in there? (no need to hurry with real test, the i2cdump
will be enough maybe)
Problem is that I dont have the nVidia datasheet, so I need to guess, when it
will turn out that there is a problem with bus driver...
Regards
Rudolf
^ permalink raw reply [flat|nested] 8+ messages in thread