* [lm-sensors] Questions on sensors.conf
@ 2006-01-13 0:31 Joachim Schrod
2006-01-13 2:18 ` Philip Pokorny
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Joachim Schrod @ 2006-01-13 0:31 UTC (permalink / raw)
To: lm-sensors
Hi,
I started to use lm_sensors to monitor my hardware. In particular, I want to
monitor temperature and fan functionality.
The system has an Intel D865GBF ATX Mainboard, with a Pentium 4 3 GHz processor,
800 MHz FSB, 1 MB Cache. I'm using lm_sensors version 2.9.1 that came with SUSE
10.0.
sensors-detect went fine, it detected the LM85 sensor chip "lm85b-i2c-0-2e";
start of the lm_sensors service as well, then calling sensors got:
CPU_Fan: 2669 RPM (min = 4000 RPM) ALARM
fan2: 0 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
CPU: +45 C (low = +10 C, high = +50 C)
Board: +36 C (low = +10 C, high = +35 C) ALARM
Remote: +37 C (low = +10 C, high = +35 C) ALARM
Oops, obviouslsy too much alarms for my taste. ;-)
So I learned that I have to configure /etc/sensors.cfg.
First, I wanted to check the measured values. So I rebooted and looked into the
BIOS read-out.
My BIOS reports the CPU temperature to be 56?C, System Zone 1 as 40?C, and
System Zone 2 as 45?C. (Wherever zone 1 and 2 are -- I assume that they match
temp2_* and temp3_* whereas temp1_* is the CPU.)
The CPU fan speed is the same, so that doesn't need any adaption.
So here's my first question:
Is it best practice to assume that the BIOS values are OK and to add compute
statements to increase the lm_sensors values to match them?
Is the relation between BIOS measurement and sensors measurement usually linear?
I.e., do I just add and subtract the difference in a compute statement?
Or does your experience tell that this needs some scale factor as well?
And my second question:
The semantics of the temperature limits in sensors.conf are still unclear to me.
There is temp#_min, temp#_max, temp#_hyst, and temp#_over.
min and max are not explained.
hyst and over are not used in the LM85 example configuration.
As far as I understood, if a temperature gets > temp#_over, ALARM is turned on;
and if it gets (subsequently) < temp#_hyst, ALARM is turned off again.
But what happens if the temperature is below temp#_min or above temp#_max?
How is the relation between these two vars and hyst/over?
In addition, the documentation of lm85 mentions other sensors.conf variables
named zone#_{limit,hyst,range,critical} which aren't used at all. It looks as if
the documentation is out of date and these variables don't exist any more. Is
that assumption true?
Last, the documentation mentions a /proc interface. I gather that this got
replaced by the /sys interface. There I find files/variables named
temp#_auto_temp_* that look interesting. Is there any documentation for them?
Sorry for these many questions. If there is documentation that I haven't found
and that would answer them, I'd also appreciate any hint to it.
Best,
Joachim
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joachim Schrod Email: jschrod at acm.org
Roedermark, Germany
^ permalink raw reply [flat|nested] 5+ messages in thread
* [lm-sensors] Questions on sensors.conf
2006-01-13 0:31 [lm-sensors] Questions on sensors.conf Joachim Schrod
@ 2006-01-13 2:18 ` Philip Pokorny
2006-01-13 11:57 ` Joachim Schrod
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Philip Pokorny @ 2006-01-13 2:18 UTC (permalink / raw)
To: lm-sensors
Joachim Schrod wrote:
>Hi,
>
>I started to use lm_sensors to monitor my hardware. In particular, I want to
>monitor temperature and fan functionality.
>
>
Welcome!
>The system has an Intel D865GBF ATX Mainboard, with a Pentium 4 3 GHz processor,
>800 MHz FSB, 1 MB Cache. I'm using lm_sensors version 2.9.1 that came with SUSE
>10.0.
>
>
That should be fairly similar to the D865PERL which was one of the MB on
which the lm85 driver was originally developed and tested.
>sensors-detect went fine, it detected the LM85 sensor chip "lm85b-i2c-0-2e";
>start of the lm_sensors service as well, then calling sensors got:
>
>CPU_Fan: 2669 RPM (min = 4000 RPM) ALARM
>fan2: 0 RPM (min = 0 RPM)
>fan3: 0 RPM (min = 0 RPM)
>fan4: 0 RPM (min = 0 RPM)
>CPU: +45 C (low = +10 C, high = +50 C)
>Board: +36 C (low = +10 C, high = +35 C) ALARM
>Remote: +37 C (low = +10 C, high = +35 C) ALARM
>
>Oops, obviouslsy too much alarms for my taste. ;-)
>So I learned that I have to configure /etc/sensors.cfg.
>
>
Have you read the lm85 chip documentation in doc/chips/lm85 ?
>First, I wanted to check the measured values. So I rebooted and looked into the
>BIOS read-out.
>My BIOS reports the CPU temperature to be 56?C, System Zone 1 as 40?C, and
>System Zone 2 as 45?C. (Wherever zone 1 and 2 are -- I assume that they match
>temp2_* and temp3_* whereas temp1_* is the CPU.)
>The CPU fan speed is the same, so that doesn't need any adaption.
>
>
Be careful. When a CPU is sitting in the BIOS, it is usually in a
spin-loop which usually puts the CPU in a maximum power situation. So
the CPU temperatures in the BIOS will frequently be higher than you
observe under an operating system like Linux where the CPU is put into
the HALT state when there isn't any work to do.
The "Board" sensor (temp2) is actually internal to the lm85 chip. So if
you can *find* the chip which is labeled lm85, that will be the location
of that temperature...
Generally, I'm surprised that the limits are set this low. They are too
low for a P4 CPU. If you know your fan is spinning slower, then set a
more appropriate low limit. If the system temperatures are above the
limit, then set a higher limit that makes sense. Most comercial
electronics are capable of operating up to 70degC. But a reasonable
case internal ambient temperature is 42 degC. A temperature sensor on
the motherboard that is close to a high power component or function like
the VRM (Vcore power supply) for the CPU *will* read higher than ambient
because heat from the power transistors in the VRM disipate heat through
the copper traces on the motherboard and the motherboard itself. This
heats components near them on the motherboard. It wouldn't be
unreasonable for the "Remote" temperature sensor to in fact be located
very near the VRM to in effect measure the temperature of the VRM.
>So here's my first question:
>Is it best practice to assume that the BIOS values are OK and to add compute
>statements to increase the lm_sensors values to match them?
>
>
Unless the values are *way* off, or you have a way to measure the same
temperature or value using independant means _at the same time_, I would
recommend you *not* adjust the readings returned by an lm_sensors chip
driver.
Did the BIOS program the temp#_offset registers? Can you report the
values from those configuration registers?
>And my second question:
>The semantics of the temperature limits in sensors.conf are still unclear to me.
>
>There is temp#_min, temp#_max, temp#_hyst, and temp#_over.
>
>
Some chips use "over" and "hyst" while others use "min" and "max". A
given temperature sensor almost never has both.
>min and max are not explained.
>
>
What documentation are you reading?
>hyst and over are not used in the LM85 example configuration.
>
>
Because the lm85 uses minimum and maximum temperature limits.
If the temperature is less than the minimum or greater than the maximum,
an ALARM is signalled. If it's between the values, then there is no error.
>In addition, the documentation of lm85 mentions other sensors.conf variables
>named zone#_{limit,hyst,range,critical} which aren't used at all. It looks as if
>the documentation is out of date and these variables don't exist any more. Is
>that assumption true?
>
>
The zone#_ values were originally implemented in the 2.4 version of the
driver. They control the automatic fan speed control features of the
lm85 chip. The first port of the lm85 driver to the 2.6 kernel did not
include those configuration registers. They have since been added back
to the 2.6 driver.
But in the 2.6 kernel, the lm_sensors team tried to enforce a consistent
set of values for automatic fan speed control. So at least in 2.6.13,
the name and values have changed and things are worked around in the
driver to present the "standard" interface values:
temp#_auto_temp_{off,min,max,crit}
pwm#_auto_pwm_{min,minctl,freq}
:v)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [lm-sensors] Questions on sensors.conf
2006-01-13 0:31 [lm-sensors] Questions on sensors.conf Joachim Schrod
2006-01-13 2:18 ` Philip Pokorny
@ 2006-01-13 11:57 ` Joachim Schrod
2006-01-13 21:47 ` Philip Pokorny
2006-01-14 17:20 ` Joachim Schrod
3 siblings, 0 replies; 5+ messages in thread
From: Joachim Schrod @ 2006-01-13 11:57 UTC (permalink / raw)
To: lm-sensors
Philip Pokorny wrote:
>>The system has an Intel D865GBF ATX Mainboard
>>
> That should be fairly similar to the D865PERL which was one of the MB on
> which the lm85 driver was originally developed and tested.
That's interesting. Anybody has a sensors.conf for a D865PERL board that he is
willing to share? I didn't found one in the list archive.
The example configuration in sensors.conf.eg is for an Intel S845WD1-E and I
started with that.
>> [temperature is higher in BIOS than reported by lm_sensors]
>
> Be careful. When a CPU is sitting in the BIOS, it is usually in a
> spin-loop which usually puts the CPU in a maximum power situation. So
> the CPU temperatures in the BIOS will frequently be higher than you
> observe under an operating system like Linux where the CPU is put into
> the HALT state when there isn't any work to do.
>
> Unless the values are *way* off, or you have a way to measure the same
> temperature or value using independant means _at the same time_, I would
> recommend you *not* adjust the readings returned by an lm_sensors chip
> driver.
OK, then I'll trust the sensors readings. Sadly, ACPI doesn't have thermal
information (the board's DSDT also has no _TMP variable, so this seems to be
deliberate), so I cannot get at the temperature in an independent way.
> Did the BIOS program the temp#_offset registers? Can you report the
> values from those configuration registers?
I assume that they would appear in /sys, if they would be registered, right?
There are no temp#_offset variables.
There are temp#_auto_temp_off variables, with 86000 as content.
> The "Board" sensor (temp2) is actually internal to the lm85 chip. So if
> you can *find* the chip which is labeled lm85, that will be the location
> of that temperature...
>
> A temperature sensor on
> the motherboard that is close to a high power component or function like
> the VRM (Vcore power supply) for the CPU *will* read higher than ambient
> because heat from the power transistors in the VRM disipate heat through
> the copper traces on the motherboard and the motherboard itself. This
> heats components near them on the motherboard. It wouldn't be
> unreasonable for the "Remote" temperature sensor to in fact be located
> very near the VRM to in effect measure the temperature of the VRM.
Hmm, didn't find the lm85 chip at first sight. And I don't want to rip out all
cards to check the board more thoroughly. Sadly, the board documentation (from
http://www.intel.com/design/motherbd/bf/bf_documentation.htm, in the spec
update) has information about its hot zones -- but not about the temperature
measurement zones.
So I think I don't have much further chance to associate the temp# measurements
to board or chassis areas. Maybe temp3 is near such a `hot zone' because it's
higher than temp2.
> Generally, I'm surprised that the limits are set this low. They are too
> low for a P4 CPU.
That would have been my next question -- what are good limits? :-)
I looked up the P4 specs, at
http://download.intel.com/design/Pentium4/datashts/29864312.pdf
It tells that the chassis temperature shall be between 5 and 70?C, that would
have been my first guess at temp1_min and temp1_max.
But both the P4 and the board documentation also tell me that the ``chassis'
maximum internal ambient temperatur'' shall be 38?C. Actually, that shall be the
temperature at the CPU's fan heatsink, with a worst case limit of 40?C (on p.84
of the P4 datasheet).
So, maybe that 38 or 40? are good values for temp2_max or temp3_max?
On i2c/lm_sensors documentation:
>>There is temp#_min, temp#_max, temp#_hyst, and temp#_over.
>>
>>min and max are not explained.
>>
> What documentation are you reading?
All that I could found. I downloaded the current 2.9.2 distribution to be sure
that I got the current docs. I even checked some of the files in CVS.
I looked in sensors.conf.eg, doc/chips/lm85, the FAQ and did a grep of
'temp.*_min' over doc/chips/*. I also looked at the Web site. I found the
explanation of hyst and over in sensors.conf.eg, but none for min and max.
Anyhow, your following explanation was sufficient for me. :-)
> But in the 2.6 kernel, the lm_sensors team tried to enforce a consistent
> set of values for automatic fan speed control. So at least in 2.6.13,
> the name and values have changed and things are worked around in the
> driver to present the "standard" interface values:
> temp#_auto_temp_{off,min,max,crit}
> pwm#_auto_pwm_{min,minctl,freq}
OK. Is there a documentation that explains them?
Neither the lm_sensors nor the i2c distribution mentions them.
I also checked the list's archive and couldn't locate an explanation.
In some emails to this list a Linux kernel documentation file
i2c/sysfs-interface is mentioned, but I only have one from Linux 2.6.8 (SUSE
9.2) where these variables are not mentioned; and in 2.6.13 (SUSE 10.0) this
file disappeared.
I ask because the values in these files are quite high. (If I interpret them
correctly as millidegrees C.)
temp1_auto_temp_off 86000
temp1_auto_temp_min 90000
temp1_auto_temp_crit 100000
temp1_auto_temp_max 122000
Do they make sense? What are their semantics?
Sorry for the loads of questions, but I actually want to understand what's going
on on my system. :-)
Joachim
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joachim Schrod Email: jschrod at acm.org
Roedermark, Germany
^ permalink raw reply [flat|nested] 5+ messages in thread
* [lm-sensors] Questions on sensors.conf
2006-01-13 0:31 [lm-sensors] Questions on sensors.conf Joachim Schrod
2006-01-13 2:18 ` Philip Pokorny
2006-01-13 11:57 ` Joachim Schrod
@ 2006-01-13 21:47 ` Philip Pokorny
2006-01-14 17:20 ` Joachim Schrod
3 siblings, 0 replies; 5+ messages in thread
From: Philip Pokorny @ 2006-01-13 21:47 UTC (permalink / raw)
To: lm-sensors
Joachim Schrod wrote:
>Philip Pokorny wrote:
> > Unless the values are *way* off, or you have a way to measure the same
> > temperature or value using independant means _at the same time_, I would
> > recommend you *not* adjust the readings returned by an lm_sensors chip
> > driver.
>
>OK, then I'll trust the sensors readings. Sadly, ACPI doesn't have thermal
>information (the board's DSDT also has no _TMP variable, so this seems to be
>deliberate), so I cannot get at the temperature in an independent way.
>
>
No, ACPI would just be reporting the values from the same sensor chip.
Independant in this case would be a thermocouple, thermometer, or
non-contact temperature sensor.
> > Did the BIOS program the temp#_offset registers? Can you report the
> > values from those configuration registers?
>
>I assume that they would appear in /sys, if they would be registered, right?
>There are no temp#_offset variables.
>
>
Perhaps I'm confusing this with another chip. Nevermind.
>>Generally, I'm surprised that the limits are set this low. They are too
>>low for a P4 CPU.
>>
>>
>
>That would have been my next question -- what are good limits? :-)
>
>I looked up the P4 specs, at
>http://download.intel.com/design/Pentium4/datashts/29864312.pdf
>
>It tells that the chassis temperature shall be between 5 and 70?C, that would
>have been my first guess at temp1_min and temp1_max.
>
>
No, that's not "chassis" temperature, that's the temperature of the
lid/case of the CPU itself. But the temperature reported by the lm85 is
not the case temperature. It the temperature of a sensing diode on the
CPU die itself which will be several degrees hotter than the case
temperature because it's inside the CPU package.
If you check the "Thermal Diode" section of that spec, you'll see that
the diode is "characterized" at 75 degC.
(Note 2) There isn't a spec for the actual Tjunction temperature for a
P4, but I would guess that 75 deg is a better value for the maximum CPU
temperature.
>But both the P4 and the board documentation also tell me that the ``chassis'
>maximum internal ambient temperatur'' shall be 38?C. Actually, that shall be the
>temperature at the CPU's fan heatsink, with a worst case limit of 40?C (on p.84
>of the P4 datasheet).
>
>So, maybe that 38 or 40? are good values for temp2_max or temp3_max?
>
>
Yes, if they were measuring air temperature. But since they are mounted
on the board and perhaps located next to hot components, you may need to
increase them a little as well. But start with 40 and see how it goes.
:v)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [lm-sensors] Questions on sensors.conf
2006-01-13 0:31 [lm-sensors] Questions on sensors.conf Joachim Schrod
` (2 preceding siblings ...)
2006-01-13 21:47 ` Philip Pokorny
@ 2006-01-14 17:20 ` Joachim Schrod
3 siblings, 0 replies; 5+ messages in thread
From: Joachim Schrod @ 2006-01-14 17:20 UTC (permalink / raw)
To: lm-sensors
Thanks a lot for your help; now it works like a charm.
Integration in Nagios monitoring was very easy as well, great software that you
made here!
Joachim
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joachim Schrod Email: jschrod at acm.org
Roedermark, Germany
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-01-14 17:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-13 0:31 [lm-sensors] Questions on sensors.conf Joachim Schrod
2006-01-13 2:18 ` Philip Pokorny
2006-01-13 11:57 ` Joachim Schrod
2006-01-13 21:47 ` Philip Pokorny
2006-01-14 17:20 ` Joachim Schrod
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.