* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
@ 2011-06-01 18:32 Yu, Fenghua
2011-06-06 13:57 ` Jean Delvare
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Yu, Fenghua @ 2011-06-01 18:32 UTC (permalink / raw)
To: lm-sensors
> -----Original Message-----
> From: Guenter Roeck [mailto:guenter.roeck@ericsson.com]
> Sent: Wednesday, June 01, 2011 11:09 AM
> To: lm-sensors@lm-sensors.org
> Cc: Guenter Roeck; Carsten Emde; Yu, Fenghua; Jean Delvare
> Subject: [PATCH] hwmon: (coretemp) Further relax temperature range
> checks
>
> Further relax temperature range checks after reading the
> IA32_TEMPERATURE_TARGET
> register. If the register returns a value other than 0 in bits 16..32,
> assume
> that the returned value is correct.
>
> This change applies to both packet and core temperature limits.
>
> Cc: Carsten Emde <C.Emde@osadl.org>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Jean Delvare <khali@linux-fr.org>
> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
> ---
> drivers/hwmon/coretemp.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index 1680977..85e9379 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -296,7 +296,7 @@ static int get_tjmax(struct cpuinfo_x86 *c, u32 id,
> struct device *dev)
> * If the TjMax is not plausible, an assumption
> * will be used
> */
> - if (val >= 70 && val <= 125) {
> + if (val) {
> dev_info(dev, "TjMax is %d C.\n", val);
> return val * 1000;
> }
> @@ -326,7 +326,7 @@ static int get_pkg_tjmax(unsigned int cpu, struct
> device *dev)
> err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET, &eax,
> &edx);
> if (!err) {
> val = (eax >> 16) & 0xff;
> - if (val > 80 && val < 120)
> + if (val)
> return val * 1000;
> }
> dev_warn(dev, "Unable to read Pkg-TjMax from CPU:%u\n", cpu);
> --
> 1.7.3.1
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
@ 2011-06-06 13:57 ` Jean Delvare
2011-06-06 14:48 ` Guenter Roeck
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Jean Delvare @ 2011-06-06 13:57 UTC (permalink / raw)
To: lm-sensors
Hi Guenter, Fenghua,
On Wed, 1 Jun 2011 11:08:43 -0700, Guenter Roeck wrote:
> Further relax temperature range checks after reading the IA32_TEMPERATURE_TARGET
> register. If the register returns a value other than 0 in bits 16..32, assume
> that the returned value is correct.
>
> This change applies to both packet and core temperature limits.
Sorry for the later reply. Looks good to me, tested OK on my 3 systems.
See comment below though.
> Cc: Carsten Emde <C.Emde@osadl.org>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Jean Delvare <khali@linux-fr.org>
> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
> ---
> drivers/hwmon/coretemp.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index 1680977..85e9379 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -296,7 +296,7 @@ static int get_tjmax(struct cpuinfo_x86 *c, u32 id, struct device *dev)
> * If the TjMax is not plausible, an assumption
> * will be used
> */
> - if (val >= 70 && val <= 125) {
> + if (val) {
> dev_info(dev, "TjMax is %d C.\n", val);
> return val * 1000;
> }
> @@ -326,7 +326,7 @@ static int get_pkg_tjmax(unsigned int cpu, struct device *dev)
> err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET, &eax, &edx);
> if (!err) {
> val = (eax >> 16) & 0xff;
> - if (val > 80 && val < 120)
> + if (val)
> return val * 1000;
> }
> dev_warn(dev, "Unable to read Pkg-TjMax from CPU:%u\n", cpu);
While we're here, I don't quite get the rationale for having separate
functions get_tjmax() and get_pkg_tjmax(). They do pretty much the
same, don't they? Except that get_pkg_tjmax defaults to 100°C when
get_tjmax calls adjust_tjmax(). This seems wrong, even though I guess
it makes no difference in practice as adjust_tjmax() should only be
needed for older CPU models without the package temperature sensor.
Going one step further, if MSR_IA32_TEMPERATURE_TARGET is the right MSR
for the package TjMax, then this pretty much means that this MSR (or at
least bits 23:16 in it) is not per-core but per-package (it will return
the same value on every core.) Then why are we calling get_tjmax (and
then adjust_tjmax) on every core if the result will always be the same?
This seems conceptually wrong and expensive.
So I would suggest that we move tjmax to struct pdev_entry, and only
fetch it once from MSR_IA32_TEMPERATURE_TARGET. This would also solve a
minor annoyance I'm seeing in my kernel logs with the latest version of
the driver:
coretemp coretemp.0: TjMax is 97 C.
coretemp coretemp.0: TjMax is 97 C.
coretemp coretemp.0: TjMax is 97 C.
coretemp coretemp.0: TjMax is 97 C.
Imagine the result on a system with more CPUs and more core per CPUs...
Not sure if the message itself is very valuable anyway, as tjmax will
be visible in the output of "sensors" or any other monitoring
application, but if we keep it, it should only be displayed once per
physical CPU.
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
2011-06-06 13:57 ` Jean Delvare
@ 2011-06-06 14:48 ` Guenter Roeck
2011-06-08 9:31 ` Jean Delvare
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2011-06-06 14:48 UTC (permalink / raw)
To: lm-sensors
Hi Jean,
On Mon, Jun 06, 2011 at 09:57:53AM -0400, Jean Delvare wrote:
> Hi Guenter, Fenghua,
>
> On Wed, 1 Jun 2011 11:08:43 -0700, Guenter Roeck wrote:
> > Further relax temperature range checks after reading the IA32_TEMPERATURE_TARGET
> > register. If the register returns a value other than 0 in bits 16..32, assume
> > that the returned value is correct.
> >
> > This change applies to both packet and core temperature limits.
>
> Sorry for the later reply. Looks good to me, tested OK on my 3 systems.
>
> See comment below though.
>
> > Cc: Carsten Emde <C.Emde@osadl.org>
> > Cc: Fenghua Yu <fenghua.yu@intel.com>
> > Cc: Jean Delvare <khali@linux-fr.org>
> > Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
> > ---
> > drivers/hwmon/coretemp.c | 4 ++--
> > 1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> > index 1680977..85e9379 100644
> > --- a/drivers/hwmon/coretemp.c
> > +++ b/drivers/hwmon/coretemp.c
> > @@ -296,7 +296,7 @@ static int get_tjmax(struct cpuinfo_x86 *c, u32 id, struct device *dev)
> > * If the TjMax is not plausible, an assumption
> > * will be used
> > */
> > - if (val >= 70 && val <= 125) {
> > + if (val) {
> > dev_info(dev, "TjMax is %d C.\n", val);
> > return val * 1000;
> > }
> > @@ -326,7 +326,7 @@ static int get_pkg_tjmax(unsigned int cpu, struct device *dev)
> > err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET, &eax, &edx);
> > if (!err) {
> > val = (eax >> 16) & 0xff;
> > - if (val > 80 && val < 120)
> > + if (val)
> > return val * 1000;
> > }
> > dev_warn(dev, "Unable to read Pkg-TjMax from CPU:%u\n", cpu);
>
> While we're here, I don't quite get the rationale for having separate
> functions get_tjmax() and get_pkg_tjmax(). They do pretty much the
> same, don't they? Except that get_pkg_tjmax defaults to 100°C when
> get_tjmax calls adjust_tjmax(). This seems wrong, even though I guess
> it makes no difference in practice as adjust_tjmax() should only be
> needed for older CPU models without the package temperature sensor.
>
That was my assumption as well.
> Going one step further, if MSR_IA32_TEMPERATURE_TARGET is the right MSR
> for the package TjMax, then this pretty much means that this MSR (or at
> least bits 23:16 in it) is not per-core but per-package (it will return
> the same value on every core.) Then why are we calling get_tjmax (and
> then adjust_tjmax) on every core if the result will always be the same?
> This seems conceptually wrong and expensive.
>
I have not really thought about it myself, but that seems to be likely.
> So I would suggest that we move tjmax to struct pdev_entry, and only
> fetch it once from MSR_IA32_TEMPERATURE_TARGET. This would also solve a
Makes sense to me, though I'd like to get input from Fenghua if your analysis
is correct.
> minor annoyance I'm seeing in my kernel logs with the latest version of
> the driver:
>
> coretemp coretemp.0: TjMax is 97 C.
> coretemp coretemp.0: TjMax is 97 C.
> coretemp coretemp.0: TjMax is 97 C.
> coretemp coretemp.0: TjMax is 97 C.
>
> Imagine the result on a system with more CPUs and more core per CPUs...
> Not sure if the message itself is very valuable anyway, as tjmax will
> be visible in the output of "sensors" or any other monitoring
> application, but if we keep it, it should only be displayed once per
> physical CPU.
>
We should probably make the message a debug message or remove it entirely.
After all, we don't display this kind of stuff for other drivers either. And,
yes, I do get it 16 times on one of the systems I tested it with ;).
Thanks,
Guenter
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
2011-06-06 13:57 ` Jean Delvare
2011-06-06 14:48 ` Guenter Roeck
@ 2011-06-08 9:31 ` Jean Delvare
2011-06-08 10:17 ` R, Durgadoss
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Jean Delvare @ 2011-06-08 9:31 UTC (permalink / raw)
To: lm-sensors
On Mon, 6 Jun 2011 07:48:15 -0700, Guenter Roeck wrote:
> On Mon, Jun 06, 2011 at 09:57:53AM -0400, Jean Delvare wrote:
> > While we're here, I don't quite get the rationale for having separate
> > functions get_tjmax() and get_pkg_tjmax(). They do pretty much the
> > same, don't they? Except that get_pkg_tjmax defaults to 100°C when
> > get_tjmax calls adjust_tjmax(). This seems wrong, even though I guess
> > it makes no difference in practice as adjust_tjmax() should only be
> > needed for older CPU models without the package temperature sensor.
>
> That was my assumption as well.
For now I'll submit a patch dropping get_pkg_tjmax, as this seems
desirable regardless of the rest.
> > Going one step further, if MSR_IA32_TEMPERATURE_TARGET is the right MSR
> > for the package TjMax, then this pretty much means that this MSR (or at
> > least bits 23:16 in it) is not per-core but per-package (it will return
> > the same value on every core.) Then why are we calling get_tjmax (and
> > then adjust_tjmax) on every core if the result will always be the same?
> > This seems conceptually wrong and expensive.
>
> I have not really thought about it myself, but that seems to be likely.
>
> > So I would suggest that we move tjmax to struct pdev_entry, and only
> > fetch it once from MSR_IA32_TEMPERATURE_TARGET. This would also solve a
>
> Makes sense to me, though I'd like to get input from Fenghua if your analysis
> is correct.
I have investigated this a little further, and have questions / action
items for Fenghua and Durgadoss (or anyone else at Intel...)
The Software Developer's Manual says that MSR 1A2h has scope "Thread"
for Nehalem, but scope "Unique" for Sandy Bridge.
1* "Unique" isn't a valid scope value for a Sandy Bridge MSR. It should
be one of "Package", "Core" or "Thread". "Unique" is only used for
Pentium 4 and Atom models. This should be fixed by whoever is
responsible for the document at Intel.
2* "Thread" seems wrong for Nehalem. Experience shows that thermal
sensors are per-core [1], so I would expect scope "Core" for
MSR_TEMPERATURE_TARGET. I would love to have confirmation from
someone at Intel, and documentation fixed if needs be.
[1] See commit d883b9f0977269d519469da72faec6a7f72cb489,
"hwmon: (coretemp) Skip duplicate CPU entries".
3* Assuming that "Unique" becomes "Package" for Sandy Bridge, this can
mean two things: either the MSR was actually changed from
per-thread/core to per-package when Sandy Bridge was developed, or
it was already per-package in Nehalem and documentation is wrong. I
would love to have someone at Intel clarify this point, and
documentation fixed if needs be.
4* Regardless of the technically correct answer to point 3 above, I
have never seen any multi-core CPU supported by the coretemp driver
report different tjmax values for different cores. Did anyone ever
see this? If not, I would feel comfortable reading tjmax from
MSR_TEMPERATURE_TARGET only once per package for all models. This
would make driver initialization faster.
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
` (2 preceding siblings ...)
2011-06-08 9:31 ` Jean Delvare
@ 2011-06-08 10:17 ` R, Durgadoss
2011-06-08 16:45 ` Yu, Fenghua
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: R, Durgadoss @ 2011-06-08 10:17 UTC (permalink / raw)
To: lm-sensors
Hi Jean,
> 4* Regardless of the technically correct answer to point 3 above, I
> have never seen any multi-core CPU supported by the coretemp driver
> report different tjmax values for different cores. Did anyone ever
> see this? If not, I would feel comfortable reading tjmax from
> MSR_TEMPERATURE_TARGET only once per package for all models. This
> would make driver initialization faster.
>
When I tested in 3 of my machines, it did not Show different TjMax.
So, I agree that we should have the TjMax per package.
Thanks,
Durga
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
` (3 preceding siblings ...)
2011-06-08 10:17 ` R, Durgadoss
@ 2011-06-08 16:45 ` Yu, Fenghua
2011-06-08 18:32 ` Jean Delvare
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Yu, Fenghua @ 2011-06-08 16:45 UTC (permalink / raw)
To: lm-sensors
> -----Original Message-----
> From: R, Durgadoss
> Sent: Wednesday, June 08, 2011 3:06 AM
> To: Jean Delvare; Guenter Roeck
> Cc: lm-sensors@lm-sensors.org; Carsten Emde; Yu, Fenghua
> Subject: RE: [PATCH] hwmon: (coretemp) Further relax temperature range
> checks
>
> Hi Jean,
>
> > 4* Regardless of the technically correct answer to point 3 above, I
> > have never seen any multi-core CPU supported by the coretemp
> driver
> > report different tjmax values for different cores. Did anyone ever
> > see this? If not, I would feel comfortable reading tjmax from
> > MSR_TEMPERATURE_TARGET only once per package for all models. This
> > would make driver initialization faster.
> >
>
> When I tested in 3 of my machines, it did not Show different TjMax.
> So, I agree that we should have the TjMax per package.
According to SDM, MSR_TEMPERATURE_TARGET scope is thread. Reading it per package is not correct.
Coding this MSR based on observation is not right and may cause problem in other cases or in the future.
Thanks.
-Fenghua
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
` (4 preceding siblings ...)
2011-06-08 16:45 ` Yu, Fenghua
@ 2011-06-08 18:32 ` Jean Delvare
2011-06-09 1:01 ` Yu, Fenghua
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Jean Delvare @ 2011-06-08 18:32 UTC (permalink / raw)
To: lm-sensors
Hi Fenghua,
On Wed, 8 Jun 2011 09:45:03 -0700, Yu, Fenghua wrote:
> > -----Original Message-----
> > From: R, Durgadoss
> > Sent: Wednesday, June 08, 2011 3:06 AM
> > To: Jean Delvare; Guenter Roeck
> > Cc: lm-sensors@lm-sensors.org; Carsten Emde; Yu, Fenghua
> > Subject: RE: [PATCH] hwmon: (coretemp) Further relax temperature range
> > checks
> >
> > Hi Jean,
> >
> > > 4* Regardless of the technically correct answer to point 3 above, I
> > > have never seen any multi-core CPU supported by the coretemp driver
> > > report different tjmax values for different cores. Did anyone ever
> > > see this? If not, I would feel comfortable reading tjmax from
> > > MSR_TEMPERATURE_TARGET only once per package for all models. This
> > > would make driver initialization faster.
> > >
> >
> > When I tested in 3 of my machines, it did not Show different TjMax.
> > So, I agree that we should have the TjMax per package.
>
> According to SDM, MSR_TEMPERATURE_TARGET scope is thread. Reading it per package is not correct.
As I explained in my original post, the SDM says something different
for Nehalem (scope thread) and Sandy Bridge (unique/package scope).
This is what I would like to see clarified. Please refer to my
questions in the original post.
Plus, we already don't follow the SDM to the letter, as we currently
read TjMax per core, rather than per thread. Which makes sense given
that MSR 19Ch (temperature measurement) is described as having scope
core. Obviously it makes no sense to have per-thread limits and
per-core measurements. In fact it is not even possible, as the
measurements are relative to the limit. So, the SDM is wrong with
regards to scope for either IA32_TEMPERATURE_TARGET (1A2h) or
IA32_THERM_STATUS (19Ch) on Nehalem.
> Coding this MSR based on observation is not right and may cause problem in other cases or in the future.
If you have actual examples of "other cases", I'm all ears.
I wouldn't care too much about the future. The fact that the package
TjMax is currently read from the same MSR bits as the per-core TjMax
implies that the value _is_ per-package for Sandy Bridge (as
documented, BTW.) If a future microarchitecture really offers a
per-package limit distinct from per-core limits, this will have to be
read from different MSR bits, so the coretemp driver will have to be
updated anyway.
As a summary, I'm fine following the documentation, but only as long as
it doesn't contradict logic and real-world experiments.
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
` (5 preceding siblings ...)
2011-06-08 18:32 ` Jean Delvare
@ 2011-06-09 1:01 ` Yu, Fenghua
2011-06-09 8:19 ` Jean Delvare
2011-06-09 21:01 ` Yu, Fenghua
8 siblings, 0 replies; 10+ messages in thread
From: Yu, Fenghua @ 2011-06-09 1:01 UTC (permalink / raw)
To: lm-sensors
> From: Jean Delvare [mailto:khali@linux-fr.org]
> Sent: Wednesday, June 08, 2011 2:31 AM
> To: Guenter Roeck
> Cc: lm-sensors@lm-sensors.org; Carsten Emde; Yu, Fenghua; R, Durgadoss
> Subject: Re: [PATCH] hwmon: (coretemp) Further relax temperature range
> checks
>
> On Mon, 6 Jun 2011 07:48:15 -0700, Guenter Roeck wrote:
> > On Mon, Jun 06, 2011 at 09:57:53AM -0400, Jean Delvare wrote:
> > > While we're here, I don't quite get the rationale for having
> separate
> > > functions get_tjmax() and get_pkg_tjmax(). They do pretty much the
> > > same, don't they? Except that get_pkg_tjmax defaults to 100°C when
> > > get_tjmax calls adjust_tjmax(). This seems wrong, even though I
> guess
> > > it makes no difference in practice as adjust_tjmax() should only be
> > > needed for older CPU models without the package temperature sensor.
> >
> > That was my assumption as well.
>
> For now I'll submit a patch dropping get_pkg_tjmax, as this seems
> desirable regardless of the rest.
>
> > > Going one step further, if MSR_IA32_TEMPERATURE_TARGET is the right
> MSR
> > > for the package TjMax, then this pretty much means that this MSR
> (or at
> > > least bits 23:16 in it) is not per-core but per-package (it will
> return
> > > the same value on every core.) Then why are we calling get_tjmax
> (and
> > > then adjust_tjmax) on every core if the result will always be the
> same?
> > > This seems conceptually wrong and expensive.
> >
> > I have not really thought about it myself, but that seems to be
> likely.
> >
> > > So I would suggest that we move tjmax to struct pdev_entry, and
> only
> > > fetch it once from MSR_IA32_TEMPERATURE_TARGET. This would also
> solve a
> >
> > Makes sense to me, though I'd like to get input from Fenghua if your
> analysis
> > is correct.
>
> I have investigated this a little further, and have questions / action
> items for Fenghua and Durgadoss (or anyone else at Intel...)
>
> The Software Developer's Manual says that MSR 1A2h has scope "Thread"
> for Nehalem, but scope "Unique" for Sandy Bridge.
>
> 1* "Unique" isn't a valid scope value for a Sandy Bridge MSR. It should
> be one of "Package", "Core" or "Thread". "Unique" is only used for
> Pentium 4 and Atom models. This should be fixed by whoever is
> responsible for the document at Intel.
>
> 2* "Thread" seems wrong for Nehalem. Experience shows that thermal
> sensors are per-core [1], so I would expect scope "Core" for
> MSR_TEMPERATURE_TARGET. I would love to have confirmation from
> someone at Intel, and documentation fixed if needs be.
>
> [1] See commit d883b9f0977269d519469da72faec6a7f72cb489,
> "hwmon: (coretemp) Skip duplicate CPU entries".
>
> 3* Assuming that "Unique" becomes "Package" for Sandy Bridge, this can
> mean two things: either the MSR was actually changed from
> per-thread/core to per-package when Sandy Bridge was developed, or
> it was already per-package in Nehalem and documentation is wrong. I
> would love to have someone at Intel clarify this point, and
> documentation fixed if needs be.
>
> 4* Regardless of the technically correct answer to point 3 above, I
> have never seen any multi-core CPU supported by the coretemp driver
> report different tjmax values for different cores. Did anyone ever
> see this? If not, I would feel comfortable reading tjmax from
> MSR_TEMPERATURE_TARGET only once per package for all models. This
> would make driver initialization faster.
Hi, Jean,
I contacted SDM author and hardware architect. Yes, the "Unique" term for Sandy Bridge is wrong. It should be "Thread" instead. Next SDM will fix this.
The interface was defined by in P4 as per thread. They believe there is no viable usage of setting different temp targets on siblings.
So one could say the architecture of per-thread interface is a quirk of legacy.
And you can read the target temperature on one thread and it will apply to all cores in a package.
Thanks.
-Fenghua
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
` (6 preceding siblings ...)
2011-06-09 1:01 ` Yu, Fenghua
@ 2011-06-09 8:19 ` Jean Delvare
2011-06-09 21:01 ` Yu, Fenghua
8 siblings, 0 replies; 10+ messages in thread
From: Jean Delvare @ 2011-06-09 8:19 UTC (permalink / raw)
To: lm-sensors
Hi Fenghua,
On Wed, 8 Jun 2011 18:01:12 -0700, Yu, Fenghua wrote:
> > From: Jean Delvare [mailto:khali@linux-fr.org]
> > Sent: Wednesday, June 08, 2011 2:31 AM
> > To: Guenter Roeck
> > Cc: lm-sensors@lm-sensors.org; Carsten Emde; Yu, Fenghua; R, Durgadoss
> > Subject: Re: [PATCH] hwmon: (coretemp) Further relax temperature range
> > checks
> >
> > I have investigated this a little further, and have questions / action
> > items for Fenghua and Durgadoss (or anyone else at Intel...)
> >
> > The Software Developer's Manual says that MSR 1A2h has scope "Thread"
> > for Nehalem, but scope "Unique" for Sandy Bridge.
> >
> > 1* "Unique" isn't a valid scope value for a Sandy Bridge MSR. It should
> > be one of "Package", "Core" or "Thread". "Unique" is only used for
> > Pentium 4 and Atom models. This should be fixed by whoever is
> > responsible for the document at Intel.
> >
> > 2* "Thread" seems wrong for Nehalem. Experience shows that thermal
> > sensors are per-core [1], so I would expect scope "Core" for
> > MSR_TEMPERATURE_TARGET. I would love to have confirmation from
> > someone at Intel, and documentation fixed if needs be.
> >
> > [1] See commit d883b9f0977269d519469da72faec6a7f72cb489,
> > "hwmon: (coretemp) Skip duplicate CPU entries".
> >
> > 3* Assuming that "Unique" becomes "Package" for Sandy Bridge, this can
> > mean two things: either the MSR was actually changed from
> > per-thread/core to per-package when Sandy Bridge was developed, or
> > it was already per-package in Nehalem and documentation is wrong. I
> > would love to have someone at Intel clarify this point, and
> > documentation fixed if needs be.
> >
> > 4* Regardless of the technically correct answer to point 3 above, I
> > have never seen any multi-core CPU supported by the coretemp driver
> > report different tjmax values for different cores. Did anyone ever
> > see this? If not, I would feel comfortable reading tjmax from
> > MSR_TEMPERATURE_TARGET only once per package for all models. This
> > would make driver initialization faster.
>
> I contacted SDM author and hardware architect.
Thanks a lot for doing this!
> Yes, the "Unique" term for Sandy Bridge is wrong. It should be "Thread" instead. Next SDM will fix this..
This would bring it in line with Nehalem. However this doesn't look
right. Remember that IA32_THERM_STATUS (MSR 19Ch) has scope "Core", so
IA32_TEMPERATURE_TARGET (MSR 1A2h) can't have scope "Thread", because
you need to combine the values from both MSRs to return the current
temperature. IA32_TEMPERATURE_TARGET must have a scope equal to or
broader than the scope of IA32_THERM_STATUS, otherwise the computation
is not possible. What do the SDM author and hardware architect have to
say about this?
Furthermore, on Sandy Bridge, IA32_PACKAGE_THERM_STATUS (MSR 1B1h) has
scope "Package" but it is a relative value (just as IA32_THERM_STATUS.)
To what can it be relative, if there is no package-level Temperature
Target available? For this reason I really believe that
IA32_TEMPERATURE_TARGET (MSR 1A2h) has scope "Package" on Sandy Bridge.
Again, I am curious what your contacts will have to say about this.
> The interface was defined by in P4 as per thread. They believe there is no viable usage of setting different temp targets on siblings.
> So one could say the architecture of per-thread interface is a quirk of legacy.
>
> And you can read the target temperature on one thread and it will apply to all cores in a package.
OK, great. I'll prepare a patch, I'll post it when I am done testing it.
Thanks,
--
Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [lm-sensors] [PATCH] hwmon: (coretemp) Further relax
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
` (7 preceding siblings ...)
2011-06-09 8:19 ` Jean Delvare
@ 2011-06-09 21:01 ` Yu, Fenghua
8 siblings, 0 replies; 10+ messages in thread
From: Yu, Fenghua @ 2011-06-09 21:01 UTC (permalink / raw)
To: lm-sensors
> -----Original Message-----
> From: Jean Delvare [mailto:khali@linux-fr.org]
> Sent: Thursday, June 09, 2011 1:19 AM
> To: Yu, Fenghua
> Cc: Guenter Roeck; lm-sensors@lm-sensors.org; Carsten Emde; R,
> Durgadoss
> Subject: Re: [PATCH] hwmon: (coretemp) Further relax temperature range
> checks
>
> Hi Fenghua,
>
> On Wed, 8 Jun 2011 18:01:12 -0700, Yu, Fenghua wrote:
> > > From: Jean Delvare [mailto:khali@linux-fr.org]
> > > Sent: Wednesday, June 08, 2011 2:31 AM
> > > To: Guenter Roeck
> > > Cc: lm-sensors@lm-sensors.org; Carsten Emde; Yu, Fenghua; R,
> Durgadoss
> > > Subject: Re: [PATCH] hwmon: (coretemp) Further relax temperature
> range
> > > checks
> > >
> > > I have investigated this a little further, and have questions /
> action
> > > items for Fenghua and Durgadoss (or anyone else at Intel...)
> > >
> > > The Software Developer's Manual says that MSR 1A2h has scope
> "Thread"
> > > for Nehalem, but scope "Unique" for Sandy Bridge.
> > >
> > > 1* "Unique" isn't a valid scope value for a Sandy Bridge MSR. It
> should
> > > be one of "Package", "Core" or "Thread". "Unique" is only used
> for
> > > Pentium 4 and Atom models. This should be fixed by whoever is
> > > responsible for the document at Intel.
> > >
> > > 2* "Thread" seems wrong for Nehalem. Experience shows that thermal
> > > sensors are per-core [1], so I would expect scope "Core" for
> > > MSR_TEMPERATURE_TARGET. I would love to have confirmation from
> > > someone at Intel, and documentation fixed if needs be.
> > >
> > > [1] See commit d883b9f0977269d519469da72faec6a7f72cb489,
> > > "hwmon: (coretemp) Skip duplicate CPU entries".
> > >
> > > 3* Assuming that "Unique" becomes "Package" for Sandy Bridge, this
> can
> > > mean two things: either the MSR was actually changed from
> > > per-thread/core to per-package when Sandy Bridge was developed,
> or
> > > it was already per-package in Nehalem and documentation is
> wrong. I
> > > would love to have someone at Intel clarify this point, and
> > > documentation fixed if needs be.
> > >
> > > 4* Regardless of the technically correct answer to point 3 above, I
> > > have never seen any multi-core CPU supported by the coretemp
> driver
> > > report different tjmax values for different cores. Did anyone
> ever
> > > see this? If not, I would feel comfortable reading tjmax from
> > > MSR_TEMPERATURE_TARGET only once per package for all models.
> This
> > > would make driver initialization faster.
> >
> > I contacted SDM author and hardware architect.
>
> Thanks a lot for doing this!
>
> > Yes, the "Unique" term for Sandy Bridge is wrong. It should be
> "Thread" instead. Next SDM will fix this..
>
> This would bring it in line with Nehalem. However this doesn't look
> right. Remember that IA32_THERM_STATUS (MSR 19Ch) has scope "Core", so
> IA32_TEMPERATURE_TARGET (MSR 1A2h) can't have scope "Thread", because
> you need to combine the values from both MSRs to return the current
> temperature. IA32_TEMPERATURE_TARGET must have a scope equal to or
> broader than the scope of IA32_THERM_STATUS, otherwise the computation
> is not possible. What do the SDM author and hardware architect have to
> say about this?
>
> Furthermore, on Sandy Bridge, IA32_PACKAGE_THERM_STATUS (MSR 1B1h) has
> scope "Package" but it is a relative value (just as IA32_THERM_STATUS.)
> To what can it be relative, if there is no package-level Temperature
> Target available? For this reason I really believe that
> IA32_TEMPERATURE_TARGET (MSR 1A2h) has scope "Package" on Sandy Bridge.
> Again, I am curious what your contacts will have to say about this.
>
Your argument is valid. Next SDM will change the scope to "Package" for SNB. The scope for Nehalem is still "Thread". I think they just want to keep it a quirk of legacy on Nehalem.
> > The interface was defined by in P4 as per thread. They believe there
> is no viable usage of setting different temp targets on siblings.
> > So one could say the architecture of per-thread interface is a quirk
> of legacy.
> >
> > And you can read the target temperature on one thread and it will
> apply to all cores in a package.
>
> OK, great. I'll prepare a patch, I'll post it when I am done testing
> it.
I think we can work on the patch to view the package scope for this MSR. I believe you will add a comment to explain why we need this patch.
>
> Thanks,
> --
> Jean Delvare
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-06-09 21:01 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-01 18:32 [lm-sensors] [PATCH] hwmon: (coretemp) Further relax Yu, Fenghua
2011-06-06 13:57 ` Jean Delvare
2011-06-06 14:48 ` Guenter Roeck
2011-06-08 9:31 ` Jean Delvare
2011-06-08 10:17 ` R, Durgadoss
2011-06-08 16:45 ` Yu, Fenghua
2011-06-08 18:32 ` Jean Delvare
2011-06-09 1:01 ` Yu, Fenghua
2011-06-09 8:19 ` Jean Delvare
2011-06-09 21:01 ` Yu, Fenghua
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.