linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/5] Package Level Thermal Control and Power Limit Notification: feature enabling
       [not found] ` <4C485DF1.5050407@linux.intel.com>
@ 2010-07-22 16:21   ` Fenghua Yu
  2010-07-22 16:21   ` [PATCH 2/5] Package Level Thermal Control and Power Limit Notification: pkgtemp hwmon driver Fenghua Yu
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Fenghua Yu @ 2010-07-22 16:21 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Brown, Len,
	Chen Gong, Jean Delvare, Guenter Roeck, Huaxu Wan
  Cc: lkml, lm-sensors


Add package level thermal and power limit feature support in the kernel.

The two MSRs and features are new starting with Intel's Sandy Bridge processor.

Please check Intel 64 and IA-32 Architectures SDMV Vol 3A 14.5.6 Power Limit
Notification and 14.6 Package Level Thermal Management.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
---

 include/asm/cpufeature.h          |    2 ++
 include/asm/msr-index.h           |    3 +++
 kernel/cpu/addon_cpuid_features.c |    2 ++
 3 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 4681459..79517b5 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -162,6 +162,8 @@
 #define X86_FEATURE_IDA		(7*32+ 0) /* Intel Dynamic Acceleration */
 #define X86_FEATURE_ARAT	(7*32+ 1) /* Always Running APIC Timer */
 #define X86_FEATURE_CPB		(7*32+ 2) /* AMD Core Performance Boost */
+#define X86_FEATURE_PLN		(7*32+ 3) /* Intel Power Limit Notification */
+#define X86_FEATURE_PTS		(7*32+ 4) /* Intel Package Thermal Status */
 
 /* Virtualization flags: Linux defined */
 #define X86_FEATURE_TPR_SHADOW  (8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 8c7ae43..5eaad6f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -239,6 +239,9 @@
 
 #define MSR_IA32_TEMPERATURE_TARGET	0x000001a2
 
+#define MSR_IA32_PACKAGE_THERM_STATUS		0x000001b1
+#define MSR_IA32_PACKAGE_THERM_INTERRUPT	0x000001b2
+
 /* MISC_ENABLE bits: architectural */
 #define MSR_IA32_MISC_ENABLE_FAST_STRING	(1ULL << 0)
 #define MSR_IA32_MISC_ENABLE_TCC		(1ULL << 1)
diff --git a/arch/x86/kernel/cpu/addon_cpuid_features.c b/arch/x86/kernel/cpu/addon_cpuid_features.c
index 10fa568..c1f4b98 100644
--- a/arch/x86/kernel/cpu/addon_cpuid_features.c
+++ b/arch/x86/kernel/cpu/addon_cpuid_features.c
@@ -32,6 +32,8 @@ void __cpuinit init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 	static const struct cpuid_bit __cpuinitconst cpuid_bits[] = {
 		{ X86_FEATURE_IDA,   		CR_EAX, 1, 0x00000006 },
 		{ X86_FEATURE_ARAT,  		CR_EAX, 2, 0x00000006 },
+		{ X86_FEATURE_PLN,		CR_EAX, 4, 0x00000006 },
+		{ X86_FEATURE_PTS,		CR_EAX, 6, 0x00000006 },
 		{ X86_FEATURE_APERFMPERF,	CR_ECX, 0, 0x00000006 },
 		{ X86_FEATURE_CPB,   		CR_EDX, 9, 0x80000007 },
 		{ X86_FEATURE_NPT,   		CR_EDX, 0, 0x8000000a },

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] Package Level Thermal Control and Power Limit Notification: pkgtemp hwmon driver
       [not found] ` <4C485DF1.5050407@linux.intel.com>
  2010-07-22 16:21   ` [PATCH 1/5] Package Level Thermal Control and Power Limit Notification: feature enabling Fenghua Yu
@ 2010-07-22 16:21   ` Fenghua Yu
  2010-07-22 16:22   ` [PATCH 3/5] Package Level Thermal Control and Power Limit Notification: thermal throttling Fenghua Yu
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Fenghua Yu @ 2010-07-22 16:21 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Brown, Len,
	Chen Gong, Jean Delvare, Guenter Roeck, Huaxu Wan
  Cc: lkml, lm-sensors

This patch adds a hwmon driver for package level thermal control. The driver
dumps package level thermal information through sysfs interface so that upper
level application (e.g. lm_sensor) can retrive the information.

Instead of having the package level hwmon code in coretemp, I write a seperate
driver pkgtemp because:

First, package level thermal sensors include not only sensors for each core,
but also sensors for uncore, memory controller or other components in the
package. Logically it will be clear to have a seperate hwmon driver for package
level hwmon to monitor wider range of sensors in a package. Merging package
thermal driver into core thermal driver doesn't make sense and may mislead.

Secondly, merging the two drivers together may cause coding mess. It's easier
to include various package level sensors info if more sensor information is
implemented. Coretemp code needs to consider a lot of legacy machine cases.
Pkgtemp code only considers platform starting from Sandy Bridge.

On a 1Sx4Cx2T Sandy Bridge platform, lm-sensors dumps the pkgtemp and coretemp:

pkgtemp-isa-0000
Adapter: ISA adapter
Physical package id 0: +33.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0002
Adapter: ISA adapter
Core 2:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0003
Adapter: ISA adapter
Core 3:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
---

 arch/x86/configs/i386_defconfig   |    1 
 arch/x86/configs/x86_64_defconfig |    1 
 drivers/hwmon/Kconfig             |    7 
 drivers/hwmon/Makefile            |    1 
 drivers/hwmon/pkgtemp.c           |  456 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 466 insertions(+)

diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index d28fad1..e3a3243 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -1471,6 +1471,7 @@ CONFIG_HWMON=y
 # CONFIG_SENSORS_GL518SM is not set
 # CONFIG_SENSORS_GL520SM is not set
 # CONFIG_SENSORS_CORETEMP is not set
+# CONFIG_SENSORS_PKGTEMP is not set
 # CONFIG_SENSORS_IT87 is not set
 # CONFIG_SENSORS_LM63 is not set
 # CONFIG_SENSORS_LM75 is not set
diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig
index 6c86acd..4251f83 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -1456,6 +1456,7 @@ CONFIG_HWMON=y
 # CONFIG_SENSORS_GL518SM is not set
 # CONFIG_SENSORS_GL520SM is not set
 # CONFIG_SENSORS_CORETEMP is not set
+# CONFIG_SENSORS_PKGTEMP is not set
 # CONFIG_SENSORS_IT87 is not set
 # CONFIG_SENSORS_LM63 is not set
 # CONFIG_SENSORS_LM75 is not set
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index e19cf8e..3a858e8 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -407,6 +407,13 @@ config SENSORS_CORETEMP
 	  sensor inside your CPU. Most of the family 6 CPUs
 	  are supported. Check documentation/driver for details.
 
+config SENSORS_PKGTEMP
+	tristate "Intel processor package temperature sensor"
+	depends on X86 && PCI && EXPERIMENTAL
+	help
+	  If you say yes here you get support for the package level temperature
+	  sensor inside your CPU. Check documentation/driver for details.
+
 config SENSORS_IBMAEM
 	tristate "IBM Active Energy Manager temperature/power sensors and control"
 	select IPMI_SI
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index 2138ceb..879814e 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_SENSORS_AMS)	+= ams/
 obj-$(CONFIG_SENSORS_ASC7621)	+= asc7621.o
 obj-$(CONFIG_SENSORS_ATXP1)	+= atxp1.o
 obj-$(CONFIG_SENSORS_CORETEMP)	+= coretemp.o
+obj-$(CONFIG_SENSORS_PKGTEMP)	+= pkgtemp.o
 obj-$(CONFIG_SENSORS_DME1737)	+= dme1737.o
 obj-$(CONFIG_SENSORS_DS1621)	+= ds1621.o
 obj-$(CONFIG_SENSORS_EMC1403)	+= emc1403.o
diff --git a/drivers/hwmon/pkgtemp.c b/drivers/hwmon/pkgtemp.c
new file mode 100644
index 0000000..fcfe2df
--- /dev/null
+++ b/drivers/hwmon/pkgtemp.c
@@ -0,0 +1,456 @@
+/*
+ * pkgtemp.c - Linux kernel module for processor package hardware monitoring
+ *
+ * Copyright (C) 2010 Fenghua Yu <fenghua.yu@intel.com>
+ *
+ * Inspired from many hwmon drivers especially coretemp.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301 USA.
+ */
+
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/jiffies.h>
+#include <linux/hwmon.h>
+#include <linux/sysfs.h>
+#include <linux/hwmon-sysfs.h>
+#include <linux/err.h>
+#include <linux/mutex.h>
+#include <linux/list.h>
+#include <linux/platform_device.h>
+#include <linux/cpu.h>
+#include <linux/pci.h>
+#include <asm/msr.h>
+#include <asm/processor.h>
+
+#define DRVNAME	"pkgtemp"
+
+enum { SHOW_TEMP, SHOW_TJMAX, SHOW_TTARGET, SHOW_LABEL, SHOW_NAME } SHOW;
+
+/*
+ * Functions declaration
+ */
+
+static struct pkgtemp_data *pkgtemp_update_device(struct device *dev);
+
+struct pkgtemp_data {
+	struct device *hwmon_dev;
+	struct mutex update_lock;
+	const char *name;
+	u32 id;
+	u16 phys_proc_id;
+	char valid;		/* zero until following fields are valid */
+	unsigned long last_updated;	/* in jiffies */
+	int temp;
+	int tjmax;
+	int ttarget;
+	u8 alarm;
+};
+
+/*
+ * Sysfs stuff
+ */
+
+static ssize_t show_name(struct device *dev, struct device_attribute
+			  *devattr, char *buf)
+{
+	int ret;
+	struct sensor_device_attribute *attr = to_sensor_dev_attr(devattr);
+	struct pkgtemp_data *data = dev_get_drvdata(dev);
+
+	if (attr->index == SHOW_NAME)
+		ret = sprintf(buf, "%s\n", data->name);
+	else	/* show label */
+		ret = sprintf(buf, "Physical package id %d\n",
+			      data->phys_proc_id);
+	return ret;
+}
+
+static ssize_t show_alarm(struct device *dev, struct device_attribute
+			  *devattr, char *buf)
+{
+	struct pkgtemp_data *data = pkgtemp_update_device(dev);
+	/* read the Out-of-spec log, never clear */
+	return sprintf(buf, "%d\n", data->alarm);
+}
+
+static ssize_t show_temp(struct device *dev,
+			 struct device_attribute *devattr, char *buf)
+{
+	struct sensor_device_attribute *attr = to_sensor_dev_attr(devattr);
+	struct pkgtemp_data *data = pkgtemp_update_device(dev);
+	int err = 0;
+
+	if (attr->index == SHOW_TEMP)
+		err = data->valid ? sprintf(buf, "%d\n", data->temp) : -EAGAIN;
+	else if (attr->index == SHOW_TJMAX)
+		err = sprintf(buf, "%d\n", data->tjmax);
+	else
+		err = sprintf(buf, "%d\n", data->ttarget);
+	return err;
+}
+
+static SENSOR_DEVICE_ATTR(temp1_input, S_IRUGO, show_temp, NULL, SHOW_TEMP);
+static SENSOR_DEVICE_ATTR(temp1_crit, S_IRUGO, show_temp, NULL, SHOW_TJMAX);
+static SENSOR_DEVICE_ATTR(temp1_max, S_IRUGO, show_temp, NULL, SHOW_TTARGET);
+static DEVICE_ATTR(temp1_crit_alarm, S_IRUGO, show_alarm, NULL);
+static SENSOR_DEVICE_ATTR(temp1_label, S_IRUGO, show_name, NULL, SHOW_LABEL);
+static SENSOR_DEVICE_ATTR(name, S_IRUGO, show_name, NULL, SHOW_NAME);
+
+static struct attribute *pkgtemp_attributes[] = {
+	&sensor_dev_attr_name.dev_attr.attr,
+	&sensor_dev_attr_temp1_label.dev_attr.attr,
+	&dev_attr_temp1_crit_alarm.attr,
+	&sensor_dev_attr_temp1_input.dev_attr.attr,
+	&sensor_dev_attr_temp1_crit.dev_attr.attr,
+	NULL
+};
+
+static const struct attribute_group pkgtemp_group = {
+	.attrs = pkgtemp_attributes,
+};
+
+static struct pkgtemp_data *pkgtemp_update_device(struct device *dev)
+{
+	struct pkgtemp_data *data = dev_get_drvdata(dev);
+	unsigned int cpu;
+	int err;
+
+	mutex_lock(&data->update_lock);
+
+	if (!data->valid || time_after(jiffies, data->last_updated + HZ)) {
+		u32 eax, edx;
+
+		data->valid = 0;
+		cpu = data->id;
+		err = rdmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_STATUS,
+				   &eax, &edx);
+		if (!err) {
+			data->alarm = (eax >> 5) & 1;
+			data->temp = data->tjmax - (((eax >> 16)
+							& 0x7f) * 1000);
+			data->valid = 1;
+		} else
+			dev_dbg(dev, "Temperature data invalid (0x%x)\n", eax);
+
+		data->last_updated = jiffies;
+	}
+
+	mutex_unlock(&data->update_lock);
+	return data;
+}
+
+static int get_tjmax(int cpu, struct device *dev)
+{
+	int default_tjmax = 100000;
+	int err;
+	u32 eax, edx;
+	u32 val;
+
+	/* IA32_TEMPERATURE_TARGET contains the TjMax value */
+	err = rdmsr_safe_on_cpu(cpu, MSR_IA32_TEMPERATURE_TARGET, &eax, &edx);
+	if (!err) {
+		val = (eax >> 16) & 0xff;
+		if ((val > 80) && (val < 120)) {
+			dev_info(dev, "TjMax is %d C.\n", val);
+			return val * 1000;
+		}
+	}
+	dev_warn(dev, "Unable to read TjMax from CPU.\n");
+	return default_tjmax;
+}
+
+static int __devinit pkgtemp_probe(struct platform_device *pdev)
+{
+	struct pkgtemp_data *data;
+	int err;
+	u32 eax, edx;
+#ifdef CONFIG_SMP
+	struct cpuinfo_x86 *c = &cpu_data(pdev->id);
+#endif
+
+	data = kzalloc(sizeof(struct pkgtemp_data), GFP_KERNEL);
+	if (!data) {
+		err = -ENOMEM;
+		dev_err(&pdev->dev, "Out of memory\n");
+		goto exit;
+	}
+
+	data->id = pdev->id;
+#ifdef CONFIG_SMP
+	data->phys_proc_id = c->phys_proc_id;
+#endif
+	data->name = "pkgtemp";
+	mutex_init(&data->update_lock);
+
+	/* test if we can access the THERM_STATUS MSR */
+	err = rdmsr_safe_on_cpu(data->id, MSR_IA32_PACKAGE_THERM_STATUS,
+				&eax, &edx);
+	if (err) {
+		dev_err(&pdev->dev,
+			"Unable to access THERM_STATUS MSR, giving up\n");
+		goto exit_free;
+	}
+
+	data->tjmax = get_tjmax(data->id, &pdev->dev);
+	platform_set_drvdata(pdev, data);
+
+	err = rdmsr_safe_on_cpu(data->id, MSR_IA32_TEMPERATURE_TARGET,
+				&eax, &edx);
+	if (err) {
+		dev_warn(&pdev->dev, "Unable to read"
+				" IA32_TEMPERATURE_TARGET MSR\n");
+	} else {
+		data->ttarget = data->tjmax - (((eax >> 8) & 0xff) * 1000);
+		err = device_create_file(&pdev->dev,
+				&sensor_dev_attr_temp1_max.dev_attr);
+		if (err)
+			goto exit_free;
+	}
+
+	err = sysfs_create_group(&pdev->dev.kobj, &pkgtemp_group);
+	if (err)
+		goto exit_free;
+
+	data->hwmon_dev = hwmon_device_register(&pdev->dev);
+	if (IS_ERR(data->hwmon_dev)) {
+		err = PTR_ERR(data->hwmon_dev);
+		dev_err(&pdev->dev, "Class registration failed (%d)\n",
+			err);
+		goto exit_class;
+	}
+
+	return 0;
+
+exit_class:
+	sysfs_remove_group(&pdev->dev.kobj, &pkgtemp_group);
+exit_free:
+	kfree(data);
+exit:
+	return err;
+}
+
+static int __devexit pkgtemp_remove(struct platform_device *pdev)
+{
+	struct pkgtemp_data *data = platform_get_drvdata(pdev);
+
+	hwmon_device_unregister(data->hwmon_dev);
+	sysfs_remove_group(&pdev->dev.kobj, &pkgtemp_group);
+	platform_set_drvdata(pdev, NULL);
+	kfree(data);
+	return 0;
+}
+
+static struct platform_driver pkgtemp_driver = {
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = DRVNAME,
+	},
+	.probe = pkgtemp_probe,
+	.remove = __devexit_p(pkgtemp_remove),
+};
+
+struct pdev_entry {
+	struct list_head list;
+	struct platform_device *pdev;
+	unsigned int cpu;
+#ifdef CONFIG_SMP
+	u16 phys_proc_id;
+#endif
+};
+
+static LIST_HEAD(pdev_list);
+static DEFINE_MUTEX(pdev_list_mutex);
+
+static int __cpuinit pkgtemp_device_add(unsigned int cpu)
+{
+	int err;
+	struct platform_device *pdev;
+	struct pdev_entry *pdev_entry;
+#ifdef CONFIG_SMP
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+#endif
+
+	mutex_lock(&pdev_list_mutex);
+
+#ifdef CONFIG_SMP
+	/* Only keep the first entry in each package */
+	list_for_each_entry(pdev_entry, &pdev_list, list) {
+		if (c->phys_proc_id == pdev_entry->phys_proc_id) {
+			err = 0;        /* Not an error */
+			goto exit;
+		}
+	}
+#endif
+
+	pdev = platform_device_alloc(DRVNAME, cpu);
+	if (!pdev) {
+		err = -ENOMEM;
+		printk(KERN_ERR DRVNAME ": Device allocation failed\n");
+		goto exit;
+	}
+
+	pdev_entry = kzalloc(sizeof(struct pdev_entry), GFP_KERNEL);
+	if (!pdev_entry) {
+		err = -ENOMEM;
+		goto exit_device_put;
+	}
+
+	err = platform_device_add(pdev);
+	if (err) {
+		printk(KERN_ERR DRVNAME ": Device addition failed (%d)\n",
+		       err);
+		goto exit_device_free;
+	}
+
+#ifdef CONFIG_SMP
+	pdev_entry->phys_proc_id = c->phys_proc_id;
+#endif
+	pdev_entry->pdev = pdev;
+	pdev_entry->cpu = cpu;
+	list_add_tail(&pdev_entry->list, &pdev_list);
+	mutex_unlock(&pdev_list_mutex);
+
+	return 0;
+
+exit_device_free:
+	kfree(pdev_entry);
+exit_device_put:
+	platform_device_put(pdev);
+exit:
+	mutex_unlock(&pdev_list_mutex);
+	return err;
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static void pkgtemp_device_remove(unsigned int cpu)
+{
+	struct pdev_entry *p, *n;
+	unsigned int i;
+	int err;
+
+	mutex_lock(&pdev_list_mutex);
+	list_for_each_entry_safe(p, n, &pdev_list, list) {
+		if (p->cpu != cpu)
+			continue;
+
+		platform_device_unregister(p->pdev);
+		list_del(&p->list);
+		kfree(p);
+		for_each_cpu(i, cpu_core_mask(cpu)) {
+			if (i != cpu) {
+				err = pkgtemp_device_add(i);
+				if (!err)
+					break;
+			}
+		}
+		break;
+	}
+	mutex_unlock(&pdev_list_mutex);
+}
+
+static int __cpuinit pkgtemp_cpu_callback(struct notifier_block *nfb,
+				 unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (unsigned long) hcpu;
+
+	switch (action) {
+	case CPU_ONLINE:
+	case CPU_DOWN_FAILED:
+		pkgtemp_device_add(cpu);
+		break;
+	case CPU_DOWN_PREPARE:
+		pkgtemp_device_remove(cpu);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block pkgtemp_cpu_notifier __refdata = {
+	.notifier_call = pkgtemp_cpu_callback,
+};
+#endif				/* !CONFIG_HOTPLUG_CPU */
+
+static int __init pkgtemp_init(void)
+{
+	int i, err = -ENODEV;
+	struct pdev_entry *p, *n;
+
+	/* quick check if we run Intel */
+	if (cpu_data(0).x86_vendor != X86_VENDOR_INTEL)
+		goto exit;
+
+	err = platform_driver_register(&pkgtemp_driver);
+	if (err)
+		goto exit;
+
+	for_each_online_cpu(i) {
+		struct cpuinfo_x86 *c = &cpu_data(i);
+
+		if (!cpu_has(c, X86_FEATURE_PTS))
+			continue;
+
+		err = pkgtemp_device_add(i);
+		if (err)
+			goto exit_devices_unreg;
+	}
+	if (list_empty(&pdev_list)) {
+		err = -ENODEV;
+		goto exit_driver_unreg;
+	}
+
+#ifdef CONFIG_HOTPLUG_CPU
+	register_hotcpu_notifier(&pkgtemp_cpu_notifier);
+#endif
+	return 0;
+
+exit_devices_unreg:
+	mutex_lock(&pdev_list_mutex);
+	list_for_each_entry_safe(p, n, &pdev_list, list) {
+		platform_device_unregister(p->pdev);
+		list_del(&p->list);
+		kfree(p);
+	}
+	mutex_unlock(&pdev_list_mutex);
+exit_driver_unreg:
+	platform_driver_unregister(&pkgtemp_driver);
+exit:
+	return err;
+}
+
+static void __exit pkgtemp_exit(void)
+{
+	struct pdev_entry *p, *n;
+#ifdef CONFIG_HOTPLUG_CPU
+	unregister_hotcpu_notifier(&pkgtemp_cpu_notifier);
+#endif
+	mutex_lock(&pdev_list_mutex);
+	list_for_each_entry_safe(p, n, &pdev_list, list) {
+		platform_device_unregister(p->pdev);
+		list_del(&p->list);
+		kfree(p);
+	}
+	mutex_unlock(&pdev_list_mutex);
+	platform_driver_unregister(&pkgtemp_driver);
+}
+
+MODULE_AUTHOR("Fenghua Yu <fenghua.yu@intel.com>");
+MODULE_DESCRIPTION("Intel processor package temperature monitor");
+MODULE_LICENSE("GPL");
+
+module_init(pkgtemp_init)
+module_exit(pkgtemp_exit)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] Package Level Thermal Control and Power Limit Notification: thermal throttling
       [not found] ` <4C485DF1.5050407@linux.intel.com>
  2010-07-22 16:21   ` [PATCH 1/5] Package Level Thermal Control and Power Limit Notification: feature enabling Fenghua Yu
  2010-07-22 16:21   ` [PATCH 2/5] Package Level Thermal Control and Power Limit Notification: pkgtemp hwmon driver Fenghua Yu
@ 2010-07-22 16:22   ` Fenghua Yu
  2010-07-22 16:22   ` [PATCH 4/5] Package Level Thermal Control and Power Limit Notification: power limit notification Fenghua Yu
  2010-07-22 16:22   ` [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc Fenghua Yu
  4 siblings, 0 replies; 17+ messages in thread
From: Fenghua Yu @ 2010-07-22 16:22 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Brown, Len,
	Chen Gong, Jean Delvare, Guenter Roeck, Huaxu Wan
  Cc: lkml, lm-sensors

Add package level thermal throttle interrupt support. The interrupt handler
increases package level thermal throttle count. It also logs the event in MCE
log.

The package level thermal throttle interrupt happens across threads in a
package. Each thread handles the interrupt individually. User level application
is supposed to retrieve correct event count and log based on package/thread
topology. This is the same situation for core level interrupt handler.

core_throttle_count and package_throttle_count are used for user interface.
Previously only throttle_count is used for core throttle count. If you think
new core_throttle_count name breaks user interface, I can change this part.

This patch also fixes a bug which defines reverse THERM_INT_LOW_ENABLE bit and
THERM_INT_HIGH_ENABLE bit.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
---

 include/asm/msr-index.h         |   10 +++-
 kernel/cpu/mcheck/therm_throt.c |   89 +++++++++++++++++++++++++++++++---------
 2 files changed, 79 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5eaad6f..f1c1b82 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -224,8 +224,8 @@
 #define MSR_IA32_THERM_CONTROL		0x0000019a
 #define MSR_IA32_THERM_INTERRUPT	0x0000019b
 
-#define THERM_INT_LOW_ENABLE		(1 << 0)
-#define THERM_INT_HIGH_ENABLE		(1 << 1)
+#define THERM_INT_HIGH_ENABLE		(1 << 0)
+#define THERM_INT_LOW_ENABLE		(1 << 1)
 
 #define MSR_IA32_THERM_STATUS		0x0000019c
 
@@ -240,8 +240,14 @@
 #define MSR_IA32_TEMPERATURE_TARGET	0x000001a2
 
 #define MSR_IA32_PACKAGE_THERM_STATUS		0x000001b1
+
+#define PACKAGE_THERM_STATUS_PROCHOT		(1 << 0)
+
 #define MSR_IA32_PACKAGE_THERM_INTERRUPT	0x000001b2
 
+#define PACKAGE_THERM_INT_HIGH_ENABLE		(1 << 0)
+#define PACKAGE_THERM_INT_LOW_ENABLE		(1 << 1)
+
 /* MISC_ENABLE bits: architectural */
 #define MSR_IA32_MISC_ENABLE_FAST_STRING	(1ULL << 0)
 #define MSR_IA32_MISC_ENABLE_TCC		(1ULL << 1)
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index e1a0a3b..d307f9f 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -37,7 +37,7 @@
 /*
  * Current thermal throttling state:
  */
-struct thermal_state {
+struct _thermal_state {
 	bool			is_throttled;
 
 	u64			next_check;
@@ -45,6 +45,11 @@ struct thermal_state {
 	unsigned long		last_throttle_count;
 };
 
+struct thermal_state {
+	struct _thermal_state core;
+	struct _thermal_state package;
+};
+
 static DEFINE_PER_CPU(struct thermal_state, thermal_state);
 
 static atomic_t therm_throt_en	= ATOMIC_INIT(0);
@@ -53,11 +58,13 @@ static u32 lvtthmr_init __read_mostly;
 
 #ifdef CONFIG_SYSFS
 #define define_therm_throt_sysdev_one_ro(_name)				\
-	static SYSDEV_ATTR(_name, 0444, therm_throt_sysdev_show_##_name, NULL)
+	static SYSDEV_ATTR(_name, 0444,					\
+			   therm_throt_sysdev_show_##_name,		\
+				   NULL)				\
 
-#define define_therm_throt_sysdev_show_func(name)			\
+#define define_therm_throt_sysdev_show_func(level, name)		\
 									\
-static ssize_t therm_throt_sysdev_show_##name(				\
+static ssize_t therm_throt_sysdev_show_##level##_##name(		\
 			struct sys_device *dev,				\
 			struct sysdev_attribute *attr,			\
 			char *buf)					\
@@ -66,21 +73,24 @@ static ssize_t therm_throt_sysdev_show_##name(				\
 	ssize_t ret;							\
 									\
 	preempt_disable();	/* CPU hotplug */			\
-	if (cpu_online(cpu))						\
+	if (cpu_online(cpu)) {						\
 		ret = sprintf(buf, "%lu\n",				\
-			      per_cpu(thermal_state, cpu).name);	\
-	else								\
+			      per_cpu(thermal_state, cpu).level.name);	\
+	} else								\
 		ret = 0;						\
 	preempt_enable();						\
 									\
 	return ret;							\
 }
 
-define_therm_throt_sysdev_show_func(throttle_count);
-define_therm_throt_sysdev_one_ro(throttle_count);
+define_therm_throt_sysdev_show_func(core, throttle_count);
+define_therm_throt_sysdev_one_ro(core_throttle_count);
+
+define_therm_throt_sysdev_show_func(package, throttle_count);
+define_therm_throt_sysdev_one_ro(package_throttle_count);
 
 static struct attribute *thermal_throttle_attrs[] = {
-	&attr_throttle_count.attr,
+	&attr_core_throttle_count.attr,
 	NULL
 };
 
@@ -106,16 +116,21 @@ static struct attribute_group thermal_throttle_attr_group = {
  *          1 : Event should be logged further, and a message has been
  *              printed to the syslog.
  */
-static int therm_throt_process(bool is_throttled)
+#define CORE_LEVEL	0
+#define PACKAGE_LEVEL	1
+static int therm_throt_process(bool is_throttled, int level)
 {
-	struct thermal_state *state;
+	struct _thermal_state *state;
 	unsigned int this_cpu;
 	bool was_throttled;
 	u64 now;
 
 	this_cpu = smp_processor_id();
 	now = get_jiffies_64();
-	state = &per_cpu(thermal_state, this_cpu);
+	if (level == CORE_LEVEL)
+		state = &per_cpu(thermal_state, this_cpu).core;
+	else
+		state = &per_cpu(thermal_state, this_cpu).package;
 
 	was_throttled = state->is_throttled;
 	state->is_throttled = is_throttled;
@@ -132,13 +147,18 @@ static int therm_throt_process(bool is_throttled)
 
 	/* if we just entered the thermal event */
 	if (is_throttled) {
-		printk(KERN_CRIT "CPU%d: Temperature above threshold, cpu clock throttled (total events = %lu)\n", this_cpu, state->throttle_count);
+		printk(KERN_CRIT "CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
+		      this_cpu,
+		      level == CORE_LEVEL ? "Core" : "Package",
+		      state->throttle_count);
 
 		add_taint(TAINT_MACHINE_CHECK);
 		return 1;
 	}
 	if (was_throttled) {
-		printk(KERN_INFO "CPU%d: Temperature/speed normal\n", this_cpu);
+		printk(KERN_INFO "CPU%d: %s temperature/speed normal\n",
+		       this_cpu,
+		       level == CORE_LEVEL ? "Core" : "Package");
 		return 1;
 	}
 
@@ -149,8 +169,19 @@ static int therm_throt_process(bool is_throttled)
 /* Add/Remove thermal_throttle interface for CPU device: */
 static __cpuinit int thermal_throttle_add_dev(struct sys_device *sys_dev)
 {
-	return sysfs_create_group(&sys_dev->kobj,
-				  &thermal_throttle_attr_group);
+	int err;
+	struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+
+	err = sysfs_create_group(&sys_dev->kobj, &thermal_throttle_attr_group);
+	if (err)
+		return err;
+
+	if (cpu_has(c, X86_FEATURE_PTS))
+		err = sysfs_add_file_to_group(&sys_dev->kobj,
+					      &attr_package_throttle_count.attr,
+					      thermal_throttle_attr_group.name);
+
+	return err;
 }
 
 static __cpuinit void thermal_throttle_remove_dev(struct sys_device *sys_dev)
@@ -230,10 +261,25 @@ device_initcall(thermal_throttle_init_device);
 static void intel_thermal_interrupt(void)
 {
 	__u64 msr_val;
+	struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
 
 	rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
-	if (therm_throt_process((msr_val & THERM_STATUS_PROCHOT) != 0))
+	if (therm_throt_process(msr_val & THERM_STATUS_PROCHOT,
+				CORE_LEVEL) != 0)
 		mce_log_therm_throt_event(msr_val);
+
+	if (cpu_has(c, X86_FEATURE_PTS)) {
+		rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS, msr_val);
+		if (therm_throt_process(msr_val & PACKAGE_THERM_STATUS_PROCHOT,
+					PACKAGE_LEVEL) != 0)
+			/*
+			 * Set up the most significant bit to notify mce log
+			 * that this thermal event is a package level event.
+			 * This is a temp solution. May be changed in the future
+			 * with mce log infrasture.
+			 */
+			mce_log_therm_throt_event(((__u64)1 << 63) | msr_val);
+	}
 }
 
 static void unexpected_thermal_interrupt(void)
@@ -338,6 +384,13 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	wrmsr(MSR_IA32_THERM_INTERRUPT,
 		l | (THERM_INT_LOW_ENABLE | THERM_INT_HIGH_ENABLE), h);
 
+	if (cpu_has(c, X86_FEATURE_PTS)) {
+		rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
+		wrmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT,
+			l | (PACKAGE_THERM_INT_LOW_ENABLE
+		  | PACKAGE_THERM_INT_HIGH_ENABLE), h);
+	}
+
 	smp_thermal_vector = intel_thermal_interrupt;
 
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] Package Level Thermal Control and Power Limit Notification: power limit notification
       [not found] ` <4C485DF1.5050407@linux.intel.com>
                     ` (2 preceding siblings ...)
  2010-07-22 16:22   ` [PATCH 3/5] Package Level Thermal Control and Power Limit Notification: thermal throttling Fenghua Yu
@ 2010-07-22 16:22   ` Fenghua Yu
  2010-07-22 16:22   ` [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc Fenghua Yu
  4 siblings, 0 replies; 17+ messages in thread
From: Fenghua Yu @ 2010-07-22 16:22 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Brown, Len,
	Chen Gong, Jean Delvare, Guenter Roeck, Huaxu Wan
  Cc: lkml, lm-sensors

Power limit notification feature is published in Intel 64 and IA-32
Architectures SDMV Vol 3A 14.5.6 Power Limit Notification.

It is implemented first on Intel Sandy Bridge platform.

The patch handles notification interrupt. Interrupt handler dumps power limit
information in log_buf, logs the event in mce log, and increases the event
counters (core_power_limit and package_power_limit). Upper level applications
could use the data to detect system health or diagnose functionality/performance
issues.

In the future, the event could be handled in a more fancy way.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
---

 include/asm/msr-index.h         |    4 
 kernel/cpu/mcheck/therm_throt.c |  186 ++++++++++++++++++++++++++++------------
 2 files changed, 136 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index f1c1b82..f3c8a7a 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -226,10 +226,12 @@
 
 #define THERM_INT_HIGH_ENABLE		(1 << 0)
 #define THERM_INT_LOW_ENABLE		(1 << 1)
+#define THERM_INT_PLN_ENABLE		(1 << 24)
 
 #define MSR_IA32_THERM_STATUS		0x0000019c
 
 #define THERM_STATUS_PROCHOT		(1 << 0)
+#define THERM_STATUS_POWER_LIMIT	(1 << 10)
 
 #define MSR_THERM2_CTL			0x0000019d
 
@@ -242,11 +244,13 @@
 #define MSR_IA32_PACKAGE_THERM_STATUS		0x000001b1
 
 #define PACKAGE_THERM_STATUS_PROCHOT		(1 << 0)
+#define PACKAGE_THERM_STATUS_POWER_LIMIT	(1 << 10)
 
 #define MSR_IA32_PACKAGE_THERM_INTERRUPT	0x000001b2
 
 #define PACKAGE_THERM_INT_HIGH_ENABLE		(1 << 0)
 #define PACKAGE_THERM_INT_LOW_ENABLE		(1 << 1)
+#define PACKAGE_THERM_INT_PLN_ENABLE		(1 << 24)
 
 /* MISC_ENABLE bits: architectural */
 #define MSR_IA32_MISC_ENABLE_FAST_STRING	(1ULL << 0)
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index d307f9f..0c3d1e0 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -34,20 +34,25 @@
 /* How long to wait between reporting thermal events */
 #define CHECK_INTERVAL		(300 * HZ)
 
+#define THERMAL_THROTTLING_EVENT	0
+#define POWER_LIMIT_EVENT		1
+
 /*
- * Current thermal throttling state:
+ * Current thermal event state:
  */
 struct _thermal_state {
-	bool			is_throttled;
-
+	bool			new_event;
+	int			event;
 	u64			next_check;
-	unsigned long		throttle_count;
-	unsigned long		last_throttle_count;
+	unsigned long		count;
+	unsigned long		last_count;
 };
 
 struct thermal_state {
-	struct _thermal_state core;
-	struct _thermal_state package;
+	struct _thermal_state core_throttle;
+	struct _thermal_state core_power_limit;
+	struct _thermal_state package_throttle;
+	struct _thermal_state package_power_limit;
 };
 
 static DEFINE_PER_CPU(struct thermal_state, thermal_state);
@@ -62,9 +67,9 @@ static u32 lvtthmr_init __read_mostly;
 			   therm_throt_sysdev_show_##_name,		\
 				   NULL)				\
 
-#define define_therm_throt_sysdev_show_func(level, name)		\
+#define define_therm_throt_sysdev_show_func(event, name)		\
 									\
-static ssize_t therm_throt_sysdev_show_##level##_##name(		\
+static ssize_t therm_throt_sysdev_show_##event##_##name(		\
 			struct sys_device *dev,				\
 			struct sysdev_attribute *attr,			\
 			char *buf)					\
@@ -75,7 +80,7 @@ static ssize_t therm_throt_sysdev_show_##level##_##name(		\
 	preempt_disable();	/* CPU hotplug */			\
 	if (cpu_online(cpu)) {						\
 		ret = sprintf(buf, "%lu\n",				\
-			      per_cpu(thermal_state, cpu).level.name);	\
+			      per_cpu(thermal_state, cpu).event.name);	\
 	} else								\
 		ret = 0;						\
 	preempt_enable();						\
@@ -83,23 +88,32 @@ static ssize_t therm_throt_sysdev_show_##level##_##name(		\
 	return ret;							\
 }
 
-define_therm_throt_sysdev_show_func(core, throttle_count);
+define_therm_throt_sysdev_show_func(core_throttle, count);
 define_therm_throt_sysdev_one_ro(core_throttle_count);
 
-define_therm_throt_sysdev_show_func(package, throttle_count);
+define_therm_throt_sysdev_show_func(core_power_limit, count);
+define_therm_throt_sysdev_one_ro(core_power_limit_count);
+
+define_therm_throt_sysdev_show_func(package_throttle, count);
 define_therm_throt_sysdev_one_ro(package_throttle_count);
 
+define_therm_throt_sysdev_show_func(package_power_limit, count);
+define_therm_throt_sysdev_one_ro(package_power_limit_count);
+
 static struct attribute *thermal_throttle_attrs[] = {
 	&attr_core_throttle_count.attr,
 	NULL
 };
 
-static struct attribute_group thermal_throttle_attr_group = {
+static struct attribute_group thermal_attr_group = {
 	.attrs	= thermal_throttle_attrs,
 	.name	= "thermal_throttle"
 };
 #endif /* CONFIG_SYSFS */
 
+#define CORE_LEVEL	0
+#define PACKAGE_LEVEL	1
+
 /***
  * therm_throt_process - Process thermal throttling event from interrupt
  * @curr: Whether the condition is current or not (boolean), since the
@@ -116,49 +130,73 @@ static struct attribute_group thermal_throttle_attr_group = {
  *          1 : Event should be logged further, and a message has been
  *              printed to the syslog.
  */
-#define CORE_LEVEL	0
-#define PACKAGE_LEVEL	1
-static int therm_throt_process(bool is_throttled, int level)
+static int therm_throt_process(bool new_event, int event, int level)
 {
 	struct _thermal_state *state;
-	unsigned int this_cpu;
-	bool was_throttled;
+	unsigned int this_cpu = smp_processor_id();
+	bool old_event;
 	u64 now;
+	struct thermal_state *pstate = &per_cpu(thermal_state, this_cpu);
+
+	if (!new_event)
+		return 0;
 
-	this_cpu = smp_processor_id();
 	now = get_jiffies_64();
-	if (level == CORE_LEVEL)
-		state = &per_cpu(thermal_state, this_cpu).core;
-	else
-		state = &per_cpu(thermal_state, this_cpu).package;
+	if (level == CORE_LEVEL) {
+		if (event == THERMAL_THROTTLING_EVENT)
+			state = &pstate->core_throttle;
+		else if (event == POWER_LIMIT_EVENT)
+			state = &pstate->core_power_limit;
+		else
+			 return 0;
+	} else if (level == PACKAGE_LEVEL) {
+		if (event == THERMAL_THROTTLING_EVENT)
+			state = &pstate->package_throttle;
+		else if (event == POWER_LIMIT_EVENT)
+			state = &pstate->package_power_limit;
+		else
+			return 0;
+	} else
+		return 0;
 
-	was_throttled = state->is_throttled;
-	state->is_throttled = is_throttled;
+	old_event = state->new_event;
+	state->new_event = new_event;
 
-	if (is_throttled)
-		state->throttle_count++;
+	if (new_event)
+		state->count++;
 
 	if (time_before64(now, state->next_check) &&
-			state->throttle_count != state->last_throttle_count)
+			state->count != state->last_count)
 		return 0;
 
 	state->next_check = now + CHECK_INTERVAL;
-	state->last_throttle_count = state->throttle_count;
+	state->last_count = state->count;
 
 	/* if we just entered the thermal event */
-	if (is_throttled) {
-		printk(KERN_CRIT "CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
-		      this_cpu,
-		      level == CORE_LEVEL ? "Core" : "Package",
-		      state->throttle_count);
+	if (new_event) {
+		if (event == THERMAL_THROTTLING_EVENT)
+			printk(KERN_CRIT "CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
+				this_cpu,
+				level == CORE_LEVEL ? "Core" : "Package",
+				state->count);
+		else
+			printk(KERN_CRIT "CPU%d: %s power limit notification (total events = %lu)\n",
+				this_cpu,
+				level == CORE_LEVEL ? "Core" : "Package",
+				state->count);
 
 		add_taint(TAINT_MACHINE_CHECK);
 		return 1;
 	}
-	if (was_throttled) {
-		printk(KERN_INFO "CPU%d: %s temperature/speed normal\n",
-		       this_cpu,
-		       level == CORE_LEVEL ? "Core" : "Package");
+	if (old_event) {
+		if (event == THERMAL_THROTTLING_EVENT)
+			printk(KERN_INFO "CPU%d: %s temperature/speed normal\n",
+				this_cpu,
+				level == CORE_LEVEL ? "Core" : "Package");
+		else
+			printk(KERN_INFO "CPU%d: %s power limit normal\n",
+				this_cpu,
+				level == CORE_LEVEL ? "Core" : "Package");
 		return 1;
 	}
 
@@ -172,21 +210,29 @@ static __cpuinit int thermal_throttle_add_dev(struct sys_device *sys_dev)
 	int err;
 	struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
 
-	err = sysfs_create_group(&sys_dev->kobj, &thermal_throttle_attr_group);
+	err = sysfs_create_group(&sys_dev->kobj, &thermal_attr_group);
 	if (err)
 		return err;
 
+	if (cpu_has(c, X86_FEATURE_PLN))
+		err = sysfs_add_file_to_group(&sys_dev->kobj,
+					      &attr_core_power_limit_count.attr,
+					      thermal_attr_group.name);
 	if (cpu_has(c, X86_FEATURE_PTS))
 		err = sysfs_add_file_to_group(&sys_dev->kobj,
 					      &attr_package_throttle_count.attr,
-					      thermal_throttle_attr_group.name);
+					      thermal_attr_group.name);
+		if (cpu_has(c, X86_FEATURE_PLN))
+			err = sysfs_add_file_to_group(&sys_dev->kobj,
+					&attr_package_power_limit_count.attr,
+					thermal_attr_group.name);
 
 	return err;
 }
 
 static __cpuinit void thermal_throttle_remove_dev(struct sys_device *sys_dev)
 {
-	sysfs_remove_group(&sys_dev->kobj, &thermal_throttle_attr_group);
+	sysfs_remove_group(&sys_dev->kobj, &thermal_attr_group);
 }
 
 /* Mutex protecting device creation against CPU hotplug: */
@@ -257,6 +303,17 @@ device_initcall(thermal_throttle_init_device);
 
 #endif /* CONFIG_SYSFS */
 
+/*
+ * Set up the most two significant bit to notify mce log that this thermal
+ * event type.
+ * This is a temp solution. May be changed in the future with mce log
+ * infrasture.
+ */
+#define CORE_THROTTLED		(0)
+#define CORE_POWER_LIMIT	((__u64)1 << 62)
+#define PACKAGE_THROTTLED	((__u64)2 << 62)
+#define PACKAGE_POWER_LIMIT	((__u64)3 << 62)
+
 /* Thermal transition interrupt handler */
 static void intel_thermal_interrupt(void)
 {
@@ -264,21 +321,31 @@ static void intel_thermal_interrupt(void)
 	struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
 
 	rdmsrl(MSR_IA32_THERM_STATUS, msr_val);
+
 	if (therm_throt_process(msr_val & THERM_STATUS_PROCHOT,
+				THERMAL_THROTTLING_EVENT,
 				CORE_LEVEL) != 0)
-		mce_log_therm_throt_event(msr_val);
+		mce_log_therm_throt_event(CORE_THROTTLED | msr_val);
+
+	if (cpu_has(c, X86_FEATURE_PLN))
+		if (therm_throt_process(msr_val & THERM_STATUS_POWER_LIMIT,
+					POWER_LIMIT_EVENT,
+					CORE_LEVEL) != 0)
+			mce_log_therm_throt_event(CORE_POWER_LIMIT | msr_val);
 
 	if (cpu_has(c, X86_FEATURE_PTS)) {
 		rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS, msr_val);
 		if (therm_throt_process(msr_val & PACKAGE_THERM_STATUS_PROCHOT,
+					THERMAL_THROTTLING_EVENT,
 					PACKAGE_LEVEL) != 0)
-			/*
-			 * Set up the most significant bit to notify mce log
-			 * that this thermal event is a package level event.
-			 * This is a temp solution. May be changed in the future
-			 * with mce log infrasture.
-			 */
-			mce_log_therm_throt_event(((__u64)1 << 63) | msr_val);
+			mce_log_therm_throt_event(PACKAGE_THROTTLED | msr_val);
+		if (cpu_has(c, X86_FEATURE_PLN))
+			if (therm_throt_process(msr_val &
+					PACKAGE_THERM_STATUS_POWER_LIMIT,
+					POWER_LIMIT_EVENT,
+					PACKAGE_LEVEL) != 0)
+				mce_log_therm_throt_event(PACKAGE_POWER_LIMIT
+							  | msr_val);
 	}
 }
 
@@ -381,14 +448,25 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	apic_write(APIC_LVTTHMR, h);
 
 	rdmsr(MSR_IA32_THERM_INTERRUPT, l, h);
-	wrmsr(MSR_IA32_THERM_INTERRUPT,
-		l | (THERM_INT_LOW_ENABLE | THERM_INT_HIGH_ENABLE), h);
+	if (cpu_has(c, X86_FEATURE_PLN))
+		wrmsr(MSR_IA32_THERM_INTERRUPT,
+		      l | (THERM_INT_LOW_ENABLE
+			| THERM_INT_HIGH_ENABLE | THERM_INT_PLN_ENABLE), h);
+	else
+		wrmsr(MSR_IA32_THERM_INTERRUPT,
+		      l | (THERM_INT_LOW_ENABLE | THERM_INT_HIGH_ENABLE), h);
 
 	if (cpu_has(c, X86_FEATURE_PTS)) {
 		rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
-		wrmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT,
-			l | (PACKAGE_THERM_INT_LOW_ENABLE
-		  | PACKAGE_THERM_INT_HIGH_ENABLE), h);
+		if (cpu_has(c, X86_FEATURE_PLN))
+			wrmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT,
+			      l | (PACKAGE_THERM_INT_LOW_ENABLE
+				| PACKAGE_THERM_INT_HIGH_ENABLE
+				| PACKAGE_THERM_INT_PLN_ENABLE), h);
+		else
+			wrmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT,
+			      l | (PACKAGE_THERM_INT_LOW_ENABLE
+				| PACKAGE_THERM_INT_HIGH_ENABLE), h);
 	}
 
 	smp_thermal_vector = intel_thermal_interrupt;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
       [not found] ` <4C485DF1.5050407@linux.intel.com>
                     ` (3 preceding siblings ...)
  2010-07-22 16:22   ` [PATCH 4/5] Package Level Thermal Control and Power Limit Notification: power limit notification Fenghua Yu
@ 2010-07-22 16:22   ` Fenghua Yu
  2010-07-22 17:27     ` Guenter Roeck
  4 siblings, 1 reply; 17+ messages in thread
From: Fenghua Yu @ 2010-07-22 16:22 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Brown, Len,
	Chen Gong, Jean Delvare, Guenter Roeck, Huaxu Wan
  Cc: lkml, lm-sensors


Document for package level thermal hwmon driver.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
---

 pkgtemp |   36 ++++++++++++++++++++++++++++++++++++
 1 files changed, 36 insertions(+)

diff --git a/Documentation/hwmon/pkgtemp b/Documentation/hwmon/pkgtemp
new file mode 100644
index 0000000..a60d286
--- /dev/null
+++ b/Documentation/hwmon/pkgtemp
@@ -0,0 +1,36 @@
+Kernel driver pkgtemp
+======================
+
+Supported chips:
+  * Intel family
+    Prefix: 'pkgtemp'
+    CPUID:
+    Datasheet: Intel 64 and IA-32 Architectures Software Developer's Manual
+               Volume 3A: System Programming Guide
+
+Author: Fenghua Yu
+
+Description
+-----------
+
+This driver permits reading package level temperature sensor embedded inside
+Intel CPU package. The sensors can be in core, uncore, memroy controller, or
+other componenets in a package. The feature is first implemented in Intel Sandy
+Bridge platform.
+
+Temperature is measured in degrees Celsius and measurement resolution is
+1 degree C. Valid temperatures are from 0 to TjMax degrees C, because the actual
+value of temperature register is in fact a delta from TjMax.
+
+Temperature known as TjMax is the maximum junction temperature of package.
+Intel defines this temperature as 125C. At this temperature, protection
+mechanism will perform actions to forcibly cool down the processor. Alarm
+may be raised, if the temperature grows enough (more than TjMax) to trigger
+the Out-Of-Spec bit. Following table summarizes the exported sysfs files:
+
+temp1_input	 - Package temperature (in millidegrees Celsius).
+temp1_crit	 - Maximum junction temperature (in millidegrees Celsius).
+temp1_crit_alarm - Set when Out-of-spec bit is set, never clears.
+		   Correct CPU operation is no longer guaranteed.
+temp1_label	 - Contains string "Pysical package id X", where X is physical
+		   package id.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-07-22 16:22   ` [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc Fenghua Yu
@ 2010-07-22 17:27     ` Guenter Roeck
  2010-07-22 17:52       ` Fenghua Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Guenter Roeck @ 2010-07-22 17:27 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Brown, Len,
	Chen Gong, Jean Delvare, Huaxu Wan, lkml,
	lm-sensors@lm-sensors.org

On Thu, Jul 22, 2010 at 12:22:23PM -0400, Fenghua Yu wrote:
> 
> Document for package level thermal hwmon driver.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Len Brown <len.brown@intel.com>
> ---
> 
>  pkgtemp |   36 ++++++++++++++++++++++++++++++++++++
>  1 files changed, 36 insertions(+)
> 
> diff --git a/Documentation/hwmon/pkgtemp b/Documentation/hwmon/pkgtemp
> new file mode 100644
> index 0000000..a60d286
> --- /dev/null
> +++ b/Documentation/hwmon/pkgtemp
> @@ -0,0 +1,36 @@
> +Kernel driver pkgtemp
> +======================
> +
> +Supported chips:
> +  * Intel family
> +    Prefix: 'pkgtemp'
> +    CPUID:
> +    Datasheet: Intel 64 and IA-32 Architectures Software Developer's Manual
> +               Volume 3A: System Programming Guide
> +
> +Author: Fenghua Yu
> +
> +Description
> +-----------
> +
> +This driver permits reading package level temperature sensor embedded inside
> +Intel CPU package. The sensors can be in core, uncore, memroy controller, or

memroy --> memory

> +other componenets in a package. The feature is first implemented in Intel Sandy

componenets --> components

> +Bridge platform.
> +
Just for clarification - you mention a number of sensors, but unless
I am missing something only the package sensor is implemented. Is that correct ?

> +Temperature is measured in degrees Celsius and measurement resolution is
> +1 degree C. Valid temperatures are from 0 to TjMax degrees C, because the actual
> +value of temperature register is in fact a delta from TjMax.
> +
>From the code, it seems that negative values can be reported. Is it guaranteed 
by the chip that (TjMax - MSR_IA32_TEMPERATURE_TARGET) >= 0 ? Otherwise, the
minimum temperature would be (TjMax - 127).

> +Temperature known as TjMax is the maximum junction temperature of package.
> +Intel defines this temperature as 125C. At this temperature, protection

Your driver bails out at TjMax >= 120, so there is some inconsistency.
Also, it seems that this is not a constant, since you are reading
MSR_IA32_TEMPERATURE_TARGET to get the value.

Since the CPUs supporting the package sensor presumably also all support
reading TjMax, maybe you can reword the above text to reflect this.

> +mechanism will perform actions to forcibly cool down the processor. Alarm
> +may be raised, if the temperature grows enough (more than TjMax) to trigger
> +the Out-Of-Spec bit. Following table summarizes the exported sysfs files:
> +
> +temp1_input	 - Package temperature (in millidegrees Celsius).
> +temp1_crit	 - Maximum junction temperature (in millidegrees Celsius).
> +temp1_crit_alarm - Set when Out-of-spec bit is set, never clears.
> +		   Correct CPU operation is no longer guaranteed.
> +temp1_label	 - Contains string "Pysical package id X", where X is physical

Pysical --> Physical.
I would suggest to drop the "id" for consistency. Core sensor names don't
include "id" either.

I am not sure if "physical" should be included in the first place.
Also, above description suggests that future CPUs might add more sensors.
If so, the name should probably reflect the location of the current sensor,
ie be something like "Package core X" or "Core package X" or "Package X (core)".

> +		   package id.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-07-22 17:27     ` Guenter Roeck
@ 2010-07-22 17:52       ` Fenghua Yu
  2010-07-22 18:58         ` Guenter Roeck
  0 siblings, 1 reply; 17+ messages in thread
From: Fenghua Yu @ 2010-07-22 17:52 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Yu, Fenghua, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Brown, Len, Chen Gong, Jean Delvare, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

> > +This driver permits reading package level temperature sensor embedded inside
> > +Intel CPU package. The sensors can be in core, uncore, memroy controller, or
> 
> memroy --> memory
Will change the typo.

> > +other componenets in a package. The feature is first implemented in Intel Sandy
> 
> componenets --> components
> 
Ditto.

> > +Bridge platform.
> > +
> Just for clarification - you mention a number of sensors, but unless
> I am missing something only the package sensor is implemented. Is that correct ?
Do you mean "only the core sensor"? I think the Sandy Bridge only implements
core sensor now. I would think in the future more sensors could be implemented.
If only core sensors are implemented, there is no need to have the package
level thermal hw MSRs because OS can get all package level thermal info by
enumerating all core sensors in a package.

> 
> > +Temperature is measured in degrees Celsius and measurement resolution is
> > +1 degree C. Valid temperatures are from 0 to TjMax degrees C, because the actual
> > +value of temperature register is in fact a delta from TjMax.
> > +
> From the code, it seems that negative values can be reported. Is it guaranteed 
> by the chip that (TjMax - MSR_IA32_TEMPERATURE_TARGET) >= 0 ? Otherwise, the
> minimum temperature would be (TjMax - 127).
> 
> > +Temperature known as TjMax is the maximum junction temperature of package.
> > +Intel defines this temperature as 125C. At this temperature, protection
> 
> Your driver bails out at TjMax >= 120, so there is some inconsistency.
> Also, it seems that this is not a constant, since you are reading
> MSR_IA32_TEMPERATURE_TARGET to get the value.
> 
> Since the CPUs supporting the package sensor presumably also all support
> reading TjMax, maybe you can reword the above text to reflect this.
> 
You are right. I forgot to update this part. I'll reword it.

> > +mechanism will perform actions to forcibly cool down the processor. Alarm
> > +may be raised, if the temperature grows enough (more than TjMax) to trigger
> > +the Out-Of-Spec bit. Following table summarizes the exported sysfs files:
> > +
> > +temp1_input	 - Package temperature (in millidegrees Celsius).
> > +temp1_crit	 - Maximum junction temperature (in millidegrees Celsius).
> > +temp1_crit_alarm - Set when Out-of-spec bit is set, never clears.
> > +		   Correct CPU operation is no longer guaranteed.
> > +temp1_label	 - Contains string "Pysical package id X", where X is physical
> 
> Pysical --> Physical.
> I would suggest to drop the "id" for consistency. Core sensor names don't
> include "id" either.
> 
> I am not sure if "physical" should be included in the first place.
> Also, above description suggests that future CPUs might add more sensors.
> If so, the name should probably reflect the location of the current sensor,
> ie be something like "Package core X" or "Core package X" or "Package X (core)".
How about just using "package X" to align with coretemp for now? Currently there
is no interface to tell which sensor it is.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-07-22 17:52       ` Fenghua Yu
@ 2010-07-22 18:58         ` Guenter Roeck
  2010-07-22 21:21           ` Fenghua Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Guenter Roeck @ 2010-07-22 18:58 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Brown, Len,
	Chen Gong, Jean Delvare, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

On Thu, Jul 22, 2010 at 01:52:25PM -0400, Fenghua Yu wrote:
> > > +This driver permits reading package level temperature sensor embedded inside
> > > +Intel CPU package. The sensors can be in core, uncore, memroy controller, or
> > 
> > memroy --> memory
> Will change the typo.
> 
> > > +other componenets in a package. The feature is first implemented in Intel Sandy
> > 
> > componenets --> components
> > 
> Ditto.
> 
> > > +Bridge platform.
> > > +
> > Just for clarification - you mention a number of sensors, but unless
> > I am missing something only the package sensor is implemented. Is that correct ?
> Do you mean "only the core sensor"? I think the Sandy Bridge only implements
> core sensor now. I would think in the future more sensors could be implemented.
> If only core sensors are implemented, there is no need to have the package
> level thermal hw MSRs because OS can get all package level thermal info by
> enumerating all core sensors in a package.
> 
Now you lost me. I understand there are the per-core sensors, read through coretemp,
plus one package level sensor. Do you associate this existing package level sensor
with any of the locations mentioned above (core, uncore, memory, other), or is
that unknown ? If it is unknown, you might want to mention it.

> > 
> > > +Temperature is measured in degrees Celsius and measurement resolution is
> > > +1 degree C. Valid temperatures are from 0 to TjMax degrees C, because the actual
> > > +value of temperature register is in fact a delta from TjMax.
> > > +
> > From the code, it seems that negative values can be reported. Is it guaranteed 
> > by the chip that (TjMax - MSR_IA32_TEMPERATURE_TARGET) >= 0 ? Otherwise, the
> > minimum temperature would be (TjMax - 127).
> > 
> > > +Temperature known as TjMax is the maximum junction temperature of package.
> > > +Intel defines this temperature as 125C. At this temperature, protection
> > 
> > Your driver bails out at TjMax >= 120, so there is some inconsistency.
> > Also, it seems that this is not a constant, since you are reading
> > MSR_IA32_TEMPERATURE_TARGET to get the value.
> > 
> > Since the CPUs supporting the package sensor presumably also all support
> > reading TjMax, maybe you can reword the above text to reflect this.
> > 
> You are right. I forgot to update this part. I'll reword it.
> 
> > > +mechanism will perform actions to forcibly cool down the processor. Alarm
> > > +may be raised, if the temperature grows enough (more than TjMax) to trigger
> > > +the Out-Of-Spec bit. Following table summarizes the exported sysfs files:
> > > +
> > > +temp1_input	 - Package temperature (in millidegrees Celsius).
> > > +temp1_crit	 - Maximum junction temperature (in millidegrees Celsius).
> > > +temp1_crit_alarm - Set when Out-of-spec bit is set, never clears.
> > > +		   Correct CPU operation is no longer guaranteed.
> > > +temp1_label	 - Contains string "Pysical package id X", where X is physical
> > 
> > Pysical --> Physical.
> > I would suggest to drop the "id" for consistency. Core sensor names don't
> > include "id" either.
> > 
> > I am not sure if "physical" should be included in the first place.
> > Also, above description suggests that future CPUs might add more sensors.
> > If so, the name should probably reflect the location of the current sensor,
> > ie be something like "Package core X" or "Core package X" or "Package X (core)".
> How about just using "package X" to align with coretemp for now? Currently there
> is no interface to tell which sensor it is.
> 
Fine with me. Maybe wait for feedback from others.

You use the argument that there may be other package level sensors in the future.
Are there any plans for this, or is this just a theory ?

Next question is how to handle future sensor types. One hwmon instance per sensor,
additional sensors in this driver, or even a new driver ?

We had was a separate discussion if the coretemp driver should be redesigned
to one instance per CPU. The package sensor would fit into that model,
since you would have

coretemp-isa-0000
Core0
Core1
...
CoreN
Package

coretemp-isa-0001
Core0
Core1
...
CoreM
Package

I personally would prefer that approach. It would avoid ambiguity associating Package X
with specific cores, and it would also easily expand to additional non-core future sensors.

Guenter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-07-22 18:58         ` Guenter Roeck
@ 2010-07-22 21:21           ` Fenghua Yu
  2010-08-19 15:46             ` Jean Delvare
  0 siblings, 1 reply; 17+ messages in thread
From: Fenghua Yu @ 2010-07-22 21:21 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Yu, Fenghua, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Brown, Len, Chen Gong, Jean Delvare, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

On Thu, Jul 22, 2010 at 11:58:14AM -0700, Guenter Roeck wrote:
> On Thu, Jul 22, 2010 at 01:52:25PM -0400, Fenghua Yu wrote:
> > > Just for clarification - you mention a number of sensors, but unless
> > > I am missing something only the package sensor is implemented. Is that correct ?
> > Do you mean "only the core sensor"? I think the Sandy Bridge only implements
> > core sensor now. I would think in the future more sensors could be implemented.
> > If only core sensors are implemented, there is no need to have the package
> > level thermal hw MSRs because OS can get all package level thermal info by
> > enumerating all core sensors in a package.
> > 
> Now you lost me. I understand there are the per-core sensors, read through coretemp,
> plus one package level sensor. Do you associate this existing package level sensor
> with any of the locations mentioned above (core, uncore, memory, other), or is
> that unknown ? If it is unknown, you might want to mention it.

There are more than one package level sensor. The package level MSRs take into
account the maximum across the whole package, which encompasses all the cores
and other integrated components like memory controller, uncore, graphics, etc.
Sandy Bridge implements not only core sensors but also other sensors. I take
back what I said that Sandy Bridge only implements core sensors.


> 
> > > 
> > > > +Temperature is measured in degrees Celsius and measurement resolution is
> > > > +1 degree C. Valid temperatures are from 0 to TjMax degrees C, because the actual
> > > > +value of temperature register is in fact a delta from TjMax.
> > > > +
> > > From the code, it seems that negative values can be reported. Is it guaranteed 
> > > by the chip that (TjMax - MSR_IA32_TEMPERATURE_TARGET) >= 0 ? Otherwise, the
> > > minimum temperature would be (TjMax - 127).
> > > 
> > > > +Temperature known as TjMax is the maximum junction temperature of package.
> > > > +Intel defines this temperature as 125C. At this temperature, protection
> > > 
> > > Your driver bails out at TjMax >= 120, so there is some inconsistency.
> > > Also, it seems that this is not a constant, since you are reading
> > > MSR_IA32_TEMPERATURE_TARGET to get the value.
> > > 
> > > Since the CPUs supporting the package sensor presumably also all support
> > > reading TjMax, maybe you can reword the above text to reflect this.
> > > 
> > You are right. I forgot to update this part. I'll reword it.
> > 
> > > > +mechanism will perform actions to forcibly cool down the processor. Alarm
> > > > +may be raised, if the temperature grows enough (more than TjMax) to trigger
> > > > +the Out-Of-Spec bit. Following table summarizes the exported sysfs files:
> > > > +
> > > > +temp1_input	 - Package temperature (in millidegrees Celsius).
> > > > +temp1_crit	 - Maximum junction temperature (in millidegrees Celsius).
> > > > +temp1_crit_alarm - Set when Out-of-spec bit is set, never clears.
> > > > +		   Correct CPU operation is no longer guaranteed.
> > > > +temp1_label	 - Contains string "Pysical package id X", where X is physical
> > > 
> > > Pysical --> Physical.
> > > I would suggest to drop the "id" for consistency. Core sensor names don't
> > > include "id" either.
> > > 
> > > I am not sure if "physical" should be included in the first place.
> > > Also, above description suggests that future CPUs might add more sensors.
> > > If so, the name should probably reflect the location of the current sensor,
> > > ie be something like "Package core X" or "Core package X" or "Package X (core)".
> > How about just using "package X" to align with coretemp for now? Currently there
> > is no interface to tell which sensor it is.
> > 
> Fine with me. Maybe wait for feedback from others.
> 
> You use the argument that there may be other package level sensors in the future.
> Are there any plans for this, or is this just a theory ?
> 

Not just a theory. Sandy Bridge already implements other package level sensors.
If really need to know exactly which sensors are implemented, we might go
through a channel before releasing the info.

> Next question is how to handle future sensor types. One hwmon instance per sensor,
> additional sensors in this driver, or even a new driver ?

Currently package level thermal just reports the maximum temperature across
the package. Which sensor is reporting the highest temperature is not exposed.

> 
> We had was a separate discussion if the coretemp driver should be redesigned
> to one instance per CPU. The package sensor would fit into that model,
> since you would have
> 
> coretemp-isa-0000
> Core0
> Core1
> ...
> CoreN
> Package
> 
> coretemp-isa-0001
> Core0
> Core1
> ...
> CoreM
> Package
> 
> I personally would prefer that approach. It would avoid ambiguity associating Package X
> with specific cores, and it would also easily expand to additional non-core future sensors.

Package X shows Physical id which is unique in platform topology and can be
got from cpuinfo. Package X doesn't have that problem, right?

Maybe instead of showing "Package X", pkgtemp may show "Physical id" just like
what cpuinfo shows?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-07-22 21:21           ` Fenghua Yu
@ 2010-08-19 15:46             ` Jean Delvare
  2010-08-19 16:27               ` Guenter Roeck
  0 siblings, 1 reply; 17+ messages in thread
From: Jean Delvare @ 2010-08-19 15:46 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Guenter Roeck, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Brown, Len, Chen Gong, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

Hi Fenghua, Guenter,

Sorry for joining the discussion a little late, I was on vacation when
it happened. I'll comment now, it's probably "too late" as the patch
set was merged meanwhile, but still...

On Thu, 22 Jul 2010 14:21:11 -0700, Fenghua Yu wrote:
> On Thu, Jul 22, 2010 at 11:58:14AM -0700, Guenter Roeck wrote:
> > You use the argument that there may be other package level sensors in the future.
> > Are there any plans for this, or is this just a theory ?
> 
> Not just a theory. Sandy Bridge already implements other package level sensors.
> If really need to know exactly which sensors are implemented, we might go
> through a channel before releasing the info.
> 
> > Next question is how to handle future sensor types. One hwmon instance per sensor,
> > additional sensors in this driver, or even a new driver ?
> 
> Currently package level thermal just reports the maximum temperature across
> the package. Which sensor is reporting the highest temperature is not exposed.

So this isn't a real physical sensor, but more of a meta-sensor? If
this is a case, then we don't need support for this at all. User-space
can compute a maximum by itself, we don't need a dedicated kernel
driver for that.

> > We had was a separate discussion if the coretemp driver should be redesigned
> > to one instance per CPU. The package sensor would fit into that model,
> > since you would have
> > 
> > coretemp-isa-0000
> > Core0
> > Core1
> > ...
> > CoreN
> > Package
> > 
> > coretemp-isa-0001
> > Core0
> > Core1
> > ...
> > CoreM
> > Package
> > 
> > I personally would prefer that approach. It would avoid ambiguity associating Package X
> > with specific cores, and it would also easily expand to additional non-core future sensors.

For the records, I totally support this approach. I want the coretemp
driver to be updated to present a single hwmon device per CPU, no
matter what happens to the "package temperature".

> Package X shows Physical id which is unique in platform topology and can be
> got from cpuinfo. Package X doesn't have that problem, right?
> 
> Maybe instead of showing "Package X", pkgtemp may show "Physical id" just like
> what cpuinfo shows?

What Guenter meant IMHO has nothing to do with labelling or numbering.
It has to do with presenting related temperature values (that belong to
the same physical CPU) as a single hwmon entry. This is definitely the
most obvious way to present a group of related temperatures to the user.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-08-19 15:46             ` Jean Delvare
@ 2010-08-19 16:27               ` Guenter Roeck
  2010-08-19 20:51                 ` Fenghua Yu
  0 siblings, 1 reply; 17+ messages in thread
From: Guenter Roeck @ 2010-08-19 16:27 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Fenghua Yu, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Brown, Len, Chen Gong, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

On Thu, Aug 19, 2010 at 11:46:32AM -0400, Jean Delvare wrote:
> Hi Fenghua, Guenter,
> 
> Sorry for joining the discussion a little late, I was on vacation when
> it happened. I'll comment now, it's probably "too late" as the patch
> set was merged meanwhile, but still...
> 
There was no discussion at all, unfortunately.

> On Thu, 22 Jul 2010 14:21:11 -0700, Fenghua Yu wrote:
> > On Thu, Jul 22, 2010 at 11:58:14AM -0700, Guenter Roeck wrote:
> > > You use the argument that there may be other package level sensors in the future.
> > > Are there any plans for this, or is this just a theory ?
> > 
> > Not just a theory. Sandy Bridge already implements other package level sensors.
> > If really need to know exactly which sensors are implemented, we might go
> > through a channel before releasing the info.
> > 
> > > Next question is how to handle future sensor types. One hwmon instance per sensor,
> > > additional sensors in this driver, or even a new driver ?
> > 
> > Currently package level thermal just reports the maximum temperature across
> > the package. Which sensor is reporting the highest temperature is not exposed.
> 
> So this isn't a real physical sensor, but more of a meta-sensor? If
> this is a case, then we don't need support for this at all. User-space
> can compute a maximum by itself, we don't need a dedicated kernel
> driver for that.
> 
> > > We had was a separate discussion if the coretemp driver should be redesigned
> > > to one instance per CPU. The package sensor would fit into that model,
> > > since you would have
> > > 
> > > coretemp-isa-0000
> > > Core0
> > > Core1
> > > ...
> > > CoreN
> > > Package
> > > 
> > > coretemp-isa-0001
> > > Core0
> > > Core1
> > > ...
> > > CoreM
> > > Package
> > > 
> > > I personally would prefer that approach. It would avoid ambiguity associating Package X
> > > with specific cores, and it would also easily expand to additional non-core future sensors.
> 
> For the records, I totally support this approach. I want the coretemp
> driver to be updated to present a single hwmon device per CPU, no
> matter what happens to the "package temperature".
> 
I might spend some time rewriting the coretemp driver as described above,
unless someone else picks it up, and unless there is opposition. 
Obviously, that won't include the package sensor since there is now
a separate driver for it.

Guenter

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-08-19 16:27               ` Guenter Roeck
@ 2010-08-19 20:51                 ` Fenghua Yu
  2010-08-19 21:06                   ` Guenter Roeck
                                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Fenghua Yu @ 2010-08-19 20:51 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jean Delvare, Yu, Fenghua, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Brown, Len, Chen Gong, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

On Thu, Aug 19, 2010 at 09:27:19AM -0700, Guenter Roeck wrote:
> On Thu, Aug 19, 2010 at 11:46:32AM -0400, Jean Delvare wrote:
> > Hi Fenghua, Guenter,
> > 
> > Sorry for joining the discussion a little late, I was on vacation when
> > it happened. I'll comment now, it's probably "too late" as the patch
> > set was merged meanwhile, but still...
> > 
> There was no discussion at all, unfortunately.
> 
> > On Thu, 22 Jul 2010 14:21:11 -0700, Fenghua Yu wrote:
> > > On Thu, Jul 22, 2010 at 11:58:14AM -0700, Guenter Roeck wrote:
> > > > You use the argument that there may be other package level sensors in the future.
> > > > Are there any plans for this, or is this just a theory ?
> > > 
> > > Not just a theory. Sandy Bridge already implements other package level sensors.
> > > If really need to know exactly which sensors are implemented, we might go
> > > through a channel before releasing the info.
> > > 
> > > > Next question is how to handle future sensor types. One hwmon instance per sensor,
> > > > additional sensors in this driver, or even a new driver ?
> > > 
> > > Currently package level thermal just reports the maximum temperature across
> > > the package. Which sensor is reporting the highest temperature is not exposed.
> > 
> > So this isn't a real physical sensor, but more of a meta-sensor? If
> > this is a case, then we don't need support for this at all. User-space
> > can compute a maximum by itself, we don't need a dedicated kernel
> > driver for that.
> > 
The pkgtemp reports thermal status for a set of sensors in a package. Please
note the sensors in a package are not limited to processor sensors which are
handled by coretemp. The sensors in a package also include gfx sensors, cache
sensors, memory controllor sensors which are not handled by any hwmon drivers.

OS gets maximum package thermal status only from MSR PACKAGE_THERM_STATUS.
There is no detailed thermal info for each sensor in a package. User-space
can't compute a maximum by itself. So a piece of kernel driver code, whether
a seperate driver or a integrated driver, is necessary if user-space wants to
know thermal status of a package.

> > > > We had was a separate discussion if the coretemp driver should be redesigned
> > > > to one instance per CPU. The package sensor would fit into that model,
> > > > since you would have
> > > > 
> > > > coretemp-isa-0000
> > > > Core0
> > > > Core1
> > > > ...
> > > > CoreN
> > > > Package
> > > > 
> > > > coretemp-isa-0001
> > > > Core0
> > > > Core1
> > > > ...
> > > > CoreM
> > > > Package
> > > > 
> > > > I personally would prefer that approach. It would avoid ambiguity associating Package X
> > > > with specific cores, and it would also easily expand to additional non-core future sensors.
> > 
> > For the records, I totally support this approach. I want the coretemp
> > driver to be updated to present a single hwmon device per CPU, no
> > matter what happens to the "package temperature".
> > 
> I might spend some time rewriting the coretemp driver as described above,
> unless someone else picks it up, and unless there is opposition. 
> Obviously, that won't include the package sensor since there is now
> a separate driver for it.
I agree with this method too. On a multiple socket system, the current coretemp
output will cause confusion since it only outputs core# without package#.

If it's ok for you, I can rewrite this part to have hwmon device per CPU with
both core and package thermal info and send out RFC patch soon.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-08-19 20:51                 ` Fenghua Yu
@ 2010-08-19 21:06                   ` Guenter Roeck
  2010-08-20  8:33                   ` Jean Delvare
  2010-08-21 10:02                   ` Jean Delvare
  2 siblings, 0 replies; 17+ messages in thread
From: Guenter Roeck @ 2010-08-19 21:06 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Jean Delvare, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Brown, Len, Chen Gong, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

On Thu, Aug 19, 2010 at 04:51:20PM -0400, Fenghua Yu wrote:
> On Thu, Aug 19, 2010 at 09:27:19AM -0700, Guenter Roeck wrote:
> > On Thu, Aug 19, 2010 at 11:46:32AM -0400, Jean Delvare wrote:
> > > Hi Fenghua, Guenter,
> > > 
> > > Sorry for joining the discussion a little late, I was on vacation when
> > > it happened. I'll comment now, it's probably "too late" as the patch
> > > set was merged meanwhile, but still...
> > > 
> > There was no discussion at all, unfortunately.
> > 
> > > On Thu, 22 Jul 2010 14:21:11 -0700, Fenghua Yu wrote:
> > > > On Thu, Jul 22, 2010 at 11:58:14AM -0700, Guenter Roeck wrote:
> > > > > You use the argument that there may be other package level sensors in the future.
> > > > > Are there any plans for this, or is this just a theory ?
> > > > 
> > > > Not just a theory. Sandy Bridge already implements other package level sensors.
> > > > If really need to know exactly which sensors are implemented, we might go
> > > > through a channel before releasing the info.
> > > > 
> > > > > Next question is how to handle future sensor types. One hwmon instance per sensor,
> > > > > additional sensors in this driver, or even a new driver ?
> > > > 
> > > > Currently package level thermal just reports the maximum temperature across
> > > > the package. Which sensor is reporting the highest temperature is not exposed.
> > > 
> > > So this isn't a real physical sensor, but more of a meta-sensor? If
> > > this is a case, then we don't need support for this at all. User-space
> > > can compute a maximum by itself, we don't need a dedicated kernel
> > > driver for that.
> > > 
> The pkgtemp reports thermal status for a set of sensors in a package. Please
> note the sensors in a package are not limited to processor sensors which are
> handled by coretemp. The sensors in a package also include gfx sensors, cache
> sensors, memory controllor sensors which are not handled by any hwmon drivers.
> 
> OS gets maximum package thermal status only from MSR PACKAGE_THERM_STATUS.
> There is no detailed thermal info for each sensor in a package. User-space
> can't compute a maximum by itself. So a piece of kernel driver code, whether
> a seperate driver or a integrated driver, is necessary if user-space wants to
> know thermal status of a package.
> 
> > > > > We had was a separate discussion if the coretemp driver should be redesigned
> > > > > to one instance per CPU. The package sensor would fit into that model,
> > > > > since you would have
> > > > > 
> > > > > coretemp-isa-0000
> > > > > Core0
> > > > > Core1
> > > > > ...
> > > > > CoreN
> > > > > Package
> > > > > 
> > > > > coretemp-isa-0001
> > > > > Core0
> > > > > Core1
> > > > > ...
> > > > > CoreM
> > > > > Package
> > > > > 
> > > > > I personally would prefer that approach. It would avoid ambiguity associating Package X
> > > > > with specific cores, and it would also easily expand to additional non-core future sensors.
> > > 
> > > For the records, I totally support this approach. I want the coretemp
> > > driver to be updated to present a single hwmon device per CPU, no
> > > matter what happens to the "package temperature".
> > > 
> > I might spend some time rewriting the coretemp driver as described above,
> > unless someone else picks it up, and unless there is opposition. 
> > Obviously, that won't include the package sensor since there is now
> > a separate driver for it.
> I agree with this method too. On a multiple socket system, the current coretemp
> output will cause confusion since it only outputs core# without package#.
> 
> If it's ok for you, I can rewrite this part to have hwmon device per CPU with
> both core and package thermal info and send out RFC patch soon.
> 
Sounds good to me.

Guenter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-08-19 20:51                 ` Fenghua Yu
  2010-08-19 21:06                   ` Guenter Roeck
@ 2010-08-20  8:33                   ` Jean Delvare
  2010-08-20 16:58                     ` Fenghua Yu
  2010-08-20 18:39                     ` H. Peter Anvin
  2010-08-21 10:02                   ` Jean Delvare
  2 siblings, 2 replies; 17+ messages in thread
From: Jean Delvare @ 2010-08-20  8:33 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Guenter Roeck, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Brown, Len, Chen Gong, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

Fenghua,

On Thu, 19 Aug 2010 13:51:20 -0700, Fenghua Yu wrote:
> On Thu, Aug 19, 2010 at 09:27:19AM -0700, Guenter Roeck wrote:
> > I might spend some time rewriting the coretemp driver as described above,
> > unless someone else picks it up, and unless there is opposition. 
> > Obviously, that won't include the package sensor since there is now
> > a separate driver for it.
>
> I agree with this method too. On a multiple socket system, the current coretemp
> output will cause confusion since it only outputs core# without package#.

Good point.

> If it's ok for you, I can rewrite this part to have hwmon device per CPU with
> both core and package thermal info and send out RFC patch soon.

Yes, please! If you have time to work on this, it would be very great.
I am really curious to see how the driver would look like if we go with
this approach. I can test the code, too (although I understand you
won't have any difficulties getting your hands on recent Intel
systems ;)

Also see my reply in the other thread about the handling of removed
siblings. I suspect it will be very easy to add to the new design.

Side question: is it safe to assume a maximum of 2 siblings per core on
Intel x86 CPUs?

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-08-20  8:33                   ` Jean Delvare
@ 2010-08-20 16:58                     ` Fenghua Yu
  2010-08-20 18:39                     ` H. Peter Anvin
  1 sibling, 0 replies; 17+ messages in thread
From: Fenghua Yu @ 2010-08-20 16:58 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Yu, Fenghua, Guenter Roeck, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Brown, Len, Chen Gong, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

On Fri, Aug 20, 2010 at 01:33:56AM -0700, Jean Delvare wrote:
> Fenghua,
> 
> On Thu, 19 Aug 2010 13:51:20 -0700, Fenghua Yu wrote:
> > On Thu, Aug 19, 2010 at 09:27:19AM -0700, Guenter Roeck wrote:
> > > I might spend some time rewriting the coretemp driver as described above,
> > > unless someone else picks it up, and unless there is opposition. 
> > > Obviously, that won't include the package sensor since there is now
> > > a separate driver for it.
> >
> > I agree with this method too. On a multiple socket system, the current coretemp
> > output will cause confusion since it only outputs core# without package#.
> 
> Good point.
> 
> > If it's ok for you, I can rewrite this part to have hwmon device per CPU with
> > both core and package thermal info and send out RFC patch soon.
> 
> Yes, please! If you have time to work on this, it would be very great.
> I am really curious to see how the driver would look like if we go with
> this approach. I can test the code, too (although I understand you
> won't have any difficulties getting your hands on recent Intel
> systems ;)
> 
> Also see my reply in the other thread about the handling of removed
> siblings. I suspect it will be very easy to add to the new design.
> 
> Side question: is it safe to assume a maximum of 2 siblings per core on
> Intel x86 CPUs?
I think architecturally it's not safe to assume 2 siblings per core on x86
although so far HT implementations have been having 2 siblings per core.

Linux kernel doesn't assume 2 siblings per core during initialization (please
check arch/x86/kernel/smpboot.c). This is right way to handle potential non 2
sibling case in the future.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-08-20  8:33                   ` Jean Delvare
  2010-08-20 16:58                     ` Fenghua Yu
@ 2010-08-20 18:39                     ` H. Peter Anvin
  1 sibling, 0 replies; 17+ messages in thread
From: H. Peter Anvin @ 2010-08-20 18:39 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Fenghua Yu, Guenter Roeck, Ingo Molnar, Thomas Gleixner,
	Brown, Len, Chen Gong, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

On 08/20/2010 01:33 AM, Jean Delvare wrote:
> 
> Side question: is it safe to assume a maximum of 2 siblings per core on
> Intel x86 CPUs?
> 

That is true for all current CPUs, but is not guaranteed for the future.

	-hpa

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc
  2010-08-19 20:51                 ` Fenghua Yu
  2010-08-19 21:06                   ` Guenter Roeck
  2010-08-20  8:33                   ` Jean Delvare
@ 2010-08-21 10:02                   ` Jean Delvare
  2 siblings, 0 replies; 17+ messages in thread
From: Jean Delvare @ 2010-08-21 10:02 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Guenter Roeck, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Brown, Len, Chen Gong, Wan, Huaxu, lkml,
	lm-sensors@lm-sensors.org

One more thing...

On Thu, 19 Aug 2010 13:51:20 -0700, Fenghua Yu wrote:
> The pkgtemp reports thermal status for a set of sensors in a package. Please
> note the sensors in a package are not limited to processor sensors which are
> handled by coretemp. The sensors in a package also include gfx sensors, cache
> sensors, memory controllor sensors which are not handled by any hwmon drivers.

By "package" you mean CPU package, right? So by "gfx sensors" and
"memory controllor sensors", you refer to features possibly embedded
into the CPU package, not sensors outside the CPU, right?

> OS gets maximum package thermal status only from MSR PACKAGE_THERM_STATUS.
> There is no detailed thermal info for each sensor in a package. User-space
> can't compute a maximum by itself. So a piece of kernel driver code, whether
> a seperate driver or a integrated driver, is necessary if user-space wants to
> know thermal status of a package.

OK, now I understand, thanks for the clarification.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-08-21 10:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <94E56C79ECC49A4B87113985F1FEBA8D03F8AE2A4B@bgsmsx502.gar.corp.intel.com>
     [not found] ` <4C485DF1.5050407@linux.intel.com>
2010-07-22 16:21   ` [PATCH 1/5] Package Level Thermal Control and Power Limit Notification: feature enabling Fenghua Yu
2010-07-22 16:21   ` [PATCH 2/5] Package Level Thermal Control and Power Limit Notification: pkgtemp hwmon driver Fenghua Yu
2010-07-22 16:22   ` [PATCH 3/5] Package Level Thermal Control and Power Limit Notification: thermal throttling Fenghua Yu
2010-07-22 16:22   ` [PATCH 4/5] Package Level Thermal Control and Power Limit Notification: power limit notification Fenghua Yu
2010-07-22 16:22   ` [PATCH 5/5] Package Level Thermal Control and Power Limit Notification: pkgtemp doc Fenghua Yu
2010-07-22 17:27     ` Guenter Roeck
2010-07-22 17:52       ` Fenghua Yu
2010-07-22 18:58         ` Guenter Roeck
2010-07-22 21:21           ` Fenghua Yu
2010-08-19 15:46             ` Jean Delvare
2010-08-19 16:27               ` Guenter Roeck
2010-08-19 20:51                 ` Fenghua Yu
2010-08-19 21:06                   ` Guenter Roeck
2010-08-20  8:33                   ` Jean Delvare
2010-08-20 16:58                     ` Fenghua Yu
2010-08-20 18:39                     ` H. Peter Anvin
2010-08-21 10:02                   ` Jean Delvare

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).