From: Guenter Roeck <linux@roeck-us.net>
To: "Huang Rui" <ray.huang@amd.com>, "Borislav Petkov" <bp@suse.de>,
"Peter Zijlstra" <peterz@infradead.org>,
"Jean Delvare" <jdelvare@suse.de>,
"Andy Lutomirski" <luto@amacapital.net>,
"Andreas Herrmann" <herrmann.der.user@gmail.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@kernel.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
"Len Brown" <lenb@kernel.org>,
"John Stultz" <john.stultz@linaro.org>,
"Frédéric Weisbecker" <fweisbec@gmail.com>
Cc: lm-sensors@lm-sensors.org, linux-kernel@vger.kernel.org,
x86@kernel.org,
Andreas Herrmann <herrmann.der.user@googlemail.com>,
Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>,
Borislav Petkov <bp@alien8.de>,
Fengguang Wu <fengguang.wu@intel.com>,
Aaron Lu <aaron.lu@intel.com>, Tony Li <tony.li@amd.com>
Subject: Re: [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm
Date: Fri, 23 Oct 2015 07:20:59 -0700 [thread overview]
Message-ID: <562A424B.2070108@roeck-us.net> (raw)
In-Reply-To: <1445308109-17970-8-git-send-email-ray.huang@amd.com>
On 10/19/2015 07:28 PM, Huang Rui wrote:
> This patch introduces an algorithm that computes the average power by
> reading a delta value of “core power accumulator” register during
> measurement interval, and then dividing delta value by the length of
> the time interval.
>
> User is able to use power1_average entry to measure the processor power
> consumption and power1_average_interval entry to set the interval.
>
> A simple example:
>
> ray@hr-ub:~/tip$ sensors
> fam15h_power-pci-00c4
> Adapter: PCI adapter
> power1: 23.73 mW (avg = 634.63 mW, interval = 0.01 s)
> (crit = 15.00 W)
>
> ...
>
> The result is current average processor power consumption in 10
> millisecond. The unit of the result is uWatt.
>
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
> drivers/hwmon/fam15h_power.c | 120 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 120 insertions(+)
>
> diff --git a/drivers/hwmon/fam15h_power.c b/drivers/hwmon/fam15h_power.c
> index 6321f73..a5a539e 100644
> --- a/drivers/hwmon/fam15h_power.c
> +++ b/drivers/hwmon/fam15h_power.c
> @@ -26,6 +26,9 @@
> #include <linux/pci.h>
> #include <linux/bitops.h>
> #include <linux/cpumask.h>
> +#include <linux/mutex.h>
> +#include <linux/time.h>
> +#include <linux/sched.h>
> #include <asm/processor.h>
> #include <asm/msr.h>
>
> @@ -64,6 +67,10 @@ struct fam15h_power_data {
> u64 cu_acc_power[MAX_CUS];
> /* performance timestamp counter */
> u64 cpu_sw_pwr_ptsc[MAX_CUS];
> + /* online/offline status of current compute unit */
> + int cu_on[MAX_CUS];
> + unsigned long power_period;
> + struct mutex acc_pwr_mutex;
Can you elaborate a bit about what this mutex is supposed to protect ?
To me it seems that it doesn't really protect anything.
> };
>
> static ssize_t show_power(struct device *dev,
> @@ -132,11 +139,15 @@ static void do_read_registers_on_cu(void *_data)
> cores_per_cu = amd_get_cores_per_cu();
> cu = cpu / cores_per_cu;
>
> + mutex_lock(&data->acc_pwr_mutex);
> WARN_ON(rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR,
> &data->cu_acc_power[cu]));
>
> WARN_ON(rdmsrl_safe(MSR_F15H_PTSC,
> &data->cpu_sw_pwr_ptsc[cu]));
> +
> + data->cu_on[cu] = 1;
> + mutex_unlock(&data->acc_pwr_mutex);
... for example, while this protects cu_on[cu],
> }
>
> static int read_registers(struct fam15h_power_data *data)
> @@ -148,6 +159,10 @@ static int read_registers(struct fam15h_power_data *data)
> cores_per_cu = amd_get_cores_per_cu();
> cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
>
> + mutex_lock(&data->acc_pwr_mutex);
> + memset(data->cu_on, 0, sizeof(int) * MAX_CUS);
> + mutex_unlock(&data->acc_pwr_mutex);
... this code may well overwrite that same value.
> +
> WARN_ON_ONCE(cu_num > MAX_CUS);
>
> ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
> @@ -184,18 +199,113 @@ static int read_registers(struct fam15h_power_data *data)
> return 0;
> }
>
> +static ssize_t acc_show_power(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct fam15h_power_data *data = dev_get_drvdata(dev);
> + u64 prev_cu_acc_power[MAX_CUS], prev_ptsc[MAX_CUS],
> + jdelta[MAX_CUS];
> + u64 tdelta, avg_acc;
> + int cu, cu_num, cores_per_cu, ret;
> + signed long leftover;
> +
> + cores_per_cu = amd_get_cores_per_cu();
> + cu_num = boot_cpu_data.x86_max_cores / cores_per_cu;
> +
> + ret = read_registers(data);
> + if (ret)
> + return 0;
> +
> + cu = 0;
> + while(cu++ < cu_num) {
> + prev_cu_acc_power[cu] = data->cu_acc_power[cu];
> + prev_ptsc[cu] = data->cpu_sw_pwr_ptsc[cu];
> + }
... and multiple parallel reads on the power attribute must produce
pretty random values, unless I am really missing something.
> +
> + leftover = schedule_timeout_interruptible(
> + msecs_to_jiffies(data->power_period)
> + );
> + if (leftover)
> + return 0;
> +
> + ret = read_registers(data);
> + if (ret)
> + return 0;
> +
With a 10ms period, I wonder how accurate this really is.
> + for (cu = 0, avg_acc = 0; cu < cu_num; cu++) {
> + /* check if current compute unit is online */
> + if (data->cu_on[cu] == 0)
> + continue;
> +
> + if (data->cu_acc_power[cu] < prev_cu_acc_power[cu]) {
> + jdelta[cu] = data->max_cu_acc_power + data->cu_acc_power[cu];
> + jdelta[cu] -= prev_cu_acc_power[cu];
> + } else {
> + jdelta[cu] = data->cu_acc_power[cu] - prev_cu_acc_power[cu];
> + }
> + tdelta = data->cpu_sw_pwr_ptsc[cu] - prev_ptsc[cu];
> + jdelta[cu] *= data->cpu_pwr_sample_ratio * 1000;
> + do_div(jdelta[cu], tdelta);
> +
> + /* the unit is microWatt */
> + avg_acc += jdelta[cu];
> + }
> +
> + return sprintf(buf, "%llu\n", (unsigned long long)avg_acc);
> +}
> +static DEVICE_ATTR(power1_average, S_IRUGO, acc_show_power, NULL);
> +
> +
> +static ssize_t acc_show_power_period(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct fam15h_power_data *data = dev_get_drvdata(dev);
> +
> + return sprintf(buf, "%lu\n", data->power_period);
> +}
> +
> +static ssize_t acc_set_power_period(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct fam15h_power_data *data = dev_get_drvdata(dev);
> + unsigned long temp;
> + int ret;
> +
> + ret = kstrtoul(buf, 10, &temp);
> + if (ret)
> + return ret;
> +
> + mutex_lock(&data->acc_pwr_mutex);
> + data->power_period = temp;
> + mutex_unlock(&data->acc_pwr_mutex);
This doesn't really protect anything either except that power_period
can not be updated while the lock is active. But the code using
power_period does not run under mutex protection, so that seems pretty
pointless.
Also, this accepts an unlimited timeout. If I understand correctly,
setting the timeout to, say, 10 seconds will cause the read function
to hang for that period of time. Setting it to 1 hour will cause the read
function to hang for 1 hour.
Does this really make sense ?
> +
> + return count;
> +}
> +static DEVICE_ATTR(power1_average_interval, S_IRUGO | S_IWUSR,
> + acc_show_power_period, acc_set_power_period);
> +
> static int fam15h_power_init_attrs(struct pci_dev *pdev,
> struct fam15h_power_data *data)
> {
> int n = FAM15H_MIN_NUM_ATTRS;
> struct attribute **fam15h_power_attrs;
> struct cpuinfo_x86 *c = &boot_cpu_data;
> + u32 cpuid;
>
> if (c->x86 == 0x15 &&
> ((c->x86_model <= 0xf) ||
> (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
> n += 1;
>
> + cpuid = cpuid_edx(0x80000007);
> +
> + /* check if processor supports accumulated power */
> + if (cpuid & BIT(12))
> + n += 2;
> +
> fam15h_power_attrs = devm_kcalloc(&pdev->dev, n,
> sizeof(*fam15h_power_attrs),
> GFP_KERNEL);
> @@ -210,6 +320,11 @@ static int fam15h_power_init_attrs(struct pci_dev *pdev,
> (c->x86_model >= 0x60 && c->x86_model <= 0x6f)))
> fam15h_power_attrs[n++] = &dev_attr_power1_input.attr;
>
> + if (cpuid & BIT(12)) {
> + fam15h_power_attrs[n++] = &dev_attr_power1_average.attr;
> + fam15h_power_attrs[n++] = &dev_attr_power1_average_interval.attr;
> + }
> +
> data->fam15h_power_group.attrs = fam15h_power_attrs;
>
> return 0;
> @@ -322,6 +437,9 @@ static int fam15h_power_init_data(struct pci_dev *f4,
>
> data->max_cu_acc_power = tmp;
>
> + /* set default interval as 10 ms */
> + data->power_period = 10;
> +
> ret = read_registers(data);
>
> return ret;
> @@ -349,6 +467,8 @@ static int fam15h_power_probe(struct pci_dev *pdev,
> if (!data)
> return -ENOMEM;
>
> + mutex_init(&data->acc_pwr_mutex);
> +
> ret = fam15h_power_init_data(pdev, data);
> if (ret)
> return ret;
>
next prev parent reply other threads:[~2015-10-23 14:21 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-20 2:28 [PATCH v2 00/10] hwmon: (fam15h_power) Introduce an accumulated power reporting algorithm Huang Rui
2015-10-20 2:28 ` [PATCH v2 01/10] hwmon: (fam15h_power) Refactor attributes for dynamically added Huang Rui
2015-10-23 13:42 ` Guenter Roeck
2015-10-26 1:58 ` Huang Rui
2015-10-26 2:14 ` Guenter Roeck
2015-10-26 2:25 ` Huang Rui
2015-10-20 2:28 ` [PATCH v2 02/10] hwmon: (fam15h_power) Enable power1_input on AMD Carrizo Huang Rui
2015-10-23 13:45 ` Guenter Roeck
2015-10-26 2:19 ` Huang Rui
2015-10-20 2:28 ` [PATCH v2 03/10] hwmon: (fam15h_power) Add max compute unit accumulated power Huang Rui
2015-10-20 2:28 ` [PATCH v2 04/10] x86, amd: add accessor for number of cores per compute unit Huang Rui
2015-10-20 2:28 ` [PATCH v2 05/10] hwmon: (fam15h_power) Add compute unit accumulated power Huang Rui
2015-10-20 7:24 ` kbuild test robot
2015-10-21 1:42 ` Huang Rui
2015-10-21 2:15 ` Guenter Roeck
2015-10-21 2:40 ` Huang Rui
2015-10-21 2:49 ` Guenter Roeck
2015-10-21 3:04 ` Huang Rui
2015-10-21 6:05 ` Jean Delvare
2015-10-21 6:28 ` Huang Rui
2015-10-23 13:27 ` Borislav Petkov
2015-10-23 13:37 ` Guenter Roeck
2015-10-27 2:53 ` Huang Rui
2015-10-28 0:34 ` Borislav Petkov
2015-10-28 5:06 ` Huang Rui
2015-10-20 2:28 ` [PATCH v2 06/10] hwmon: (fam15h_power) Add ptsc counter value for " Huang Rui
2015-10-23 13:59 ` Guenter Roeck
2015-10-26 3:37 ` Huang Rui
2015-10-26 4:36 ` Borislav Petkov
2015-10-20 2:28 ` [PATCH v2 07/10] hwmon: (fam15h_power) Introduce a cpu accumulated power reporting algorithm Huang Rui
2015-10-23 13:28 ` Borislav Petkov
2015-10-26 3:46 ` Huang Rui
2015-10-26 4:41 ` Borislav Petkov
2015-10-23 14:20 ` Guenter Roeck [this message]
2015-10-27 3:36 ` Huang Rui
2015-10-20 2:28 ` [PATCH v2 08/10] hwmon: (fam15h_power) Add documentation for previous TDP reporting Huang Rui
2015-10-20 2:28 ` [PATCH v2 09/10] hwmon: (fam15h_power) Add documentation for accumulated power algorithm Huang Rui
2015-10-23 10:22 ` Borislav Petkov
2015-10-20 2:28 ` [PATCH v2 10/10] MAINTAINERS: change the maintainer of fam15h_power driver Huang Rui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=562A424B.2070108@roeck-us.net \
--to=linux@roeck-us.net \
--cc=Aravind.Gopalakrishnan@amd.com \
--cc=aaron.lu@intel.com \
--cc=bp@alien8.de \
--cc=bp@suse.de \
--cc=fengguang.wu@intel.com \
--cc=fweisbec@gmail.com \
--cc=herrmann.der.user@gmail.com \
--cc=herrmann.der.user@googlemail.com \
--cc=jdelvare@suse.de \
--cc=john.stultz@linaro.org \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lm-sensors@lm-sensors.org \
--cc=luto@amacapital.net \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=ray.huang@amd.com \
--cc=rjw@rjwysocki.net \
--cc=tglx@linutronix.de \
--cc=tony.li@amd.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.