From: kazutomo <kazutomo.yoshii@gmail.com>
To: Jacob Pan <jacob.jun.pan@linux.intel.com>,
Linux PM <linux-pm@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>,
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
Len Brown <len.brown@intel.com>
Subject: Re: [PATCH] powercap/rapl: handle domain energy unit
Date: Wed, 11 Mar 2015 16:25:52 -0500 [thread overview]
Message-ID: <5500B2E0.9000107@gmail.com> (raw)
In-Reply-To: <1426078509-3767-1-git-send-email-jacob.jun.pan@linux.intel.com>
Hi Jacob,
Wow, this is a really pitfall for people who are writing their own RAPL
tool.
Anyway, I've tested your patch on a Haswell system (2699v3), running a
dgemm
benchmark. NOTE: userspace governor is selected. All core are set to 2.3
GHz.
No power cap is set.
# before the patch is applied
$ cd /sys/class/powercap/intel-rapl:0:0
$ cat name
dram
$ for i in 1 2 3 ; do a=`cat energy_uj` ; sleep 1 ; b=`cat energy_uj` ;
expr $b - $a ; done
16853445
16829355
16666320
# after the patch is applied
$ for i in 1 2 3 ; do a=`cat energy_uj` ; sleep 1 ; b=`cat energy_uj` ;
expr $b - $a ; done
69751487
68153897
69689816
I have a couple of questions.
1. Is it possible to retrieve the DRAM energy unit from some MSRs
*eventually* like the domain energy unit?
2. Will the Intel software developer's manual (vol3b) be updated
accordingly if you know? I'm assuming that you are working at Intel.
3. Is get_max_energy_range_uj still the same as other counters?
4. The current driver maintains the unit as an integer, instead of a
shift value, and the multiplier is a relatively small number. I guess
the DRAM energy unit is technically ~15.2587 uJ = (0.5 ** 16) * 1e6,
so it always reports a approx. 2 % smaller energy number, while the
pkg energy unit is ~61.0351, so the error is ~0.5 %. An easier
solution would be to maintain the unit in pJ, instead of uJ.
or am I worrying too much? I guess the RAPL energy estimation may
have some error, so maybe canceling out.
- kaz
On 03/11/2015 07:55 AM, Jacob Pan wrote:
> The current driver assumes all RAPL domains within a CPU package
> have the same energy unit. This is no longer true for HSW server
> CPUs since DRAM domain has is own fixed energy unit which can be
> different than the package energy unit enumerated by package
> power MSR. In fact, the default HSW EP package power unit is 61uJ
> whereas DRAM domain unit is 15.3uJ. The result is that DRAM power
> consumption is counted 4x more than real power reported by energy
> counters.
>
> This patch adds domain specific energy unit per cpu type, it allows
> domain energy unit to override package energy unit if non zero.
>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> drivers/powercap/intel_rapl.c | 35 ++++++++++++++++++++++++++++-------
> 1 file changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
> index 97b5e4e..af4c61e 100644
> --- a/drivers/powercap/intel_rapl.c
> +++ b/drivers/powercap/intel_rapl.c
> @@ -158,6 +158,7 @@ struct rapl_domain {
> struct rapl_power_limit rpl[NR_POWER_LIMITS];
> u64 attr_map; /* track capabilities */
> unsigned int state;
> + unsigned int domain_energy_unit;
> int package_id;
> };
> #define power_zone_to_rapl_domain(_zone) \
> @@ -190,6 +191,7 @@ struct rapl_defaults {
> void (*set_floor_freq)(struct rapl_domain *rd, bool mode);
> u64 (*compute_time_window)(struct rapl_package *rp, u64 val,
> bool to_raw);
> + unsigned int dram_domain_energy_unit;
> };
> static struct rapl_defaults *rapl_defaults;
>
> @@ -227,7 +229,8 @@ static int rapl_read_data_raw(struct rapl_domain *rd,
> static int rapl_write_data_raw(struct rapl_domain *rd,
> enum rapl_primitives prim,
> unsigned long long value);
> -static u64 rapl_unit_xlate(int package, enum unit_type type, u64 value,
> +static u64 rapl_unit_xlate(struct rapl_domain *rd, int package,
> + enum unit_type type, u64 value,
> int to_raw);
> static void package_power_limit_irq_save(int package_id);
>
> @@ -305,7 +308,8 @@ static int get_energy_counter(struct powercap_zone *power_zone, u64 *energy_raw)
>
> static int get_max_energy_counter(struct powercap_zone *pcd_dev, u64 *energy)
> {
> - *energy = rapl_unit_xlate(0, ENERGY_UNIT, ENERGY_STATUS_MASK, 0);
> + /* package domain is the largest */
> + *energy = rapl_unit_xlate(NULL, 0, ENERGY_UNIT, ENERGY_STATUS_MASK, 0);
> return 0;
> }
>
> @@ -639,6 +643,11 @@ static void rapl_init_domains(struct rapl_package *rp)
> rd->msrs[4] = MSR_DRAM_POWER_INFO;
> rd->rpl[0].prim_id = PL1_ENABLE;
> rd->rpl[0].name = pl1_name;
> + rd->domain_energy_unit =
> + rapl_defaults->dram_domain_energy_unit;
> + if (rd->domain_energy_unit)
> + pr_info("DRAM domain energy unit %duj\n",
> + rd->domain_energy_unit);
> break;
> }
> if (mask) {
> @@ -648,7 +657,8 @@ static void rapl_init_domains(struct rapl_package *rp)
> }
> }
>
> -static u64 rapl_unit_xlate(int package, enum unit_type type, u64 value,
> +static u64 rapl_unit_xlate(struct rapl_domain *rd, int package,
> + enum unit_type type, u64 value,
> int to_raw)
> {
> u64 units = 1;
> @@ -663,7 +673,11 @@ static u64 rapl_unit_xlate(int package, enum unit_type type, u64 value,
> units = rp->power_unit;
> break;
> case ENERGY_UNIT:
> - units = rp->energy_unit;
> + /* per domain unit takes precedence */
> + if (rd && rd->domain_energy_unit)
> + units = rd->domain_energy_unit;
> + else
> + units = rp->energy_unit;
> break;
> case TIME_UNIT:
> return rapl_defaults->compute_time_window(rp, value, to_raw);
> @@ -773,7 +787,7 @@ static int rapl_read_data_raw(struct rapl_domain *rd,
> final = value & rp->mask;
> final = final >> rp->shift;
> if (xlate)
> - *data = rapl_unit_xlate(rd->package_id, rp->unit, final, 0);
> + *data = rapl_unit_xlate(rd, rd->package_id, rp->unit, final, 0);
> else
> *data = final;
>
> @@ -799,7 +813,7 @@ static int rapl_write_data_raw(struct rapl_domain *rd,
> "failed to read msr 0x%x on cpu %d\n", msr, cpu);
> return -EIO;
> }
> - value = rapl_unit_xlate(rd->package_id, rp->unit, value, 1);
> + value = rapl_unit_xlate(rd, rd->package_id, rp->unit, value, 1);
> msr_val &= ~rp->mask;
> msr_val |= value << rp->shift;
> if (wrmsrl_safe_on_cpu(cpu, msr, msr_val)) {
> @@ -1017,6 +1031,13 @@ static const struct rapl_defaults rapl_defaults_core = {
> .compute_time_window = rapl_compute_time_window_core,
> };
>
> +static const struct rapl_defaults rapl_defaults_hsw_server = {
> + .check_unit = rapl_check_unit_core,
> + .set_floor_freq = set_floor_freq_default,
> + .compute_time_window = rapl_compute_time_window_core,
> + .dram_domain_energy_unit = 15,
> +};
> +
> static const struct rapl_defaults rapl_defaults_atom = {
> .check_unit = rapl_check_unit_atom,
> .set_floor_freq = set_floor_freq_atom,
> @@ -1037,7 +1058,7 @@ static const struct x86_cpu_id rapl_ids[] = {
> RAPL_CPU(0x3a, rapl_defaults_core),/* Ivy Bridge */
> RAPL_CPU(0x3c, rapl_defaults_core),/* Haswell */
> RAPL_CPU(0x3d, rapl_defaults_core),/* Broadwell */
> - RAPL_CPU(0x3f, rapl_defaults_core),/* Haswell */
> + RAPL_CPU(0x3f, rapl_defaults_hsw_server),/* Haswell servers */
> RAPL_CPU(0x45, rapl_defaults_core),/* Haswell ULT */
> RAPL_CPU(0x4C, rapl_defaults_atom),/* Braswell */
> RAPL_CPU(0x4A, rapl_defaults_atom),/* Tangier */
next prev parent reply other threads:[~2015-03-11 21:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-11 12:55 [PATCH] powercap/rapl: handle domain energy unit Jacob Pan
2015-03-11 21:25 ` kazutomo [this message]
2015-03-12 21:24 ` Jacob Pan
2015-03-12 21:59 ` kazutomo
2015-03-12 22:05 ` Jacob Pan
2015-03-12 22:50 ` kazutomo
2015-03-13 10:53 ` Jacob Pan
2015-03-11 22:01 ` Rafael J. Wysocki
2015-03-12 20:17 ` Jacob Pan
2015-03-12 22:24 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5500B2E0.9000107@gmail.com \
--to=kazutomo.yoshii@gmail.com \
--cc=jacob.jun.pan@linux.intel.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=rafael.j.wysocki@intel.com \
--cc=srinivas.pandruvada@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.