From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
venkatesh.pallipadi@intel.com, suresh.b.siddha@intel.com,
Michael Neuling <mikey@neuling.org>,
"Amit K. Arora" <aarora@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH v1 1/3] General framework for APERF/MPERF access and accounting
Date: Mon, 26 May 2008 23:41:02 +0530 [thread overview]
Message-ID: <483AFD36.7050407@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080526143138.24680.99086.stgit@drishya.in.ibm.com>
Vaidyanathan Srinivasan wrote:
> General framework for low level APERF/MPERF access.
> * Avoid resetting APERF/MPERF in acpi-cpufreq.c
> * Implement functions that will calculate the scaled stats
> * acpi_get_pm_msrs_delta() will give delta values after reading the current values
> * Change get_measured_perf() to use acpi_get_pm_msrs_delta()
> * scaled_stats_init() detect availability of APERF/MPERF using cpuid
> * reset_for_scaled_stats() called when a process occupies the CPU
> * account_scaled_stats() is called when the process leaves the CPU
>
> Signed-off-by: Amit K. Arora <aarora@linux.vnet.ibm.com>
> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> ---
>
> arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 21 +++
> arch/x86/kernel/time_32.c | 171 ++++++++++++++++++++++++++++
> 2 files changed, 188 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> index b0c8208..761beec 100644
> --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> @@ -59,6 +59,13 @@ enum {
> #define INTEL_MSR_RANGE (0xffff)
> #define CPUID_6_ECX_APERFMPERF_CAPABILITY (0x1)
>
> +/* Buffer to store old snapshot values */
> +DEFINE_PER_CPU(u64, cpufreq_old_aperf);
> +DEFINE_PER_CPU(u64, cpufreq_old_mperf);
> +
> +extern void acpi_get_pm_msrs_delta(u64 *aperf_delta, u64 *mperf_delta,
> + u64 *aperf_old, u64 *mperf_old, int reset);
> +
> struct acpi_cpufreq_data {
> struct acpi_processor_performance *acpi_data;
> struct cpufreq_frequency_table *freq_table;
> @@ -265,6 +272,7 @@ static unsigned int get_measured_perf(unsigned int cpu)
> } split;
> u64 whole;
> } aperf_cur, mperf_cur;
> + u64 *aperf_old, *mperf_old;
>
> cpumask_t saved_mask;
> unsigned int perf_percent;
> @@ -278,11 +286,16 @@ static unsigned int get_measured_perf(unsigned int cpu)
> return 0;
> }
>
> - rdmsr(MSR_IA32_APERF, aperf_cur.split.lo, aperf_cur.split.hi);
> - rdmsr(MSR_IA32_MPERF, mperf_cur.split.lo, mperf_cur.split.hi);
> + /*
> + * Get the old APERF/MPERF values for this cpu and pass it to
> + * acpi_get_pm_msrs_delta() which will read the current values
> + * and return the delta.
> + */
> + aperf_old = &(per_cpu(cpufreq_old_aperf, smp_processor_id()));
> + mperf_old = &(per_cpu(cpufreq_old_mperf, smp_processor_id()));
>
> - wrmsr(MSR_IA32_APERF, 0,0);
> - wrmsr(MSR_IA32_MPERF, 0,0);
> + acpi_get_pm_msrs_delta(&aperf_cur.whole, &mperf_cur.whole,
> + aperf_old, mperf_old, 1);
>
> #ifdef __i386__
> /*
> diff --git a/arch/x86/kernel/time_32.c b/arch/x86/kernel/time_32.c
> index 2ff21f3..5131e01 100644
> --- a/arch/x86/kernel/time_32.c
> +++ b/arch/x86/kernel/time_32.c
> @@ -32,10 +32,13 @@
> #include <linux/interrupt.h>
> #include <linux/time.h>
> #include <linux/mca.h>
> +#include <linux/kernel_stat.h>
>
> #include <asm/arch_hooks.h>
> #include <asm/hpet.h>
> #include <asm/time.h>
> +#include <asm/processor.h>
> +#include <asm/cputime.h>
>
> #include "do_timer.h"
>
> @@ -136,3 +139,171 @@ void __init time_init(void)
> tsc_init();
> late_time_init = choose_time_init();
> }
> +
> +/*
> + * This function should be used to get the APERF/MPERF MSRS delta from the cpu.
> + * We let the individual users of this function store the old values of APERF
> + * and MPERF registers in per cpu variables. They pass these old values as 3rd
> + * and 4th arguments. 'reset' tells if the old values should be reset or not.
> + * Mostly, users of this function will like to reset the old values.
> + */
> +void acpi_get_pm_msrs_delta(u64 *aperf_delta, u64 *mperf_delta, u64 *aperf_old,
> + u64 *mperf_old, int reset)
> +{
> + union {
> + struct {
> + u32 lo;
> + u32 hi;
> + } split;
> + u64 whole;
> + } aperf_cur, mperf_cur;
> + unsigned long flags;
> +
> + /* Read current values of APERF and MPERF MSRs*/
> + local_irq_save(flags);
Why do we need to do this? We've already disabled pre-emption in the caller?
> + rdmsr(MSR_IA32_MPERF, mperf_cur.split.lo, mperf_cur.split.hi);
> + rdmsr(MSR_IA32_APERF, aperf_cur.split.lo, aperf_cur.split.hi);
> + local_irq_restore(flags);
> +
> + /*
> + * If the new values are less than the previous values, there has
> + * been an overflow and both APERF and MPERF have been reset to
> + * zero. In this case consider absolute value as diff/delta.
> + * Note that we do not check for 'reset' here, since resetting here
> + * is no more optional and has to be done for values to make sense.
> + */
> + if (unlikely((mperf_cur.whole <= *mperf_old) ||
> + (aperf_cur.whole <= *aperf_old)))
> + {
> + *aperf_old = 0;
> + *mperf_old = 0;
> + }
> +
> + /* Calculate the delta from the current and per cpu old values */
> + *mperf_delta = mperf_cur.whole - *mperf_old;
> + *aperf_delta = aperf_cur.whole - *aperf_old;
> +
> + /* Set the per cpu variables to current readings */
> + if (reset) {
> + *mperf_old = mperf_cur.whole;
> + *aperf_old = aperf_cur.whole;
> + }
> +}
> +EXPORT_SYMBOL(acpi_get_pm_msrs_delta);
> +
> +
> +DEFINE_PER_CPU(u64, cputime_old_aperf);
> +DEFINE_PER_CPU(u64, cputime_old_mperf);
> +
> +DEFINE_PER_CPU(cputime_t, task_utime_old);
> +DEFINE_PER_CPU(cputime_t, task_stime_old);
> +
I fail to understand the per cpu variable for task_utime_old and task_stime_old?
What does it represent? Why is it global?
> +
> +#define CPUID_6_ECX_APERFMPERF_CAPABILITY (0x1)
> +
> +static int cpu_supports_freq_scaling;
> +
> +/* Initialize scaled stat functions */
> +void scaled_stats_init(void)
> +{
> + struct cpuinfo_x86 *c = &cpu_data(0);
> +
> + /* Check for APERF/MPERF support in hardware */
> + if (c->x86_vendor == X86_VENDOR_INTEL && c->cpuid_level >= 6) {
> + unsigned int ecx;
> + ecx = cpuid_ecx(6);
> + if (ecx & CPUID_6_ECX_APERFMPERF_CAPABILITY)
> + cpu_supports_freq_scaling = 1;
> + else
> + cpu_supports_freq_scaling = -1;
> + }
> +}
> +
> +/*
> + * Reset the old utime and stime (percpu) value to the new task
> + * that we are going to switch to.
> + */
> +void reset_for_scaled_stats(struct task_struct *tsk)
> +{
> + u64 aperf_delta, mperf_delta;
> + u64 *aperf_old, *mperf_old;
> +
> + if (cpu_supports_freq_scaling < 0)
> + return;
> +
> + if(!cpu_supports_freq_scaling) {
> + scaled_stats_init();
> + if(cpu_supports_freq_scaling < 0)
> + return;
> + }
> +
> + aperf_old = &(per_cpu(cputime_old_aperf, smp_processor_id()));
> + mperf_old = &(per_cpu(cputime_old_mperf, smp_processor_id()));
> + acpi_get_pm_msrs_delta(&aperf_delta, &mperf_delta, aperf_old,
> + mperf_old, 1);
> +
> + per_cpu(task_utime_old, smp_processor_id()) = tsk->utime;
> + per_cpu(task_stime_old, smp_processor_id()) = tsk->stime;
> +}
> +
I hope this routine is called with preemption disabled
> +
> +/* Account scaled statistics for a task on context switch */
> +void account_scaled_stats(struct task_struct *tsk)
> +{
> + u64 aperf_delta, mperf_delta;
> + u64 *aperf_old, *mperf_old;
> + cputime_t time;
> + u64 time_msec;
> + int readmsrs = 1;
> +
> + if(!cpu_supports_freq_scaling)
> + scaled_stats_init();
> +
> + /*
> + * Get the old APERF/MPERF values for this cpu and pass it to
> + * acpi_get_pm_msrs_delta() which will read the current values
> + * and return the delta.
> + */
> + aperf_old = &(per_cpu(cputime_old_aperf, smp_processor_id()));
> + mperf_old = &(per_cpu(cputime_old_mperf, smp_processor_id()));
> +
> + if (cputime_gt(tsk->utime, per_cpu(task_utime_old,
> + smp_processor_id()))) {
> + acpi_get_pm_msrs_delta(&aperf_delta, &mperf_delta, aperf_old,
> + mperf_old, 1);
> + readmsrs = 0;
> + time = cputime_sub(tsk->utime, per_cpu(task_utime_old,
> + smp_processor_id()));
> + time_msec = cputime_to_msecs(time);
> + time_msec *= 1000; /* Scale it to hold fractional values */
What is 1000? The code is not clear
> + if (cpu_supports_freq_scaling == 1) {
> + time_msec *= aperf_delta;
> + time_msec = div64_u64(time_msec, mperf_delta);
> + }
> + time = msecs_to_cputime(time_msec);
> + account_user_time_scaled(tsk, time);
> + per_cpu(task_utime_old, smp_processor_id()) = tsk->utime;
> + }
> +
> + if (cputime_gt(tsk->stime, per_cpu(task_stime_old,
> + smp_processor_id()))) {
> + if (readmsrs)
> + acpi_get_pm_msrs_delta(&aperf_delta, &mperf_delta,
> + aperf_old, mperf_old, 1);
> + time = cputime_sub(tsk->stime, per_cpu(task_stime_old,
> + smp_processor_id()));
> +
> + time_msec = cputime_to_msecs(time);
> + time_msec *= 1000; /* Scale it to hold fractional values */
> + if (cpu_supports_freq_scaling == 1) {
> + time_msec *= aperf_delta;
> + time_msec = div64_u64(time_msec, mperf_delta);
> + }
> + time = msecs_to_cputime(time_msec);
> + account_system_time_scaled(tsk, time);
> + per_cpu(task_stime_old, smp_processor_id()) = tsk->stime;
> + }
> +
> +}
> +EXPORT_SYMBOL(account_scaled_stats);
> +
>
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
next prev parent reply other threads:[~2008-05-26 18:12 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-26 14:31 [RFC PATCH v1 0/3] Scaled statistics using APERF/MPERF in x86 Vaidyanathan Srinivasan
2008-05-26 14:31 ` [RFC PATCH v1 1/3] General framework for APERF/MPERF access and accounting Vaidyanathan Srinivasan
2008-05-26 18:11 ` Balbir Singh [this message]
2008-05-27 14:54 ` Vaidyanathan Srinivasan
2008-05-26 14:31 ` [RFC PATCH v1 2/3] Make calls to account_scaled_stats Vaidyanathan Srinivasan
2008-05-26 18:18 ` Balbir Singh
2008-05-27 15:02 ` Vaidyanathan Srinivasan
2008-05-29 15:18 ` Michael Neuling
2008-05-29 18:23 ` Vaidyanathan Srinivasan
2008-05-26 14:31 ` [RFC PATCH v1 3/3] Print scaled utime and stime in getdelays Vaidyanathan Srinivasan
2008-05-26 15:50 ` [RFC PATCH v1 0/3] Scaled statistics using APERF/MPERF in x86 Arjan van de Ven
2008-05-26 17:24 ` Balbir Singh
2008-05-26 18:00 ` Arjan van de Ven
2008-05-26 18:36 ` Balbir Singh
2008-05-26 18:51 ` Arjan van de Ven
2008-05-27 12:59 ` Balbir Singh
2008-05-27 13:19 ` Vaidyanathan Srinivasan
2008-05-27 14:15 ` Arjan van de Ven
2008-05-27 15:27 ` Vaidyanathan Srinivasan
2008-05-31 21:27 ` Pavel Machek
2008-06-02 17:54 ` Vaidyanathan Srinivasan
2008-06-03 2:20 ` Arjan van de Ven
2008-05-27 13:29 ` Vaidyanathan Srinivasan
2008-05-27 14:19 ` Arjan van de Ven
2008-05-27 15:20 ` Vaidyanathan Srinivasan
2008-05-27 14:04 ` Vaidyanathan Srinivasan
2008-05-27 16:40 ` Arjan van de Ven
2008-05-27 18:26 ` Vaidyanathan Srinivasan
2008-05-31 21:17 ` Pavel Machek
2008-05-31 21:13 ` Pavel Machek
2008-06-02 6:08 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=483AFD36.7050407@linux.vnet.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=aarora@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mikey@neuling.org \
--cc=suresh.b.siddha@intel.com \
--cc=svaidy@linux.vnet.ibm.com \
--cc=venkatesh.pallipadi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.