public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/3] Scaled statistics using APERF/MPERF in x86
@ 2008-05-26 14:31 Vaidyanathan Srinivasan
  2008-05-26 14:31 ` [RFC PATCH v1 1/3] General framework for APERF/MPERF access and accounting Vaidyanathan Srinivasan
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-05-26 14:31 UTC (permalink / raw)
  To: Linux Kernel, venkatesh.pallipadi, suresh.b.siddha,
	Michael Neuling, Balbir Singh, Amit K. Arora

The following RFC patch tries to implement scaled CPU utilisation statistics
using APERF and MPERF MSR registers in an x86 platform.

The CPU capacity is significantly changed when the CPU's frequency is reduced
for the purpose of power savings.  The applications that run at such lower CPU
frequencies are also accounted for real CPU time by default.  If the
applications have been run at full CPU frequency, they would have finished the
work faster and not get charged for excessive CPU time.

One of the solution to this problem it so scale the utime and stime entitlement
for the process as per the current CPU frequency.  This technique is used in
powerpc architecture with the help of hardware registers that accurately capture
the entitlement.

On x86 hardware, APERF and MPERF are MSR registers that can provide feedback on
current CPU frequency.  Currently these registers are used to detect current CPU
frequency on each core in a multi-core x86 processor where the frequency of the
entire package is changed.

This patch demonstrates the idea of scaling utime and stime based on cpu
frequency.  The scaled values are exported through taskstats delay accounting
infrastructure.

Example:

On a two socket two CPU x86 hardware:
./getdelays -d -l -m0-3  

PID     4172


CPU             count     real total  virtual total    delay total
                43873      148009250     3368915732       28751295
IO              count    delay total
                    0              0
MEM             count    delay total
                    0              0
                utime          stime
                40000         108000
         scaled utime   scaled stime          total
                26676          72032       98714169

The utime/stime and scaled utime/stime are printed in micro secs while the
totals are in nano seconds. The CPU was running at 66% of its maximum frequency.

We can observe that scaled utime/stime values are 66% of their normal
accumulated runtime values, and total is 66% of 'real total'.

The following output is for CPU intensive job running for 10s:

PID     4134


CPU             count     real total  virtual total    delay total
                   61    10000625000     9807860434              2
IO              count    delay total
                    0              0
MEM             count    delay total
                    0              0
                utime          stime
             10000000              0
         scaled utime   scaled stime          total
              9886696              0     9887313918

Ondemand governor was running and it took sometime to switch the frequency to
maximum.  Hence the scaled values are marginally less than that of the elapsed
utime.


Limitations:

* RFC patch to communicate just the idea, implementation may need rework
* Works only for 32-bit x86 hardware
* MSRs and APERF/MPERF ratio is calculated at every context switch which is very
  slow
* Hacked cputime_t task_struct->utime to hold 'jiffies * 1000' values just to
  account for fractional jiffies.  Since cputime_t is jiffies in x86, we cannot
  add fractional jiffies at each context switch. Need to convert the scaled
  utime/stime data types and units to micro seconds or nano seconds.


ToDo:

* Compute scaling ratio per package only at each frequency switch
  -- Notify frequency change to all affected CPUs
* Use more accurate time unit for x86 scaled utime and stime  

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

---

Vaidyanathan Srinivasan (3):
      Print scaled utime and stime in getdelays
      Make calls to account_scaled_stats
      General framework for APERF/MPERF access and accounting


 Documentation/accounting/getdelays.c       |   13 ++
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |   21 +++
 arch/x86/kernel/process_32.c               |    8 +
 arch/x86/kernel/time_32.c                  |  171 ++++++++++++++++++++++++++++
 include/linux/hardirq.h                    |    4 +
 kernel/delayacct.c                         |    7 +
 kernel/timer.c                             |    2 
 kernel/tsacct.c                            |   10 +-
 8 files changed, 225 insertions(+), 11 deletions(-)

-- 
        Vaidyanathan Srinivasan,
        Linux Technology Center,
        IBM India Systems and Technology Labs.


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2008-06-03  2:21 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-26 14:31 [RFC PATCH v1 0/3] Scaled statistics using APERF/MPERF in x86 Vaidyanathan Srinivasan
2008-05-26 14:31 ` [RFC PATCH v1 1/3] General framework for APERF/MPERF access and accounting Vaidyanathan Srinivasan
2008-05-26 18:11   ` Balbir Singh
2008-05-27 14:54     ` Vaidyanathan Srinivasan
2008-05-26 14:31 ` [RFC PATCH v1 2/3] Make calls to account_scaled_stats Vaidyanathan Srinivasan
2008-05-26 18:18   ` Balbir Singh
2008-05-27 15:02     ` Vaidyanathan Srinivasan
2008-05-29 15:18   ` Michael Neuling
2008-05-29 18:23     ` Vaidyanathan Srinivasan
2008-05-26 14:31 ` [RFC PATCH v1 3/3] Print scaled utime and stime in getdelays Vaidyanathan Srinivasan
2008-05-26 15:50 ` [RFC PATCH v1 0/3] Scaled statistics using APERF/MPERF in x86 Arjan van de Ven
2008-05-26 17:24   ` Balbir Singh
2008-05-26 18:00     ` Arjan van de Ven
2008-05-26 18:36       ` Balbir Singh
2008-05-26 18:51         ` Arjan van de Ven
2008-05-27 12:59           ` Balbir Singh
2008-05-27 13:19             ` Vaidyanathan Srinivasan
2008-05-27 14:15               ` Arjan van de Ven
2008-05-27 15:27                 ` Vaidyanathan Srinivasan
2008-05-31 21:27             ` Pavel Machek
2008-06-02 17:54               ` Vaidyanathan Srinivasan
2008-06-03  2:20                 ` Arjan van de Ven
2008-05-27 13:29           ` Vaidyanathan Srinivasan
2008-05-27 14:19             ` Arjan van de Ven
2008-05-27 15:20               ` Vaidyanathan Srinivasan
2008-05-27 14:04   ` Vaidyanathan Srinivasan
2008-05-27 16:40     ` Arjan van de Ven
2008-05-27 18:26       ` Vaidyanathan Srinivasan
2008-05-31 21:17     ` Pavel Machek
2008-05-31 21:13   ` Pavel Machek
2008-06-02  6:08     ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox