public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: dino@in.ibm.com
To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	John Stultz <johnstul@us.ibm.com>,
	Darren Hart <dvhltc@us.ibm.com>, John Kacur <jkacur@redhat.com>
Subject: [patch -rt 12/17] x86: sched: provide arch implementations using aperf/mperf
Date: Thu, 22 Oct 2009 18:07:55 +0530	[thread overview]
Message-ID: <20091022124112.184099152@spinlock.in.ibm.com> (raw)
In-Reply-To: 20091022123743.506956796@spinlock.in.ibm.com

[-- Attachment #1: sched-lb-11.patch --]
[-- Type: text/plain, Size: 3637 bytes --]

APERF/MPERF support for cpu_power.

APERF/MPERF is arch defined to be a relative scale of work capacity
per logical cpu, this is assumed to include SMT and Turbo mode.

APERF/MPERF are specified to both reset to 0 when either counter
wraps, which is highly inconvenient, since that'll give a blimp when
that happens. The manual specifies writing 0 to the counters after
each read, but that's 1) too expensive, and 2) destroys the
possibility of sharing these counters with other users, so we live
with the blimp - the other existing user does too.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
---
 arch/x86/kernel/cpu/Makefile |    2 -
 arch/x86/kernel/cpu/sched.c  |   58 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/sched.h        |    4 ++
 3 files changed, 63 insertions(+), 1 deletion(-)

Index: linux-2.6.31.4-rt14-lb1/arch/x86/kernel/cpu/Makefile
===================================================================
--- linux-2.6.31.4-rt14-lb1.orig/arch/x86/kernel/cpu/Makefile	2009-10-21 10:47:15.000000000 -0400
+++ linux-2.6.31.4-rt14-lb1/arch/x86/kernel/cpu/Makefile	2009-10-21 10:49:00.000000000 -0400
@@ -13,7 +13,7 @@
 
 obj-y			:= intel_cacheinfo.o addon_cpuid_features.o
 obj-y			+= proc.o capflags.o powerflags.o common.o
-obj-y			+= vmware.o hypervisor.o
+obj-y			+= vmware.o hypervisor.o sched.o
 
 obj-$(CONFIG_X86_32)	+= bugs.o cmpxchg.o
 obj-$(CONFIG_X86_64)	+= bugs_64.o
Index: linux-2.6.31.4-rt14-lb1/arch/x86/kernel/cpu/sched.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.31.4-rt14-lb1/arch/x86/kernel/cpu/sched.c	2009-10-21 10:49:00.000000000 -0400
@@ -0,0 +1,58 @@
+#include <linux/sched.h>
+#include <linux/math64.h>
+#include <linux/percpu.h>
+#include <linux/irqflags.h>
+
+#include <asm/cpufeature.h>
+#include <asm/processor.h>
+
+static DEFINE_PER_CPU(struct aperfmperf, old_aperfmperf);
+
+static unsigned long scale_aperfmperf(void)
+{
+	struct aperfmperf cur, val, *old = &__get_cpu_var(old_aperfmperf);
+	unsigned long ratio = SCHED_LOAD_SCALE;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	get_aperfmperf(&val);
+	local_irq_restore(flags);
+
+	cur = val;
+	cur.aperf -= old->aperf;
+	cur.mperf -= old->mperf;
+	*old = val;
+
+	cur.mperf >>= SCHED_LOAD_SHIFT;
+	if (cur.mperf)
+		ratio = div_u64(cur.aperf, cur.mperf);
+
+	return ratio;
+}
+
+unsigned long arch_scale_freq_power(struct sched_domain *sd, int cpu)
+{
+	/*
+	 * do aperf/mperf on the cpu level because it includes things
+	 * like turbo mode, which are relevant to full cores.
+	 */
+	if (boot_cpu_has(X86_FEATURE_APERFMPERF))
+		return scale_aperfmperf();
+
+	/*
+	 * maybe have something cpufreq here
+	 */
+
+	return default_scale_freq_power(sd, cpu);
+}
+
+unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu)
+{
+	/*
+	 * aperf/mperf already includes the smt gain
+	 */
+	if (boot_cpu_has(X86_FEATURE_APERFMPERF))
+		return SCHED_LOAD_SCALE;
+
+	return default_scale_smt_power(sd, cpu);
+}
Index: linux-2.6.31.4-rt14-lb1/include/linux/sched.h
===================================================================
--- linux-2.6.31.4-rt14-lb1.orig/include/linux/sched.h	2009-10-21 10:47:15.000000000 -0400
+++ linux-2.6.31.4-rt14-lb1/include/linux/sched.h	2009-10-21 10:49:00.000000000 -0400
@@ -1047,6 +1047,10 @@
 }
 #endif	/* !CONFIG_SMP */
 
+
+unsigned long default_scale_freq_power(struct sched_domain *sd, int cpu);
+unsigned long default_scale_smt_power(struct sched_domain *sd, int cpu);
+
 struct io_context;			/* See blkdev.h */
 
 

--

  parent reply	other threads:[~2009-10-22 12:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-22 12:37 [patch -rt 00/17] [patch -rt] Sched load balance backport dino
2009-10-22 12:37 ` [patch -rt 01/17] sched: restore __cpu_power to a straight sum of power dino
2009-10-22 12:37 ` [patch -rt 02/17] sched: SD_PREFER_SIBLING dino
2009-10-22 12:37 ` [patch -rt 03/17] sched: update the cpu_power sum during load-balance dino
2009-10-22 12:37 ` [patch -rt 04/17] sched: add smt_gain dino
2009-10-22 12:37 ` [patch -rt 05/17] sched: dynamic cpu_power dino
2009-10-22 12:37 ` [patch -rt 06/17] sched: scale down cpu_power due to RT tasks dino
2009-10-22 12:37 ` [patch -rt 07/17] sched: try to deal with low capacity dino
2009-10-22 12:37 ` [patch -rt 08/17] sched: remove reciprocal for cpu_power dino
2009-10-22 12:37 ` [patch -rt 09/17] x86: move APERF/MPERF into a X86_FEATURE dino
2009-10-22 12:37 ` [patch -rt 10/17] x86: Add generic aperf/mperf code dino
2009-10-22 12:37 ` [patch -rt 11/17] Provide an arch specific hook for cpufreq based scaling of cpu_power dino
2009-10-22 12:37 ` dino [this message]
2009-10-22 12:37 ` [patch -rt 13/17] sched: cleanup wake_idle power saving dino
2009-10-22 12:37 ` [patch -rt 14/17] sched: cleanup wake_idle dino
2009-10-22 12:37 ` [patch -rt 15/17] sched: Add a missing = dino
2009-10-22 12:37 ` [patch -rt 16/17] sched: Deal with low-load in wake_affine() dino
2009-10-22 12:38 ` [patch -rt 17/17] sched: Fix dynamic power-balancing crash dino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091022124112.184099152@spinlock.in.ibm.com \
    --to=dino@in.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=dvhltc@us.ibm.com \
    --cc=jkacur@redhat.com \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox