All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Clark Williams <clark.williams@gmail.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Li Zefan <lizf@cn.fujitsu.com>, Ingo Molnar <mingo@kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Mike Galbraith <efault@gmx.de>,
	Alessio Igor Bogani <abogani@kernel.org>,
	Avi Kivity <avi@redhat.com>, Chris Metcalf <cmetcalf@tilera.com>,
	Christoph Lameter <cl@linux.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Geoff Levand <geoff@infradead.org>,
	Gilad Ben Yossef <gilad@benyossef.com>,
	Hakan Akkan <hakanakkan@gmail.com>, Kevin Hilman <khilman@ti.com>,
	Max Krasnyansky <maxk@qualcomm.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Subject: [PATCH 05/32] nohz: Adaptive tick stop and restart on nohz cpuset
Date: Mon, 29 Oct 2012 16:27:16 -0400	[thread overview]
Message-ID: <20121029203847.391136599@goodmis.org> (raw)
In-Reply-To: 20121029202711.062749374@goodmis.org

[-- Attachment #1: 0005-nohz-Adaptive-tick-stop-and-restart-on-nohz-cpuset.patch --]
[-- Type: text/plain, Size: 11057 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

When a CPU is included in a nohz cpuset, try to switch
it to nohz mode from the interrupt exit path if it is running
a single non-idle task.

Then restart the tick if necessary if we are enqueuing a
second task while the timer is stopped, so that the scheduler
tick is rearmed.

[TODO: Handle the many things done from scheduler_tick()]

[ Included build fix from Geoff Levand ]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/smp.c    |    2 ++
 include/linux/sched.h    |    6 ++++
 include/linux/tick.h     |   11 +++++-
 init/Kconfig             |    2 +-
 kernel/sched/core.c      |   24 +++++++++++++
 kernel/sched/sched.h     |   12 +++++++
 kernel/softirq.c         |    6 ++--
 kernel/time/tick-sched.c |   86 +++++++++++++++++++++++++++++++++++++++++-----
 8 files changed, 137 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 4c0b7d2..0bad72d 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -23,6 +23,7 @@
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
 #include <linux/gfp.h>
+#include <linux/tick.h>
 
 #include <asm/mtrr.h>
 #include <asm/tlbflush.h>
@@ -275,6 +276,7 @@ void smp_cpuset_update_nohz_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
 	irq_enter();
+	tick_nohz_check_adaptive();
 	inc_irq_stat(irq_call_count);
 	irq_exit();
 }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0dd42a0..749752e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2753,6 +2753,12 @@ static inline void inc_syscw(struct task_struct *tsk)
 #define TASK_SIZE_OF(tsk)	TASK_SIZE
 #endif
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+extern bool sched_can_stop_tick(void);
+#else
+static inline bool sched_can_stop_tick(void) { return false; }
+#endif
+
 #ifdef CONFIG_MM_OWNER
 extern void mm_update_next_owner(struct mm_struct *mm);
 extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p);
diff --git a/include/linux/tick.h b/include/linux/tick.h
index f37fceb..9b66fd3 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -124,11 +124,12 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
 # ifdef CONFIG_NO_HZ
 extern void tick_nohz_idle_enter(void);
 extern void tick_nohz_idle_exit(void);
+extern void tick_nohz_restart_sched_tick(void);
 extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
-# else
+# else /* !NO_HZ */
 static inline void tick_nohz_idle_enter(void) { }
 static inline void tick_nohz_idle_exit(void) { }
 
@@ -142,4 +143,12 @@ static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
 static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 # endif /* !NO_HZ */
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+extern void tick_nohz_check_adaptive(void);
+extern void tick_nohz_post_schedule(void);
+#else /* !CPUSETS_NO_HZ */
+static inline void tick_nohz_check_adaptive(void) { }
+static inline void tick_nohz_post_schedule(void) { }
+#endif /* CPUSETS_NO_HZ */
+
 #endif
diff --git a/init/Kconfig b/init/Kconfig
index ffdeeab..418e078 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -751,7 +751,7 @@ config PROC_PID_CPUSET
 
 config CPUSETS_NO_HZ
        bool "Tickless cpusets"
-       depends on CPUSETS && HAVE_CPUSETS_NO_HZ
+       depends on CPUSETS && HAVE_CPUSETS_NO_HZ && NO_HZ && HIGH_RES_TIMERS
        help
          This options let you apply a nohz property to a cpuset such
 	 that the periodic timer tick tries to be avoided when possible on
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..2716b79 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1196,6 +1196,29 @@ static void update_avg(u64 *avg, u64 sample)
 }
 #endif
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+bool sched_can_stop_tick(void)
+{
+	struct rq *rq;
+
+	rq = this_rq();
+
+	/*
+	 * This is called right after cpuset_adaptive_nohz() that
+	 * uses atomic_add_return() so that we are ordered against
+	 * cpu_adaptive_nohz_ref. When inc_nr_running() sends an
+	 * IPI to this CPU, we are guaranteed to see the update on
+	 * nr_running.
+	 */
+
+	/* More than one running task need preemption */
+	if (rq->nr_running > 1)
+		return false;
+
+	return true;
+}
+#endif
+
 static void
 ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
 {
@@ -1897,6 +1920,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 * frame will be invalid.
 	 */
 	finish_task_switch(this_rq(), prev);
+	tick_nohz_post_schedule();
 }
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7a7db09..c6cd9ec 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1,6 +1,7 @@
 
 #include <linux/sched.h>
 #include <linux/mutex.h>
+#include <linux/cpuset.h>
 #include <linux/spinlock.h>
 #include <linux/stop_machine.h>
 
@@ -927,6 +928,17 @@ static inline u64 steal_ticks(u64 steal)
 static inline void inc_nr_running(struct rq *rq)
 {
 	rq->nr_running++;
+
+	if (rq->nr_running == 2) {
+		/*
+		 * cpuset_cpu_adaptive_nohz() uses atomic_add_return()
+		 * to order against rq->nr_running updates. This way
+		 * the CPU that receives the IPI is guaranteed to see
+		 * the update on nr_running without the rq->lock.
+		 */
+		if (cpuset_cpu_adaptive_nohz(rq->cpu))
+			smp_cpuset_update_nohz(rq->cpu);
+	}
 }
 
 static inline void dec_nr_running(struct rq *rq)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index cc96bdc..e06b8eb 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -25,6 +25,7 @@
 #include <linux/smp.h>
 #include <linux/smpboot.h>
 #include <linux/tick.h>
+#include <linux/cpuset.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/irq.h>
@@ -307,7 +308,8 @@ void irq_enter(void)
 	int cpu = smp_processor_id();
 
 	rcu_irq_enter();
-	if (is_idle_task(current) && !in_interrupt()) {
+
+	if ((is_idle_task(current) || cpuset_adaptive_nohz()) && !in_interrupt()) {
 		/*
 		 * Prevent raise_softirq from needlessly waking up ksoftirqd
 		 * here, as softirq will be serviced on return from interrupt.
@@ -349,7 +351,7 @@ void irq_exit(void)
 
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
-	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
+	if (!in_interrupt())
 		tick_nohz_irq_exit();
 #endif
 	rcu_irq_exit();
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c7a78c6..35047b2 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -512,6 +512,24 @@ void tick_nohz_idle_enter(void)
 	local_irq_enable();
 }
 
+static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
+{
+#ifdef CONFIG_CPUSETS_NO_HZ
+	int cpu = smp_processor_id();
+
+	if (!cpuset_adaptive_nohz() || is_idle_task(current))
+		return;
+
+	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
+		return;
+
+	if (!sched_can_stop_tick())
+		return;
+
+	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+#endif
+}
+
 /**
  * tick_nohz_irq_exit - update next tick event from interrupt exit
  *
@@ -524,10 +542,12 @@ void tick_nohz_irq_exit(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
-	if (!ts->inidle)
-		return;
-
-	__tick_nohz_idle_enter(ts);
+	if (ts->inidle) {
+		if (!need_resched())
+			__tick_nohz_idle_enter(ts);
+	} else {
+		tick_nohz_cpuset_stop_tick(ts);
+	}
 }
 
 /**
@@ -568,7 +588,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 	}
 }
 
-static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
+static void __tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
@@ -584,6 +604,31 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	tick_nohz_restart(ts, now);
 }
 
+/**
+ * tick_nohz_restart_sched_tick - restart the tick for a tickless CPU
+ *
+ * Restart the tick when the CPU is in adaptive tickless mode.
+ */
+void tick_nohz_restart_sched_tick(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+	unsigned long flags;
+	ktime_t now;
+
+	local_irq_save(flags);
+
+	if (!ts->tick_stopped) {
+		local_irq_restore(flags);
+		return;
+	}
+
+	now = ktime_get();
+	__tick_nohz_restart_sched_tick(ts, now);
+
+	local_irq_restore(flags);
+}
+
+
 static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
 {
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
@@ -630,7 +675,7 @@ void tick_nohz_idle_exit(void)
 	if (ts->tick_stopped) {
 		nohz_balance_enter_idle(cpu);
 		calc_load_exit_idle();
-		tick_nohz_restart_sched_tick(ts, now);
+		__tick_nohz_restart_sched_tick(ts, now);
 		tick_nohz_account_idle_ticks(ts);
 	}
 
@@ -791,7 +836,6 @@ void tick_check_idle(int cpu)
 }
 
 #ifdef CONFIG_CPUSETS_NO_HZ
-
 /*
  * Take the timer duty if nobody is taking care of it.
  * If a CPU already does and and it's in a nohz cpuset,
@@ -810,6 +854,31 @@ static void tick_do_timer_check_handler(int cpu)
 	}
 }
 
+void tick_nohz_check_adaptive(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	if (cpuset_adaptive_nohz()) {
+		if (ts->tick_stopped && !is_idle_task(current)) {
+			if (!sched_can_stop_tick())
+				tick_nohz_restart_sched_tick();
+		}
+	}
+}
+
+void tick_nohz_post_schedule(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	/*
+	 * No need to disable irqs here. The worst that can happen
+	 * is an irq that comes and restart the tick before us.
+	 * tick_nohz_restart_sched_tick() is irq safe.
+	 */
+	if (ts->tick_stopped)
+		tick_nohz_restart_sched_tick();
+}
+
 #else
 
 static void tick_do_timer_check_handler(int cpu)
@@ -856,6 +925,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 	 * no valid regs pointer
 	 */
 	if (regs) {
+		int user = user_mode(regs);
 		/*
 		 * When we are idle and the tick is stopped, we have to touch
 		 * the watchdog as we might not schedule for a really long
@@ -869,7 +939,7 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 			if (is_idle_task(current))
 				ts->idle_jiffies++;
 		}
-		update_process_times(user_mode(regs));
+		update_process_times(user);
 		profile_tick(CPU_PROFILING);
 	}
 
-- 
1.7.10.4



  parent reply	other threads:[~2012-10-29 20:50 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
2012-10-29 20:27 ` [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic Steven Rostedt
2012-10-30  8:32   ` Charles Wang
2012-10-30 15:39     ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 02/32] cpuset: Set up interface for nohz flag Steven Rostedt
2012-10-30 17:16   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 03/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu Steven Rostedt
2012-10-30 17:33   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 04/32] x86: New cpuset nohz irq vector Steven Rostedt
2012-10-30 17:39   ` Steven Rostedt
2012-10-30 23:51     ` Frederic Weisbecker
2012-10-31  0:07       ` Steven Rostedt
2012-10-31  0:45         ` Frederic Weisbecker
2012-10-29 20:27 ` Steven Rostedt [this message]
2012-10-30 18:23   ` [PATCH 05/32] nohz: Adaptive tick stop and restart on nohz cpuset Steven Rostedt
2012-10-29 20:27 ` [PATCH 06/32] nohz/cpuset: Dont turn off the tick if rcu needs it Steven Rostedt
2012-10-30 18:30   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 07/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued Steven Rostedt
2012-10-29 20:27 ` [PATCH 08/32] nohz/cpuset: Dont stop the tick if posix cpu timers are running Steven Rostedt
2012-10-29 20:27 ` [PATCH 09/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Steven Rostedt
2012-10-30 18:55   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 10/32] nohz/cpuset: Restart the tick if printk needs it Steven Rostedt
2012-10-30 19:01   ` Steven Rostedt
2012-10-30 23:54     ` Frederic Weisbecker
2012-10-29 20:27 ` [PATCH 11/32] rcu: Restart the tick on non-responding adaptive nohz CPUs Steven Rostedt
2012-10-29 20:27 ` [PATCH 12/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU Steven Rostedt
2012-10-29 20:27 ` [PATCH 13/32] nohz: Generalize tickless cpu time accounting Steven Rostedt
2012-10-29 20:27 ` [PATCH 14/32] nohz/cpuset: Account user and system times in adaptive nohz mode Steven Rostedt
2012-10-29 20:27 ` [PATCH 15/32] nohz/cpuset: New API to flush cputimes on nohz cpusets Steven Rostedt
2012-10-29 20:27 ` [PATCH 16/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader Steven Rostedt
2012-10-29 20:27 ` [PATCH 17/32] nohz/cpuset: Flush cputimes on procfs stat file read Steven Rostedt
2012-10-29 20:27 ` [PATCH 18/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls Steven Rostedt
2012-10-29 20:27 ` [PATCH 19/32] x86: Syscall hooks for nohz cpusets Steven Rostedt
2012-10-29 20:27 ` [PATCH 20/32] nohz/cpuset: enable addition&removal of cpus while in adaptive nohz mode Steven Rostedt
2012-10-29 20:27 ` [PATCH 21/32] nohz: Dont restart the tick before scheduling to idle Steven Rostedt
2012-10-29 20:27 ` [PATCH 22/32] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz Steven Rostedt
2012-10-29 20:27 ` [PATCH 23/32] sched: Update rq clock on nohz CPU before migrating tasks Steven Rostedt
2012-10-29 20:27 ` [PATCH 24/32] sched: Update rq clock on nohz CPU before setting fair group shares Steven Rostedt
2012-10-29 20:27 ` [PATCH 25/32] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Steven Rostedt
2012-10-29 20:27 ` [PATCH 26/32] sched: Update rq clock earlier in unthrottle_cfs_rq Steven Rostedt
2012-10-29 20:27 ` [PATCH 27/32] sched: Update clock of nohz busiest rq before balancing Steven Rostedt
2012-10-29 20:27 ` [PATCH 28/32] sched: Update rq clock before idle balancing Steven Rostedt
2012-10-29 20:27 ` [PATCH 29/32] sched: Update nohz rq clock before searching busiest group on load balancing Steven Rostedt
2012-10-29 20:27 ` [PATCH 30/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset Steven Rostedt
2012-10-29 20:27 ` [PATCH 31/32] nohz/cpuset: Disable under some configs Steven Rostedt
2012-10-29 20:27 ` [PATCH 32/32] nohz, not for merge: Add tickless tracing Steven Rostedt
2012-10-30 14:02 ` [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Gilad Ben-Yossef
2012-11-02 14:23 ` Christoph Lameter
2012-11-02 14:37   ` Steven Rostedt
2012-11-02 14:50     ` David Nyström
2012-11-02 15:03     ` Christoph Lameter
2012-11-02 15:14       ` Steven Rostedt
2012-11-02 18:35       ` Paul E. McKenney
2012-11-02 20:16         ` Christoph Lameter
2012-11-02 20:41           ` Paul E. McKenney
2012-11-02 20:51             ` Steven Rostedt
2012-11-03  2:08               ` Paul E. McKenney
2012-11-05 15:17                 ` Christoph Lameter
2012-11-05 22:41                   ` Frederic Weisbecker
2012-11-05 22:32   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121029203847.391136599@goodmis.org \
    --to=rostedt@goodmis.org \
    --cc=abogani@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=cl@linux.com \
    --cc=clark.williams@gmail.com \
    --cc=cmetcalf@tilera.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=efault@gmx.de \
    --cc=fweisbec@gmail.com \
    --cc=geoff@infradead.org \
    --cc=gilad@benyossef.com \
    --cc=hakanakkan@gmail.com \
    --cc=khilman@ti.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=maxk@qualcomm.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=shemminger@vyatta.com \
    --cc=tglx@linutronix.de \
    --cc=thebigcorporation@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.