All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Clark Williams <clark.williams@gmail.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Li Zefan <lizf@cn.fujitsu.com>, Ingo Molnar <mingo@kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Mike Galbraith <efault@gmx.de>,
	Alessio Igor Bogani <abogani@kernel.org>,
	Avi Kivity <avi@redhat.com>, Chris Metcalf <cmetcalf@tilera.com>,
	Christoph Lameter <cl@linux.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Geoff Levand <geoff@infradead.org>,
	Gilad Ben Yossef <gilad@benyossef.com>,
	Hakan Akkan <hakanakkan@gmail.com>, Kevin Hilman <khilman@ti.com>,
	Max Krasnyansky <maxk@qualcomm.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Subject: [PATCH 14/32] nohz/cpuset: Account user and system times in adaptive nohz mode
Date: Mon, 29 Oct 2012 16:27:25 -0400	[thread overview]
Message-ID: <20121029203848.668515857@goodmis.org> (raw)
In-Reply-To: 20121029202711.062749374@goodmis.org

[-- Attachment #1: 0014-nohz-cpuset-Account-user-and-system-times-in-adaptiv.patch --]
[-- Type: text/plain, Size: 8067 bytes --]

From: Frederic Weisbecker <fweisbec@gmail.com>

If we are not running the tick, we are not anymore regularly counting
the user/system cputime at every jiffies.

To solve this, save a snapshot of the jiffies when we stop the tick
and keep track of where we saved it: user or system. On top of this,
we account the cputime elapsed when we cross the kernel entry/exit
boundaries and when we restart the tick.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/tick.h     |   12 +++++
 kernel/sched/core.c      |    1 +
 kernel/time/tick-sched.c |  129 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 140 insertions(+), 2 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 03b6edd..598b492 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -153,11 +153,23 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 # endif /* !NO_HZ */
 
 #ifdef CONFIG_CPUSETS_NO_HZ
+extern void tick_nohz_enter_kernel(void);
+extern void tick_nohz_exit_kernel(void);
+extern void tick_nohz_enter_exception(struct pt_regs *regs);
+extern void tick_nohz_exit_exception(struct pt_regs *regs);
 extern void tick_nohz_check_adaptive(void);
+extern void tick_nohz_pre_schedule(void);
 extern void tick_nohz_post_schedule(void);
+extern bool tick_nohz_account_tick(void);
 #else /* !CPUSETS_NO_HZ */
+static inline void tick_nohz_enter_kernel(void) { }
+static inline void tick_nohz_exit_kernel(void) { }
+static inline void tick_nohz_enter_exception(struct pt_regs *regs) { }
+static inline void tick_nohz_exit_exception(struct pt_regs *regs) { }
 static inline void tick_nohz_check_adaptive(void) { }
+static inline void tick_nohz_pre_schedule(void) { }
 static inline void tick_nohz_post_schedule(void) { }
+static inline bool tick_nohz_account_tick(void) { return false; }
 #endif /* CPUSETS_NO_HZ */
 
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7b35eda..bebea17 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1771,6 +1771,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
 		    struct task_struct *next)
 {
 	trace_sched_switch(prev, next);
+	tick_nohz_pre_schedule();
 	sched_info_switch(prev, next);
 	perf_event_task_sched_out(prev, next);
 	fire_sched_out_preempt_notifiers(prev, next);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b8f3757..de8ba59 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -532,7 +532,13 @@ static bool can_stop_adaptive_tick(void)
 
 static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 {
+	struct pt_regs *regs = get_irq_regs();
 	int cpu = smp_processor_id();
+	int was_stopped;
+	int user = 0;
+
+	if (regs)
+		user = user_mode(regs);
 
 	if (!cpuset_adaptive_nohz() || is_idle_task(current))
 		return;
@@ -543,7 +549,36 @@ static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 	if (!can_stop_adaptive_tick())
 		return;
 
+	/*
+	 * If we stop the tick between the syscall exit hook and the actual
+	 * return to userspace, we'll think we are in system space (due to
+	 * user_mode() thinking so). And since we passed the syscall exit hook
+	 * already we won't realize we are in userspace. So the time spent
+	 * tickless would be spuriously accounted as belonging to system.
+	 *
+	 * To avoid this kind of problem, we only stop the tick from userspace
+	 * (until we find a better solution).
+	 * We can later enter the kernel and keep the tick stopped. But the place
+	 * where we stop the tick must be userspace.
+	 * We make an exception for kernel threads since they always execute in
+	 * kernel space.
+	 */
+	if (!user && current->mm)
+		return;
+
+	was_stopped = ts->tick_stopped;
 	tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+
+	if (!was_stopped && ts->tick_stopped) {
+		WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_NONE);
+		if (user)
+			ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
+		else if (!current->mm)
+			ts->saved_jiffies_whence = JIFFIES_SAVED_SYS;
+
+		ts->saved_jiffies = jiffies;
+		set_thread_flag(TIF_NOHZ);
+	}
 }
 #else
 static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts) { }
@@ -871,6 +906,68 @@ void tick_check_idle(int cpu)
 }
 
 #ifdef CONFIG_CPUSETS_NO_HZ
+void tick_nohz_exit_kernel(void)
+{
+	unsigned long flags;
+	struct tick_sched *ts;
+	unsigned long delta_jiffies;
+
+	if (!test_thread_flag(TIF_NOHZ))
+		return;
+
+	local_irq_save(flags);
+
+	ts = &__get_cpu_var(tick_cpu_sched);
+
+	WARN_ON_ONCE(!ts->tick_stopped);
+	WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_SYS);
+
+	delta_jiffies = jiffies - ts->saved_jiffies;
+	account_system_ticks(current, delta_jiffies);
+
+	ts->saved_jiffies = jiffies;
+	ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
+
+	local_irq_restore(flags);
+}
+
+void tick_nohz_enter_kernel(void)
+{
+	unsigned long flags;
+	struct tick_sched *ts;
+	unsigned long delta_jiffies;
+
+	if (!test_thread_flag(TIF_NOHZ))
+		return;
+
+	local_irq_save(flags);
+
+	ts = &__get_cpu_var(tick_cpu_sched);
+
+	WARN_ON_ONCE(!ts->tick_stopped);
+	WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_USER);
+
+	delta_jiffies = jiffies - ts->saved_jiffies;
+	account_user_ticks(current, delta_jiffies);
+
+	ts->saved_jiffies = jiffies;
+	ts->saved_jiffies_whence = JIFFIES_SAVED_SYS;
+
+	local_irq_restore(flags);
+}
+
+void tick_nohz_enter_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs))
+		tick_nohz_enter_kernel();
+}
+
+void tick_nohz_exit_exception(struct pt_regs *regs)
+{
+	if (user_mode(regs))
+		tick_nohz_exit_kernel();
+}
+
 /*
  * Take the timer duty if nobody is taking care of it.
  * If a CPU already does and and it's in a nohz cpuset,
@@ -889,6 +986,15 @@ static void tick_do_timer_check_handler(int cpu)
 	}
 }
 
+static void tick_nohz_restart_adaptive(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	tick_nohz_account_ticks(ts);
+	tick_nohz_restart_sched_tick();
+	clear_thread_flag(TIF_NOHZ);
+}
+
 void tick_nohz_check_adaptive(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
@@ -896,7 +1002,7 @@ void tick_nohz_check_adaptive(void)
 	if (cpuset_adaptive_nohz()) {
 		if (ts->tick_stopped && !is_idle_task(current)) {
 			if (!can_stop_adaptive_tick())
-				tick_nohz_restart_sched_tick();
+				tick_nohz_restart_adaptive();
 		}
 	}
 }
@@ -909,6 +1015,26 @@ void cpuset_exit_nohz_interrupt(void *unused)
 		tick_nohz_restart_adaptive();
 }
 
+/*
+ * Flush cputime and clear hooks before context switch in case we
+ * haven't yet received the IPI that should take care of that.
+ */
+void tick_nohz_pre_schedule(void)
+{
+	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+	/*
+	 * We are holding the rq lock and if we restart the tick now
+	 * we could deadlock by acquiring the lock twice. Instead
+	 * we do that on post schedule time. For now do the cleanups
+	 * on the prev task.
+	 */
+	if (test_thread_flag(TIF_NOHZ)) {
+		tick_nohz_account_ticks(ts);
+		clear_thread_flag(TIF_NOHZ);
+	}
+}
+
 void tick_nohz_post_schedule(void)
 {
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
@@ -921,7 +1047,6 @@ void tick_nohz_post_schedule(void)
 	if (ts->tick_stopped)
 		tick_nohz_restart_sched_tick();
 }
-
 #else
 
 static void tick_do_timer_check_handler(int cpu)
-- 
1.7.10.4



  parent reply	other threads:[~2012-10-29 20:47 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 20:27 [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Steven Rostedt
2012-10-29 20:27 ` [PATCH 01/32] nohz: Move nohz load balancer selection into idle logic Steven Rostedt
2012-10-30  8:32   ` Charles Wang
2012-10-30 15:39     ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 02/32] cpuset: Set up interface for nohz flag Steven Rostedt
2012-10-30 17:16   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 03/32] nohz: Try not to give the timekeeping duty to an adaptive tickless cpu Steven Rostedt
2012-10-30 17:33   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 04/32] x86: New cpuset nohz irq vector Steven Rostedt
2012-10-30 17:39   ` Steven Rostedt
2012-10-30 23:51     ` Frederic Weisbecker
2012-10-31  0:07       ` Steven Rostedt
2012-10-31  0:45         ` Frederic Weisbecker
2012-10-29 20:27 ` [PATCH 05/32] nohz: Adaptive tick stop and restart on nohz cpuset Steven Rostedt
2012-10-30 18:23   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 06/32] nohz/cpuset: Dont turn off the tick if rcu needs it Steven Rostedt
2012-10-30 18:30   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 07/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued Steven Rostedt
2012-10-29 20:27 ` [PATCH 08/32] nohz/cpuset: Dont stop the tick if posix cpu timers are running Steven Rostedt
2012-10-29 20:27 ` [PATCH 09/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Steven Rostedt
2012-10-30 18:55   ` Steven Rostedt
2012-10-29 20:27 ` [PATCH 10/32] nohz/cpuset: Restart the tick if printk needs it Steven Rostedt
2012-10-30 19:01   ` Steven Rostedt
2012-10-30 23:54     ` Frederic Weisbecker
2012-10-29 20:27 ` [PATCH 11/32] rcu: Restart the tick on non-responding adaptive nohz CPUs Steven Rostedt
2012-10-29 20:27 ` [PATCH 12/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU Steven Rostedt
2012-10-29 20:27 ` [PATCH 13/32] nohz: Generalize tickless cpu time accounting Steven Rostedt
2012-10-29 20:27 ` Steven Rostedt [this message]
2012-10-29 20:27 ` [PATCH 15/32] nohz/cpuset: New API to flush cputimes on nohz cpusets Steven Rostedt
2012-10-29 20:27 ` [PATCH 16/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader Steven Rostedt
2012-10-29 20:27 ` [PATCH 17/32] nohz/cpuset: Flush cputimes on procfs stat file read Steven Rostedt
2012-10-29 20:27 ` [PATCH 18/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls Steven Rostedt
2012-10-29 20:27 ` [PATCH 19/32] x86: Syscall hooks for nohz cpusets Steven Rostedt
2012-10-29 20:27 ` [PATCH 20/32] nohz/cpuset: enable addition&removal of cpus while in adaptive nohz mode Steven Rostedt
2012-10-29 20:27 ` [PATCH 21/32] nohz: Dont restart the tick before scheduling to idle Steven Rostedt
2012-10-29 20:27 ` [PATCH 22/32] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz Steven Rostedt
2012-10-29 20:27 ` [PATCH 23/32] sched: Update rq clock on nohz CPU before migrating tasks Steven Rostedt
2012-10-29 20:27 ` [PATCH 24/32] sched: Update rq clock on nohz CPU before setting fair group shares Steven Rostedt
2012-10-29 20:27 ` [PATCH 25/32] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Steven Rostedt
2012-10-29 20:27 ` [PATCH 26/32] sched: Update rq clock earlier in unthrottle_cfs_rq Steven Rostedt
2012-10-29 20:27 ` [PATCH 27/32] sched: Update clock of nohz busiest rq before balancing Steven Rostedt
2012-10-29 20:27 ` [PATCH 28/32] sched: Update rq clock before idle balancing Steven Rostedt
2012-10-29 20:27 ` [PATCH 29/32] sched: Update nohz rq clock before searching busiest group on load balancing Steven Rostedt
2012-10-29 20:27 ` [PATCH 30/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset Steven Rostedt
2012-10-29 20:27 ` [PATCH 31/32] nohz/cpuset: Disable under some configs Steven Rostedt
2012-10-29 20:27 ` [PATCH 32/32] nohz, not for merge: Add tickless tracing Steven Rostedt
2012-10-30 14:02 ` [PATCH 00/32] [RFC] nohz/cpuset: Start discussions on nohz CPUs Gilad Ben-Yossef
2012-11-02 14:23 ` Christoph Lameter
2012-11-02 14:37   ` Steven Rostedt
2012-11-02 14:50     ` David Nyström
2012-11-02 15:03     ` Christoph Lameter
2012-11-02 15:14       ` Steven Rostedt
2012-11-02 18:35       ` Paul E. McKenney
2012-11-02 20:16         ` Christoph Lameter
2012-11-02 20:41           ` Paul E. McKenney
2012-11-02 20:51             ` Steven Rostedt
2012-11-03  2:08               ` Paul E. McKenney
2012-11-05 15:17                 ` Christoph Lameter
2012-11-05 22:41                   ` Frederic Weisbecker
2012-11-05 22:32   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121029203848.668515857@goodmis.org \
    --to=rostedt@goodmis.org \
    --cc=abogani@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=cl@linux.com \
    --cc=clark.williams@gmail.com \
    --cc=cmetcalf@tilera.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=efault@gmx.de \
    --cc=fweisbec@gmail.com \
    --cc=geoff@infradead.org \
    --cc=gilad@benyossef.com \
    --cc=hakanakkan@gmail.com \
    --cc=khilman@ti.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=maxk@qualcomm.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=shemminger@vyatta.com \
    --cc=tglx@linutronix.de \
    --cc=thebigcorporation@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.