From: Frederic Weisbecker <fweisbec@gmail.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Anton Blanchard <anton@au1.ibm.com>, Avi Kivity <avi@redhat.com>,
Ingo Molnar <mingo@elte.hu>, Lai Jiangshan <laijs@cn.fujitsu.com>,
"Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,
Paul Menage <menage@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Stephen Hemminger <shemminger@vyatta.com>,
Thomas Gleixner <tglx@linutronix.de>,
Tim Pepper <lnxninja@linux.vnet.ibm.com>
Subject: [PATCH 24/32] nohz/cpuset: Handle kernel entry/exit to account cputime
Date: Mon, 15 Aug 2011 17:52:21 +0200 [thread overview]
Message-ID: <1313423549-27093-25-git-send-email-fweisbec@gmail.com> (raw)
In-Reply-To: <1313423549-27093-1-git-send-email-fweisbec@gmail.com>
Provide a few APIs that archs can call to tell they are entering
or exiting the kernel so that when we are in nohz adaptive mode
we know precisely where we need to account the cputime.
The new APIs are:
- tick_nohz_enter_kernel() (called when we enter a syscall)
- tick_nohz_exit_kernel() (called when we exit a syscall)
- tick_nohz_enter_exception() (called when we enter any
exception, trap, faults...but not irqs)
- tick_nohz_exit_exception() (called when we exit any exception)
Hooks into syscalls are typically driven by the TIF_NOHZ thread
flag.
In addition, we use the value returned by user_mode(regs) from
the timer interrupt to know where we are.
Nonetheless, we can rely on user_mode(regs) != 0 to know
we are in userspace, but we can't rely on user_mode(regs) == 0
to know we are in the system.
Consider the following scenario: we stop the tick after syscall
return, so we set TIF_NOHZ but the syscall exit hook is behind us.
If we haven't yet returned to userspace, then we have
user_mode(regs) == 0. If on top of that we consider we are in
system mode, and later we issue a syscall but restart the tick
right before reaching the syscall entry hook, then we have no clue
that the whole elapsed cputime was not in the system but in the
userspace.
The only way to fix this is to only start entering nohz mode once
we know we are in userspace a first time, like when we reach the
kernel exit hook or when a timer tick with user_mode(regs) == 1
fires. Kernel threads don't have this worry.
This sucks but for now I have no better solution. Let's hope we
can find better.
TODO: wrap operation on jiffies?
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com>
---
include/linux/tick.h | 8 +++
kernel/sched.c | 1 +
kernel/time/tick-sched.c | 114 ++++++++++++++++++++++++++++++++++++++++------
3 files changed, 109 insertions(+), 14 deletions(-)
diff --git a/include/linux/tick.h b/include/linux/tick.h
index ea6dfb7..3ad649f 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -139,10 +139,18 @@ extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
#ifdef CONFIG_CPUSETS_NO_HZ
DECLARE_PER_CPU(int, task_nohz_mode);
+extern void tick_nohz_enter_kernel(void);
+extern void tick_nohz_exit_kernel(void);
+extern void tick_nohz_enter_exception(struct pt_regs *regs);
+extern void tick_nohz_exit_exception(struct pt_regs *regs);
extern int tick_nohz_adaptive_mode(void);
extern bool tick_nohz_account_tick(void);
extern void tick_nohz_flush_current_times(void);
#else /* !CPUSETS_NO_HZ */
+static inline void tick_nohz_enter_kernel(void) { }
+static inline void tick_nohz_exit_kernel(void) { }
+static inline void tick_nohz_enter_exception(struct pt_regs *regs) { }
+static inline void tick_nohz_exit_exception(struct pt_regs *regs) { }
static inline int tick_nohz_adaptive_mode(void) { return 0; }
static inline bool tick_nohz_account_tick(void) { return false; }
#endif /* CPUSETS_NO_HZ */
diff --git a/kernel/sched.c b/kernel/sched.c
index a58f993..c49c1b1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2503,6 +2503,7 @@ static void cpuset_nohz_restart_tick(void)
tick_nohz_flush_current_times();
__get_cpu_var(task_nohz_mode) = 0;
tick_nohz_restart_sched_tick();
+ clear_thread_flag(TIF_NOHZ);
}
void cpuset_update_nohz(void)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c3a8f26..d8f01b8 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -595,8 +595,9 @@ void tick_nohz_irq_exit(void)
if (ts->inidle && !need_resched())
__tick_nohz_enter_idle(ts, cpu);
else if (tick_nohz_adaptive_mode() && !idle_cpu(cpu)) {
- if (tick_nohz_can_stop_tick(cpu, ts))
- tick_nohz_stop_sched_tick(ktime_get(), cpu, ts);
+ if (ts->saved_jiffies_whence != JIFFIES_SAVED_NONE
+ && tick_nohz_can_stop_tick(cpu, ts))
+ tick_nohz_stop_sched_tick(ktime_get(), cpu, ts);
}
}
@@ -757,6 +758,74 @@ void tick_check_idle(int cpu)
#ifdef CONFIG_CPUSETS_NO_HZ
+void tick_nohz_exit_kernel(void)
+{
+ unsigned long flags;
+ struct tick_sched *ts;
+ unsigned long delta_jiffies;
+
+ local_irq_save(flags);
+
+ if (!tick_nohz_adaptive_mode()) {
+ local_irq_restore(flags);
+ return;
+ }
+
+ ts = &__get_cpu_var(tick_cpu_sched);
+
+ WARN_ON_ONCE(ts->saved_jiffies_whence == JIFFIES_SAVED_USER);
+
+ if (ts->saved_jiffies_whence == JIFFIES_SAVED_SYS) {
+ delta_jiffies = jiffies - ts->saved_jiffies;
+ account_system_jiffies(current, delta_jiffies);
+ }
+
+ ts->saved_jiffies = jiffies;
+ ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
+
+ local_irq_restore(flags);
+}
+
+void tick_nohz_enter_kernel(void)
+{
+ unsigned long flags;
+ struct tick_sched *ts;
+ unsigned long delta_jiffies;
+
+ local_irq_save(flags);
+
+ if (!tick_nohz_adaptive_mode()) {
+ local_irq_restore(flags);
+ return;
+ }
+
+ ts = &__get_cpu_var(tick_cpu_sched);
+
+ WARN_ON_ONCE(ts->saved_jiffies_whence == JIFFIES_SAVED_SYS);
+
+ if (ts->saved_jiffies_whence == JIFFIES_SAVED_USER) {
+ delta_jiffies = jiffies - ts->saved_jiffies;
+ account_user_jiffies(current, delta_jiffies);
+ }
+
+ ts->saved_jiffies = jiffies;
+ ts->saved_jiffies_whence = JIFFIES_SAVED_SYS;
+
+ local_irq_restore(flags);
+}
+
+void tick_nohz_enter_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs))
+ tick_nohz_enter_kernel();
+}
+
+void tick_nohz_exit_exception(struct pt_regs *regs)
+{
+ if (user_mode(regs))
+ tick_nohz_exit_kernel();
+}
+
int tick_nohz_adaptive_mode(void)
{
return __get_cpu_var(task_nohz_mode);
@@ -766,20 +835,33 @@ static void tick_nohz_cpuset_stop_tick(int user)
{
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
- if (!cpuset_adaptive_nohz() || tick_nohz_adaptive_mode())
+ if (!cpuset_adaptive_nohz())
return;
+ if (tick_nohz_adaptive_mode()) {
+ if (user && ts->saved_jiffies_whence == JIFFIES_SAVED_NONE) {
+ ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
+ ts->saved_jiffies = jiffies;
+ }
+
+ return;
+ }
+
if (cpuset_nohz_can_stop_tick()) {
__get_cpu_var(task_nohz_mode) = 1;
/* Nohz mode must be visible to wake_up_nohz_cpu() */
smp_wmb();
+ set_thread_flag(TIF_NOHZ);
WARN_ON_ONCE(ts->saved_jiffies_whence != JIFFIES_SAVED_NONE);
- ts->saved_jiffies = jiffies;
- if (user)
+
+ if (user) {
ts->saved_jiffies_whence = JIFFIES_SAVED_USER;
- else
+ ts->saved_jiffies = jiffies;
+ } else if (!current->mm) {
ts->saved_jiffies_whence = JIFFIES_SAVED_SYS;
+ ts->saved_jiffies = jiffies;
+ }
}
}
@@ -803,7 +885,7 @@ static void tick_do_timer_check_handler(int cpu)
bool tick_nohz_account_tick(void)
{
- struct tick_sched *ts;
+ struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
unsigned long delta_jiffies;
if (!tick_nohz_adaptive_mode())
@@ -811,11 +893,15 @@ bool tick_nohz_account_tick(void)
ts = &__get_cpu_var(tick_cpu_sched);
+ if (ts->saved_jiffies_whence == JIFFIES_SAVED_NONE)
+ return false;
+
delta_jiffies = jiffies - ts->saved_jiffies;
- if (ts->saved_jiffies_whence == JIFFIES_SAVED_SYS)
- account_system_jiffies(current, delta_jiffies);
- else
+
+ if (ts->saved_jiffies_whence == JIFFIES_SAVED_USER)
account_user_jiffies(current, delta_jiffies);
+ else
+ account_system_jiffies(current, delta_jiffies);
ts->saved_jiffies = jiffies;
@@ -825,12 +911,12 @@ bool tick_nohz_account_tick(void)
void tick_nohz_flush_current_times(void)
{
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+ unsigned long delta_jiffies;
+ struct pt_regs *regs;
- tick_nohz_account_tick();
-
- ts->saved_jiffies_whence = JIFFIES_SAVED_NONE;
+ if (tick_nohz_account_tick())
+ ts->saved_jiffies_whence = JIFFIES_SAVED_NONE;
}
-
#else
static void tick_nohz_cpuset_stop_tick(int user) { }
--
1.7.5.4
next prev parent reply other threads:[~2011-08-15 15:56 UTC|newest]
Thread overview: 139+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-15 15:51 [RFC PATCH 00/32] Nohz cpusets (was: Nohz Tasks) Frederic Weisbecker
2011-08-15 15:51 ` [PATCH 01/32 RESEND] nohz: Drop useless call in tick_nohz_start_idle() Frederic Weisbecker
2011-08-29 14:23 ` Peter Zijlstra
2011-08-29 17:10 ` Frederic Weisbecker
2011-08-15 15:51 ` [PATCH 02/32 RESEND] nohz: Drop ts->idle_active Frederic Weisbecker
2011-08-29 14:23 ` Peter Zijlstra
2011-08-29 16:15 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 03/32 RESEND] nohz: Drop useless ts->inidle check before rearming the tick Frederic Weisbecker
2011-08-29 14:23 ` Peter Zijlstra
2011-08-29 16:58 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 04/32] nohz: Separate idle sleeping time accounting from nohz switching Frederic Weisbecker
2011-08-29 14:23 ` Peter Zijlstra
2011-08-29 16:32 ` Frederic Weisbecker
2011-08-29 17:44 ` Peter Zijlstra
2011-08-29 22:53 ` Frederic Weisbecker
2011-08-29 14:23 ` Peter Zijlstra
2011-08-29 17:01 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to idle enter/exit APIs Frederic Weisbecker
2011-08-29 14:25 ` Peter Zijlstra
2011-08-29 17:11 ` Frederic Weisbecker
2011-08-29 17:49 ` Peter Zijlstra
2011-08-29 17:59 ` Frederic Weisbecker
2011-08-29 18:06 ` Peter Zijlstra
2011-08-29 23:35 ` Frederic Weisbecker
2011-08-30 11:17 ` Peter Zijlstra
2011-08-30 14:11 ` Frederic Weisbecker
2011-08-30 14:13 ` Peter Zijlstra
2011-08-30 14:27 ` Frederic Weisbecker
2011-08-30 11:19 ` Peter Zijlstra
2011-08-30 14:26 ` Frederic Weisbecker
2011-08-30 15:22 ` Peter Zijlstra
2011-08-30 18:45 ` Frederic Weisbecker
2011-08-30 11:21 ` Peter Zijlstra
2011-08-30 14:32 ` Frederic Weisbecker
2011-08-30 15:26 ` Peter Zijlstra
2011-08-30 15:33 ` Frederic Weisbecker
2011-08-30 15:42 ` Peter Zijlstra
2011-08-30 18:53 ` Frederic Weisbecker
2011-08-30 20:58 ` Peter Zijlstra
2011-08-30 22:24 ` Frederic Weisbecker
2011-08-31 9:17 ` Peter Zijlstra
2011-08-31 13:37 ` Frederic Weisbecker
2011-08-31 14:41 ` Peter Zijlstra
2011-09-01 16:40 ` Paul E. McKenney
2011-09-01 17:13 ` Peter Zijlstra
2011-09-02 1:41 ` Paul E. McKenney
2011-09-02 8:24 ` Peter Zijlstra
2011-09-04 19:37 ` Paul E. McKenney
2011-09-05 14:28 ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 06/32] nohz: Move idle ticks stats tracking out of nohz handlers Frederic Weisbecker
2011-08-29 14:28 ` Peter Zijlstra
2011-09-06 0:35 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 07/32] nohz: Rename ts->idle_tick to ts->last_tick Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 08/32] nohz: Move nohz load balancer selection into idle logic Frederic Weisbecker
2011-08-29 14:45 ` Peter Zijlstra
2011-09-08 14:08 ` Frederic Weisbecker
2011-09-08 17:16 ` Paul E. McKenney
2011-08-15 15:52 ` [PATCH 09/32] nohz: Move ts->idle_calls into strict " Frederic Weisbecker
2011-08-29 14:47 ` Peter Zijlstra
2011-08-29 17:34 ` Frederic Weisbecker
2011-08-29 17:59 ` Peter Zijlstra
2011-08-29 18:23 ` Frederic Weisbecker
2011-08-29 18:33 ` Peter Zijlstra
2011-08-30 14:45 ` Frederic Weisbecker
2011-08-30 15:33 ` Peter Zijlstra
2011-09-06 16:35 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 10/32] nohz: Move next idle expiring time record into idle logic area Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 11/32] cpuset: Set up interface for nohz flag Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 12/32] nohz: Try not to give the timekeeping duty to a cpuset nohz cpu Frederic Weisbecker
2011-08-29 14:55 ` Peter Zijlstra
2011-08-30 15:17 ` Frederic Weisbecker
2011-08-30 15:30 ` Dimitri Sivanich
2011-08-30 15:37 ` Peter Zijlstra
2011-08-30 22:44 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 13/32] nohz: Adaptive tick stop and restart on nohz cpuset Frederic Weisbecker
2011-08-29 15:25 ` Peter Zijlstra
2011-09-06 13:03 ` Frederic Weisbecker
2011-08-29 15:28 ` Peter Zijlstra
2011-08-29 18:02 ` Frederic Weisbecker
2011-08-29 18:07 ` Peter Zijlstra
2011-08-29 18:28 ` Frederic Weisbecker
2011-08-30 12:44 ` Peter Zijlstra
2011-08-30 14:38 ` Frederic Weisbecker
2011-08-30 15:28 ` Peter Zijlstra
2011-08-29 15:32 ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 14/32] nohz/cpuset: Don't turn off the tick if rcu needs it Frederic Weisbecker
2011-08-16 20:13 ` Paul E. McKenney
2011-08-17 2:10 ` Frederic Weisbecker
2011-08-17 2:49 ` Paul E. McKenney
2011-08-29 15:36 ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 15/32] nohz/cpuset: Restart tick when switching to idle task Frederic Weisbecker
2011-08-29 15:43 ` Peter Zijlstra
2011-08-30 15:04 ` Frederic Weisbecker
2011-08-30 15:35 ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 16/32] nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued Frederic Weisbecker
2011-08-29 15:51 ` Peter Zijlstra
2011-08-29 15:55 ` Peter Zijlstra
2011-08-30 15:06 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 17/32] x86: New cpuset nohz irq vector Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 18/32] nohz/cpuset: Don't stop the tick if posix cpu timers are running Frederic Weisbecker
2011-08-29 15:59 ` Peter Zijlstra
2011-08-15 15:52 ` [PATCH 19/32] nohz/cpuset: Restart tick when nohz flag is cleared on cpuset Frederic Weisbecker
2011-08-29 16:02 ` Peter Zijlstra
2011-08-30 15:10 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 20/32] nohz/cpuset: Restart the tick if printk needs it Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 21/32] rcu: Restart the tick on non-responding adaptive nohz CPUs Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 22/32] rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU Frederic Weisbecker
2011-08-16 20:20 ` Paul E. McKenney
2011-08-17 2:18 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 23/32] nohz/cpuset: Account user and system times in adaptive nohz mode Frederic Weisbecker
2011-08-15 15:52 ` Frederic Weisbecker [this message]
2011-08-16 20:38 ` [PATCH 24/32] nohz/cpuset: Handle kernel entry/exit to account cputime Paul E. McKenney
2011-08-17 2:30 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 25/32] nohz/cpuset: New API to flush cputimes on nohz cpusets Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 26/32] nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 27/32] nohz/cpuset: Flush cputimes on procfs stat file read Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 28/32] nohz/cpuset: Flush cputimes for getrusage() and times() syscalls Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 29/32] x86: Syscall hooks for nohz cpusets Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 30/32] x86: Exception " Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 31/32] rcu: Switch to extended quiescent state in userspace from nohz cpuset Frederic Weisbecker
2011-08-16 20:44 ` Paul E. McKenney
2011-08-17 2:43 ` Frederic Weisbecker
2011-08-15 15:52 ` [PATCH 32/32] nohz/cpuset: Disable under some configs Frederic Weisbecker
2011-08-17 16:36 ` [RFC PATCH 00/32] Nohz cpusets (was: Nohz Tasks) Avi Kivity
2011-08-18 13:25 ` Frederic Weisbecker
2011-08-20 7:45 ` Paul Menage
2011-08-23 16:36 ` Frederic Weisbecker
2011-08-24 14:41 ` Gilad Ben-Yossef
2011-08-30 14:06 ` Frederic Weisbecker
2011-08-31 3:47 ` Mike Galbraith
2011-08-31 9:28 ` Peter Zijlstra
2011-08-31 10:26 ` Mike Galbraith
2011-08-31 10:33 ` Peter Zijlstra
2011-08-31 14:00 ` Gilad Ben-Yossef
2011-08-31 14:26 ` Peter Zijlstra
2011-08-31 14:05 ` Gilad Ben-Yossef
2011-08-31 16:12 ` Mike Galbraith
2011-08-31 13:57 ` Gilad Ben-Yossef
2011-08-31 14:30 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1313423549-27093-25-git-send-email-fweisbec@gmail.com \
--to=fweisbec@gmail.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=anton@au1.ibm.com \
--cc=avi@redhat.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lnxninja@linux.vnet.ibm.com \
--cc=menage@google.com \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=shemminger@vyatta.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).