linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
	Alessio Igor Bogani <abogani@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Avi Kivity <avi@redhat.com>, Chris Metcalf <cmetcalf@tilera.com>,
	Christoph Lameter <cl@linux.com>,
	Geoff Levand <geoff@infradead.org>,
	Gilad Ben Yossef <gilad@benyossef.com>,
	Hakan Akkan <hakanakkan@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting
Date: Thu, 20 Dec 2012 19:32:49 +0100	[thread overview]
Message-ID: <1356028391-14427-3-git-send-email-fweisbec@gmail.com> (raw)
In-Reply-To: <1356028391-14427-1-git-send-email-fweisbec@gmail.com>

If we want to stop the tick further idle, we need to be
able to account the cputime without using the tick.

Virtual based cputime accounting solves that problem by
hooking into kernel/user boundaries.

However implementing CONFIG_VIRT_CPU_ACCOUNTING require
to set low level hooks and involves more overhead. But
we already have a generic context tracking subsystem
that is required for RCU needs by archs which will want to
shut down the tick outside idle.

This patch implements a generic virtual based cputime
accounting that relies on these generic kernel/user hooks.

There are some upsides of doing this:

- This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
if context tracking is already built (already necessary for RCU in full
tickless mode).

- We can rely on the generic context tracking subsystem to dynamically
(de)activate the hooks, so that we can switch anytime between virtual
and tick based accounting. This way we don't have the overhead
of the virtual accounting when the tick is running periodically.

And a few downsides:

- It relies on jiffies and the hooks are set in high level code. This
results in less precise cputime accounting than with a true native
virtual based cputime accounting which hooks on low level code and use
a cpu hardware clock. Precision is not the goal of this though.

- There is probably more overhead than a native virtual based cputime
accounting. But this relies on hooks that are already set anyway.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/context_tracking.h |   28 +++++++++++
 include/linux/vtime.h            |    4 ++
 init/Kconfig                     |   11 ++++-
 kernel/context_tracking.c        |   22 ++-------
 kernel/sched/cputime.c           |   93 +++++++++++++++++++++++++++++++++++--
 5 files changed, 135 insertions(+), 23 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index e24339c..9f33fbc 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -3,12 +3,40 @@
 
 #ifdef CONFIG_CONTEXT_TRACKING
 #include <linux/sched.h>
+#include <linux/percpu.h>
+
+struct context_tracking {
+	/*
+	 * When active is false, hooks are unset in order
+	 * to minimize overhead: TIF flags are cleared
+	 * and calls to user_enter/exit are ignored. This
+	 * may be further optimized using static keys.
+	 */
+	bool active;
+	enum {
+		IN_KERNEL = 0,
+		IN_USER,
+	} state;
+};
+
+DECLARE_PER_CPU(struct context_tracking, context_tracking);
+
+static inline bool context_tracking_in_user(void)
+{
+	return __this_cpu_read(context_tracking.state) == IN_USER;
+}
+
+static inline bool context_tracking_active(void)
+{
+	return __this_cpu_read(context_tracking.active);
+}
 
 extern void user_enter(void);
 extern void user_exit(void);
 extern void context_tracking_task_switch(struct task_struct *prev,
 					 struct task_struct *next);
 #else
+static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
 static inline void user_exit(void) { }
 static inline void context_tracking_task_switch(struct task_struct *prev,
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index ae30ab5..58392aa 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -17,6 +17,10 @@ static inline void vtime_account_system_irqsafe(struct task_struct *tsk) { }
 static inline void vtime_account(struct task_struct *tsk) { }
 #endif
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
+#endif
+
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 extern void irqtime_account_irq(struct task_struct *tsk);
 #else
diff --git a/init/Kconfig b/init/Kconfig
index 60579d6..a64b3e8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -340,7 +340,9 @@ config TICK_CPU_ACCOUNTING
 
 config VIRT_CPU_ACCOUNTING
 	bool "Deterministic task and CPU time accounting"
-	depends on HAVE_VIRT_CPU_ACCOUNTING
+	depends on HAVE_VIRT_CPU_ACCOUNTING || HAVE_CONTEXT_TRACKING
+	select VIRT_CPU_ACCOUNTING_GEN if !HAVE_VIRT_CPU_ACCOUNTING
+	default y if PPC64
 	help
 	  Select this option to enable more accurate task and CPU time
 	  accounting.  This is done by reading a CPU counter on each
@@ -363,6 +365,13 @@ config IRQ_TIME_ACCOUNTING
 
 endchoice
 
+config VIRT_CPU_ACCOUNTING_GEN
+	select CONTEXT_TRACKING
+	bool
+	help
+	  Implement a generic virtual based cputime accounting by using
+	  the context tracking subsystem.
+
 config BSD_PROCESS_ACCT
 	bool "BSD Process Accounting"
 	help
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 9f6c38f..ca1e073 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -17,24 +17,10 @@
 #include <linux/context_tracking.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
-#include <linux/percpu.h>
 #include <linux/hardirq.h>
 
-struct context_tracking {
-	/*
-	 * When active is false, hooks are unset in order
-	 * to minimize overhead: TIF flags are cleared
-	 * and calls to user_enter/exit are ignored. This
-	 * may be further optimized using static keys.
-	 */
-	bool active;
-	enum {
-		IN_KERNEL = 0,
-		IN_USER,
-	} state;
-};
 
-static DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
+DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
 #ifdef CONFIG_CONTEXT_TRACKING_FORCE
 	.active = true,
 #endif
@@ -70,7 +56,7 @@ void user_enter(void)
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.active) &&
 	    __this_cpu_read(context_tracking.state) != IN_USER) {
-		__this_cpu_write(context_tracking.state, IN_USER);
+		vtime_account_system(current);
 		/*
 		 * At this stage, only low level arch entry code remains and
 		 * then we'll run in userspace. We can assume there won't be
@@ -79,6 +65,7 @@ void user_enter(void)
 		 * on the tick.
 		 */
 		rcu_user_enter();
+		__this_cpu_write(context_tracking.state, IN_USER);
 	}
 	local_irq_restore(flags);
 }
@@ -104,12 +91,13 @@ void user_exit(void)
 
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.state) == IN_USER) {
-		__this_cpu_write(context_tracking.state, IN_KERNEL);
 		/*
 		 * We are going to run code that may use RCU. Inform
 		 * RCU core about that (ie: we may need the tick again).
 		 */
 		rcu_user_exit();
+		vtime_account_user(current);
+		__this_cpu_write(context_tracking.state, IN_KERNEL);
 	}
 	local_irq_restore(flags);
 }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 293b202..da0a9e7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -3,6 +3,7 @@
 #include <linux/tsacct_kern.h>
 #include <linux/kernel_stat.h>
 #include <linux/static_key.h>
+#include <linux/context_tracking.h>
 #include "sched.h"
 
 
@@ -495,10 +496,24 @@ void vtime_task_switch(struct task_struct *prev)
 #ifndef __ARCH_HAS_VTIME_ACCOUNT
 void vtime_account(struct task_struct *tsk)
 {
-	if (in_interrupt() || !is_idle_task(tsk))
-		vtime_account_system(tsk);
-	else
-		vtime_account_idle(tsk);
+	if (!in_interrupt()) {
+		/*
+		 * If we interrupted user, context_tracking_in_user()
+		 * is 1 because the context tracking don't hook
+		 * on irq entry/exit. This way we know if
+		 * we need to flush user time on kernel entry.
+		 */
+		if (context_tracking_in_user()) {
+			vtime_account_user(tsk);
+			return;
+		}
+
+		if (is_idle_task(tsk)) {
+			vtime_account_idle(tsk);
+			return;
+		}
+	}
+	vtime_account_system(tsk);
 }
 EXPORT_SYMBOL_GPL(vtime_account);
 #endif /* __ARCH_HAS_VTIME_ACCOUNT */
@@ -586,4 +601,72 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 	thread_group_cputime(p, &cputime);
 	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
 }
-#endif
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+static DEFINE_PER_CPU(long, last_jiffies) = INITIAL_JIFFIES;
+
+static cputime_t get_vtime_delta(void)
+{
+	long delta;
+
+	delta = jiffies - __this_cpu_read(last_jiffies);
+	__this_cpu_add(last_jiffies, delta);
+
+	return jiffies_to_cputime(delta);
+}
+
+void vtime_account_system(struct task_struct *tsk)
+{
+	cputime_t delta_cpu = get_vtime_delta();
+
+	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
+}
+
+void vtime_account_user(struct task_struct *tsk)
+{
+	cputime_t delta_cpu = get_vtime_delta();
+
+	/*
+	 * This is an unfortunate hack: if we flush user time only on
+	 * irq entry, we miss the jiffies update and the time is spuriously
+	 * accounted to system time.
+	 */
+	if (context_tracking_in_user())
+		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+}
+
+void vtime_account_idle(struct task_struct *tsk)
+{
+	cputime_t delta_cpu = get_vtime_delta();
+
+	account_idle_time(delta_cpu);
+}
+
+static int __cpuinit vtime_cpu_notify(struct notifier_block *self,
+				      unsigned long action, void *hcpu)
+{
+	long cpu = (long)hcpu;
+	long *last_jiffies_cpu = per_cpu_ptr(&last_jiffies, cpu);
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+		/*
+		 * CHECKME: ensure that's visible by the CPU
+		 * once it wakes up
+		 */
+		*last_jiffies_cpu = jiffies;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static int __init init_vtime(void)
+{
+	cpu_notifier(vtime_cpu_notify, 0);
+	return 0;
+}
+early_initcall(init_vtime);
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
-- 
1.7.5.4


  parent reply	other threads:[~2012-12-20 18:41 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-20 18:32 [ANNOUNCE] 3.7-nohz1 Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 01/24] context_tracking: Add comments on interface and internals Frederic Weisbecker
2012-12-20 18:32 ` Frederic Weisbecker [this message]
2012-12-21  5:11   ` [PATCH 02/24] cputime: Generic on-demand virtual cputime accounting Steven Rostedt
2012-12-26  8:19   ` Li Zhong
2012-12-29 13:15     ` Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 03/24] cputime: Allow dynamic switch between tick/virtual based " Frederic Weisbecker
2012-12-21 15:05   ` Steven Rostedt
2012-12-22 17:43     ` Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 04/24] cputime: Use accessors to read task cputime stats Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 05/24] cputime: Safely read cputime of full dynticks CPUs Frederic Weisbecker
2012-12-21 15:09   ` Steven Rostedt
2012-12-22 17:51     ` Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 06/24] nohz: Basic full dynticks interface Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 07/24] nohz: Assign timekeeping duty to a non-full-nohz CPU Frederic Weisbecker
2012-12-21 16:13   ` Steven Rostedt
2012-12-22 16:39     ` Frederic Weisbecker
2012-12-22 17:05       ` Steven Rostedt
2012-12-20 18:32 ` [PATCH 08/24] nohz: Trace timekeeping update Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 09/24] nohz: Wake up full dynticks CPUs when a timer gets enqueued Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 10/24] rcu: Restart the tick on non-responding full dynticks CPUs Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 11/24] sched: Comment on rq->clock correctness in ttwu_do_wakeup() in nohz Frederic Weisbecker
2012-12-20 18:32 ` [PATCH 12/24] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 13/24] sched: Update rq clock on nohz CPU before setting fair group shares Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 14/24] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 15/24] sched: Update rq clock earlier in unthrottle_cfs_rq Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 16/24] sched: Update clock of nohz busiest rq before balancing Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 17/24] sched: Update rq clock before idle balancing Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 18/24] sched: Update nohz rq clock before searching busiest group on load balancing Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 19/24] nohz: Move nohz load balancer selection into idle logic Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 20/24] nohz: Full dynticks mode Frederic Weisbecker
2012-12-26  6:12   ` Namhyung Kim
2012-12-26  7:02     ` Namhyung Kim
2012-12-29 13:21     ` Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 21/24] nohz: Only stop the tick on RCU nocb CPUs Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 22/24] nohz: Don't turn off the tick if rcu needs it Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 23/24] nohz: Don't stop the tick if posix cpu timers are running Frederic Weisbecker
2012-12-20 18:33 ` [PATCH 24/24] nohz: Add some tracing Frederic Weisbecker
2012-12-21  2:35 ` [ANNOUNCE] 3.7-nohz1 Steven Rostedt
2012-12-23 23:43   ` Frederic Weisbecker
2012-12-30  3:56     ` Paul E. McKenney
2013-01-04 23:42       ` Frederic Weisbecker
2013-01-07 13:06         ` Paul E. McKenney
2012-12-21  5:20 ` Hakan Akkan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1356028391-14427-3-git-send-email-fweisbec@gmail.com \
    --to=fweisbec@gmail.com \
    --cc=abogani@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=cl@linux.com \
    --cc=cmetcalf@tilera.com \
    --cc=geoff@infradead.org \
    --cc=gilad@benyossef.com \
    --cc=hakanakkan@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).