public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] context_tracking: Add comments on interface and internals
@ 2013-01-16 12:32 Frederic Weisbecker
  2013-01-16 13:04 ` Namhyung Kim
  0 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2013-01-16 12:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

This subsystem lacks many explanations on its purpose and
design. Add these missing comments.

v3: Fix the "hook" based naming as per Ingo's suggestion

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/context_tracking.c |   73 ++++++++++++++++++++++++++++++++++++++------
 1 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index e0e07fd..4b360b4 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -1,3 +1,19 @@
+/*
+ * Context tracking: Probe on high level context boundaries such as kernel
+ * and userspace. This includes syscalls and exceptions entry/exit.
+ *
+ * This is used by RCU to remove its dependency on the timer tick while a CPU
+ * runs in userspace.
+ *
+ *  Started by Frederic Weisbecker:
+ *
+ * Copyright (C) 2012 Red Hat, Inc., Frederic Weisbecker <fweisbec@redhat.com>
+ *
+ * Many thanks to Gilad Ben-Yossef, Paul McKenney, Ingo Molnar, Andrew Morton,
+ * Steven Rostedt, Peter Zijlstra for suggestions and improvements.
+ *
+ */
+
 #include <linux/context_tracking.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
@@ -6,8 +22,8 @@
 
 struct context_tracking {
 	/*
-	 * When active is false, hooks are not set to
-	 * minimize overhead: TIF flags are cleared
+	 * When active is false, probes are unset in order
+	 * to minimize overhead: TIF flags are cleared
 	 * and calls to user_enter/exit are ignored. This
 	 * may be further optimized using static keys.
 	 */
@@ -24,6 +40,15 @@ static DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
 #endif
 };
 
+/**
+ * user_enter - Inform the context tracking that the CPU is going to
+ *              enter userspace mode.
+ *
+ * This function must be called right before we switch from the kernel
+ * to userspace, when it's guaranteed the remaining kernel instructions
+ * to execute won't use any RCU read side critical section because this
+ * function sets RCU in extended quiescent state.
+ */
 void user_enter(void)
 {
 	unsigned long flags;
@@ -39,40 +64,68 @@ void user_enter(void)
 	if (in_interrupt())
 		return;
 
+	/* Kernel threads aren't supposed to go to userspace */
 	WARN_ON_ONCE(!current->mm);
 
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.active) &&
 	    __this_cpu_read(context_tracking.state) != IN_USER) {
 		__this_cpu_write(context_tracking.state, IN_USER);
+		/*
+		 * At this stage, only low level arch entry code remains and
+		 * then we'll run in userspace. We can assume there won't be
+		 * any RCU read-side critical section until the next call to
+		 * user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
+		 * on the tick.
+		 */
 		rcu_user_enter();
 	}
 	local_irq_restore(flags);
 }
 
+
+/**
+ * user_exit - Inform the context tracking that the CPU is
+ *             exiting userspace mode and entering the kernel.
+ *
+ * This function must be called after we entered the kernel from userspace
+ * before any use of RCU read side critical section. This potentially include
+ * any high level kernel code like syscalls, exceptions, signal handling, etc...
+ *
+ * This call supports re-entrancy. This way it can be called from any exception
+ * handler without needing to know if we came from userspace or not.
+ */
 void user_exit(void)
 {
 	unsigned long flags;
 
-	/*
-	 * Some contexts may involve an exception occuring in an irq,
-	 * leading to that nesting:
-	 * rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit()
-	 * This would mess up the dyntick_nesting count though. And rcu_irq_*()
-	 * helpers are enough to protect RCU uses inside the exception. So
-	 * just return immediately if we detect we are in an IRQ.
-	 */
 	if (in_interrupt())
 		return;
 
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.state) == IN_USER) {
 		__this_cpu_write(context_tracking.state, IN_KERNEL);
+		/*
+		 * We are going to run code that may use RCU. Inform
+		 * RCU core about that (ie: we may need the tick again).
+		 */
 		rcu_user_exit();
 	}
 	local_irq_restore(flags);
 }
 
+
+/**
+ * context_tracking_task_switch - context switch the syscall callbacks
+ *
+ * The context tracking uses the syscall slow path to implement its user-kernel
+ * boundaries probes on syscalls. This way it doesn't impact the syscall fast
+ * path on CPUs that don't do context tracking.
+ *
+ * But we need to clear the flag on the previous task because it may later
+ * migrate to some CPU that doesn't do the context tracking. As such the TIF
+ * flag may not be desired there.
+ */
 void context_tracking_task_switch(struct task_struct *prev,
 			     struct task_struct *next)
 {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] context_tracking: Add comments on interface and internals
  2013-01-16 12:32 [PATCH] context_tracking: Add comments on interface and internals Frederic Weisbecker
@ 2013-01-16 13:04 ` Namhyung Kim
  2013-01-16 14:58   ` Frederic Weisbecker
  2013-01-16 16:16   ` [PATCH v4] " Frederic Weisbecker
  0 siblings, 2 replies; 7+ messages in thread
From: Namhyung Kim @ 2013-01-16 13:04 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Hi Frederic,

On Wed, 16 Jan 2013 13:32:57 +0100, Frederic Weisbecker wrote:
> This subsystem lacks many explanations on its purpose and
> design. Add these missing comments.
>
> v3: Fix the "hook" based naming as per Ingo's suggestion
[snip]
> +/**
> + * context_tracking_task_switch - context switch the syscall callbacks

To be more kernel-doc-friendly, it'd better adding descriptions for
arguments too:

 @prev: the task that is being switched out
 @next: the task we are going to switch to

Thanks,
Namhyung


> + *
> + * The context tracking uses the syscall slow path to implement its user-kernel
> + * boundaries probes on syscalls. This way it doesn't impact the syscall fast
> + * path on CPUs that don't do context tracking.
> + *
> + * But we need to clear the flag on the previous task because it may later
> + * migrate to some CPU that doesn't do the context tracking. As such the TIF
> + * flag may not be desired there.
> + */
>  void context_tracking_task_switch(struct task_struct *prev,
>  			     struct task_struct *next)
>  {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] context_tracking: Add comments on interface and internals
  2013-01-16 13:04 ` Namhyung Kim
@ 2013-01-16 14:58   ` Frederic Weisbecker
  2013-01-16 16:16   ` [PATCH v4] " Frederic Weisbecker
  1 sibling, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2013-01-16 14:58 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Paul E. McKenney, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

2013/1/16 Namhyung Kim <namhyung@kernel.org>:
> Hi Frederic,
>
> On Wed, 16 Jan 2013 13:32:57 +0100, Frederic Weisbecker wrote:
>> This subsystem lacks many explanations on its purpose and
>> design. Add these missing comments.
>>
>> v3: Fix the "hook" based naming as per Ingo's suggestion
> [snip]
>> +/**
>> + * context_tracking_task_switch - context switch the syscall callbacks
>
> To be more kernel-doc-friendly, it'd better adding descriptions for
> arguments too:
>
>  @prev: the task that is being switched out
>  @next: the task we are going to switch to

Right, I'm fixing that.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4] context_tracking: Add comments on interface and internals
  2013-01-16 13:04 ` Namhyung Kim
  2013-01-16 14:58   ` Frederic Weisbecker
@ 2013-01-16 16:16   ` Frederic Weisbecker
  2013-01-16 21:18     ` Andrew Morton
  1 sibling, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2013-01-16 16:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

This subsystem lacks many explanations on its purpose and
design. Add these missing comments.

v4: Document function parameter to be more kernel-doc
friendly, as per Namhyung suggestion.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/context_tracking.c |   75 +++++++++++++++++++++++++++++++++++++++------
 1 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index e0e07fd..d566aba 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -1,3 +1,19 @@
+/*
+ * Context tracking: Probe on high level context boundaries such as kernel
+ * and userspace. This includes syscalls and exceptions entry/exit.
+ *
+ * This is used by RCU to remove its dependency on the timer tick while a CPU
+ * runs in userspace.
+ *
+ *  Started by Frederic Weisbecker:
+ *
+ * Copyright (C) 2012 Red Hat, Inc., Frederic Weisbecker <fweisbec@redhat.com>
+ *
+ * Many thanks to Gilad Ben-Yossef, Paul McKenney, Ingo Molnar, Andrew Morton,
+ * Steven Rostedt, Peter Zijlstra for suggestions and improvements.
+ *
+ */
+
 #include <linux/context_tracking.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
@@ -6,8 +22,8 @@
 
 struct context_tracking {
 	/*
-	 * When active is false, hooks are not set to
-	 * minimize overhead: TIF flags are cleared
+	 * When active is false, probes are unset in order
+	 * to minimize overhead: TIF flags are cleared
 	 * and calls to user_enter/exit are ignored. This
 	 * may be further optimized using static keys.
 	 */
@@ -24,6 +40,15 @@ static DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
 #endif
 };
 
+/**
+ * user_enter - Inform the context tracking that the CPU is going to
+ *              enter userspace mode.
+ *
+ * This function must be called right before we switch from the kernel
+ * to userspace, when it's guaranteed the remaining kernel instructions
+ * to execute won't use any RCU read side critical section because this
+ * function sets RCU in extended quiescent state.
+ */
 void user_enter(void)
 {
 	unsigned long flags;
@@ -39,40 +64,70 @@ void user_enter(void)
 	if (in_interrupt())
 		return;
 
+	/* Kernel threads aren't supposed to go to userspace */
 	WARN_ON_ONCE(!current->mm);
 
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.active) &&
 	    __this_cpu_read(context_tracking.state) != IN_USER) {
 		__this_cpu_write(context_tracking.state, IN_USER);
+		/*
+		 * At this stage, only low level arch entry code remains and
+		 * then we'll run in userspace. We can assume there won't be
+		 * any RCU read-side critical section until the next call to
+		 * user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
+		 * on the tick.
+		 */
 		rcu_user_enter();
 	}
 	local_irq_restore(flags);
 }
 
+
+/**
+ * user_exit - Inform the context tracking that the CPU is
+ *             exiting userspace mode and entering the kernel.
+ *
+ * This function must be called after we entered the kernel from userspace
+ * before any use of RCU read side critical section. This potentially include
+ * any high level kernel code like syscalls, exceptions, signal handling, etc...
+ *
+ * This call supports re-entrancy. This way it can be called from any exception
+ * handler without needing to know if we came from userspace or not.
+ */
 void user_exit(void)
 {
 	unsigned long flags;
 
-	/*
-	 * Some contexts may involve an exception occuring in an irq,
-	 * leading to that nesting:
-	 * rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit()
-	 * This would mess up the dyntick_nesting count though. And rcu_irq_*()
-	 * helpers are enough to protect RCU uses inside the exception. So
-	 * just return immediately if we detect we are in an IRQ.
-	 */
 	if (in_interrupt())
 		return;
 
 	local_irq_save(flags);
 	if (__this_cpu_read(context_tracking.state) == IN_USER) {
 		__this_cpu_write(context_tracking.state, IN_KERNEL);
+		/*
+		 * We are going to run code that may use RCU. Inform
+		 * RCU core about that (ie: we may need the tick again).
+		 */
 		rcu_user_exit();
 	}
 	local_irq_restore(flags);
 }
 
+
+/**
+ * context_tracking_task_switch - context switch the syscall callbacks
+ * @prev: the task that is being switched out
+ * @next: the task that is being switched in
+ *
+ * The context tracking uses the syscall slow path to implement its user-kernel
+ * boundaries probes on syscalls. This way it doesn't impact the syscall fast
+ * path on CPUs that don't do context tracking.
+ *
+ * But we need to clear the flag on the previous task because it may later
+ * migrate to some CPU that doesn't do the context tracking. As such the TIF
+ * flag may not be desired there.
+ */
 void context_tracking_task_switch(struct task_struct *prev,
 			     struct task_struct *next)
 {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] context_tracking: Add comments on interface and internals
  2013-01-16 16:16   ` [PATCH v4] " Frederic Weisbecker
@ 2013-01-16 21:18     ` Andrew Morton
  2013-01-16 21:49       ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2013-01-16 21:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Alessio Igor Bogani, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Ingo Molnar, Li Zhong, Namhyung Kim, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner

On Wed, 16 Jan 2013 17:16:37 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> This subsystem lacks many explanations on its purpose and
> design. Add these missing comments.

Looks nice and is really helpful, thanks.


I still don't understand NOHZ's role in this whole thing :(

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] context_tracking: Add comments on interface and internals
  2013-01-16 21:18     ` Andrew Morton
@ 2013-01-16 21:49       ` Steven Rostedt
  2013-01-16 22:06         ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2013-01-16 21:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Paul E. McKenney, LKML, Alessio Igor Bogani,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

On Wed, 2013-01-16 at 13:18 -0800, Andrew Morton wrote:
> On Wed, 16 Jan 2013 17:16:37 +0100

> I still don't understand NOHZ's role in this whole thing :(

It's not for idle NOHZ, but "process" NOHZ.

That is if we have only one task running on a CPU, we don't want a tick
interrupt to bother it. This is because there's lots of users out there
that want an uninterrupted task. A task that doesn't ever get bothered
by the kernel. If it's in userspace, it stays in userspace (no
interrupts), until it calls into the kernel itself (syscall). Even when
its in the kernel, we still don't need the tick interrupt if its the
only task.

But the scheduler isn't the only thing that uses this tick. To remove
the tick, we need to satisfy all the other users (printk, delayed work,
u/s-times).

To still keep up the stats of user and kernel times for the task, we
need to record when the task switches from user to kernel and back
again. Hence the context tracking code.

Makes more sense?

-- Steve



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] context_tracking: Add comments on interface and internals
  2013-01-16 21:49       ` Steven Rostedt
@ 2013-01-16 22:06         ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2013-01-16 22:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Paul E. McKenney, LKML, Alessio Igor Bogani,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Ingo Molnar, Li Zhong, Namhyung Kim, Paul Gortmaker,
	Peter Zijlstra, Thomas Gleixner

On Wed, 16 Jan 2013 16:49:21 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 2013-01-16 at 13:18 -0800, Andrew Morton wrote:
> > On Wed, 16 Jan 2013 17:16:37 +0100
> 
> > I still don't understand NOHZ's role in this whole thing :(
> 
> It's not for idle NOHZ, but "process" NOHZ.
> 
> That is if we have only one task running on a CPU, we don't want a tick
> interrupt to bother it. This is because there's lots of users out there
> that want an uninterrupted task. A task that doesn't ever get bothered
> by the kernel. If it's in userspace, it stays in userspace (no
> interrupts), until it calls into the kernel itself (syscall). Even when
> its in the kernel, we still don't need the tick interrupt if its the
> only task.

oh, is that what TIF_NOHZ does ;)

> But the scheduler isn't the only thing that uses this tick. To remove
> the tick, we need to satisfy all the other users (printk, delayed work,
> u/s-times).
> 
> To still keep up the stats of user and kernel times for the task, we
> need to record when the task switches from user to kernel and back
> again. Hence the context tracking code.
> 
> Makes more sense?

yup thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-01-16 22:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-16 12:32 [PATCH] context_tracking: Add comments on interface and internals Frederic Weisbecker
2013-01-16 13:04 ` Namhyung Kim
2013-01-16 14:58   ` Frederic Weisbecker
2013-01-16 16:16   ` [PATCH v4] " Frederic Weisbecker
2013-01-16 21:18     ` Andrew Morton
2013-01-16 21:49       ` Steven Rostedt
2013-01-16 22:06         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox