public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC
@ 2015-05-26 15:48 Denys Vlasenko
  2015-05-26 16:04 ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: Denys Vlasenko @ 2015-05-26 15:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Denys Vlasenko, Josh Triplett, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Tejun Heo, Oleg Nesterov, linux-kernel

DEBUG_LOCK_ALLOC=y is not a production setting, but it is
not very unusual either. Many developers routinely
use kernels built with it enabled.

Apart from being selected by hand, it is also auto-selected by
PROVE_LOCKING "Lock debugging: prove locking correctness" and
LOCK_STAT "Lock usage statistics" config options.
LOCK STAT is necessary for "perf lock" to work.

I wouldn't spend too much time optimizing it, but this particular
function has a very large cost in code size: when it is deinlined,
code size decreases by 830,000 bytes:

    text     data      bss       dec     hex filename
85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
84837612 22294424 20627456 127759492 79d7484 vmlinux

(with this config: http://busybox.net/~vda/kernel_config)

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Tejun Heo <tj@kernel.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: linux-kernel@vger.kernel.org
---
Changes in v2: removed an empty #else clause.
Changes in v3: switched to defined(FOO) syntax.

 include/linux/rcupdate.h | 40 ++-------------------------------------
 kernel/rcu/update.c      | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 7809749..6024a65 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -439,46 +439,10 @@ int rcu_read_lock_bh_held(void);
  * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
  * RCU-sched read-side critical section.  In absence of
  * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
- * critical section unless it can prove otherwise.  Note that disabling
- * of preemption (including disabling irqs) counts as an RCU-sched
- * read-side critical section.  This is useful for debug checks in functions
- * that required that they be called within an RCU-sched read-side
- * critical section.
- *
- * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
- * and while lockdep is disabled.
- *
- * Note that if the CPU is in the idle loop from an RCU point of
- * view (ie: that we are in the section between rcu_idle_enter() and
- * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
- * did an rcu_read_lock().  The reason for this is that RCU ignores CPUs
- * that are in such a section, considering these as in extended quiescent
- * state, so such a CPU is effectively never in an RCU read-side critical
- * section regardless of what RCU primitives it invokes.  This state of
- * affairs is required --- we need to keep an RCU-free window in idle
- * where the CPU may possibly enter into low power mode. This way we can
- * notice an extended quiescent state to other CPUs that started a grace
- * period. Otherwise we would delay any grace period as long as we run in
- * the idle task.
- *
- * Similarly, we avoid claiming an SRCU read lock held if the current
- * CPU is offline.
+ * critical section unless it can prove otherwise.
  */
 #ifdef CONFIG_PREEMPT_COUNT
-static inline int rcu_read_lock_sched_held(void)
-{
-	int lockdep_opinion = 0;
-
-	if (!debug_lockdep_rcu_enabled())
-		return 1;
-	if (!rcu_is_watching())
-		return 0;
-	if (!rcu_lockdep_current_cpu_online())
-		return 0;
-	if (debug_locks)
-		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
-	return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
-}
+int rcu_read_lock_sched_held(void);
 #else /* #ifdef CONFIG_PREEMPT_COUNT */
 static inline int rcu_read_lock_sched_held(void)
 {
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index e0d31a3..d273c82 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -62,6 +62,55 @@ MODULE_ALIAS("rcupdate");
 
 module_param(rcu_expedited, int, 0);
 
+#if defined(CONFIG_DEBUG_LOCK_ALLOC) && defined(CONFIG_PREEMPT_COUNT)
+/**
+ * rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
+ *
+ * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
+ * RCU-sched read-side critical section.  In absence of
+ * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
+ * critical section unless it can prove otherwise.  Note that disabling
+ * of preemption (including disabling irqs) counts as an RCU-sched
+ * read-side critical section.  This is useful for debug checks in functions
+ * that required that they be called within an RCU-sched read-side
+ * critical section.
+ *
+ * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
+ * and while lockdep is disabled.
+ *
+ * Note that if the CPU is in the idle loop from an RCU point of
+ * view (ie: that we are in the section between rcu_idle_enter() and
+ * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
+ * did an rcu_read_lock().  The reason for this is that RCU ignores CPUs
+ * that are in such a section, considering these as in extended quiescent
+ * state, so such a CPU is effectively never in an RCU read-side critical
+ * section regardless of what RCU primitives it invokes.  This state of
+ * affairs is required --- we need to keep an RCU-free window in idle
+ * where the CPU may possibly enter into low power mode. This way we can
+ * notice an extended quiescent state to other CPUs that started a grace
+ * period. Otherwise we would delay any grace period as long as we run in
+ * the idle task.
+ *
+ * Similarly, we avoid claiming an SRCU read lock held if the current
+ * CPU is offline.
+ */
+int rcu_read_lock_sched_held(void)
+{
+	int lockdep_opinion = 0;
+
+	if (!debug_lockdep_rcu_enabled())
+		return 1;
+	if (!rcu_is_watching())
+		return 0;
+	if (!rcu_lockdep_current_cpu_online())
+		return 0;
+	if (debug_locks)
+		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
+	return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
+}
+EXPORT_SYMBOL(rcu_read_lock_sched_held);
+#endif
+
 #ifdef CONFIG_PREEMPT_RCU
 
 /*
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC
  2015-05-26 15:48 [PATCH v3] rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC Denys Vlasenko
@ 2015-05-26 16:04 ` Steven Rostedt
  2015-05-26 16:28   ` Paul E. McKenney
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2015-05-26 16:04 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Paul E. McKenney, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan,
	Tejun Heo, Oleg Nesterov, linux-kernel

On Tue, 26 May 2015 17:48:34 +0200
Denys Vlasenko <dvlasenk@redhat.com> wrote:

> DEBUG_LOCK_ALLOC=y is not a production setting, but it is
> not very unusual either. Many developers routinely
> use kernels built with it enabled.
> 
> Apart from being selected by hand, it is also auto-selected by
> PROVE_LOCKING "Lock debugging: prove locking correctness" and
> LOCK_STAT "Lock usage statistics" config options.
> LOCK STAT is necessary for "perf lock" to work.
> 
> I wouldn't spend too much time optimizing it, but this particular
> function has a very large cost in code size: when it is deinlined,
> code size decreases by 830,000 bytes:
> 
>     text     data      bss       dec     hex filename
> 85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
> 84837612 22294424 20627456 127759492 79d7484 vmlinux
> 
> (with this config: http://busybox.net/~vda/kernel_config)
> 
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: Josh Triplett <josh@joshtriplett.org>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Lai Jiangshan <laijs@cn.fujitsu.com>
> CC: Tejun Heo <tj@kernel.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: linux-kernel@vger.kernel.org

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3] rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC
  2015-05-26 16:04 ` Steven Rostedt
@ 2015-05-26 16:28   ` Paul E. McKenney
  0 siblings, 0 replies; 3+ messages in thread
From: Paul E. McKenney @ 2015-05-26 16:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Denys Vlasenko, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan,
	Tejun Heo, Oleg Nesterov, linux-kernel

On Tue, May 26, 2015 at 12:04:46PM -0400, Steven Rostedt wrote:
> On Tue, 26 May 2015 17:48:34 +0200
> Denys Vlasenko <dvlasenk@redhat.com> wrote:
> 
> > DEBUG_LOCK_ALLOC=y is not a production setting, but it is
> > not very unusual either. Many developers routinely
> > use kernels built with it enabled.
> > 
> > Apart from being selected by hand, it is also auto-selected by
> > PROVE_LOCKING "Lock debugging: prove locking correctness" and
> > LOCK_STAT "Lock usage statistics" config options.
> > LOCK STAT is necessary for "perf lock" to work.
> > 
> > I wouldn't spend too much time optimizing it, but this particular
> > function has a very large cost in code size: when it is deinlined,
> > code size decreases by 830,000 bytes:
> > 
> >     text     data      bss       dec     hex filename
> > 85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
> > 84837612 22294424 20627456 127759492 79d7484 vmlinux
> > 
> > (with this config: http://busybox.net/~vda/kernel_config)
> > 
> > Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> > CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > CC: Josh Triplett <josh@joshtriplett.org>
> > CC: Steven Rostedt <rostedt@goodmis.org>
> > CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > CC: Lai Jiangshan <laijs@cn.fujitsu.com>
> > CC: Tejun Heo <tj@kernel.org>
> > CC: Oleg Nesterov <oleg@redhat.com>
> > CC: linux-kernel@vger.kernel.org
> 
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

Thank you both!

I had to apply by hand -- next time, please base on the rcu/dev branch of
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git.

Please also check to make sure that I didn't mess something up by
my hand-applying, please see below.

							Thanx, Paul

------------------------------------------------------------------------

commit 523d371ef86584a28def5d23288babffdbcf5f85
Author: Denys Vlasenko <dvlasenk@redhat.com>
Date:   Tue May 26 17:48:34 2015 +0200

    rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC
    
    DEBUG_LOCK_ALLOC=y is not a production setting, but it is
    not very unusual either. Many developers routinely
    use kernels built with it enabled.
    
    Apart from being selected by hand, it is also auto-selected by
    PROVE_LOCKING "Lock debugging: prove locking correctness" and
    LOCK_STAT "Lock usage statistics" config options.
    LOCK STAT is necessary for "perf lock" to work.
    
    I wouldn't spend too much time optimizing it, but this particular
    function has a very large cost in code size: when it is deinlined,
    code size decreases by 830,000 bytes:
    
        text     data      bss       dec     hex filename
    85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
    84837612 22294424 20627456 127759492 79d7484 vmlinux
    
    (with this config: http://busybox.net/~vda/kernel_config)
    
    Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
    CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
    CC: Josh Triplett <josh@joshtriplett.org>
    CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    CC: Lai Jiangshan <laijs@cn.fujitsu.com>
    CC: Tejun Heo <tj@kernel.org>
    CC: Oleg Nesterov <oleg@redhat.com>
    CC: linux-kernel@vger.kernel.org
    Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 18e377b92875..3a4d1bf430b1 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -467,46 +467,10 @@ int rcu_read_lock_bh_held(void);
  * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
  * RCU-sched read-side critical section.  In absence of
  * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
- * critical section unless it can prove otherwise.  Note that disabling
- * of preemption (including disabling irqs) counts as an RCU-sched
- * read-side critical section.  This is useful for debug checks in functions
- * that required that they be called within an RCU-sched read-side
- * critical section.
- *
- * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
- * and while lockdep is disabled.
- *
- * Note that if the CPU is in the idle loop from an RCU point of
- * view (ie: that we are in the section between rcu_idle_enter() and
- * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
- * did an rcu_read_lock().  The reason for this is that RCU ignores CPUs
- * that are in such a section, considering these as in extended quiescent
- * state, so such a CPU is effectively never in an RCU read-side critical
- * section regardless of what RCU primitives it invokes.  This state of
- * affairs is required --- we need to keep an RCU-free window in idle
- * where the CPU may possibly enter into low power mode. This way we can
- * notice an extended quiescent state to other CPUs that started a grace
- * period. Otherwise we would delay any grace period as long as we run in
- * the idle task.
- *
- * Similarly, we avoid claiming an SRCU read lock held if the current
- * CPU is offline.
+ * critical section unless it can prove otherwise.
  */
 #ifdef CONFIG_PREEMPT_COUNT
-static inline int rcu_read_lock_sched_held(void)
-{
-	int lockdep_opinion = 0;
-
-	if (!debug_lockdep_rcu_enabled())
-		return 1;
-	if (!rcu_is_watching())
-		return 0;
-	if (!rcu_lockdep_current_cpu_online())
-		return 0;
-	if (debug_locks)
-		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
-	return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
-}
+int rcu_read_lock_sched_held(void);
 #else /* #ifdef CONFIG_PREEMPT_COUNT */
 static inline int rcu_read_lock_sched_held(void)
 {
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index afaecb7a799a..fec5f48b8860 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -62,6 +62,55 @@ MODULE_ALIAS("rcupdate");
 
 module_param(rcu_expedited, int, 0);
 
+#if defined(CONFIG_DEBUG_LOCK_ALLOC) && defined(CONFIG_PREEMPT_COUNT)
+/**
+ * rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
+ *
+ * If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
+ * RCU-sched read-side critical section.  In absence of
+ * CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
+ * critical section unless it can prove otherwise.  Note that disabling
+ * of preemption (including disabling irqs) counts as an RCU-sched
+ * read-side critical section.  This is useful for debug checks in functions
+ * that required that they be called within an RCU-sched read-side
+ * critical section.
+ *
+ * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
+ * and while lockdep is disabled.
+ *
+ * Note that if the CPU is in the idle loop from an RCU point of
+ * view (ie: that we are in the section between rcu_idle_enter() and
+ * rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
+ * did an rcu_read_lock().  The reason for this is that RCU ignores CPUs
+ * that are in such a section, considering these as in extended quiescent
+ * state, so such a CPU is effectively never in an RCU read-side critical
+ * section regardless of what RCU primitives it invokes.  This state of
+ * affairs is required --- we need to keep an RCU-free window in idle
+ * where the CPU may possibly enter into low power mode. This way we can
+ * notice an extended quiescent state to other CPUs that started a grace
+ * period. Otherwise we would delay any grace period as long as we run in
+ * the idle task.
+ *
+ * Similarly, we avoid claiming an SRCU read lock held if the current
+ * CPU is offline.
+ */
+int rcu_read_lock_sched_held(void)
+{
+	int lockdep_opinion = 0;
+
+	if (!debug_lockdep_rcu_enabled())
+		return 1;
+	if (!rcu_is_watching())
+		return 0;
+	if (!rcu_lockdep_current_cpu_online())
+		return 0;
+	if (debug_locks)
+		lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
+	return lockdep_opinion || preempt_count() != 0 || irqs_disabled();
+}
+EXPORT_SYMBOL(rcu_read_lock_sched_held);
+#endif
+
 #ifndef CONFIG_TINY_RCU
 
 static atomic_t rcu_expedited_nesting =


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-26 16:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-26 15:48 [PATCH v3] rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC Denys Vlasenko
2015-05-26 16:04 ` Steven Rostedt
2015-05-26 16:28   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox