* [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure @ 2017-05-19 7:03 Konstantin Khlebnikov 2017-05-20 16:42 ` Paul E. McKenney 0 siblings, 1 reply; 4+ messages in thread From: Konstantin Khlebnikov @ 2017-05-19 7:03 UTC (permalink / raw) To: linux-kernel Cc: Peter Zijlstra, Lai Jiangshan, Josh Triplett, Steven Rostedt, Ingo Molnar, Mathieu Desnoyers, Paul E. McKenney This allows to get rid of unneeded invocations. Function debug_lockdep_rcu_enabled() becomes really hot if several debug options are enabled together with CONFIG_PROVE_RCU. Hottest path ends with: debug_lockdep_rcu_enabled is_ftrace_trampoline __kernel_text_address Here debug_lockdep_rcu_enabled() is called from condition (debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro do_for_each_ftrace_op(), where "c" is false. With this patch "netperf -H localhost" shows boost from 2400 to 2500. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> --- include/linux/rcupdate.h | 2 +- kernel/rcu/update.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index e1e5d002fdb9..9ecb3cb715bd 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; extern struct lockdep_map rcu_bh_lock_map; extern struct lockdep_map rcu_sched_lock_map; extern struct lockdep_map rcu_callback_map; -int debug_lockdep_rcu_enabled(void); +int __pure debug_lockdep_rcu_enabled(void); int rcu_read_lock_held(void); int rcu_read_lock_bh_held(void); diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 273e869ca21d..a0c30abefdcd 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key); EXPORT_SYMBOL_GPL(rcu_callback_map); -int notrace debug_lockdep_rcu_enabled(void) +int __pure notrace debug_lockdep_rcu_enabled(void) { return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && current->lockdep_recursion == 0; ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure 2017-05-19 7:03 [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure Konstantin Khlebnikov @ 2017-05-20 16:42 ` Paul E. McKenney 2017-05-21 8:51 ` Konstantin Khlebnikov 0 siblings, 1 reply; 4+ messages in thread From: Paul E. McKenney @ 2017-05-20 16:42 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: linux-kernel, Peter Zijlstra, Lai Jiangshan, Josh Triplett, Steven Rostedt, Ingo Molnar, Mathieu Desnoyers On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: > This allows to get rid of unneeded invocations. > > Function debug_lockdep_rcu_enabled() becomes really hot if several > debug options are enabled together with CONFIG_PROVE_RCU. > > Hottest path ends with: > debug_lockdep_rcu_enabled > is_ftrace_trampoline > __kernel_text_address > > Here debug_lockdep_rcu_enabled() is called from condition > (debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro > do_for_each_ftrace_op(), where "c" is false. > > With this patch "netperf -H localhost" shows boost from 2400 to 2500. > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Nice performance increase! The gcc documentation says that __attribute__((pure)) functions are supposed to have return values that depend only at the function's arguments and the values of global variables. However, it also says: Interesting non-pure functions are functions with infinite loops or those depending on volatile memory or other system resource, that may change between two consecutive calls (such as feof in a multithreading environment). This is OK for current->lockdep_recursion because this variable is changed only by the current task (I think so, anyway). It is sort of OK for debug_locks. This could be set to zero at any time by any other task, but if we have a race condition that very rarely causes two lockdep splats instead of just one, so what? (But I am sure that some of the people on CC will correct me if I am wrong here.) It should be OK for rcu_scheduler_active because the transition from RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first context switch, and the various barrier() calls, implied as well as explicit, should keep things straight. But I don't totally trust my analysis. Could you please get someone more gcc-savvy to review this and give their ack/review? Given that, I will queue it. Thanx, Paul > --- > include/linux/rcupdate.h | 2 +- > kernel/rcu/update.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index e1e5d002fdb9..9ecb3cb715bd 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; > extern struct lockdep_map rcu_bh_lock_map; > extern struct lockdep_map rcu_sched_lock_map; > extern struct lockdep_map rcu_callback_map; > -int debug_lockdep_rcu_enabled(void); > +int __pure debug_lockdep_rcu_enabled(void); > > int rcu_read_lock_held(void); > int rcu_read_lock_bh_held(void); > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > index 273e869ca21d..a0c30abefdcd 100644 > --- a/kernel/rcu/update.c > +++ b/kernel/rcu/update.c > @@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = > STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key); > EXPORT_SYMBOL_GPL(rcu_callback_map); > > -int notrace debug_lockdep_rcu_enabled(void) > +int __pure notrace debug_lockdep_rcu_enabled(void) > { > return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > current->lockdep_recursion == 0; > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure 2017-05-20 16:42 ` Paul E. McKenney @ 2017-05-21 8:51 ` Konstantin Khlebnikov 2017-07-27 17:57 ` Paul E. McKenney 0 siblings, 1 reply; 4+ messages in thread From: Konstantin Khlebnikov @ 2017-05-21 8:51 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Peter Zijlstra, Lai Jiangshan, Josh Triplett, Steven Rostedt, Ingo Molnar, Mathieu Desnoyers On 20.05.2017 19:42, Paul E. McKenney wrote: > On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: >> This allows to get rid of unneeded invocations. >> >> Function debug_lockdep_rcu_enabled() becomes really hot if several >> debug options are enabled together with CONFIG_PROVE_RCU. >> >> Hottest path ends with: >> debug_lockdep_rcu_enabled >> is_ftrace_trampoline >> __kernel_text_address >> >> Here debug_lockdep_rcu_enabled() is called from condition >> (debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro >> do_for_each_ftrace_op(), where "c" is false. >> >> With this patch "netperf -H localhost" shows boost from 2400 to 2500. >> >> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > > Nice performance increase! > > The gcc documentation says that __attribute__((pure)) functions are > supposed to have return values that depend only at the function's > arguments and the values of global variables. However, it also says: > > Interesting non-pure functions are functions with infinite loops > or those depending on volatile memory or other system resource, > that may change between two consecutive calls (such as feof in > a multithreading environment). > > This is OK for current->lockdep_recursion because this variable is changed > only by the current task (I think so, anyway). > > It is sort of OK for debug_locks. This could be set to zero at any time > by any other task, but if we have a race condition that very rarely causes > two lockdep splats instead of just one, so what? (But I am sure that > some of the people on CC will correct me if I am wrong here.) > > It should be OK for rcu_scheduler_active because the transition from > RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first > context switch, and the various barrier() calls, implied as well as > explicit, should keep things straight. > > But I don't totally trust my analysis. Could you please get someone > more gcc-savvy to review this and give their ack/review? Given that, > I will queue it. > > Thanx, Paul Thank you for analisys. __attribute__((pure)) allows compiler to deduplicate / eliminate calls. This indeed might expand race windows when global switches like debug_locks changes their state. But strict synchronization here isn't required. > >> --- >> include/linux/rcupdate.h | 2 +- >> kernel/rcu/update.c | 2 +- >> 2 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h >> index e1e5d002fdb9..9ecb3cb715bd 100644 >> --- a/include/linux/rcupdate.h >> +++ b/include/linux/rcupdate.h >> @@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; >> extern struct lockdep_map rcu_bh_lock_map; >> extern struct lockdep_map rcu_sched_lock_map; >> extern struct lockdep_map rcu_callback_map; >> -int debug_lockdep_rcu_enabled(void); >> +int __pure debug_lockdep_rcu_enabled(void); >> >> int rcu_read_lock_held(void); >> int rcu_read_lock_bh_held(void); >> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c >> index 273e869ca21d..a0c30abefdcd 100644 >> --- a/kernel/rcu/update.c >> +++ b/kernel/rcu/update.c >> @@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = >> STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key); >> EXPORT_SYMBOL_GPL(rcu_callback_map); >> >> -int notrace debug_lockdep_rcu_enabled(void) >> +int __pure notrace debug_lockdep_rcu_enabled(void) >> { >> return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && >> current->lockdep_recursion == 0; >> > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure 2017-05-21 8:51 ` Konstantin Khlebnikov @ 2017-07-27 17:57 ` Paul E. McKenney 0 siblings, 0 replies; 4+ messages in thread From: Paul E. McKenney @ 2017-07-27 17:57 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: linux-kernel, Peter Zijlstra, Lai Jiangshan, Josh Triplett, Steven Rostedt, Ingo Molnar, Mathieu Desnoyers On Sun, May 21, 2017 at 11:51:39AM +0300, Konstantin Khlebnikov wrote: > On 20.05.2017 19:42, Paul E. McKenney wrote: > >On Fri, May 19, 2017 at 10:03:59AM +0300, Konstantin Khlebnikov wrote: > >>This allows to get rid of unneeded invocations. > >> > >>Function debug_lockdep_rcu_enabled() becomes really hot if several > >>debug options are enabled together with CONFIG_PROVE_RCU. > >> > >>Hottest path ends with: > >> debug_lockdep_rcu_enabled > >> is_ftrace_trampoline > >> __kernel_text_address > >> > >>Here debug_lockdep_rcu_enabled() is called from condition > >>(debug_lockdep_rcu_enabled() && !__warned && (c)) inside macro > >>do_for_each_ftrace_op(), where "c" is false. > >> > >>With this patch "netperf -H localhost" shows boost from 2400 to 2500. > >> > >>Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > > > >Nice performance increase! > > > >The gcc documentation says that __attribute__((pure)) functions are > >supposed to have return values that depend only at the function's > >arguments and the values of global variables. However, it also says: > > > > Interesting non-pure functions are functions with infinite loops > > or those depending on volatile memory or other system resource, > > that may change between two consecutive calls (such as feof in > > a multithreading environment). > > > >This is OK for current->lockdep_recursion because this variable is changed > >only by the current task (I think so, anyway). > > > >It is sort of OK for debug_locks. This could be set to zero at any time > >by any other task, but if we have a race condition that very rarely causes > >two lockdep splats instead of just one, so what? (But I am sure that > >some of the people on CC will correct me if I am wrong here.) > > > >It should be OK for rcu_scheduler_active because the transition from > >RCU_SCHEDULER_INACTIVE to RCU_SCHEDULER_INIT happens before the first > >context switch, and the various barrier() calls, implied as well as > >explicit, should keep things straight. > > > >But I don't totally trust my analysis. Could you please get someone > >more gcc-savvy to review this and give their ack/review? Given that, > >I will queue it. > > > > Thanx, Paul > > Thank you for analisys. > > __attribute__((pure)) allows compiler to deduplicate / eliminate calls. > This indeed might expand race windows when global switches like debug_locks > changes their state. But strict synchronization here isn't required. Any objections to this patch? Thanx, Paul > >>--- > >> include/linux/rcupdate.h | 2 +- > >> kernel/rcu/update.c | 2 +- > >> 2 files changed, 2 insertions(+), 2 deletions(-) > >> > >>diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > >>index e1e5d002fdb9..9ecb3cb715bd 100644 > >>--- a/include/linux/rcupdate.h > >>+++ b/include/linux/rcupdate.h > >>@@ -472,7 +472,7 @@ extern struct lockdep_map rcu_lock_map; > >> extern struct lockdep_map rcu_bh_lock_map; > >> extern struct lockdep_map rcu_sched_lock_map; > >> extern struct lockdep_map rcu_callback_map; > >>-int debug_lockdep_rcu_enabled(void); > >>+int __pure debug_lockdep_rcu_enabled(void); > >> > >> int rcu_read_lock_held(void); > >> int rcu_read_lock_bh_held(void); > >>diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > >>index 273e869ca21d..a0c30abefdcd 100644 > >>--- a/kernel/rcu/update.c > >>+++ b/kernel/rcu/update.c > >>@@ -292,7 +292,7 @@ struct lockdep_map rcu_callback_map = > >> STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key); > >> EXPORT_SYMBOL_GPL(rcu_callback_map); > >> > >>-int notrace debug_lockdep_rcu_enabled(void) > >>+int __pure notrace debug_lockdep_rcu_enabled(void) > >> { > >> return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > >> current->lockdep_recursion == 0; > >> > > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-07-27 17:57 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-05-19 7:03 [PATCH] rcu: mark debug_lockdep_rcu_enabled() as pure Konstantin Khlebnikov 2017-05-20 16:42 ` Paul E. McKenney 2017-05-21 8:51 ` Konstantin Khlebnikov 2017-07-27 17:57 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox