From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Rik van Riel <riel@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Ingo Molnar <mingo@kernel.org>,
Andy Lutomirski <luto@amacapital.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
X86 ML <x86@kernel.org>,
williams@redhat.com, Andrew Lutomirski <luto@kernel.org>,
fweisbec@redhat.com, Peter Zijlstra <peterz@infradead.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: question about RCU dynticks_nesting
Date: Tue, 5 May 2015 23:06:28 -0700 [thread overview]
Message-ID: <20150506060628.GZ5381@linux.vnet.ibm.com> (raw)
In-Reply-To: <1430883894.3805.2.camel@gmail.com>
On Wed, May 06, 2015 at 05:44:54AM +0200, Mike Galbraith wrote:
> On Wed, 2015-05-06 at 03:49 +0200, Mike Galbraith wrote:
> > On Mon, 2015-05-04 at 22:54 -0700, Paul E. McKenney wrote:
> >
> > > You have RCU_FAST_NO_HZ=y, correct? Could you please try measuring with
> > > RCU_FAST_NO_HZ=n?
> >
> > FWIW, the syscall numbers I posted were RCU_FAST_NO_HZ=n. (I didn't
> > profile to see where costs lie though)
>
> (did that)
Nice, thank you!!!
> 100000000 * stat() on isolated cpu
>
> NO_HZ_FULL off inactive housekeeper nohz_full
> real 0m14.266s 0m14.367s 0m20.427s 0m27.921s
> user 0m1.756s 0m1.553s 0m1.976s 0m10.447s
> sys 0m12.508s 0m12.769s 0m18.400s 0m17.464s
> (real) 1.000 1.007 1.431 1.957
>
> inactive housekeeper nohz_full
> ----------------------------------------------------------------------------------------------------------------------------------------------
> 7.61% [.] __xstat64 11.12% [k] context_tracking_exit 7.41% [k] context_tracking_exit
> 7.04% [k] system_call 6.18% [k] context_tracking_enter 6.02% [k] native_sched_clock
> 6.96% [k] copy_user_enhanced_fast_string 5.18% [.] __xstat64 4.69% [k] rcu_eqs_enter_common.isra.37
> 6.57% [k] path_init 4.89% [k] system_call 4.35% [k] _raw_spin_lock
> 5.92% [k] system_call_after_swapgs 4.84% [k] copy_user_enhanced_fast_string 4.30% [k] context_tracking_enter
> 5.44% [k] lockref_put_return 4.46% [k] path_init 4.25% [k] kmem_cache_alloc
> 4.69% [k] link_path_walk 4.30% [k] system_call_after_swapgs 4.14% [.] __xstat64
> 4.47% [k] lockref_get_not_dead 4.12% [k] kmem_cache_free 3.89% [k] rcu_eqs_exit_common.isra.38
> 4.46% [k] kmem_cache_free 3.78% [k] link_path_walk 3.50% [k] system_call
> 4.20% [k] kmem_cache_alloc 3.62% [k] lockref_put_return 3.48% [k] copy_user_enhanced_fast_string
> 4.09% [k] cp_new_stat 3.43% [k] kmem_cache_alloc 3.02% [k] system_call_after_swapgs
> 3.38% [k] vfs_getattr_nosec 2.95% [k] lockref_get_not_dead 2.97% [k] kmem_cache_free
> 2.82% [k] vfs_fstatat 2.87% [k] cp_new_stat 2.88% [k] lockref_put_return
> 2.60% [k] user_path_at_empty 2.62% [k] syscall_trace_leave 2.61% [k] link_path_walk
> 2.47% [k] path_lookupat 1.91% [k] vfs_getattr_nosec 2.58% [k] path_init
> 2.14% [k] strncpy_from_user 1.89% [k] syscall_trace_enter_phase1 2.15% [k] lockref_get_not_dead
> 2.11% [k] getname_flags 1.77% [k] path_lookupat 2.04% [k] cp_new_stat
> 2.10% [k] generic_fillattr 1.67% [k] complete_walk 1.89% [k] generic_fillattr
> 2.05% [.] main 1.65% [k] vfs_fstatat 1.67% [k] syscall_trace_leave
> 1.89% [k] complete_walk 1.56% [k] generic_fillattr 1.59% [k] vfs_getattr_nosec
> 1.73% [k] generic_permission 1.55% [k] user_path_at_empty 1.49% [k] get_vtime_delta
> 1.50% [k] system_call_fastpath 1.54% [k] strncpy_from_user 1.32% [k] user_path_at_empty
> 1.37% [k] legitimize_mnt 1.53% [k] getname_flags 1.30% [k] syscall_trace_enter_phase1
> 1.30% [k] dput 1.46% [k] legitimize_mnt 1.21% [k] rcu_eqs_exit
> 1.26% [k] putname 1.34% [.] main 1.21% [k] vfs_fstatat
> 1.19% [k] path_put 1.32% [k] int_with_check 1.18% [k] path_lookupat
> 1.18% [k] filename_lookup 1.28% [k] generic_permission 1.15% [k] getname_flags
> 1.01% [k] SYSC_newstat 1.16% [k] int_very_careful 1.03% [k] strncpy_from_user
> 0.96% [k] mntput_no_expire 1.04% [k] putname 1.01% [k] account_system_time
> 0.79% [k] path_cleanup 0.94% [k] dput 1.00% [k] complete_walk
> 0.79% [k] mntput 0.91% [k] context_tracking_user_exit 0.99% [k] vtime_account_user
>
So we have rcu_eqs_enter_common(4.69%), rcu_eqs_exit_common(3.89%),
and rcu_eqs_exit(1.21%). Interesting that rcu_eqs_exit() appears at all,
given that it just does very simple operations, and rcu_eqs_exit_common()
is apparently not inlined (OK, perhaps it is partially inlined?).
This suggests that there are useful gains to be had via simple changes,
for exmaple, placing the warnings behind CONFIG_NO_HZ_DEBUG or some such.
Or not, as this overhead might instead be due to cache misses on first
access to the relevant data structures. But worth a try, perhaps.
Does the attached patch help at all? If so, there might be some similar
gains to be had.
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4e6902005228..3f09e5abb7b0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -616,7 +616,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting);
- if (!user && !is_idle_task(current)) {
+ if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
+ !user && !is_idle_task(current)) {
struct task_struct *idle __maybe_unused =
idle_task(smp_processor_id());
@@ -635,7 +636,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
smp_mb__before_atomic(); /* See above. */
atomic_inc(&rdtp->dynticks);
smp_mb__after_atomic(); /* Force ordering with next sojourn. */
- WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
+ atomic_read(&rdtp->dynticks) & 0x1);
rcu_dynticks_task_enter();
/*
@@ -661,7 +663,8 @@ static void rcu_eqs_enter(bool user)
rdtp = this_cpu_ptr(&rcu_dynticks);
oldval = rdtp->dynticks_nesting;
- WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
+ (oldval & DYNTICK_TASK_NEST_MASK) == 0);
if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) {
rdtp->dynticks_nesting = 0;
rcu_eqs_enter_common(oldval, user);
@@ -734,7 +737,8 @@ void rcu_irq_exit(void)
rdtp = this_cpu_ptr(&rcu_dynticks);
oldval = rdtp->dynticks_nesting;
rdtp->dynticks_nesting--;
- WARN_ON_ONCE(rdtp->dynticks_nesting < 0);
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
+ rdtp->dynticks_nesting < 0);
if (rdtp->dynticks_nesting)
trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting);
else
@@ -759,10 +763,12 @@ static void rcu_eqs_exit_common(long long oldval, int user)
atomic_inc(&rdtp->dynticks);
/* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
smp_mb__after_atomic(); /* See above. */
- WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
+ !(atomic_read(&rdtp->dynticks) & 0x1));
rcu_cleanup_after_idle();
trace_rcu_dyntick(TPS("End"), oldval, rdtp->dynticks_nesting);
- if (!user && !is_idle_task(current)) {
+ if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
+ !user && !is_idle_task(current)) {
struct task_struct *idle __maybe_unused =
idle_task(smp_processor_id());
@@ -786,7 +792,7 @@ static void rcu_eqs_exit(bool user)
rdtp = this_cpu_ptr(&rcu_dynticks);
oldval = rdtp->dynticks_nesting;
- WARN_ON_ONCE(oldval < 0);
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && oldval < 0);
if (oldval & DYNTICK_TASK_NEST_MASK) {
rdtp->dynticks_nesting += DYNTICK_TASK_NEST_VALUE;
} else {
@@ -859,7 +865,8 @@ void rcu_irq_enter(void)
rdtp = this_cpu_ptr(&rcu_dynticks);
oldval = rdtp->dynticks_nesting;
rdtp->dynticks_nesting++;
- WARN_ON_ONCE(rdtp->dynticks_nesting == 0);
+ WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
+ rdtp->dynticks_nesting == 0);
if (oldval)
trace_rcu_dyntick(TPS("++="), oldval, rdtp->dynticks_nesting);
else
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c4e1cf04cf57..b908048f8d6a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1373,6 +1373,17 @@ config RCU_TRACE
Say Y here if you want to enable RCU tracing
Say N if you are unsure.
+config RCU_EQS_DEBUG
+ bool "Use this when adding any sort of NO_HZ support to your arch"
+ depends on DEBUG_KERNEL
+ help
+ This option provides consistency checks in RCU's handling of
+ NO_HZ. These checks have proven quite helpful in detecting
+ bugs in arch-specific NO_HZ code.
+
+ Say N here if you need ultimate kernel/user switch latencies
+ Say Y if you are unsure
+
endmenu # "RCU Debugging"
config DEBUG_BLOCK_EXT_DEVT
next prev parent reply other threads:[~2015-05-06 6:06 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-30 21:23 [PATCH 0/3] reduce nohz_full syscall overhead by 10% riel
2015-04-30 21:23 ` [PATCH 1/3] reduce indentation in __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 2/3] remove local_irq_save from __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry riel
2015-04-30 21:56 ` Andy Lutomirski
2015-05-01 6:40 ` Ingo Molnar
2015-05-01 15:20 ` Rik van Riel
2015-05-01 15:59 ` Ingo Molnar
2015-05-01 16:03 ` Andy Lutomirski
2015-05-01 16:21 ` Ingo Molnar
2015-05-01 16:26 ` Rik van Riel
2015-05-01 16:34 ` Ingo Molnar
2015-05-01 18:05 ` Rik van Riel
2015-05-01 18:40 ` Ingo Molnar
2015-05-01 19:11 ` Rik van Riel
2015-05-01 19:37 ` Andy Lutomirski
2015-05-02 5:27 ` Ingo Molnar
2015-05-02 18:27 ` Rik van Riel
2015-05-03 18:41 ` Andy Lutomirski
2015-05-07 10:35 ` Ingo Molnar
2015-05-04 9:26 ` Paolo Bonzini
2015-05-04 13:30 ` Rik van Riel
2015-05-04 14:06 ` Rik van Riel
2015-05-04 14:19 ` Rik van Riel
2015-05-04 15:59 ` question about RCU dynticks_nesting Rik van Riel
2015-05-04 18:39 ` Paul E. McKenney
2015-05-04 19:39 ` Rik van Riel
2015-05-04 20:02 ` Paul E. McKenney
2015-05-04 20:13 ` Rik van Riel
2015-05-04 20:38 ` Paul E. McKenney
2015-05-04 20:53 ` Rik van Riel
2015-05-05 5:54 ` Paul E. McKenney
2015-05-06 1:49 ` Mike Galbraith
2015-05-06 3:44 ` Mike Galbraith
2015-05-06 6:06 ` Paul E. McKenney [this message]
2015-05-06 6:52 ` Mike Galbraith
2015-05-06 7:01 ` Mike Galbraith
2015-05-07 0:59 ` Frederic Weisbecker
2015-05-07 15:44 ` Rik van Riel
2015-05-04 19:00 ` Rik van Riel
2015-05-04 19:39 ` Paul E. McKenney
2015-05-04 19:59 ` Rik van Riel
2015-05-04 20:40 ` Paul E. McKenney
2015-05-05 10:53 ` Peter Zijlstra
2015-05-05 12:34 ` Paul E. McKenney
2015-05-05 13:00 ` Peter Zijlstra
2015-05-05 18:35 ` Paul E. McKenney
2015-05-05 21:09 ` Rik van Riel
2015-05-06 5:41 ` Paul E. McKenney
2015-05-05 10:48 ` Peter Zijlstra
2015-05-05 10:51 ` Peter Zijlstra
2015-05-05 12:30 ` Paul E. McKenney
2015-05-02 4:06 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Mike Galbraith
2015-05-01 16:37 ` Ingo Molnar
2015-05-01 16:40 ` Rik van Riel
2015-05-01 16:45 ` Ingo Molnar
2015-05-01 16:54 ` Rik van Riel
2015-05-01 17:12 ` Ingo Molnar
2015-05-01 17:22 ` Rik van Riel
2015-05-01 17:59 ` Ingo Molnar
2015-05-01 16:22 ` Rik van Riel
2015-05-01 16:27 ` Ingo Molnar
2015-05-03 13:23 ` Mike Galbraith
2015-05-03 17:30 ` Rik van Riel
2015-05-03 18:24 ` Andy Lutomirski
2015-05-03 18:52 ` Rik van Riel
2015-05-07 10:48 ` Ingo Molnar
2015-05-07 12:18 ` Frederic Weisbecker
2015-05-07 12:29 ` Ingo Molnar
2015-05-07 15:47 ` Rik van Riel
2015-05-08 7:58 ` Ingo Molnar
2015-05-07 12:22 ` Andy Lutomirski
2015-05-07 12:44 ` Ingo Molnar
2015-05-07 12:49 ` Ingo Molnar
2015-05-08 6:17 ` Paul E. McKenney
2015-05-07 12:52 ` Andy Lutomirski
2015-05-07 15:08 ` Ingo Molnar
2015-05-07 17:47 ` Andy Lutomirski
2015-05-08 6:37 ` Ingo Molnar
2015-05-08 10:59 ` Andy Lutomirski
2015-05-08 11:27 ` Ingo Molnar
2015-05-08 12:56 ` Andy Lutomirski
2015-05-08 13:27 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150506060628.GZ5381@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=fweisbec@redhat.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=umgwanakikbuti@gmail.com \
--cc=williams@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.