From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: rcu_process_callbacks irqsoff latency caused by taking spinlock with irqs disabled
Date: Wed, 4 Apr 2018 17:13:58 -0700 [thread overview]
Message-ID: <20180405001358.GK3948@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180405093414.2273203e@roar.ozlabs.ibm.com>
On Thu, Apr 05, 2018 at 09:34:14AM +1000, Nicholas Piggin wrote:
> Hi Paul,
>
> Just looking at latencies, and RCU showed up as one of the maximums.
> This is a 2 socket system with (176 CPU threads). Just doing a
> `make -j 352` kernel build. Got a max latency of 3ms. I don't think
> that's anything to worry about really, but I wanted to check the
> cause.
Well, that 3 milliseconds would cause serious problems for some workloads...
> # tracer: irqsoff
> #
> # irqsoff latency trace v1.1.5 on 4.16.0-01530-g43d1859f0994
> # --------------------------------------------------------------------
> # latency: 3055 us, #19/19, CPU#135 | (M:server VP:0, KP:0, SP:0 HP:0 #P:176)
> # -----------------
> # | task: cc1-58689 (uid:1003 nice:0 policy:0 rt_prio:0)
> # -----------------
> # => started at: rcu_process_callbacks
> # => ended at: _raw_spin_unlock_irqrestore
> #
> #
> # _------=> CPU#
> # / _-----=> irqs-off
> # | / _----=> need-resched
> # || / _---=> hardirq/softirq
> # ||| / _--=> preempt-depth
> # |||| / delay
> # cmd pid ||||| time | caller
> # \ / ||||| \ | /
> <...>-58689 135d.s. 0us : rcu_process_callbacks
> <...>-58689 135d.s. 1us : cpu_needs_another_gp <-rcu_process_callbacks
> <...>-58689 135d.s. 2us : rcu_segcblist_future_gp_needed <-cpu_needs_another_gp
> <...>-58689 135d.s. 3us#: _raw_spin_lock <-rcu_process_callbacks
> <...>-58689 135d.s. 3047us : rcu_start_gp <-rcu_process_callbacks
> <...>-58689 135d.s. 3048us : rcu_advance_cbs <-rcu_start_gp
> <...>-58689 135d.s. 3049us : rcu_segcblist_pend_cbs <-rcu_advance_cbs
> <...>-58689 135d.s. 3049us : rcu_segcblist_advance <-rcu_advance_cbs
> <...>-58689 135d.s. 3050us : rcu_accelerate_cbs <-rcu_start_gp
> <...>-58689 135d.s. 3050us : rcu_segcblist_pend_cbs <-rcu_accelerate_cbs
> <...>-58689 135d.s. 3051us : rcu_segcblist_accelerate <-rcu_accelerate_cbs
> <...>-58689 135d.s. 3052us : trace_rcu_future_gp.isra.0 <-rcu_accelerate_cbs
> <...>-58689 135d.s. 3052us : trace_rcu_future_gp.isra.0 <-rcu_accelerate_cbs
> <...>-58689 135d.s. 3053us : rcu_start_gp_advanced.isra.35 <-rcu_start_gp
> <...>-58689 135d.s. 3053us : cpu_needs_another_gp <-rcu_start_gp_advanced.isra.35
> <...>-58689 135d.s. 3054us : _raw_spin_unlock_irqrestore <-rcu_process_callbacks
> <...>-58689 135d.s. 3055us : _raw_spin_unlock_irqrestore
> <...>-58689 135d.s. 3056us : trace_hardirqs_on <-_raw_spin_unlock_irqrestore
> <...>-58689 135d.s. 3061us : <stack trace>
>
> So it's taking a contende lock with interrupts disabled:
>
> static void
> __rcu_process_callbacks(struct rcu_state *rsp)
> {
> unsigned long flags;
> bool needwake;
> struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
>
> WARN_ON_ONCE(!rdp->beenonline);
>
> /* Update RCU state based on any recent quiescent states. */
> rcu_check_quiescent_state(rsp, rdp);
>
> /* Does this CPU require a not-yet-started grace period? */
> local_irq_save(flags);
> if (cpu_needs_another_gp(rsp, rdp)) {
> raw_spin_lock_rcu_node(rcu_get_root(rsp)); /* irqs disabled. */
> needwake = rcu_start_gp(rsp);
> raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(rsp), flags);
> if (needwake)
> rcu_gp_kthread_wake(rsp);
> } else {
> local_irq_restore(flags);
> }
>
> Because irqs are disabled before taking the lock, we can't spin with
> interrupts enabled.
>
> cpu_needs_another_gp needs interrupts off to prevent races with normal
> callback registry, but that doesn't seem to be preventing any wider
> races in this code, because we immediately re-enable interrupts anyway
> if no gp is needed. So an interrupt can come in right after that and
> queue something up.
>
> So then the question is whether it's safe-albeit-racy to call with
> interrupts ensabled? Would be nice to move it to a spin_lock_irqsave.
If I recall correctly, the issue is that an unsynchronized (due to
interrupts being enabled) check in the "if" statement can cause extra
unneeded grace periods.
I am guessing that the long latency is caused by lots of CPUs suddenly
needing a new grace period at about the same time. If so, this is
a bottleneck that I have been expecting for some time, and one that
I would resolve by introducing funnel locking, sort of like SRCU and
expedited grace periods already use.
Or am I confused about the source of the contention?
Thanx, Paul
next prev parent reply other threads:[~2018-04-05 0:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-04 23:34 rcu_process_callbacks irqsoff latency caused by taking spinlock with irqs disabled Nicholas Piggin
2018-04-05 0:13 ` Paul E. McKenney [this message]
2018-04-05 0:45 ` Nicholas Piggin
2018-04-05 15:53 ` Paul E. McKenney
2018-04-06 21:40 ` Nicholas Piggin
2018-04-08 21:06 ` Paul E. McKenney
2018-04-17 15:43 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180405001358.GK3948@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.