* Re: [tip:core/rcu] rcu: Break call_rcu() deadlock involving scheduler and perf
[not found] <tip-96d3fd0d315a949e30adc80f086031c5cdf070d1@git.kernel.org>
@ 2013-12-16 15:26 ` Peter Zijlstra
2013-12-16 15:32 ` Paul E. McKenney
0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2013-12-16 15:26 UTC (permalink / raw)
To: linux-kernel, mingo, hpa, paulmck, tglx, davej; +Cc: linux-tip-commits
On Mon, Dec 16, 2013 at 07:19:22AM -0800, tip-bot for Paul E. McKenney wrote:
> The underlying problem is that perf is invoking call_rcu() with the
> scheduler locks held, but in NOCB mode, call_rcu() will with high
> probability invoke the scheduler -- which just might want to use its
> locks. The reason that call_rcu() needs to invoke the scheduler is
> to wake up the corresponding rcuo callback-offload kthread, which
> does the job of starting up a grace period and invoking the callbacks
> afterwards.
>
> One solution (championed on a related problem by Lai Jiangshan) is to
> simply defer the wakeup to some point where scheduler locks are no longer
> held. Since we don't want to unnecessarily incur the cost of such
> deferral, the task before us is threefold:
>
> 1. Determine when it is likely that a relevant scheduler lock is held.
>
> 2. Defer the wakeup in such cases.
>
> 3. Ensure that all deferred wakeups eventually happen, preferably
> sooner rather than later.
>
> We use irqs_disabled_flags() as a proxy for relevant scheduler locks
> being held. This works because the relevant locks are always acquired
> with interrupts disabled. We may defer more often than needed, but that
> is at least safe.
This would also allow us to do away with things like the below patch,
right?
---
commit 058ebd0eba3aff16b144eabf4510ed9510e1416e
Author: Peter Zijlstra <peterz@infradead.org>
Date: Fri Jul 12 11:08:33 2013 +0200
perf: Fix perf_lock_task_context() vs RCU
Jiri managed to trigger this warning:
[] ======================================================
[] [ INFO: possible circular locking dependency detected ]
[] 3.10.0+ #228 Tainted: G W
[] -------------------------------------------------------
[] p/6613 is trying to acquire lock:
[] (rcu_node_0){..-...}, at: [<ffffffff810ca797>] rcu_read_unlock_special+0xa7/0x250
[]
[] but task is already holding lock:
[] (&ctx->lock){-.-...}, at: [<ffffffff810f2879>] perf_lock_task_context+0xd9/0x2c0
[]
[] which lock already depends on the new lock.
[]
[] the existing dependency chain (in reverse order) is:
[]
[] -> #4 (&ctx->lock){-.-...}:
[] -> #3 (&rq->lock){-.-.-.}:
[] -> #2 (&p->pi_lock){-.-.-.}:
[] -> #1 (&rnp->nocb_gp_wq[1]){......}:
[] -> #0 (rcu_node_0){..-...}:
Paul was quick to explain that due to preemptible RCU we cannot call
rcu_read_unlock() while holding scheduler (or nested) locks when part
of the read side critical section was preemptible.
Therefore solve it by making the entire RCU read side non-preemptible.
Also pull out the retry from under the non-preempt to play nice with RT.
Reported-by: Jiri Olsa <jolsa@redhat.com>
Helped-out-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ef5e7cc686e3..eba8fb5834ae 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -947,8 +947,18 @@ perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
{
struct perf_event_context *ctx;
- rcu_read_lock();
retry:
+ /*
+ * One of the few rules of preemptible RCU is that one cannot do
+ * rcu_read_unlock() while holding a scheduler (or nested) lock when
+ * part of the read side critical section was preemptible -- see
+ * rcu_read_unlock_special().
+ *
+ * Since ctx->lock nests under rq->lock we must ensure the entire read
+ * side critical section is non-preemptible.
+ */
+ preempt_disable();
+ rcu_read_lock();
ctx = rcu_dereference(task->perf_event_ctxp[ctxn]);
if (ctx) {
/*
@@ -964,6 +974,8 @@ perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
raw_spin_lock_irqsave(&ctx->lock, *flags);
if (ctx != rcu_dereference(task->perf_event_ctxp[ctxn])) {
raw_spin_unlock_irqrestore(&ctx->lock, *flags);
+ rcu_read_unlock();
+ preempt_enable();
goto retry;
}
@@ -973,6 +985,7 @@ perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
}
}
rcu_read_unlock();
+ preempt_enable();
return ctx;
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [tip:core/rcu] rcu: Break call_rcu() deadlock involving scheduler and perf
2013-12-16 15:26 ` [tip:core/rcu] rcu: Break call_rcu() deadlock involving scheduler and perf Peter Zijlstra
@ 2013-12-16 15:32 ` Paul E. McKenney
2013-12-16 15:45 ` Peter Zijlstra
0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2013-12-16 15:32 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-kernel, mingo, hpa, tglx, davej, linux-tip-commits
On Mon, Dec 16, 2013 at 04:26:36PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 16, 2013 at 07:19:22AM -0800, tip-bot for Paul E. McKenney wrote:
> > The underlying problem is that perf is invoking call_rcu() with the
> > scheduler locks held, but in NOCB mode, call_rcu() will with high
> > probability invoke the scheduler -- which just might want to use its
> > locks. The reason that call_rcu() needs to invoke the scheduler is
> > to wake up the corresponding rcuo callback-offload kthread, which
> > does the job of starting up a grace period and invoking the callbacks
> > afterwards.
> >
> > One solution (championed on a related problem by Lai Jiangshan) is to
> > simply defer the wakeup to some point where scheduler locks are no longer
> > held. Since we don't want to unnecessarily incur the cost of such
> > deferral, the task before us is threefold:
> >
> > 1. Determine when it is likely that a relevant scheduler lock is held.
> >
> > 2. Defer the wakeup in such cases.
> >
> > 3. Ensure that all deferred wakeups eventually happen, preferably
> > sooner rather than later.
> >
> > We use irqs_disabled_flags() as a proxy for relevant scheduler locks
> > being held. This works because the relevant locks are always acquired
> > with interrupts disabled. We may defer more often than needed, but that
> > is at least safe.
>
> This would also allow us to do away with things like the below patch,
> right?
It takes care of one problem, but there are others, including
rcu_read_unlock() inovking the scheduler to deboost itself. So for the
moment, we still need the below patch.
Thanx, Paul
> ---
> commit 058ebd0eba3aff16b144eabf4510ed9510e1416e
> Author: Peter Zijlstra <peterz@infradead.org>
> Date: Fri Jul 12 11:08:33 2013 +0200
>
> perf: Fix perf_lock_task_context() vs RCU
>
> Jiri managed to trigger this warning:
>
> [] ======================================================
> [] [ INFO: possible circular locking dependency detected ]
> [] 3.10.0+ #228 Tainted: G W
> [] -------------------------------------------------------
> [] p/6613 is trying to acquire lock:
> [] (rcu_node_0){..-...}, at: [<ffffffff810ca797>] rcu_read_unlock_special+0xa7/0x250
> []
> [] but task is already holding lock:
> [] (&ctx->lock){-.-...}, at: [<ffffffff810f2879>] perf_lock_task_context+0xd9/0x2c0
> []
> [] which lock already depends on the new lock.
> []
> [] the existing dependency chain (in reverse order) is:
> []
> [] -> #4 (&ctx->lock){-.-...}:
> [] -> #3 (&rq->lock){-.-.-.}:
> [] -> #2 (&p->pi_lock){-.-.-.}:
> [] -> #1 (&rnp->nocb_gp_wq[1]){......}:
> [] -> #0 (rcu_node_0){..-...}:
>
> Paul was quick to explain that due to preemptible RCU we cannot call
> rcu_read_unlock() while holding scheduler (or nested) locks when part
> of the read side critical section was preemptible.
>
> Therefore solve it by making the entire RCU read side non-preemptible.
>
> Also pull out the retry from under the non-preempt to play nice with RT.
>
> Reported-by: Jiri Olsa <jolsa@redhat.com>
> Helped-out-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: <stable@kernel.org>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index ef5e7cc686e3..eba8fb5834ae 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -947,8 +947,18 @@ perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
> {
> struct perf_event_context *ctx;
>
> - rcu_read_lock();
> retry:
> + /*
> + * One of the few rules of preemptible RCU is that one cannot do
> + * rcu_read_unlock() while holding a scheduler (or nested) lock when
> + * part of the read side critical section was preemptible -- see
> + * rcu_read_unlock_special().
> + *
> + * Since ctx->lock nests under rq->lock we must ensure the entire read
> + * side critical section is non-preemptible.
> + */
> + preempt_disable();
> + rcu_read_lock();
> ctx = rcu_dereference(task->perf_event_ctxp[ctxn]);
> if (ctx) {
> /*
> @@ -964,6 +974,8 @@ perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
> raw_spin_lock_irqsave(&ctx->lock, *flags);
> if (ctx != rcu_dereference(task->perf_event_ctxp[ctxn])) {
> raw_spin_unlock_irqrestore(&ctx->lock, *flags);
> + rcu_read_unlock();
> + preempt_enable();
> goto retry;
> }
>
> @@ -973,6 +985,7 @@ perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
> }
> }
> rcu_read_unlock();
> + preempt_enable();
> return ctx;
> }
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [tip:core/rcu] rcu: Break call_rcu() deadlock involving scheduler and perf
2013-12-16 15:32 ` Paul E. McKenney
@ 2013-12-16 15:45 ` Peter Zijlstra
2013-12-16 16:10 ` Paul E. McKenney
0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2013-12-16 15:45 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: linux-kernel, mingo, hpa, tglx, davej, linux-tip-commits
On Mon, Dec 16, 2013 at 07:32:48AM -0800, Paul E. McKenney wrote:
> On Mon, Dec 16, 2013 at 04:26:36PM +0100, Peter Zijlstra wrote:
> > On Mon, Dec 16, 2013 at 07:19:22AM -0800, tip-bot for Paul E. McKenney wrote:
> > > The underlying problem is that perf is invoking call_rcu() with the
> > > scheduler locks held, but in NOCB mode, call_rcu() will with high
> > > probability invoke the scheduler -- which just might want to use its
> > > locks. The reason that call_rcu() needs to invoke the scheduler is
> > > to wake up the corresponding rcuo callback-offload kthread, which
> > > does the job of starting up a grace period and invoking the callbacks
> > > afterwards.
> > >
> > > One solution (championed on a related problem by Lai Jiangshan) is to
> > > simply defer the wakeup to some point where scheduler locks are no longer
> > > held. Since we don't want to unnecessarily incur the cost of such
> > > deferral, the task before us is threefold:
> > >
> > > 1. Determine when it is likely that a relevant scheduler lock is held.
> > >
> > > 2. Defer the wakeup in such cases.
> > >
> > > 3. Ensure that all deferred wakeups eventually happen, preferably
> > > sooner rather than later.
> > >
> > > We use irqs_disabled_flags() as a proxy for relevant scheduler locks
> > > being held. This works because the relevant locks are always acquired
> > > with interrupts disabled. We may defer more often than needed, but that
> > > is at least safe.
> >
> > This would also allow us to do away with things like the below patch,
> > right?
>
> It takes care of one problem, but there are others, including
> rcu_read_unlock() inovking the scheduler to deboost itself. So for the
> moment, we still need the below patch.
Oh right, see I knew I was forgetting something... :-)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [tip:core/rcu] rcu: Break call_rcu() deadlock involving scheduler and perf
2013-12-16 15:45 ` Peter Zijlstra
@ 2013-12-16 16:10 ` Paul E. McKenney
0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2013-12-16 16:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, mingo, hpa, tglx, davej, linux-tip-commits, laijs
On Mon, Dec 16, 2013 at 04:45:39PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 16, 2013 at 07:32:48AM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 16, 2013 at 04:26:36PM +0100, Peter Zijlstra wrote:
> > > On Mon, Dec 16, 2013 at 07:19:22AM -0800, tip-bot for Paul E. McKenney wrote:
> > > > The underlying problem is that perf is invoking call_rcu() with the
> > > > scheduler locks held, but in NOCB mode, call_rcu() will with high
> > > > probability invoke the scheduler -- which just might want to use its
> > > > locks. The reason that call_rcu() needs to invoke the scheduler is
> > > > to wake up the corresponding rcuo callback-offload kthread, which
> > > > does the job of starting up a grace period and invoking the callbacks
> > > > afterwards.
> > > >
> > > > One solution (championed on a related problem by Lai Jiangshan) is to
> > > > simply defer the wakeup to some point where scheduler locks are no longer
> > > > held. Since we don't want to unnecessarily incur the cost of such
> > > > deferral, the task before us is threefold:
> > > >
> > > > 1. Determine when it is likely that a relevant scheduler lock is held.
> > > >
> > > > 2. Defer the wakeup in such cases.
> > > >
> > > > 3. Ensure that all deferred wakeups eventually happen, preferably
> > > > sooner rather than later.
> > > >
> > > > We use irqs_disabled_flags() as a proxy for relevant scheduler locks
> > > > being held. This works because the relevant locks are always acquired
> > > > with interrupts disabled. We may defer more often than needed, but that
> > > > is at least safe.
> > >
> > > This would also allow us to do away with things like the below patch,
> > > right?
> >
> > It takes care of one problem, but there are others, including
> > rcu_read_unlock() inovking the scheduler to deboost itself. So for the
> > moment, we still need the below patch.
>
> Oh right, see I knew I was forgetting something... :-)
I am hoping to make your patch unnecessary, but it ain't trivial. ;-)
We will get there! Especially if I can find Lai Jiangshan's old patch
that reworked deboosting. :-/
Thanx, Paul
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-12-16 16:10 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <tip-96d3fd0d315a949e30adc80f086031c5cdf070d1@git.kernel.org>
2013-12-16 15:26 ` [tip:core/rcu] rcu: Break call_rcu() deadlock involving scheduler and perf Peter Zijlstra
2013-12-16 15:32 ` Paul E. McKenney
2013-12-16 15:45 ` Peter Zijlstra
2013-12-16 16:10 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox