From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>,
josh@joshtriplett.org, mathieu.desnoyers@efficios.com,
jiangshanlai@gmail.com, linux-kernel@vger.kernel.org,
sramana@codeaurora.org, prsood@codeaurora.org,
pkondeti@codeaurora.org, markivx@codeaurora.org,
peterz@infradead.org, byungchul.park@lge.com
Subject: Re: Query regarding synchronize_sched_expedited and resched_cpu
Date: Mon, 18 Sep 2017 16:53:11 -0700 [thread overview]
Message-ID: <20170918235311.GA20177@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170918165527.GN3521@linux.vnet.ibm.com>
On Mon, Sep 18, 2017 at 09:55:27AM -0700, Paul E. McKenney wrote:
> On Mon, Sep 18, 2017 at 12:29:31PM -0400, Steven Rostedt wrote:
> > On Mon, 18 Sep 2017 09:24:12 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >
> >
> > > As soon as I work through the backlog of lockdep complaints that
> > > appeared in the last merge window... :-(
> > >
> > > sparse_irq_lock, I am looking at you!!! ;-)
> >
> > I just hit one too, and decided to write a patch to show a chain of 3
> > when applicable.
> >
> > For example:
> >
> > Chain exists of:
> > cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
> >
> > Possible unsafe locking scenario by crosslock:
> >
> > CPU0 CPU1 CPU2
> > ---- ---- ----
> > lock(smpboot_threads_lock);
> > lock((complete)&self->parked);
> > lock(cpu_hotplug_lock.rw_sem);
> > lock(smpboot_threads_lock);
> > lock(cpu_hotplug_lock.rw_sem);
> > unlock((complete)&self->parked);
> >
> > *** DEADLOCK ***
> >
> > :-)
>
> Nice!!!
>
> My next step is reverting 12ac1d0f6c3e ("genirq: Make sparse_irq_lock
> protect what it should protect") to see if that helps.
No joy, but it is amazing how much nicer "git bisect" is when your
failure happens deterministically within 35 seconds. ;-)
The bisection converged to the range starting with 7a46ec0e2f48
("locking/refcounts, x86/asm: Implement fast refcount overflow
protection") and ending with 0c2364791343 ("Merge branch 'x86/asm'
into locking/core"). All of these failed with an unrelated build
error, but there was a fix that could be merged. This flagged
d0541b0fa64b ("locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part
of CONFIG_PROVE_LOCKING"), which unfortunately does not revert cleanly.
However, the effect of a reversion can be obtained by removing the
selects of LOCKDEP_CROSSRELEASE and LOCKDEP_COMPLETE from
PROVE_LOCKING, which allows recent commits to complete a short
rcutorture test successfully.
So, Byungchul, any enlightenment? Please see lockdep splat below.
Thanx, Paul
------------------------------------------------------------------------
[ 35.310179] ======================================================
[ 35.310749] WARNING: possible circular locking dependency detected
[ 35.310749] 4.13.0-rc4+ #1 Not tainted
[ 35.310749] ------------------------------------------------------
[ 35.310749] torture_onoff/766 is trying to acquire lock:
[ 35.313943] ((complete)&st->done){+.+.}, at: [<ffffffffb905f5a6>] takedown_cpu+0x86/0xf0
[ 35.313943]
[ 35.313943] but task is already holding lock:
[ 35.313943] (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
[ 35.313943]
[ 35.313943] which lock already depends on the new lock.
[ 35.313943]
[ 35.313943]
[ 35.313943] the existing dependency chain (in reverse order) is:
[ 35.313943]
[ 35.313943] -> #1 (sparse_irq_lock){+.+.}:
[ 35.313943] __mutex_lock+0x65/0x960
[ 35.313943] mutex_lock_nested+0x16/0x20
[ 35.313943] irq_lock_sparse+0x12/0x20
[ 35.313943] irq_affinity_online_cpu+0x13/0xd0
[ 35.313943] cpuhp_invoke_callback+0xa7/0x8b0
[ 35.313943]
[ 35.313943] -> #0 ((complete)&st->done){+.+.}:
[ 35.313943] check_prev_add+0x401/0x800
[ 35.313943] __lock_acquire+0x1100/0x11a0
[ 35.313943] lock_acquire+0x9e/0x1e0
[ 35.313943] wait_for_completion+0x36/0x130
[ 35.313943] takedown_cpu+0x86/0xf0
[ 35.313943] cpuhp_invoke_callback+0xa7/0x8b0
[ 35.313943] cpuhp_down_callbacks+0x3d/0x80
[ 35.313943] _cpu_down+0xbb/0xf0
[ 35.313943] do_cpu_down+0x39/0x50
[ 35.313943] cpu_down+0xb/0x10
[ 35.313943] torture_offline+0x75/0x140
[ 35.313943] torture_onoff+0x102/0x1e0
[ 35.313943] kthread+0x142/0x180
[ 35.313943] ret_from_fork+0x27/0x40
[ 35.313943]
[ 35.313943] other info that might help us debug this:
[ 35.313943]
[ 35.313943] Possible unsafe locking scenario:
[ 35.313943]
[ 35.313943] CPU0 CPU1
[ 35.313943] ---- ----
[ 35.313943] lock(sparse_irq_lock);
[ 35.313943] lock((complete)&st->done);
[ 35.313943] lock(sparse_irq_lock);
[ 35.313943] lock((complete)&st->done);
[ 35.313943]
[ 35.313943] *** DEADLOCK ***
[ 35.313943]
[ 35.313943] 3 locks held by torture_onoff/766:
[ 35.313943] #0: (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
[ 35.313943] #1: (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
[ 35.313943] #2: (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
[ 35.313943]
[ 35.313943] stack backtrace:
[ 35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
[ 35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 35.313943] Call Trace:
[ 35.313943] dump_stack+0x67/0x97
[ 35.313943] print_circular_bug+0x21d/0x330
[ 35.313943] ? add_lock_to_list.isra.31+0xc0/0xc0
[ 35.313943] check_prev_add+0x401/0x800
[ 35.313943] ? wake_up_q+0x70/0x70
[ 35.313943] __lock_acquire+0x1100/0x11a0
[ 35.313943] ? __lock_acquire+0x1100/0x11a0
[ 35.313943] ? add_lock_to_list.isra.31+0xc0/0xc0
[ 35.313943] lock_acquire+0x9e/0x1e0
[ 35.313943] ? takedown_cpu+0x86/0xf0
[ 35.313943] wait_for_completion+0x36/0x130
[ 35.313943] ? takedown_cpu+0x86/0xf0
[ 35.313943] ? stop_machine_cpuslocked+0xb9/0xd0
[ 35.313943] ? cpuhp_invoke_callback+0x8b0/0x8b0
[ 35.313943] ? cpuhp_complete_idle_dead+0x10/0x10
[ 35.313943] takedown_cpu+0x86/0xf0
[ 35.313943] cpuhp_invoke_callback+0xa7/0x8b0
[ 35.313943] cpuhp_down_callbacks+0x3d/0x80
[ 35.313943] _cpu_down+0xbb/0xf0
[ 35.313943] do_cpu_down+0x39/0x50
[ 35.313943] cpu_down+0xb/0x10
[ 35.313943] torture_offline+0x75/0x140
[ 35.313943] torture_onoff+0x102/0x1e0
[ 35.313943] kthread+0x142/0x180
[ 35.313943] ? torture_kthread_stopping+0x70/0x70
[ 35.313943] ? kthread_create_on_node+0x40/0x40
[ 35.313943] ret_from_fork+0x27/0x40
next prev parent reply other threads:[~2017-09-18 23:53 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-15 11:14 Query regarding synchronize_sched_expedited and resched_cpu Neeraj Upadhyay
2017-09-17 1:00 ` Paul E. McKenney
2017-09-17 6:07 ` Neeraj Upadhyay
2017-09-18 15:11 ` Steven Rostedt
2017-09-18 16:01 ` Paul E. McKenney
2017-09-18 16:12 ` Steven Rostedt
2017-09-18 16:24 ` Paul E. McKenney
2017-09-18 16:29 ` Steven Rostedt
2017-09-18 16:55 ` Paul E. McKenney
2017-09-18 23:53 ` Paul E. McKenney [this message]
2017-09-19 1:23 ` Steven Rostedt
2017-09-19 2:26 ` Paul E. McKenney
2017-09-19 1:50 ` Byungchul Park
2017-09-19 2:06 ` Byungchul Park
2017-09-19 2:33 ` Paul E. McKenney
2017-09-19 2:48 ` Byungchul Park
2017-09-19 4:04 ` Paul E. McKenney
2017-09-19 5:37 ` Boqun Feng
2017-09-19 6:11 ` Mike Galbraith
2017-09-19 6:53 ` Byungchul Park
2017-09-19 13:40 ` Paul E. McKenney
2017-09-21 13:57 ` Peter Zijlstra
2017-09-21 15:33 ` Paul E. McKenney
2017-09-19 1:55 ` Byungchul Park
2017-09-19 15:31 ` Paul E. McKenney
2017-09-19 15:58 ` Steven Rostedt
2017-09-19 16:12 ` Paul E. McKenney
2017-09-21 13:59 ` Peter Zijlstra
2017-09-21 16:00 ` Paul E. McKenney
2017-09-21 16:30 ` Peter Zijlstra
2017-09-21 16:47 ` Paul E. McKenney
2017-09-21 13:55 ` Peter Zijlstra
2017-09-21 15:31 ` Paul E. McKenney
2017-09-21 16:18 ` Peter Zijlstra
2017-09-21 15:46 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170918235311.GA20177@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=byungchul.park@lge.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=markivx@codeaurora.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraju@codeaurora.org \
--cc=peterz@infradead.org \
--cc=pkondeti@codeaurora.org \
--cc=prsood@codeaurora.org \
--cc=rostedt@goodmis.org \
--cc=sramana@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.