All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>,
	josh@joshtriplett.org, mathieu.desnoyers@efficios.com,
	jiangshanlai@gmail.com, linux-kernel@vger.kernel.org,
	sramana@codeaurora.org, prsood@codeaurora.org,
	pkondeti@codeaurora.org, markivx@codeaurora.org,
	peterz@infradead.org, byungchul.park@lge.com
Subject: Re: Query regarding synchronize_sched_expedited and resched_cpu
Date: Mon, 18 Sep 2017 19:26:58 -0700	[thread overview]
Message-ID: <20170919022658.GZ3521@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170918212301.417c8b36@gandalf.local.home>

On Mon, Sep 18, 2017 at 09:23:01PM -0400, Steven Rostedt wrote:
> On Mon, 18 Sep 2017 16:53:11 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Mon, Sep 18, 2017 at 09:55:27AM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 18, 2017 at 12:29:31PM -0400, Steven Rostedt wrote:  
> > > > On Mon, 18 Sep 2017 09:24:12 -0700
> > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > 
> > > >   
> > > > > As soon as I work through the backlog of lockdep complaints that
> > > > > appeared in the last merge window...  :-(
> > > > > 
> > > > > sparse_irq_lock, I am looking at you!!!  ;-)  
> > > > 
> > > > I just hit one too, and decided to write a patch to show a chain of 3
> > > > when applicable.
> > > > 
> > > > For example:
> > > > 
> > > >  Chain exists of:
> > > >    cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
> > > > 
> > > >   Possible unsafe locking scenario by crosslock:
> > > > 
> > > >         CPU0                    CPU1                    CPU2
> > > >         ----                    ----                    ----
> > > >    lock(smpboot_threads_lock);
> > > >    lock((complete)&self->parked);
> > > >                                 lock(cpu_hotplug_lock.rw_sem);
> > > >                                 lock(smpboot_threads_lock);
> > > >                                                        lock(cpu_hotplug_lock.rw_sem);
> > > >                                                        unlock((complete)&self->parked);
> > > > 
> > > >   *** DEADLOCK ***
> > > > 
> > > > :-)  
> > > 
> > > Nice!!!
> 
> Note, the above lockdep splat does discover a bug.

Fair enough, but I unfortunately have several other much more bizarre
bugs stacked up and so I am not volunteering to fix this one.

> > > My next step is reverting 12ac1d0f6c3e ("genirq: Make sparse_irq_lock
> > > protect what it should protect") to see if that helps.  
> > 
> > No joy, but it is amazing how much nicer "git bisect" is when your
> > failure happens deterministically within 35 seconds.  ;-)
> > 
> > The bisection converged to the range starting with 7a46ec0e2f48
> > ("locking/refcounts, x86/asm: Implement fast refcount overflow
> > protection") and ending with 0c2364791343 ("Merge branch 'x86/asm'
> > into locking/core").  All of these failed with an unrelated build
> > error, but there was a fix that could be merged.  This flagged
> > d0541b0fa64b ("locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part
> > of CONFIG_PROVE_LOCKING"), which unfortunately does not revert cleanly.
> > However, the effect of a reversion can be obtained by removing the
> > selects of LOCKDEP_CROSSRELEASE and LOCKDEP_COMPLETE from
> > PROVE_LOCKING, which allows recent commits to complete a short
> > rcutorture test successfully.
> 
> I don't think you want to remove those. It appears that lockdep now
> covers completions, and it is uncovering a lot of bugs.

Actually, I do, at least in the short term.  This splat is getting in the
way of my diagnostics for the other bugs.  Please note that I am -not-
arguing that mainline should change, at least not yet.

> > So, Byungchul, any enlightenment?  Please see lockdep splat below.
> 
> Did you discover the below by reverting lockdep patches? It doesn't
> really make sense. It looks to me to be about completions but not
> fully covering it.

No, the splat below is what I get from stock v4.14-rc1 on these
rcutorture scenarios:  SRCU-P, TASKS01, TREE03, and TREE05.  If you
would like to try it yourself, TASKS01 requires only two CPUs and
the others require eight.

When I suppress LOCKDEP_CROSSRELEASE and LOCKDEP_COMPLETE, I don't
see anything that looks like that deadlock, but it is of course quite
possible that the deadlock is very low probability -- and I did short
30-minute runs.

							Thanx, Paul

> -- Steve
> 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > [   35.310179] ======================================================
> > [   35.310749] WARNING: possible circular locking dependency detected
> > [   35.310749] 4.13.0-rc4+ #1 Not tainted
> > [   35.310749] ------------------------------------------------------
> > [   35.310749] torture_onoff/766 is trying to acquire lock:
> > [   35.313943]  ((complete)&st->done){+.+.}, at: [<ffffffffb905f5a6>] takedown_cpu+0x86/0xf0
> > [   35.313943] 
> > [   35.313943] but task is already holding lock:
> > [   35.313943]  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > [   35.313943] 
> > [   35.313943] which lock already depends on the new lock.
> > [   35.313943] 
> > [   35.313943] 
> > [   35.313943] the existing dependency chain (in reverse order) is:
> > [   35.313943] 
> > [   35.313943] -> #1 (sparse_irq_lock){+.+.}:
> > [   35.313943]        __mutex_lock+0x65/0x960
> > [   35.313943]        mutex_lock_nested+0x16/0x20
> > [   35.313943]        irq_lock_sparse+0x12/0x20
> > [   35.313943]        irq_affinity_online_cpu+0x13/0xd0
> > [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> > [   35.313943] 
> > [   35.313943] -> #0 ((complete)&st->done){+.+.}:
> > [   35.313943]        check_prev_add+0x401/0x800
> > [   35.313943]        __lock_acquire+0x1100/0x11a0
> > [   35.313943]        lock_acquire+0x9e/0x1e0
> > [   35.313943]        wait_for_completion+0x36/0x130
> > [   35.313943]        takedown_cpu+0x86/0xf0
> > [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> > [   35.313943]        cpuhp_down_callbacks+0x3d/0x80
> > [   35.313943]        _cpu_down+0xbb/0xf0
> > [   35.313943]        do_cpu_down+0x39/0x50
> > [   35.313943]        cpu_down+0xb/0x10
> > [   35.313943]        torture_offline+0x75/0x140
> > [   35.313943]        torture_onoff+0x102/0x1e0
> > [   35.313943]        kthread+0x142/0x180
> > [   35.313943]        ret_from_fork+0x27/0x40
> > [   35.313943] 
> > [   35.313943] other info that might help us debug this:
> > [   35.313943] 
> > [   35.313943]  Possible unsafe locking scenario:
> > [   35.313943] 
> > [   35.313943]        CPU0                    CPU1
> > [   35.313943]        ----                    ----
> > [   35.313943]   lock(sparse_irq_lock);
> > [   35.313943]                                lock((complete)&st->done);
> > [   35.313943]                                lock(sparse_irq_lock);
> > [   35.313943]   lock((complete)&st->done);
> > [   35.313943] 
> > [   35.313943]  *** DEADLOCK ***
> > [   35.313943] 
> > [   35.313943] 3 locks held by torture_onoff/766:
> > [   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
> > [   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
> > [   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > [   35.313943] 
> > [   35.313943] stack backtrace:
> > [   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
> > [   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > [   35.313943] Call Trace:
> > [   35.313943]  dump_stack+0x67/0x97
> > [   35.313943]  print_circular_bug+0x21d/0x330
> > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > [   35.313943]  check_prev_add+0x401/0x800
> > [   35.313943]  ? wake_up_q+0x70/0x70
> > [   35.313943]  __lock_acquire+0x1100/0x11a0
> > [   35.313943]  ? __lock_acquire+0x1100/0x11a0
> > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > [   35.313943]  lock_acquire+0x9e/0x1e0
> > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > [   35.313943]  wait_for_completion+0x36/0x130
> > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > [   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
> > [   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
> > [   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
> > [   35.313943]  takedown_cpu+0x86/0xf0
> > [   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
> > [   35.313943]  cpuhp_down_callbacks+0x3d/0x80
> > [   35.313943]  _cpu_down+0xbb/0xf0
> > [   35.313943]  do_cpu_down+0x39/0x50
> > [   35.313943]  cpu_down+0xb/0x10
> > [   35.313943]  torture_offline+0x75/0x140
> > [   35.313943]  torture_onoff+0x102/0x1e0
> > [   35.313943]  kthread+0x142/0x180
> > [   35.313943]  ? torture_kthread_stopping+0x70/0x70
> > [   35.313943]  ? kthread_create_on_node+0x40/0x40
> > [   35.313943]  ret_from_fork+0x27/0x40
> 

  reply	other threads:[~2017-09-19  2:27 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15 11:14 Query regarding synchronize_sched_expedited and resched_cpu Neeraj Upadhyay
2017-09-17  1:00 ` Paul E. McKenney
2017-09-17  6:07   ` Neeraj Upadhyay
2017-09-18 15:11     ` Steven Rostedt
2017-09-18 16:01       ` Paul E. McKenney
2017-09-18 16:12         ` Steven Rostedt
2017-09-18 16:24           ` Paul E. McKenney
2017-09-18 16:29             ` Steven Rostedt
2017-09-18 16:55               ` Paul E. McKenney
2017-09-18 23:53                 ` Paul E. McKenney
2017-09-19  1:23                   ` Steven Rostedt
2017-09-19  2:26                     ` Paul E. McKenney [this message]
2017-09-19  1:50                   ` Byungchul Park
2017-09-19  2:06                     ` Byungchul Park
2017-09-19  2:33                       ` Paul E. McKenney
2017-09-19  2:48                         ` Byungchul Park
2017-09-19  4:04                           ` Paul E. McKenney
2017-09-19  5:37                             ` Boqun Feng
2017-09-19  6:11                               ` Mike Galbraith
2017-09-19  6:53                                 ` Byungchul Park
2017-09-19 13:40                                 ` Paul E. McKenney
2017-09-21 13:57                 ` Peter Zijlstra
2017-09-21 15:33                   ` Paul E. McKenney
2017-09-19  1:55               ` Byungchul Park
2017-09-19 15:31             ` Paul E. McKenney
2017-09-19 15:58               ` Steven Rostedt
2017-09-19 16:12                 ` Paul E. McKenney
2017-09-21 13:59               ` Peter Zijlstra
2017-09-21 16:00                 ` Paul E. McKenney
2017-09-21 16:30                   ` Peter Zijlstra
2017-09-21 16:47                     ` Paul E. McKenney
2017-09-21 13:55       ` Peter Zijlstra
2017-09-21 15:31         ` Paul E. McKenney
2017-09-21 16:18           ` Peter Zijlstra
2017-09-21 15:46         ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170919022658.GZ3521@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=byungchul.park@lge.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markivx@codeaurora.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=neeraju@codeaurora.org \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=prsood@codeaurora.org \
    --cc=rostedt@goodmis.org \
    --cc=sramana@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.