From: Mark Rutland <mark.rutland@arm.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>, Vince Weaver <vince@deater.net>,
Ingo Molnar <mingo@redhat.com>, Andi Kleen <ak@linux.intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@suse.de>,
Will Deacon <Will.Deacon@arm.com>
Subject: Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
Date: Wed, 21 Jan 2015 12:08:12 +0000 [thread overview]
Message-ID: <20150121120812.GF5044@leverpostej> (raw)
In-Reply-To: <20150121010048.GU9719@linux.vnet.ibm.com>
[...]
> > > > On vanilla v3.19-rc5 and vanilla v3.18, I'm able to get my hotplug
> > > > script hung when run concurrently with the test case against the CCI PMU
> > > > driver (without migration). The v3.18 and v3.19-rc5 lockups are
> > > > identical:
> > > >
> > > > INFO: task hpall.sh:1506 blocked for more than 120 seconds.
> > > > Not tainted 3.19.0-rc5 #9
> > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > hpall.sh D 804a6ffc 0 1506 1497 0x00000000
> > > > [<804a6ffc>] (__schedule) from [<80022308>] (cpu_hotplug_begin+0xa0/0xac)
> > > > [<80022308>] (cpu_hotplug_begin) from [<8002236c>] (_cpu_up+0x24/0x180)
> > > > [<8002236c>] (_cpu_up) from [<8002253c>] (cpu_up+0x74/0x98)
> > > > [<8002253c>] (cpu_up) from [<802bce60>] (device_online+0x64/0x90)
> > > > [<802bce60>] (device_online) from [<802bcef4>] (online_store+0x68/0x74)
> > > > [<802bcef4>] (online_store) from [<8014059c>] (kernfs_fop_write+0xbc/0x1a0)
> > > > [<8014059c>] (kernfs_fop_write) from [<800e71b0>] (vfs_write+0xa0/0x1ac)
> > > > [<800e71b0>] (vfs_write) from [<800e7808>] (SyS_write+0x44/0x9c)
> > > > [<800e7808>] (SyS_write) from [<8000e560>] (ret_fast_syscall+0x0/0x48)
> > > > 7 locks held by hpall.sh/1506:
> > > > #0: (sb_writers#6){.+.+.+}, at: [<800e729c>] vfs_write+0x18c/0x1ac
> > > > #1: (&of->mutex){+.+.+.}, at: [<8014052c>] kernfs_fop_write+0x4c/0x1a0
> > > > #2: (s_active#15){.+.+.+}, at: [<80140534>] kernfs_fop_write+0x54/0x1a0
> > > > #3: (device_hotplug_lock){+.+.+.}, at: [<802bbe44>] lock_device_hotplug_sysfs+0xc/0x4c
> > > > #4: (&dev->mutex){......}, at: [<802bce14>] device_online+0x18/0x90
> > > > #5: (cpu_add_remove_lock){+.+.+.}, at: [<80022508>] cpu_up+0x40/0x98
> > > > #6: (cpu_hotplug.lock){++++++}, at: [<80022268>] cpu_hotplug_begin+0x0/0xac
> > > >
> > > > I guess that lockup is my fundamental issue, and with your patch the
> > > > perf_rwsem manages to spread a transitive dependency on one of those
> > > > locks all over the perf subsystem. I haven't considered that in great
> > > > detail, however.
> > >
> > > I found that I couldn't trigger the issue with v3.17, and I was able to
> > > bisect down to commit b2c4623dcd07af4b ("rcu: More on deadlock between
> > > CPU hotplug and expedited grace periods").
> > >
> > > I'm currently stressing b2c4623dcd07af4b~1 to make sure my bisect hasn't
> > > mislead me.
> >
> > That seems to be solid, and I think I see what's going on.
> >
> > The task doing hotplug (hpall.sh:1506) gets to cpu_hotplug_begin(), and
> > sets cpu_hotplug.active_writer to current (I assume writes to this are
> > protected by cpu_add_remove_lock from cpu_up()?). Then it loops, acquiring
> > cpu_hotplug.lock and testing the refcount, and if non-zero dropping the
> > lock and going into uninterruptible sleep, expecting to be woken by
> > put_online_cpus().
> >
> > Concurrently a task holding the refcount non-zero calls
> > put_online_cpus(), and finds there to be contention on cpu_hotplug.lock.
> > Thus it increments cpu_hotplug.puts_pending and goes of on its merry
> > way, without trying to wake the writer.
> >
> > So the writer is never woken and never gets to handle the non-zero
> > cpu_hotplug.puts_pending.
> >
> > I'm not sure what the right fix for that is. It looks like the writer
> > could observe the change to puts_pending and so
> > cpu_hotplug.active_writer could change under our feet unless we hold
> > cpu_hotplug.lock. But holding that would reintroduce the deadlock
> > b2c4623dcd07af4b was trying to avoid.
> >
> > Any ideas?
>
> You need 87af9e7ff9d90 (hotplugcpu: Avoid deadlocks by waking active_writer),
> which is in -rcu at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
>
> With some luck, this will be in -tip soon, and hit mainline during
> the next merge window.
Thanks Paul, that fixes the issue for me.
Peter, with that fix applied in addition to your patch, I don't see the
CCI PMU code exploding around hotplug, even with event migration hacked
into the driver.
Thanks,
Mark.
next prev parent reply other threads:[~2015-01-21 12:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-16 7:57 perf fuzzer crash [PATCH] perf: Get group events reference before moving the group Jiri Olsa
2015-01-16 10:46 ` Peter Zijlstra
2015-01-16 14:11 ` Peter Zijlstra
2015-01-16 18:54 ` Vince Weaver
2015-01-19 3:49 ` Vince Weaver
2015-01-18 14:13 ` Ingo Molnar
2015-01-19 14:40 ` Mark Rutland
2015-01-19 17:40 ` Mark Rutland
2015-01-20 13:39 ` Mark Rutland
2015-01-20 14:35 ` Mark Rutland
2015-01-21 1:00 ` Paul E. McKenney
2015-01-21 12:08 ` Mark Rutland [this message]
2015-01-21 20:07 ` Paul E. McKenney
-- strict thread matches above, loose matches on Subject: below --
2015-01-19 18:09 Vince Weaver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150121120812.GF5044@leverpostej \
--to=mark.rutland@arm.com \
--cc=Will.Deacon@arm.com \
--cc=ak@linux.intel.com \
--cc=bp@suse.de \
--cc=jkosina@suse.cz \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=vince@deater.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox