From: Mark Rutland <mark.rutland@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>, Vince Weaver <vince@deater.net>,
Ingo Molnar <mingo@redhat.com>, Andi Kleen <ak@linux.intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@suse.de>,
Will Deacon <will.deacon@arm.com>
Subject: Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
Date: Tue, 20 Jan 2015 13:39:47 +0000 [thread overview]
Message-ID: <20150120133947.GD15924@leverpostej> (raw)
In-Reply-To: <20150119174009.GI21553@leverpostej>
On Mon, Jan 19, 2015 at 05:40:09PM +0000, Mark Rutland wrote:
> On Mon, Jan 19, 2015 at 02:40:28PM +0000, Mark Rutland wrote:
> > On Fri, Jan 16, 2015 at 02:11:04PM +0000, Peter Zijlstra wrote:
> > > On Fri, Jan 16, 2015 at 11:46:44AM +0100, Peter Zijlstra wrote:
> > > > Its a bandaid at best :/ The problem is (again) that we changes
> > > > event->ctx without any kind of serialization.
> > > >
> > > > The issue came up before:
> > > >
> > > > https://lkml.org/lkml/2014/9/5/397
> >
> > In the end neither the CCI or CCN perf drivers migrate events on
> > hotplug, so ARM is currently safe from the perf_pmu_migrate_context
> > case, but I see that you fix the move_group handling too.
> >
> > I had a go at testing this by hacking migration back into the CCI PMU
> > driver (atop of v3.19-rc5), but I'm seeing lockups after a few minutes
> > with my original test case (https://lkml.org/lkml/2014/9/1/569 with
> > PMU_TYPE and PMU_EVENT fixed up).
> >
> > I unfortunately don't have a suitable x86 box spare to run that on.
> > Would someone be able to give it a spin on something with an uncore PMU?
> >
> > I'll go and dig a bit further. I may just be hitting another latent
> > issue on my board.
>
> I'm able to trigger the lockups even without both your patch and the
> call to perf_pmu_migrate_context, so there is a latent issue.
>
> On vanilla v3.19-rc5 and vanilla v3.18, I'm able to get my hotplug
> script hung when run concurrently with the test case against the CCI PMU
> driver (without migration). The v3.18 and v3.19-rc5 lockups are
> identical:
>
> INFO: task hpall.sh:1506 blocked for more than 120 seconds.
> Not tainted 3.19.0-rc5 #9
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> hpall.sh D 804a6ffc 0 1506 1497 0x00000000
> [<804a6ffc>] (__schedule) from [<80022308>] (cpu_hotplug_begin+0xa0/0xac)
> [<80022308>] (cpu_hotplug_begin) from [<8002236c>] (_cpu_up+0x24/0x180)
> [<8002236c>] (_cpu_up) from [<8002253c>] (cpu_up+0x74/0x98)
> [<8002253c>] (cpu_up) from [<802bce60>] (device_online+0x64/0x90)
> [<802bce60>] (device_online) from [<802bcef4>] (online_store+0x68/0x74)
> [<802bcef4>] (online_store) from [<8014059c>] (kernfs_fop_write+0xbc/0x1a0)
> [<8014059c>] (kernfs_fop_write) from [<800e71b0>] (vfs_write+0xa0/0x1ac)
> [<800e71b0>] (vfs_write) from [<800e7808>] (SyS_write+0x44/0x9c)
> [<800e7808>] (SyS_write) from [<8000e560>] (ret_fast_syscall+0x0/0x48)
> 7 locks held by hpall.sh/1506:
> #0: (sb_writers#6){.+.+.+}, at: [<800e729c>] vfs_write+0x18c/0x1ac
> #1: (&of->mutex){+.+.+.}, at: [<8014052c>] kernfs_fop_write+0x4c/0x1a0
> #2: (s_active#15){.+.+.+}, at: [<80140534>] kernfs_fop_write+0x54/0x1a0
> #3: (device_hotplug_lock){+.+.+.}, at: [<802bbe44>] lock_device_hotplug_sysfs+0xc/0x4c
> #4: (&dev->mutex){......}, at: [<802bce14>] device_online+0x18/0x90
> #5: (cpu_add_remove_lock){+.+.+.}, at: [<80022508>] cpu_up+0x40/0x98
> #6: (cpu_hotplug.lock){++++++}, at: [<80022268>] cpu_hotplug_begin+0x0/0xac
>
> I guess that lockup is my fundamental issue, and with your patch the
> perf_rwsem manages to spread a transitive dependency on one of those
> locks all over the perf subsystem. I haven't considered that in great
> detail, however.
I found that I couldn't trigger the issue with v3.17, and I was able to
bisect down to commit b2c4623dcd07af4b ("rcu: More on deadlock between
CPU hotplug and expedited grace periods").
I'm currently stressing b2c4623dcd07af4b~1 to make sure my bisect hasn't
mislead me.
Thanks,
Mark.
next prev parent reply other threads:[~2015-01-20 13:40 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-16 7:57 perf fuzzer crash [PATCH] perf: Get group events reference before moving the group Jiri Olsa
2015-01-16 10:46 ` Peter Zijlstra
2015-01-16 14:11 ` Peter Zijlstra
2015-01-16 18:54 ` Vince Weaver
2015-01-19 3:49 ` Vince Weaver
2015-01-18 14:13 ` Ingo Molnar
2015-01-19 14:40 ` Mark Rutland
2015-01-19 17:40 ` Mark Rutland
2015-01-20 13:39 ` Mark Rutland [this message]
2015-01-20 14:35 ` Mark Rutland
2015-01-21 1:00 ` Paul E. McKenney
2015-01-21 12:08 ` Mark Rutland
2015-01-21 20:07 ` Paul E. McKenney
-- strict thread matches above, loose matches on Subject: below --
2015-01-19 18:09 Vince Weaver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150120133947.GD15924@leverpostej \
--to=mark.rutland@arm.com \
--cc=ak@linux.intel.com \
--cc=bp@suse.de \
--cc=jkosina@suse.cz \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=vince@deater.net \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.