Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mark Rutland <mark.rutland@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>, Vince Weaver <vince@deater.net>,
	Ingo Molnar <mingo@redhat.com>, Andi Kleen <ak@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@suse.de>,
	Will Deacon <will.deacon@arm.com>
Subject: Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
Date: Tue, 20 Jan 2015 13:39:47 +0000	[thread overview]
Message-ID: <20150120133947.GD15924@leverpostej> (raw)
In-Reply-To: <20150119174009.GI21553@leverpostej>

On Mon, Jan 19, 2015 at 05:40:09PM +0000, Mark Rutland wrote:
> On Mon, Jan 19, 2015 at 02:40:28PM +0000, Mark Rutland wrote:
> > On Fri, Jan 16, 2015 at 02:11:04PM +0000, Peter Zijlstra wrote:
> > > On Fri, Jan 16, 2015 at 11:46:44AM +0100, Peter Zijlstra wrote:
> > > > Its a bandaid at best :/ The problem is (again) that we changes
> > > > event->ctx without any kind of serialization.
> > > >
> > > > The issue came up before:
> > > >
> > > >   https://lkml.org/lkml/2014/9/5/397
> > 
> > In the end neither the CCI or CCN perf drivers migrate events on
> > hotplug, so ARM is currently safe from the perf_pmu_migrate_context
> > case, but I see that you fix the move_group handling too.
> > 
> > I had a go at testing this by hacking migration back into the CCI PMU
> > driver (atop of v3.19-rc5), but I'm seeing lockups after a few minutes
> > with my original test case (https://lkml.org/lkml/2014/9/1/569 with
> > PMU_TYPE and PMU_EVENT fixed up).
> > 
> > I unfortunately don't have a suitable x86 box spare to run that on.
> > Would someone be able to give it a spin on something with an uncore PMU?
> > 
> > I'll go and dig a bit further. I may just be hitting another latent
> > issue on my board.
> 
> I'm able to trigger the lockups even without both your patch and the
> call to perf_pmu_migrate_context, so there is a latent issue.
> 
> On vanilla v3.19-rc5 and vanilla v3.18, I'm able to get my hotplug
> script hung when run concurrently with the test case against the CCI PMU
> driver (without migration). The v3.18 and v3.19-rc5 lockups are
> identical:
> 
> INFO: task hpall.sh:1506 blocked for more than 120 seconds.
>       Not tainted 3.19.0-rc5 #9
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> hpall.sh        D 804a6ffc     0  1506   1497 0x00000000
> [<804a6ffc>] (__schedule) from [<80022308>] (cpu_hotplug_begin+0xa0/0xac)
> [<80022308>] (cpu_hotplug_begin) from [<8002236c>] (_cpu_up+0x24/0x180)
> [<8002236c>] (_cpu_up) from [<8002253c>] (cpu_up+0x74/0x98)
> [<8002253c>] (cpu_up) from [<802bce60>] (device_online+0x64/0x90)
> [<802bce60>] (device_online) from [<802bcef4>] (online_store+0x68/0x74)
> [<802bcef4>] (online_store) from [<8014059c>] (kernfs_fop_write+0xbc/0x1a0)
> [<8014059c>] (kernfs_fop_write) from [<800e71b0>] (vfs_write+0xa0/0x1ac)
> [<800e71b0>] (vfs_write) from [<800e7808>] (SyS_write+0x44/0x9c)
> [<800e7808>] (SyS_write) from [<8000e560>] (ret_fast_syscall+0x0/0x48)
> 7 locks held by hpall.sh/1506:
>  #0:  (sb_writers#6){.+.+.+}, at: [<800e729c>] vfs_write+0x18c/0x1ac
>  #1:  (&of->mutex){+.+.+.}, at: [<8014052c>] kernfs_fop_write+0x4c/0x1a0
>  #2:  (s_active#15){.+.+.+}, at: [<80140534>] kernfs_fop_write+0x54/0x1a0
>  #3:  (device_hotplug_lock){+.+.+.}, at: [<802bbe44>] lock_device_hotplug_sysfs+0xc/0x4c
>  #4:  (&dev->mutex){......}, at: [<802bce14>] device_online+0x18/0x90
>  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<80022508>] cpu_up+0x40/0x98
>  #6:  (cpu_hotplug.lock){++++++}, at: [<80022268>] cpu_hotplug_begin+0x0/0xac
> 
> I guess that lockup is my fundamental issue, and with your patch the
> perf_rwsem manages to spread a transitive dependency on one of those
> locks all over the perf subsystem. I haven't considered that in great
> detail, however.

I found that I couldn't trigger the issue with v3.17, and I was able to
bisect down to commit b2c4623dcd07af4b ("rcu: More on deadlock between
CPU hotplug and expedited grace periods").

I'm currently stressing b2c4623dcd07af4b~1 to make sure my bisect hasn't
mislead me.

Thanks,
Mark.

next prev parent reply	other threads:[~2015-01-20 13:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-16  7:57 perf fuzzer crash [PATCH] perf: Get group events reference before moving the group Jiri Olsa
2015-01-16 10:46 ` Peter Zijlstra
2015-01-16 14:11   ` Peter Zijlstra
2015-01-16 18:54     ` Vince Weaver
2015-01-19  3:49       ` Vince Weaver
2015-01-18 14:13     ` Ingo Molnar
2015-01-19 14:40     ` Mark Rutland
2015-01-19 17:40       ` Mark Rutland
2015-01-20 13:39         ` Mark Rutland [this message]
2015-01-20 14:35           ` Mark Rutland
2015-01-21  1:00             ` Paul E. McKenney
2015-01-21 12:08               ` Mark Rutland
2015-01-21 20:07                 ` Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2015-01-19 18:09 Vince Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150120133947.GD15924@leverpostej \
    --to=mark.rutland@arm.com \
    --cc=ak@linux.intel.com \
    --cc=bp@suse.de \
    --cc=jkosina@suse.cz \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vince@deater.net \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox