public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Yan@leverpostej.cambridge.arm.com" 
	<Yan@leverpostej.cambridge.arm.com>,
	Zheng <zheng.z.yan@intel.com>,
	Stephane Eranian <eranian@google.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: Possible race between CPU hotplug and perf_pmu_migrate_context
Date: Tue, 2 Sep 2014 19:58:07 +0100	[thread overview]
Message-ID: <20140902185807.GA7434@leverpostej> (raw)
In-Reply-To: <20140901190534.GC5806@worktop.ger.corp.intel.com>

On Mon, Sep 01, 2014 at 08:05:34PM +0100, Peter Zijlstra wrote:
> On Mon, Sep 01, 2014 at 07:18:08PM +0100, Mark Rutland wrote:
> > Hi all,
> 
> > [   66.780759]  [<ffffffff8109dd33>] rcu_process_callbacks+0x1e3/0x540
> 
> > Has anything seen anything like this before? Is this a known issue?
> 
> I've not seen it reported.. sounds like 'fun' though.
> 

This has been a tremendous source of 'fun' so far...

The rcu_process_callbacks line is a red herring. What seems to be
happening is:

A CPU goes down, and perf_pmu_migrate_context removes all events from
per_cpu_ptr(pmu->pmu_cpu_context, src_cpu)->ctx. The events are in a
state of limbo, with their ctx pointers pointing at the old context,
whose refcount is 1. The src_ctx->mutex is unlocked.

Concurrently on another CPU the fds are closed, and perf_event_release
goes and removes each event from their event->ctx. We skip the double
detach in list_del_event and carry on to __free_event where we put_ctx
the old context for a second time for each event. The refcount goes to 0
and we queue a kfree_rcu of the context (inside the PMU's percpu
perf_event_cpu_context, allocated with alloc_percpu).

We run the queued kfree_rcu, and explode trying to kfree something we
didn't k*alloc. I'm not sure when exactly we run the queued kfree_rcu
w.r.t. everything else.

So the problem here seems to be a race between the
perf_pmu_migrate_context and something down the perf_event_release
callchain.

Any ideas?

Thanks,
Mark.

  reply	other threads:[~2014-09-02 18:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-01 18:18 Possible race between CPU hotplug and perf_pmu_migrate_context Mark Rutland
2014-09-01 19:05 ` Peter Zijlstra
2014-09-02 18:58   ` Mark Rutland [this message]
2014-09-03 11:50     ` Mark Rutland
2014-09-04 10:44       ` Peter Zijlstra
2014-09-04 11:07         ` Mark Rutland
2014-09-05 15:16           ` Peter Zijlstra
2014-09-05 15:41             ` Linus Torvalds
2014-09-05 16:50               ` Vince Weaver
2014-09-05 16:59               ` Mark Rutland
2014-09-05 17:31                 ` Linus Torvalds
2014-09-05 19:54                   ` Peter Zijlstra
2014-09-08  8:39                   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140902185807.GA7434@leverpostej \
    --to=mark.rutland@arm.com \
    --cc=Yan@leverpostej.cambridge.arm.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=zheng.z.yan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox