All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Yan@leverpostej.cambridge.arm.com" 
	<Yan@leverpostej.cambridge.arm.com>,
	Zheng <zheng.z.yan@intel.com>,
	Stephane Eranian <eranian@google.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: Possible race between CPU hotplug and perf_pmu_migrate_context
Date: Tue, 2 Sep 2014 19:58:07 +0100	[thread overview]
Message-ID: <20140902185807.GA7434@leverpostej> (raw)
In-Reply-To: <20140901190534.GC5806@worktop.ger.corp.intel.com>

On Mon, Sep 01, 2014 at 08:05:34PM +0100, Peter Zijlstra wrote:
> On Mon, Sep 01, 2014 at 07:18:08PM +0100, Mark Rutland wrote:
> > Hi all,
> 
> > [   66.780759]  [<ffffffff8109dd33>] rcu_process_callbacks+0x1e3/0x540
> 
> > Has anything seen anything like this before? Is this a known issue?
> 
> I've not seen it reported.. sounds like 'fun' though.
> 

This has been a tremendous source of 'fun' so far...

The rcu_process_callbacks line is a red herring. What seems to be
happening is:

A CPU goes down, and perf_pmu_migrate_context removes all events from
per_cpu_ptr(pmu->pmu_cpu_context, src_cpu)->ctx. The events are in a
state of limbo, with their ctx pointers pointing at the old context,
whose refcount is 1. The src_ctx->mutex is unlocked.

Concurrently on another CPU the fds are closed, and perf_event_release
goes and removes each event from their event->ctx. We skip the double
detach in list_del_event and carry on to __free_event where we put_ctx
the old context for a second time for each event. The refcount goes to 0
and we queue a kfree_rcu of the context (inside the PMU's percpu
perf_event_cpu_context, allocated with alloc_percpu).

We run the queued kfree_rcu, and explode trying to kfree something we
didn't k*alloc. I'm not sure when exactly we run the queued kfree_rcu
w.r.t. everything else.

So the problem here seems to be a race between the
perf_pmu_migrate_context and something down the perf_event_release
callchain.

Any ideas?

Thanks,
Mark.

  reply	other threads:[~2014-09-02 18:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-01 18:18 Possible race between CPU hotplug and perf_pmu_migrate_context Mark Rutland
2014-09-01 19:05 ` Peter Zijlstra
2014-09-02 18:58   ` Mark Rutland [this message]
2014-09-03 11:50     ` Mark Rutland
2014-09-04 10:44       ` Peter Zijlstra
2014-09-04 11:07         ` Mark Rutland
2014-09-05 15:16           ` Peter Zijlstra
2014-09-05 15:41             ` Linus Torvalds
2014-09-05 16:50               ` Vince Weaver
2014-09-05 16:59               ` Mark Rutland
2014-09-05 17:31                 ` Linus Torvalds
2014-09-05 19:54                   ` Peter Zijlstra
2014-09-08  8:39                   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140902185807.GA7434@leverpostej \
    --to=mark.rutland@arm.com \
    --cc=Yan@leverpostej.cambridge.arm.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=zheng.z.yan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.