From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752359AbcAGQqc (ORCPT <rfc822;w@1wt.eu>);
	Thu, 7 Jan 2016 11:46:32 -0500
Received: from e38.co.us.ibm.com ([32.97.110.159]:38739 "EHLO
	e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751702AbcAGQq2 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 7 Jan 2016 11:46:28 -0500
X-IBM-Helo: d03dlp02.boulder.ibm.com
X-IBM-MailFrom: paulmck@linux.vnet.ibm.com
X-IBM-RcptTo: linux-kernel@vger.kernel.org;linux-tip-commits@vger.kernel.org
Date: Thu, 7 Jan 2016 08:46:25 -0800
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
        syzkaller <syzkaller@googlegroups.com>,
        LKML <linux-kernel@vger.kernel.org>, acme@redhat.com,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Alexander Potapenko <glider@google.com>,
        Kostya Serebryany <kcc@google.com>, Dmitry Vyukov <dvyukov@google.com>,
        Sasha Levin <sasha.levin@oracle.com>, "H. Peter Anvin" <hpa@zytor.com>,
        jolsa@redhat.com, Ingo Molnar <mingo@kernel.org>,
        vincent.weaver@maine.edu, Thomas Gleixner <tglx@linutronix.de>,
        acme@kernel.org, Stephane Eranian <eranian@google.com>,
        linux-tip-commits@vger.kernel.org
Subject: Re: [tip:perf/core] perf: Fix race in perf_event_exec()
Message-ID: <20160107164625.GQ3818@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <20151210195740.GG6357@twins.programming.kicks-ass.net>
 <tip-c127449944659543e5e2423002f08f0af98dba5c@git.kernel.org>
 <CANn89iLzUuR6wM=hOybM25d21PN=Fax5Ms7z2XeArTwAhZ6zqQ@mail.gmail.com>
 <20160107134008.GZ6344@twins.programming.kicks-ass.net>
 <20160107162604.GP3818@linux.vnet.ibm.com>
 <CANn89iKb7LJ2HgQEJeXZd05vc3QXqgAp9bVb0j3r2N2QzJAZcA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CANn89iKb7LJ2HgQEJeXZd05vc3QXqgAp9bVb0j3r2N2QzJAZcA@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16010716-0029-0000-0000-00000F697A2B
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Jan 07, 2016 at 11:36:21AM -0500, Eric Dumazet wrote:
> On Thu, Jan 7, 2016 at 11:26 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, Jan 07, 2016 at 02:40:08PM +0100, Peter Zijlstra wrote:
> >> On Wed, Jan 06, 2016 at 01:56:56PM -0500, Eric Dumazet wrote:
> >> > On Wed, Jan 6, 2016 at 1:46 PM, tip-bot for Peter Zijlstra
> >> > <tipbot@zytor.com> wrote:
> >> >
> >> > >
> >> > > This is because context switches can swap the task_struct::perf_event_ctxp[]
> >> > > pointer around. Therefore you have to either disable preemption when looking
> >> > > at current, or hold ctx->lock.
> >> > >
> >> >
> >> >
> >> > >
> >> > >  void perf_event_exec(void)
> >> > >  {
> >> > > -       struct perf_event_context *ctx;
> >> > >         int ctxn;
> >> > >
> >> > >         rcu_read_lock();
> >> >
> >> > Do we still need this rcu_read_lock(), if perf_event_enable_on_exec()
> >> > uses  local_irq_save( ?
> >>
> >> Strictly speaking we should not rely on the fact that RCU grace periods
> >> do not progress with IRQs disabled, so yes.
> >
> > What Peter said!
> >
> > The current implementation would let you get away with IRQs disabled
> > (aside from lockdep splats), but it is easy to imagine implementations
> > that do not, hence the restriction.
> >
> > Is the extra rcu_read_lock() and rcu_read_unlock()  causing a performance
> > problem?  If so, one option would be use of synchronize_sched() and
> > call_rcu_sched(), which explicitly recognize disabled IRQs as read-side
> > critical sections.  Of course, this might well put you in a position
> > where you would like to use IRQ disabling in some cases, BH disabling
> > (or softirq context) in others, and rcu_read_lock() in yet other cases.
> 
> I guess I could not see why the rcu_read_lock() was needed around this loop.
> 
> rcu_read_lock();
> for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
>        perf_event_enable_on_exec(ctxn);
> rcu_read_unlock();
> 
> But apparently it is needed in perf_event_enable_on_exec() and we avoid calling
> rcu_read_lock() and rcu_read_unlock()  (perf_nr_task_contexts - 1 ) times

If the overhead of rcu_read_lock() and rcu_read_unlock() really are
a noticeable fraction of the loop overhead, and if perf_nr_task_contexts
is large, then you might well have a problem.  I still suggest actually
measuring it, but you know me.  ;-)

> > If so, the good news is that RCU actually now supports this efficiently.
> > There is the shiny new synchronize_rcu_mult() macro that can be used
> > as follows to wait for all three types of grace periods:
> >
> >         synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);
> >
> > This would wait concurrently for all three grace periods to elapse.
> 
> Interesting ;)

;-) ;-) ;-)

							Thanx, Paul