From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752359AbcAGQqc (ORCPT ); Thu, 7 Jan 2016 11:46:32 -0500 Received: from e38.co.us.ibm.com ([32.97.110.159]:38739 "EHLO e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751702AbcAGQq2 (ORCPT ); Thu, 7 Jan 2016 11:46:28 -0500 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org;linux-tip-commits@vger.kernel.org Date: Thu, 7 Jan 2016 08:46:25 -0800 From: "Paul E. McKenney" To: Eric Dumazet Cc: Peter Zijlstra , syzkaller , LKML , acme@redhat.com, Linus Torvalds , Alexander Potapenko , Kostya Serebryany , Dmitry Vyukov , Sasha Levin , "H. Peter Anvin" , jolsa@redhat.com, Ingo Molnar , vincent.weaver@maine.edu, Thomas Gleixner , acme@kernel.org, Stephane Eranian , linux-tip-commits@vger.kernel.org Subject: Re: [tip:perf/core] perf: Fix race in perf_event_exec() Message-ID: <20160107164625.GQ3818@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20151210195740.GG6357@twins.programming.kicks-ass.net> <20160107134008.GZ6344@twins.programming.kicks-ass.net> <20160107162604.GP3818@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16010716-0029-0000-0000-00000F697A2B Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 07, 2016 at 11:36:21AM -0500, Eric Dumazet wrote: > On Thu, Jan 7, 2016 at 11:26 AM, Paul E. McKenney > wrote: > > On Thu, Jan 07, 2016 at 02:40:08PM +0100, Peter Zijlstra wrote: > >> On Wed, Jan 06, 2016 at 01:56:56PM -0500, Eric Dumazet wrote: > >> > On Wed, Jan 6, 2016 at 1:46 PM, tip-bot for Peter Zijlstra > >> > wrote: > >> > > >> > > > >> > > This is because context switches can swap the task_struct::perf_event_ctxp[] > >> > > pointer around. Therefore you have to either disable preemption when looking > >> > > at current, or hold ctx->lock. > >> > > > >> > > >> > > >> > > > >> > > void perf_event_exec(void) > >> > > { > >> > > - struct perf_event_context *ctx; > >> > > int ctxn; > >> > > > >> > > rcu_read_lock(); > >> > > >> > Do we still need this rcu_read_lock(), if perf_event_enable_on_exec() > >> > uses local_irq_save( ? > >> > >> Strictly speaking we should not rely on the fact that RCU grace periods > >> do not progress with IRQs disabled, so yes. > > > > What Peter said! > > > > The current implementation would let you get away with IRQs disabled > > (aside from lockdep splats), but it is easy to imagine implementations > > that do not, hence the restriction. > > > > Is the extra rcu_read_lock() and rcu_read_unlock() causing a performance > > problem? If so, one option would be use of synchronize_sched() and > > call_rcu_sched(), which explicitly recognize disabled IRQs as read-side > > critical sections. Of course, this might well put you in a position > > where you would like to use IRQ disabling in some cases, BH disabling > > (or softirq context) in others, and rcu_read_lock() in yet other cases. > > I guess I could not see why the rcu_read_lock() was needed around this loop. > > rcu_read_lock(); > for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++) > perf_event_enable_on_exec(ctxn); > rcu_read_unlock(); > > But apparently it is needed in perf_event_enable_on_exec() and we avoid calling > rcu_read_lock() and rcu_read_unlock() (perf_nr_task_contexts - 1 ) times If the overhead of rcu_read_lock() and rcu_read_unlock() really are a noticeable fraction of the loop overhead, and if perf_nr_task_contexts is large, then you might well have a problem. I still suggest actually measuring it, but you know me. ;-) > > If so, the good news is that RCU actually now supports this efficiently. > > There is the shiny new synchronize_rcu_mult() macro that can be used > > as follows to wait for all three types of grace periods: > > > > synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched); > > > > This would wait concurrently for all three grace periods to elapse. > > Interesting ;) ;-) ;-) ;-) Thanx, Paul