From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CC7EC77B75 for ; Wed, 17 May 2023 19:36:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229895AbjEQTgR (ORCPT ); Wed, 17 May 2023 15:36:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229437AbjEQTgO (ORCPT ); Wed, 17 May 2023 15:36:14 -0400 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ABC99109; Wed, 17 May 2023 12:36:13 -0700 (PDT) Received: from W11-BEAU-MD.localdomain (unknown [76.135.27.212]) by linux.microsoft.com (Postfix) with ESMTPSA id EA0EE20F26AB; Wed, 17 May 2023 12:36:12 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com EA0EE20F26AB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1684352173; bh=kOczaw1TlAZYv+vsEch6YqBha9eh9KoqTxDoaMXzpow=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=D8WrzbOo/RSmXAZ7HQMSpMgjIuPWUIYbH1jZWa4bcrvm7nP6GxHL5qyXaW1CidIUH 7jr6Nv19tpmoP6j3q/D6ZsDUCoM9CyI+fTIIMeKKz1BgMNdMT4Wd78WBrxTwr3DWID k1HRFJ9da2FNue7CRgxI/EJ+8isbaJ1/y6XPeWRU= Date: Wed, 17 May 2023 12:36:02 -0700 From: Beau Belgrave To: Linus Torvalds Cc: Steven Rostedt , Alexei Starovoitov , Masami Hiramatsu , LKML , linux-trace-kernel@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , bpf , David Vernet , dthaler@microsoft.com, brauner@kernel.org, hch@infradead.org Subject: Re: [PATCH] tracing/user_events: Run BPF program if attached Message-ID: <20230517193549.GA401@W11-BEAU-MD.localdomain> References: <20230515192407.GA85@W11-BEAU-MD.localdomain> <20230517003628.aqqlvmzffj7fzzoj@MacBook-Pro-8.local> <20230516222919.79bba667@rorschach.local.home> <20230517172243.GA152@W11-BEAU-MD.localdomain> <20230517190750.GA366@W11-BEAU-MD.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org On Wed, May 17, 2023 at 12:26:44PM -0700, Linus Torvalds wrote: > On Wed, May 17, 2023 at 12:08 PM Beau Belgrave > wrote: > > > > user_event_mm_dup() puts a new mm into the global list before the > > enablers list is fully populated. > > Then that simply needs to be fixed. > Agreed. > user_event_mm_dup() should not madd the mm into the global list until > it is *done*. > > Because if it makes that list visible to others in a half-way state, > then it needs to use the proper locking and use event_mutex. > > You can't say "this is so critical that we can't take a lock" and then > use that as an excuse to simply do buggy code. > Didn't mean that, I just have more work to do then just the RCU walks. I will fix these up. > Either take the lock in user_event_mm_dup(), or make sure that the > data structures are all completely local so that no lock is necessary. > > Here's a COMPLETELY UNTESTED patch that just separates out the notion > of "allocate" and "attach". > > NOTE NOTE NOTE! I am *not* claiming this patch works. It builds for > me. It looks right. It seems like it's the right thing to do. But it > might have some issues. > > With this, the newly dup'ed list is attached to the process once after > it is done, so nobody can see the list being built up. > > Also note that this does NOT fix the incorrect RCU walks. > > Linus Thanks for the patch and feedback! -Beau > kernel/trace/trace_events_user.c | 29 ++++++++++++++++++----------- > 1 file changed, 18 insertions(+), 11 deletions(-) > > diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c > index b1ecd7677642..b2aecbfbbd24 100644 > --- a/kernel/trace/trace_events_user.c > +++ b/kernel/trace/trace_events_user.c > @@ -538,10 +538,9 @@ static struct user_event_mm *user_event_mm_get_all(struct user_event *user) > return found; > } > > -static struct user_event_mm *user_event_mm_create(struct task_struct *t) > +static struct user_event_mm *user_event_mm_alloc(struct task_struct *t) > { > struct user_event_mm *user_mm; > - unsigned long flags; > > user_mm = kzalloc(sizeof(*user_mm), GFP_KERNEL_ACCOUNT); > > @@ -553,12 +552,6 @@ static struct user_event_mm *user_event_mm_create(struct task_struct *t) > refcount_set(&user_mm->refcnt, 1); > refcount_set(&user_mm->tasks, 1); > > - spin_lock_irqsave(&user_event_mms_lock, flags); > - list_add_rcu(&user_mm->link, &user_event_mms); > - spin_unlock_irqrestore(&user_event_mms_lock, flags); > - > - t->user_event_mm = user_mm; > - > /* > * The lifetime of the memory descriptor can slightly outlast > * the task lifetime if a ref to the user_event_mm is taken > @@ -572,6 +565,17 @@ static struct user_event_mm *user_event_mm_create(struct task_struct *t) > return user_mm; > } > > +static void user_event_mm_attach(struct user_event_mm *user_mm, struct task_struct *t) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&user_event_mms_lock, flags); > + list_add_rcu(&user_mm->link, &user_event_mms); > + spin_unlock_irqrestore(&user_event_mms_lock, flags); > + > + t->user_event_mm = user_mm; > +} > + > static struct user_event_mm *current_user_event_mm(void) > { > struct user_event_mm *user_mm = current->user_event_mm; > @@ -579,10 +583,12 @@ static struct user_event_mm *current_user_event_mm(void) > if (user_mm) > goto inc; > > - user_mm = user_event_mm_create(current); > + user_mm = user_event_mm_alloc(current); > > if (!user_mm) > goto error; > + > + user_event_mm_attach(user_mm, current); > inc: > refcount_inc(&user_mm->refcnt); > error: > @@ -670,7 +676,7 @@ void user_event_mm_remove(struct task_struct *t) > > void user_event_mm_dup(struct task_struct *t, struct user_event_mm *old_mm) > { > - struct user_event_mm *mm = user_event_mm_create(t); > + struct user_event_mm *mm = user_event_mm_alloc(t); > struct user_event_enabler *enabler; > > if (!mm) > @@ -684,10 +690,11 @@ void user_event_mm_dup(struct task_struct *t, struct user_event_mm *old_mm) > > rcu_read_unlock(); > > + user_event_mm_attach(mm, t); > return; > error: > rcu_read_unlock(); > - user_event_mm_remove(t); > + user_event_mm_destroy(mm); > } > > static bool current_user_event_enabler_exists(unsigned long uaddr,