linux-trace-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Beau Belgrave <beaub@linux.microsoft.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
	dcook@linux.microsoft.com, alanau@linux.microsoft.com,
	brauner@kernel.org, akpm@linux-foundation.org,
	ebiederm@xmission.com, keescook@chromium.org, tglx@linutronix.de,
	linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v8 04/11] tracing/user_events: Fixup enable faults asyncly
Date: Tue, 28 Mar 2023 14:42:54 -0700	[thread overview]
Message-ID: <20230328214254.GA85@W11-BEAU-MD.localdomain> (raw)
In-Reply-To: <20230328172049.10061257@gandalf.local.home>

On Tue, Mar 28, 2023 at 05:20:49PM -0400, Steven Rostedt wrote:
> On Tue, 21 Feb 2023 13:11:36 -0800
> Beau Belgrave <beaub@linux.microsoft.com> wrote:
> 
> > @@ -263,7 +277,85 @@ static int user_event_mm_fault_in(struct user_event_mm *mm, unsigned long uaddr)
> >  }
> >  
> >  static int user_event_enabler_write(struct user_event_mm *mm,
> > -				    struct user_event_enabler *enabler)
> > +				    struct user_event_enabler *enabler,
> > +				    bool fixup_fault);
> > +
> > +static void user_event_enabler_fault_fixup(struct work_struct *work)
> > +{
> > +	struct user_event_enabler_fault *fault = container_of(
> > +		work, struct user_event_enabler_fault, work);
> > +	struct user_event_enabler *enabler = fault->enabler;
> > +	struct user_event_mm *mm = fault->mm;
> > +	unsigned long uaddr = enabler->addr;
> > +	int ret;
> > +
> > +	ret = user_event_mm_fault_in(mm, uaddr);
> > +
> > +	if (ret && ret != -ENOENT) {
> > +		struct user_event *user = enabler->event;
> > +
> > +		pr_warn("user_events: Fault for mm: 0x%pK @ 0x%llx event: %s\n",
> > +			mm->mm, (unsigned long long)uaddr, EVENT_NAME(user));
> > +	}
> > +
> > +	/* Prevent state changes from racing */
> > +	mutex_lock(&event_mutex);
> > +
> > +	/*
> > +	 * If we managed to get the page, re-issue the write. We do not
> > +	 * want to get into a possible infinite loop, which is why we only
> > +	 * attempt again directly if the page came in. If we couldn't get
> > +	 * the page here, then we will try again the next time the event is
> > +	 * enabled/disabled.
> > +	 */
> 
> What case would we not get the page? A bad page mapping? User space doing
> something silly?
> 

A user space program unmapping the page is the most common I can think
of. A silly action would be unmapping the page while forgetting to call
the unregister IOCTL. We would then possibly see this if the event was
enabled in perf/ftrace before the process exited (and the mm getting
cleaned up).

> Or something else, for which how can it go into an infinite loop? Can that
> only happen if userspace is doing something mischievous?
> 

I'm not sure if changing page permissions on the user side would prevent
write permitted mapping in the kernel, but I wanted to ensure if that
type of thing did occur, we wouldn't loop forever. The code lets the mm
decide if a page is ever coming in via fixup_user_fault() with 
FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE set.

My understanding is that fixup_user_fault() will retry to get the page
up until it's decided it's either capable of coming in or not. It will
do this since we pass the unlocked bool as a parameter. I used
fixup_user_fault() since it was created for the futex code to handle
this scenario better.

From what I gather, the fault in should only fail for these reasons:
#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS |	\
			VM_FAULT_SIGSEGV | VM_FAULT_HWPOISON |	\
			VM_FAULT_HWPOISON_LARGE | VM_FAULT_FALLBACK)

If these are hit, I don't believe we want to retry as they aren't likely
to ever get corrected.

Thanks,
-Beau

> -- Steve
> 
> 
> > +	clear_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler));
> > +
> > +	if (!ret) {
> > +		mmap_read_lock(mm->mm);
> > +		user_event_enabler_write(mm, enabler, true);
> > +		mmap_read_unlock(mm->mm);
> > +	}
> > +
> > +	mutex_unlock(&event_mutex);
> > +
> > +	/* In all cases we no longer need the mm or fault */
> > +	user_event_mm_put(mm);
> > +	kmem_cache_free(fault_cache, fault);
> > +}
> > +
> > +static bool user_event_enabler_queue_fault(struct user_event_mm *mm,
> > +					   struct user_event_enabler *enabler)
> > +{
> > +	struct user_event_enabler_fault *fault;
> > +
> > +	fault = kmem_cache_zalloc(fault_cache, GFP_NOWAIT | __GFP_NOWARN);
> > +
> > +	if (!fault)
> > +		return false;
> > +
> > +	INIT_WORK(&fault->work, user_event_enabler_fault_fixup);
> > +	fault->mm = user_event_mm_get(mm);
> > +	fault->enabler = enabler;
> > +
> > +	/* Don't try to queue in again while we have a pending fault */
> > +	set_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler));
> > +
> > +	if (!schedule_work(&fault->work)) {
> > +		/* Allow another attempt later */
> > +		clear_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler));
> > +
> > +		user_event_mm_put(mm);
> > +		kmem_cache_free(fault_cache, fault);
> > +
> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +

  reply	other threads:[~2023-03-28 21:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-21 21:11 [PATCH v8 00/11] tracing/user_events: Remote write ABI Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 01/11] tracing/user_events: Split header into uapi and kernel Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 02/11] tracing/user_events: Track fork/exec/exit for mm lifetime Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 03/11] tracing/user_events: Use remote writes for event enablement Beau Belgrave
2023-03-28 20:59   ` Steven Rostedt
2023-02-21 21:11 ` [PATCH v8 04/11] tracing/user_events: Fixup enable faults asyncly Beau Belgrave
2023-03-28 21:20   ` Steven Rostedt
2023-03-28 21:42     ` Beau Belgrave [this message]
2023-02-21 21:11 ` [PATCH v8 05/11] tracing/user_events: Add ioctl for disabling addresses Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 06/11] tracing/user_events: Update self-tests to write ABI Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 07/11] tracing/user_events: Add ABI self-test Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 08/11] tracing/user_events: Use write ABI in example Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 09/11] tracing/user_events: Update documentation for ABI Beau Belgrave
2023-03-24  0:06   ` Masami Hiramatsu
2023-03-24 16:47     ` Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 10/11] tracing/user_events: Charge event allocs to cgroups Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 11/11] tracing/user_events: Limit global user_event count Beau Belgrave
2023-03-24  0:18   ` Masami Hiramatsu
2023-03-24  8:54     ` Vlastimil Babka
2023-03-24 16:43       ` Beau Belgrave
2023-03-24 17:06         ` Steven Rostedt
2023-03-26 15:22           ` Masami Hiramatsu
2023-03-24  0:05 ` [PATCH v8 00/11] tracing/user_events: Remote write ABI Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230328214254.GA85@W11-BEAU-MD.localdomain \
    --to=beaub@linux.microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=alanau@linux.microsoft.com \
    --cc=brauner@kernel.org \
    --cc=dcook@linux.microsoft.com \
    --cc=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).