All of lore.kernel.org
 help / color / mirror / Atom feed
From: Beau Belgrave <beaub@linux.microsoft.com>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: rostedt@goodmis.org, linux-trace-devel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Tzvetomir Stoyanov <tstoyanov@vmware.com>,
	"Paul E. McKenney" <paulmck@linux.ibm.com>,
	bpf@vger.kernel.org
Subject: Re: [PATCH v3] user_events: Enable user processes to create and write to trace events
Date: Wed, 27 Oct 2021 12:14:28 -0700	[thread overview]
Message-ID: <20211027191428.GA1462@kbox> (raw)
In-Reply-To: <20211026172602.55843a03c5a5ba049b567b5a@kernel.org>

On Tue, Oct 26, 2021 at 05:26:02PM +0900, Masami Hiramatsu wrote:
> > > > > > +	} else if (strstr(field, "flag ") == field) {
> > > > > > +		field += sizeof("flag");
> > > > > > +
> > > > > > +		if (!strcmp(field, "bpf_iter"))
> > > > > > +			user->flags |= FLAG_BPF_ITER;
> > > > > > +
> > > > > 
> > > > > What is this flag?
> > > > > 
> > > > We want to enable certain sensitive events the ability to mark that
> > > > there should never be a buffer copy. When FLAG_BPF_ITER is used the raw
> > > > iovecs are exposed out to eBPF instead of any sort of copy to reduce
> > > > latency. We run user_events in some highly performant code and want to
> > > > monitor things with the least amount of overhead possible.
> > > 
> > > Would you mean the event with this flag is only available from eBPF?
> > > 
> > It means that if eBPF attaches we will honor the users request to make
> > the data as cheap as possible to them. If a user with proper access
> > enables ftrace or perf on these high performant events they will still
> > come through (we don't want to hide them).
> > 
> > We will not be able to do that at all if we copy to heap or stack. At
> > that point we've lost the ability to delay copy/probing up until the
> > eBPF states it is actually required.
> 
> I think the bpf optimization should be discussed in the other thread.
> 

Yep

> Anyway, here I would like to know is that the syntax of this flag. 
> If the flag is for the user event itself, it would be better to add the flag
> with a special separator, not the "flag", so that user puts the flags
> after fieldN.
> 
> name[:FLAG1[,FLAG2...]] [field1[;field2...]] 
> 

Agreed, will do that.

> > > > I also ran with CONFIG_PROVE_RCU and didn't see anything show up in
> > > > dmesg.
> > > 
> > > Hmm, that's strange, because copy_from_iter(user) may cause a fault
> > > and yielded. Isn't it an iovec?
> > > 
> > Yeah, likely I just haven't hit a page fault case. I'll try to force one
> > in our testing to ensure this case is properly covered and doesn't cause
> > issues.
> 
> If you can suppress the fault (just skip copying when the fault occurs),
> I think it is OK. e.g. copy_from_user_nofault().
> 

We want to handle faults in these paths, which means handling them
outside of preemption disabled. This limits where we can have the buffer.

> > > > > > +				/*
> > > > > > +				 * Probes advance the iterator so we
> > > > > > +				 * need to have a copy for each probe.
> > > > > > +				 */
> > > > > > +				copy = *i;
> > > > > > +
> > > > > > +				probe_func = probe_func_ptr->func;
> > > > > > +				tpdata = probe_func_ptr->data;
> > > > > > +				probe_func(user, &copy, tpdata);
> > > > > 
> > > > > You seems to try to copy in from user space in each probe func, but
> > > > > please copy it here to the temporary buffer and pass it to the
> > > > > each probe function. Such performacne optimization can postpone.
> > > > > Start with simple implementation.
> > > > > 
> > > > Yes, this avoids double copying of the data in the normal paths. Moving
> > > > to a temp buffer only really changes 1 line in the probe functions
> > > > (copy_from_iter to copy_from_user).
> > > > 
> > > > If I were to create a temp buffer for simplicity I guess I would have to
> > > > kmalloc on each call or move to a per-cpu buffer or use stack memory and
> > > > limit how much data can be copied.
> > > 
> > > Anyway, it should be limited. You can not write more than 1 page, and
> > > do you really need it? And allocating kmalloc object is relatively low
> > > cost compared with a system call.
> > > 
> > Really, it's that low?
> > 
> > We are tracking cycles counts to compare user_events with other
> > telemetry data we have. Some people care a lot about that number, some
> > don't.
> 
> OK, then you can use a static per-cpu buffer for copying.
> 

I can only use static per-cpu buffers if preemption is disabled during
the copy. This limits to not being able to fault in data. For example
simple migration disabled could still see another user_event getting
traced on the same processor and corrupt / partial fill that per-CPU
buffer.

For the simple version I will use kmalloc and then we can talk on the
other threads about better ways to go about it.

Thanks,
-Beau

  reply	other threads:[~2021-10-27 19:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-18 23:09 [PATCH v3] user_events: Enable user processes to create and write to trace events Beau Belgrave
2021-10-22 13:38 ` Masami Hiramatsu
2021-10-22 22:42   ` Beau Belgrave
2021-10-25  1:40     ` Masami Hiramatsu
2021-10-25 17:26       ` Beau Belgrave
2021-10-26  8:26         ` Masami Hiramatsu
2021-10-27 19:14           ` Beau Belgrave [this message]
2021-10-26  0:46 ` kernel test robot
2021-10-26  0:46   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211027191428.GA1462@kbox \
    --to=beaub@linux.microsoft.com \
    --cc=bpf@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=paulmck@linux.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=tstoyanov@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.