bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Beau Belgrave <beaub@linux.microsoft.com>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: rostedt@goodmis.org, linux-trace-devel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Tzvetomir Stoyanov <tstoyanov@vmware.com>,
	"Paul E. McKenney" <paulmck@linux.ibm.com>,
	bpf@vger.kernel.org
Subject: Re: [PATCH v3] user_events: Enable user processes to create and write to trace events
Date: Wed, 27 Oct 2021 12:14:28 -0700	[thread overview]
Message-ID: <20211027191428.GA1462@kbox> (raw)
In-Reply-To: <20211026172602.55843a03c5a5ba049b567b5a@kernel.org>

On Tue, Oct 26, 2021 at 05:26:02PM +0900, Masami Hiramatsu wrote:
> > > > > > +	} else if (strstr(field, "flag ") == field) {
> > > > > > +		field += sizeof("flag");
> > > > > > +
> > > > > > +		if (!strcmp(field, "bpf_iter"))
> > > > > > +			user->flags |= FLAG_BPF_ITER;
> > > > > > +
> > > > > 
> > > > > What is this flag?
> > > > > 
> > > > We want to enable certain sensitive events the ability to mark that
> > > > there should never be a buffer copy. When FLAG_BPF_ITER is used the raw
> > > > iovecs are exposed out to eBPF instead of any sort of copy to reduce
> > > > latency. We run user_events in some highly performant code and want to
> > > > monitor things with the least amount of overhead possible.
> > > 
> > > Would you mean the event with this flag is only available from eBPF?
> > > 
> > It means that if eBPF attaches we will honor the users request to make
> > the data as cheap as possible to them. If a user with proper access
> > enables ftrace or perf on these high performant events they will still
> > come through (we don't want to hide them).
> > 
> > We will not be able to do that at all if we copy to heap or stack. At
> > that point we've lost the ability to delay copy/probing up until the
> > eBPF states it is actually required.
> 
> I think the bpf optimization should be discussed in the other thread.
> 

Yep

> Anyway, here I would like to know is that the syntax of this flag. 
> If the flag is for the user event itself, it would be better to add the flag
> with a special separator, not the "flag", so that user puts the flags
> after fieldN.
> 
> name[:FLAG1[,FLAG2...]] [field1[;field2...]] 
> 

Agreed, will do that.

> > > > I also ran with CONFIG_PROVE_RCU and didn't see anything show up in
> > > > dmesg.
> > > 
> > > Hmm, that's strange, because copy_from_iter(user) may cause a fault
> > > and yielded. Isn't it an iovec?
> > > 
> > Yeah, likely I just haven't hit a page fault case. I'll try to force one
> > in our testing to ensure this case is properly covered and doesn't cause
> > issues.
> 
> If you can suppress the fault (just skip copying when the fault occurs),
> I think it is OK. e.g. copy_from_user_nofault().
> 

We want to handle faults in these paths, which means handling them
outside of preemption disabled. This limits where we can have the buffer.

> > > > > > +				/*
> > > > > > +				 * Probes advance the iterator so we
> > > > > > +				 * need to have a copy for each probe.
> > > > > > +				 */
> > > > > > +				copy = *i;
> > > > > > +
> > > > > > +				probe_func = probe_func_ptr->func;
> > > > > > +				tpdata = probe_func_ptr->data;
> > > > > > +				probe_func(user, &copy, tpdata);
> > > > > 
> > > > > You seems to try to copy in from user space in each probe func, but
> > > > > please copy it here to the temporary buffer and pass it to the
> > > > > each probe function. Such performacne optimization can postpone.
> > > > > Start with simple implementation.
> > > > > 
> > > > Yes, this avoids double copying of the data in the normal paths. Moving
> > > > to a temp buffer only really changes 1 line in the probe functions
> > > > (copy_from_iter to copy_from_user).
> > > > 
> > > > If I were to create a temp buffer for simplicity I guess I would have to
> > > > kmalloc on each call or move to a per-cpu buffer or use stack memory and
> > > > limit how much data can be copied.
> > > 
> > > Anyway, it should be limited. You can not write more than 1 page, and
> > > do you really need it? And allocating kmalloc object is relatively low
> > > cost compared with a system call.
> > > 
> > Really, it's that low?
> > 
> > We are tracking cycles counts to compare user_events with other
> > telemetry data we have. Some people care a lot about that number, some
> > don't.
> 
> OK, then you can use a static per-cpu buffer for copying.
> 

I can only use static per-cpu buffers if preemption is disabled during
the copy. This limits to not being able to fault in data. For example
simple migration disabled could still see another user_event getting
traced on the same processor and corrupt / partial fill that per-CPU
buffer.

For the simple version I will use kmalloc and then we can talk on the
other threads about better ways to go about it.

Thanks,
-Beau

      reply	other threads:[~2021-10-27 19:14 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20211018230957.3032-1-beaub@linux.microsoft.com>
     [not found] ` <20211022223811.d0b5f03a7eee147c619d0202@kernel.org>
     [not found]   ` <20211022224202.GA27683@kbox>
     [not found]     ` <20211025104006.a322e4a5b4a56cdf3552ebac@kernel.org>
     [not found]       ` <20211025172655.GA27927@kbox>
2021-10-26  8:26         ` [PATCH v3] user_events: Enable user processes to create and write to trace events Masami Hiramatsu
2021-10-27 19:14           ` Beau Belgrave [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211027191428.GA1462@kbox \
    --to=beaub@linux.microsoft.com \
    --cc=bpf@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=paulmck@linux.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=tstoyanov@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).