Re: [PATCH] tracing/user_events: Run BPF program if attached

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Christian Brauner <brauner@kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-trace-kernel@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	David Vernet <void@manifault.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Dave Thaler <dthaler@microsoft.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] tracing/user_events: Run BPF program if attached
Date: Tue, 6 Jun 2023 22:37:52 +0900	[thread overview]
Message-ID: <20230606223752.65dd725c04b11346b45e0546@kernel.org> (raw)
In-Reply-To: <20230601162921.GA152@W11-BEAU-MD.localdomain>

Hi Beau,

On Thu, 1 Jun 2023 09:29:21 -0700
Beau Belgrave <beaub@linux.microsoft.com> wrote:

> > > These are stubs to integrate namespace support. I've been working on a
> > > series that adds a tracing namespace support similiar to the IMA
> > > namespace work [1]. That series is ending up taking more time than I
> > 
> > Look, this is all well and nice but you've integrated user events with
> > tracefs. This is currently a single-instance global filesystem. So what
> > you're effectively implying is that you're namespacing tracefs by
> > hanging it off of struct user namespace making it mountable by
> > unprivileged users. Or what's the plan?
> > 
> 
> We don't have plans for unprivileged users currently. I think that is a
> great goal and requires a proper tracing namespace, which we currently
> don't have. I've done some thinking on this, but I would like to hear
> your thoughts and others on how to do this properly. We do talk about
> this in the tracefs meetings (those might be out of your time zone
> unfortunately).
> 
> > That alone is massive work with _wild_ security implications. My
> > appetite for exposing more stuff under user namespaces is very low given
> > the amount of CVEs we've had over the years.
> > 
> 
> Ok, I based that approach on the feedback given in LPC 2022 - Containers
> and Checkpoint/Retore MC [1]. I believe you gave feedback to use user
> namespaces to provide the encapsulation that was required :)

Even with the user namespace, I think we still need to provide separate
"eventname-space" for each application, since it may depend on the context
who and where it is launched. I think the easiest solution is (perhaps)
providing a PID-based new groups for each instance (the PID-prefix or 
suffix will be hidden from the application).
I think it may not good to allow unprivileged user processes to detect
the registered event name each other by default.

> 
> > > anticipated.
> > 
> > Yet you were confident enough to leave the namespacing stubs for this
> > functionality in the code. ;)
> > 
> > What is the overall goal here? Letting arbitrary unprivileged containers
> > define their own custom user event type by mounting tracefs inside
> > unprivileged containers? If so, what security story is going to
> > guarantee that writing arbitrary tracepoints from random unprivileged
> > containers is safe?
> > 
> 
> Unprivileged containers is not a goal, however, having a per-pod
> user_event system name, such as user_event_<pod_name>, would be ideal
> for certain diagnostic scenarios, such as monitoring the entire pod.

That can be done in the user-space tools, not in the kernel.

> When you have a lot of containers, you also want to limit how many
> tracepoints each container can create, even if they are given access to
> the tracefs file. The per-group can limit how many events/tracepoints
> that container can go create, since we currently only have 16-bit
> identifiers for trace_event's we need to be cautious we don't run out.

I agree, we need to have a knob to limit it to avoid DoS attack.

> user_events in general has tracepoint validators to ensure the payloads
> coming in are "safe" from what the kernel might do with them, such as
> filtering out data.

[...]
> > > changing the system name of user_events on a per-namespace basis.
> > 
> > What is the "system name" and how does it protect against namespaces
> > messing with each other?
> 
> trace_events in the tracing facility require both a system name and an
> event name. IE: sched/sched_waking, sched is the system name,
> sched_waking is the event name. For user_events in the root group, the
> system name is "user_events". When groups are introduced, the system
> name can be "user_events_<GUID>" for example.

So my suggestion is using PID in root pid namespace instead of GUID
by default.

Thank you,


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

next prev parent reply	other threads:[~2023-06-06 13:38 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08 16:37 [PATCH] tracing/user_events: Run BPF program if attached Beau Belgrave
2023-05-09 15:24 ` Alexei Starovoitov
2023-05-09 17:01   ` Steven Rostedt
2023-05-09 20:30     ` Steven Rostedt
2023-05-09 20:42       ` Steven Rostedt
2023-05-15 16:57       ` Alexei Starovoitov
2023-05-15 18:33         ` Steven Rostedt
2023-05-15 19:35           ` Beau Belgrave
2023-05-15 21:38             ` Steven Rostedt
2023-05-15 19:24         ` Beau Belgrave
2023-05-15 21:57           ` Steven Rostedt
2023-05-17  0:36           ` Alexei Starovoitov
2023-05-17  0:56             ` Linus Torvalds
2023-05-17  1:46               ` Linus Torvalds
2023-05-17  2:29                 ` Steven Rostedt
2023-05-17  3:03                   ` Linus Torvalds
2023-05-17 17:22                     ` Beau Belgrave
2023-05-17 18:15                       ` Linus Torvalds
2023-05-17 19:07                         ` Beau Belgrave
2023-05-17 19:26                           ` Linus Torvalds
2023-05-17 19:36                             ` Beau Belgrave
2023-05-17 19:36                             ` Linus Torvalds
2023-05-17 19:37                               ` Linus Torvalds
2023-05-17 23:00                                 ` Beau Belgrave
2023-05-17 23:14                                   ` Linus Torvalds
2023-05-17 23:25                                     ` Steven Rostedt
2023-05-18  0:14                                       ` Beau Belgrave
2023-05-18  0:23                                         ` Linus Torvalds
2023-05-17 20:08                               ` Linus Torvalds
2023-05-17  1:26             ` Steven Rostedt
2023-05-17 16:50               ` Beau Belgrave
2023-05-18  0:10                 ` Alexei Starovoitov
2023-05-18  0:19                   ` Beau Belgrave
2023-05-18  0:56                     ` Alexei Starovoitov
2023-05-18  1:18                       ` Beau Belgrave
2023-05-18  2:08                         ` Steven Rostedt
2023-05-18  3:14                           ` Alexei Starovoitov
2023-05-18 13:36                             ` Steven Rostedt
2023-05-18 17:28                               ` Beau Belgrave
2023-06-01  9:46                   ` Christian Brauner
2023-06-01 15:24                     ` Beau Belgrave
2023-06-01 15:57                       ` Christian Brauner
2023-06-01 16:29                         ` Beau Belgrave
2023-06-06 13:37                           ` Masami Hiramatsu [this message]
2023-06-06 17:05                             ` Beau Belgrave
2023-06-07 14:07                               ` Masami Hiramatsu
2023-06-07 19:26                                 ` Beau Belgrave
2023-06-08  0:25                                   ` Masami Hiramatsu
2023-05-17 17:51             ` Beau Belgrave
2023-06-06 13:57             ` Masami Hiramatsu
2023-06-06 16:57               ` Andrii Nakryiko
2023-06-06 20:57                 ` Beau Belgrave

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230606223752.65dd725c04b11346b45e0546@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=beaub@linux.microsoft.com \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dthaler@microsoft.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).