Re: [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Christian Brauner <brauner@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>,
	bpf@vger.kernel.org, netdev@vger.kernel.org, paul@paul-moore.com,
	linux-fsdevel@vger.kernel.org,
	linux-security-module@vger.kernel.org, keescook@chromium.org,
	kernel-team@meta.com, sargun@sargun.me
Subject: Re: [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS
Date: Thu, 9 Nov 2023 09:47:59 +0100	[thread overview]
Message-ID: <20231109-linden-kursprogramm-15c2cbd860b3@brauner> (raw)
In-Reply-To: <CAEf4BzbanZO_QPhzyFgBEuB0i+uZZO4rZn7mO1qNp3aoPx+32g@mail.gmail.com>

On Wed, Nov 08, 2023 at 01:09:27PM -0800, Andrii Nakryiko wrote:
> On Wed, Nov 8, 2023 at 5:51 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Fri, Nov 03, 2023 at 12:05:08PM -0700, Andrii Nakryiko wrote:
> > > Add few new mount options to BPF FS that allow to specify that a given
> > > BPF FS instance allows creation of BPF token (added in the next patch),
> > > and what sort of operations are allowed under BPF token. As such, we get
> > > 4 new mount options, each is a bit mask
> > >   - `delegate_cmds` allow to specify which bpf() syscall commands are
> > >     allowed with BPF token derived from this BPF FS instance;
> > >   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
> > >     a set of allowable BPF map types that could be created with BPF token;
> > >   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
> > >     a set of allowable BPF program types that could be loaded with BPF token;
> > >   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
> > >     a set of allowable BPF program attach types that could be loaded with
> > >     BPF token; delegate_progs and delegate_attachs are meant to be used
> > >     together, as full BPF program type is, in general, determined
> > >     through both program type and program attach type.
> > >
> > > Currently, these mount options accept the following forms of values:
> > >   - a special value "any", that enables all possible values of a given
> > >   bit set;
> > >   - numeric value (decimal or hexadecimal, determined by kernel
> > >   automatically) that specifies a bit mask value directly;
> > >   - all the values for a given mount option are combined, if specified
> > >   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
> > >   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
> > >   mask.
> > >
> > > Ideally, more convenient (for humans) symbolic form derived from
> > > corresponding UAPI enums would be accepted (e.g., `-o
> > > delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> > > it requires a bunch of UAPI header churn, so I postponed it until this
> > > feature lands upstream or at least there is a definite consensus that
> > > this feature is acceptable and is going to make it, just to minimize
> > > amount of wasted effort and not increase amount of non-essential code to
> > > be reviewed.
> > >
> > > Attentive reader will notice that BPF FS is now marked as
> > > FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> > > user namespace as long as the process has sufficient *namespaced*
> > > capabilities within that user namespace. But in reality we still
> > > restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> > > init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> > > to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> > > inside unprivileged process inside non-init userns, to capture that
> > > userns as the owning userns. It will still be required to pass this
> > > context object back to privileged process to instantiate and mount it.
> > >
> > > This manipulation is important, because capturing non-init userns as the
> > > owning userns of BPF FS instance (super block) allows to use that userns
> > > to constraint BPF token to that userns later on (see next patch). So
> > > creating BPF FS with delegation inside unprivileged userns will restrict
> > > derived BPF token objects to only "work" inside that intended userns,
> > > making it scoped to a intended "container".
> > >
> > > There is a set of selftests at the end of the patch set that simulates
> > > this sequence of steps and validates that everything works as intended.
> > > But careful review is requested to make sure there are no missed gaps in
> > > the implementation and testing.
> > >
> > > All this is based on suggestions and discussions with Christian Brauner
> > > ([0]), to the best of my ability to follow all the implications.
> >
> > "who will not be held responsible for any CVE future or present as he's
> >  not sure whether bpf token is a good idea in general"
> >
> > I'm not opposing it because it's really not my subsystem. But it'd be
> > nice if you also added a disclaimer that I'm not endorsing this. :)
> >
> 
> Sure, I'll clarify. I still appreciate your reviewing everything and
> pointing out all the gotchas (like the reconfiguration and other
> stuff), thanks!
> 
> > A comment below.
> >
> > >
> > >   [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/bpf.h | 10 ++++++
> > >  kernel/bpf/inode.c  | 88 +++++++++++++++++++++++++++++++++++++++------
> > >  2 files changed, 88 insertions(+), 10 deletions(-)
> > >
> 
> [...]
> 
> > >       opt = fs_parse(fc, bpf_fs_parameters, param, &result);
> > >       if (opt < 0) {
> > > @@ -665,6 +692,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > >       case OPT_MODE:
> > >               opts->mode = result.uint_32 & S_IALLUGO;
> > >               break;
> > > +     case OPT_DELEGATE_CMDS:
> > > +     case OPT_DELEGATE_MAPS:
> > > +     case OPT_DELEGATE_PROGS:
> > > +     case OPT_DELEGATE_ATTACHS:
> > > +             if (strcmp(param->string, "any") == 0) {
> > > +                     msk = ~0ULL;
> > > +             } else {
> > > +                     err = kstrtou64(param->string, 0, &msk);
> > > +                     if (err)
> > > +                             return err;
> > > +             }
> > > +             switch (opt) {
> > > +             case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
> > > +             case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
> > > +             case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
> > > +             case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
> > > +             default: return -EINVAL;
> > > +             }
> > > +             break;
> > >       }
> >
> > So just to repeat that this will allow a container to set it's own
> > delegation options:
> >
> >         # unprivileged container
> >
> >         fd_fs = fsopen();
> >         fsconfig(fd_fs, FSCONFIG_BLA_BLA, "give-me-all-the-delegation");
> >
> >         # Now hand of that fd_fs to a privileged process
> >
> >         fsconfig(fd_fs, FSCONFIG_CREATE_CMD, ...)
> >
> > This means the container manager can't be part of your threat model
> > because you need to trust it to set delegation options.
> >
> > But if the container manager is part of your threat model then you can
> > never trust an fd_fs handed to you because the container manager might
> > have enabled arbitrary delegation privileges.
> >
> > There's ways around this:
> >
> > (1) kernel: Account for this in the kernel and require privileges when
> >     setting delegation options.
> 
> What sort of privilege would that be? We are in an unprivileged user
> namespace, so that would have to be some ns_capable() checks or
> something? I can add ns_capable(CAP_BPF), but what else did you have
> in mind?

You would require privileges in the initial namespace aka capable()
checks similar to what you require for superblock creation.

> 
> I think even if we say that privileged parent does FSCONFIG_SET_STRING
> and unprivileged child just does sys_fsopen("bpf", 0) and nothing
> more, we still can't be sure that child won't race with parent and set
> FSCONFIG_SET_STRING at the same time. Because they both have access to
> the same fs_fd.

Unless you require privileges as outlined above to set delegation
options in which case an unprivileged container cannot change delegation
options at all.

> 
> > (2) userspace: A trusted helper that allocates an fs_context fd in
> >     the target user namespace, then sets delegation options and creates
> >     superblock.
> >
> > (1) Is more restrictive but also more secure. (2) is less restrictive
> > but requires more care from userspace.
> >
> > Either way I would probably consider writing a document detailing
> > various delegation scenarios and possible pitfalls and implications
> > before advertising it.
> >
> > If you choose (2) then you also need to be aware that the security of
> > this also hinges on bpffs not allowing to reconfigure parameters once it
> > has been mounted. Otherwise an unprivileged container can change
> > delegation options.
> >
> > I would recommend that you either add a dummy bpf_reconfigure() method
> > with a comment in it or you add a comment on top of bpf_context_ops.
> > Something like:
> >
> > /*
> >  * Unprivileged mounts of bpffs are owned by the user namespace they are
> >  * mounted in. That means unprivileged users can change vfs mount
> >  * options (ro<->rw, nosuid, etc.).
> >  *
> >  * They currently cannot change bpffs specific mount options such as
> >  * delegation settings. If that is ever implemented it is necessary to
> >  * require rivileges in the initial namespace. Otherwise unprivileged
> >  * users can change delegation options to whatever they want.
> >  */
> 
> Yep, I will add a custom callback. I think we can allow reconfiguring
> towards less permissive delegation subset, but I'll need to look at
> the specifics to see if we can support that easily.

next prev parent reply	other threads:[~2023-11-09  8:48 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-03 19:05 [PATCH v9 bpf-next 00/17] BPF token and BPF FS-based delegation Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 01/17] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
2023-11-05  6:33   ` Yafang Shao
2023-11-03 19:05 ` [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
2023-11-08 13:51   ` Christian Brauner
2023-11-08 21:09     ` Andrii Nakryiko
2023-11-09  8:47       ` Christian Brauner [this message]
2023-11-09 17:09         ` Andrii Nakryiko
2023-11-09 22:29           ` Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 03/17] bpf: introduce BPF token object Andrii Nakryiko
2023-11-08 14:28   ` Christian Brauner
2023-11-08 21:09     ` Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 04/17] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 05/17] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 06/17] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 07/17] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 08/17] bpf: consistently use BPF token throughout BPF verifier logic Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 09/17] bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks Andrii Nakryiko
2023-11-06  5:01   ` [PATCH v9 9/17] " Paul Moore
2023-11-06 19:03     ` Andrii Nakryiko
2023-11-06 22:29       ` Paul Moore
2023-11-03 19:05 ` [PATCH v9 bpf-next 10/17] bpf,lsm: refactor bpf_map_alloc/bpf_map_free " Andrii Nakryiko
2023-11-06  5:01   ` [PATCH v9 " Paul Moore
2023-11-03 19:05 ` [PATCH v9 bpf-next 11/17] bpf,lsm: add BPF token " Andrii Nakryiko
2023-11-04  0:36   ` kernel test robot
2023-11-04  3:20     ` Andrii Nakryiko
2023-11-06  5:01   ` [PATCH v9 " Paul Moore
2023-11-06 19:17     ` Andrii Nakryiko
2023-11-06 22:46       ` Paul Moore
2023-11-03 19:05 ` [PATCH v9 bpf-next 12/17] libbpf: add bpf_token_create() API Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 13/17] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 14/17] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 15/17] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 16/17] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
2023-11-03 19:05 ` [PATCH v9 bpf-next 17/17] bpf,selinux: allocate bpf_security_struct per BPF token Andrii Nakryiko
2023-11-06  5:01   ` [PATCH v9 " Paul Moore
2023-11-06 19:18     ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231109-linden-kursprogramm-15c2cbd860b3@brauner \
    --to=brauner@kernel.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=keescook@chromium.org \
    --cc=kernel-team@meta.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paul@paul-moore.com \
    --cc=sargun@sargun.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox