From: "Mickaël Salaün" <mic@digikod.net>
To: Andy Lutomirski <luto@kernel.org>,
Daniel Mack <daniel@zonque.org>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Kees Cook <keescook@chromium.org>, Jann Horn <jann@thejh.net>,
Tejun Heo <tj@kernel.org>, David Ahern <dsahern@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Thomas Graf <tgraf@suug.ch>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Peter Zijlstra <peterz@infradead.org>
Cc: Linux API <linux-api@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Network Development <netdev@vger.kernel.org>,
John Stultz <john.stultz@linaro.org>
Subject: Re: Potential issues (security and otherwise) with the current cgroup-bpf API
Date: Sat, 17 Dec 2016 20:26:41 +0100 [thread overview]
Message-ID: <58559171.30402@digikod.net> (raw)
In-Reply-To: <CALCETrV81oFwq2AgeRsN54HA1jR=b5cOZfAgve8H8zhx83DTyA@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 7074 bytes --]
On 17/12/2016 19:18, Andy Lutomirski wrote:
> Hi all-
>
> I apologize for being rather late with this. I didn't realize that
> cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
> it on the linux-api list, so I missed the discussion.
>
> I think that the inet ingress, egress etc filters are a neat feature,
> but I think the API has some issues that will bite us down the road
> if it becomes stable in its current form.
>
> Most of the problems I see are summarized in this transcript:
>
> # mkdir cg2
> # mount -t cgroup2 none cg2
> # mkdir cg2/nosockets
> # strace cgrp_socket_rule cg2/nosockets/ 0
> ...
> open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3
>
> ^^^^ You can modify a cgroup after opening it O_RDONLY?
I sent a patch to check the cgroup.procs permission before attaching a
BPF program to it [1], but it was not merged because not part of the
current security model (which may not be crystal clear). The thing is
that the current socket/BPF/cgroup feature is only available to a
process with the *global CAP_NET_ADMIN* and such a process can already
modify the network for every processes, so it doesn't make much sense to
check if it can modify the network for a subset of this processes.
[1] https://lkml.org/lkml/2016/9/19/854
However, needing a process to open a cgroup *directory* in write mode
may not make sense because the process does not modify the content of
the cgroup but only use it as a *reference* in the network stack.
Forcing an open with write mode may forbid to use this kind of
network-filtering feature in a read-only file-system but not necessarily
read-only *network configuration*.
Another point of view is that the CAP_NET_ADMIN may be an unneeded
privilege if the cgroup migration is using a no_new_privs-like feature
as I proposed with Landlock [2] (with an extra ptrace_may_access() check).
The new capability proposition for cgroup may be interesting too.
[2] https://lkml.org/lkml/2016/9/14/82
>
> bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
> insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
> log_buf=0x6020c0, kern_version=0}, 48) = 4
>
> ^^^^ This is fine. The bpf() syscall manipulates bpf objects.
>
> bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0
>
> ^^^^ This is not so good:
> ^^^^
> ^^^^ a) The bpf() syscall is supposed to manipulate bpf objects. This
> ^^^^ is manipulating a cgroup. There's no reason that a socket creation
> ^^^^ filter couldn't be written in a different language (new iptables
> ^^^^ table? Simple list of address families?), but if that happened,
> ^^^^ then using bpf() to install it would be entirely nonsensical.
Another point of view is to say that the BPF program (called by the
network stack) is using a reference to a set of processes thanks to a
cgroup.
> ^^^^
> ^^^^ b) This is starting to be an excessively ugly multiplexer. Among
> ^^^^ other things, it's very unfriendly to seccomp.
FWIW, Landlock will have the capability to filter this kind of action.
>
> # echo $$ >cg2/nosockets/cgroup.procs
> # ping 127.0.0.1
> ping: socket: Operation not permitted
> # ls cg2/nosockets/
> cgroup.controllers cgroup.events cgroup.procs cgroup.subtree_control
> # cat cg2/nosockets/cgroup.controllers
>
> ^^^^ Something in cgroupfs should give an indication that this cgroup
> ^^^^ filters socket creation, but there's nothing there. You should also
> ^^^^ be able to turn the filter off from cgroupfs.
Right. Everybody was OK at LPC to add such an information but it is not
there yet.
>
> # mkdir cg2/nosockets/sockets
> # /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1
>
> ^^^^ This succeeded, which means that, if this feature is enabled in 4.10,
> ^^^^ then we're stuck with its semantics. If it returned -EINVAL instead,
> ^^^^ there would be a chance to refine it.
This is indeed unfortunate.
>
> # echo $$ >cg2/nosockets/sockets/cgroup.procs
> # ping 127.0.0.1
> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms
> ^C
> --- 127.0.0.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms
>
> ^^^^ Bash was inside a cgroup that disallowed socket creation, but socket
> ^^^^ creation wasn't disallowed. This means that the obvious use of socket
> ^^^^ creation filters in nestable constainers fails insecurely.
>
>
> There's also a subtle but nasty potential security problem here.
> In 4.9 and before, cgroups has only one real effect in the kernel:
> resource control. A process in a malicious cgroup could be DoSed,
> but that was about the extent of the damage that a malicious cgroup
> could do.
>
> In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf
> programs attached that can do things if various events occur. (Right
> now, this means socket operations, but there are plans in the works
> to do this for LSM hooks too.) These bpf programs can say yes or no,
> but they can also read out various data (including socket payloads!)
> and save them away where an attacker can find them. This sounds a
> lot like seccomp with a narrower scope but a much stronger ability to
> exfiltrate private information.
>
> Unfortunately, while seccomp is very, very careful to prevent
> injection of a privileged victim into a malicious sandbox, the
> CGROUP_BPF mechanism appears to have no real security model. There
> is nothing to prevent a program that's in a malicious cgroup from
> running a setuid binary, and there is nothing to prevent a program
> that has the ability to move itself or another program into a
> malicious cgroup from doing so and then, if needed for exploitation,
> exec a setuid binary.
>
> This isn't much of a problem yet because you currently need
> CAP_NET_ADMIN to create a malicious sandbox in the first place. I'm
> sure that, in the near future, someone will want to make this stuff
> work in containers with delegated cgroup hierarchies, and then there
> may be a real problem here.
>
>
> I've included a few security people on this thread. The current API
> looks abusable, and it would be nice to find all the holes before
> 4.10 comes out.
>
>
> (The cgrp_socket_rule source is attached. You can build it by sticking it
> in samples/bpf and doing:
>
> $ make headers_install
> $ cd samples/bpf
> $ gcc -o cgrp_socket_rule cgrp_socket_rule.c libbpf.c -I../../usr/include
> )
>
> --Andy
>
Right. Because this feature doesn't handle namespaces (only global
CAP_NET_ADMIN), nested containers should not be allowed to use it at all.
If we want this kind of feature to be usable by something other than a
process with a global capability, then we need an inheritance mechanism
similar to what I designed for Landlock. I think it could be added later.
Regards,
Mickaël
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2016-12-17 19:26 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-17 18:18 Potential issues (security and otherwise) with the current cgroup-bpf API Andy Lutomirski
2016-12-17 18:18 ` Andy Lutomirski
2016-12-17 19:26 ` Mickaël Salaün [this message]
2016-12-17 20:02 ` Andy Lutomirski
2016-12-19 20:56 ` Alexei Starovoitov
2016-12-19 21:23 ` Andy Lutomirski
2016-12-20 0:02 ` Alexei Starovoitov
[not found] ` <20161220000254.GA58895-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-12-20 0:25 ` Andy Lutomirski
2016-12-20 0:25 ` Andy Lutomirski
2016-12-20 1:43 ` Andy Lutomirski
[not found] ` <CALCETrU1_bDVLfokQ7zasHVmeq7S-R+603GEw59V_wuj4eE1hw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-20 1:44 ` David Ahern
2016-12-20 1:44 ` David Ahern
[not found] ` <2dbec775-6304-e44c-19c5-fbf07877e7b1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-12-20 1:56 ` Andy Lutomirski
2016-12-20 1:56 ` Andy Lutomirski
[not found] ` <CALCETrUW2jEYmjSsOrPj+MAjkDGGUCw_rdxQh+5Er0r4ReGLnA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-20 2:52 ` David Ahern
2016-12-20 2:52 ` David Ahern
2016-12-20 3:12 ` Andy Lutomirski
2016-12-20 4:44 ` Alexei Starovoitov
[not found] ` <20161220044440.GB86803-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-12-20 5:27 ` Andy Lutomirski
2016-12-20 5:27 ` Andy Lutomirski
[not found] ` <CALCETrVxkdZA3SsRv0KKhBz9YvNMsnmHSjS8HN1GHrgWRYNM1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-20 5:32 ` Alexei Starovoitov
2016-12-20 5:32 ` Alexei Starovoitov
2016-12-20 9:11 ` Peter Zijlstra
2016-12-20 9:11 ` Peter Zijlstra
[not found] ` <20161220091150.GJ3124-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2017-01-03 10:25 ` Michal Hocko
2017-01-03 10:25 ` Michal Hocko
[not found] ` <20170103102559.GA30129-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-01-16 1:19 ` Tejun Heo
2017-01-16 1:19 ` Tejun Heo
2017-01-17 13:03 ` Michal Hocko
2017-01-17 13:32 ` Peter Zijlstra
2017-01-17 13:58 ` Michal Hocko
[not found] ` <20170117135830.GO19699-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-01-17 20:23 ` Andy Lutomirski
2017-01-17 20:23 ` Andy Lutomirski
2017-01-18 22:18 ` Tejun Heo
2017-01-19 9:00 ` Michal Hocko
2016-12-20 3:18 ` Alexei Starovoitov
2016-12-20 3:18 ` Alexei Starovoitov
[not found] ` <20161220031802.GA77838-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-12-20 3:50 ` Andy Lutomirski
2016-12-20 3:50 ` Andy Lutomirski
2016-12-20 4:41 ` Alexei Starovoitov
[not found] ` <CALCETrXymvAo-9zhQe=amToz_fs9XGniK2KLZv5Fxc66qcUx6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-20 10:21 ` Daniel Mack
2016-12-20 10:21 ` Daniel Mack
2016-12-20 17:23 ` Andy Lutomirski
[not found] ` <CALCETrXyp2ddf4HRsEoN=qEwTBaezOUX2XWj6nxPcbc4t13svw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-12-20 18:36 ` Daniel Mack
2016-12-20 18:36 ` Daniel Mack
2016-12-20 18:49 ` Andy Lutomirski
2016-12-21 4:01 ` Alexei Starovoitov
2016-12-20 1:34 ` David Miller
2016-12-20 1:34 ` David Miller
[not found] ` <20161219.203422.500916400463091702.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2016-12-20 1:40 ` Andy Lutomirski
2016-12-20 1:40 ` Andy Lutomirski
2016-12-20 4:51 ` Alexei Starovoitov
2016-12-20 5:26 ` Andy Lutomirski
-- strict thread matches above, loose matches on Subject: below --
2017-01-17 5:18 (unknown), Andy Lutomirski
[not found] ` <CALCETrW+ZQ1xMEfdEOzx6RK9ZoCvQiqQSnOyhDJ=-FZMwUbucg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-18 22:41 ` Potential issues (security and otherwise) with the current cgroup-bpf API Tejun Heo
2017-01-18 22:41 ` Tejun Heo
[not found] ` <20170118224120.GG9171-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2017-01-19 0:18 ` Andy Lutomirski
2017-01-19 0:18 ` Andy Lutomirski
[not found] ` <CALCETrXv3k+=C=9D5YGypzPk8_f4yo8ShQ3tM2fOo++gJFBM4A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-19 0:59 ` Tejun Heo
2017-01-19 0:59 ` Tejun Heo
2017-01-19 2:29 ` Andy Lutomirski
[not found] ` <CALCETrVrEnL4cvkdDu2LUhxmeOZ+SMEmF=yKKsm9OYoW2y1Kpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-20 2:39 ` Alexei Starovoitov
2017-01-20 2:39 ` Alexei Starovoitov
2017-01-20 4:04 ` Andy Lutomirski
2017-01-23 4:31 ` Alexei Starovoitov
[not found] ` <20170123043154.GA93519-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2017-01-23 20:20 ` Andy Lutomirski
2017-01-23 20:20 ` Andy Lutomirski
[not found] ` <CALCETrXK65itdDqYTdhB-Td8d-Hzj00dcDScUOUh9psCZN_cLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-03 21:07 ` Andy Lutomirski
2017-02-03 21:07 ` Andy Lutomirski
2017-02-03 23:21 ` Alexei Starovoitov
[not found] ` <20170203232154.GA30114-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2017-02-04 17:10 ` Andy Lutomirski
2017-02-04 17:10 ` Andy Lutomirski
2017-01-19 1:01 ` Mickaël Salaün
2017-01-19 1:01 ` Mickaël Salaün
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58559171.30402@digikod.net \
--to=mic@digikod.net \
--cc=alexei.starovoitov@gmail.com \
--cc=daniel@zonque.org \
--cc=davem@davemloft.net \
--cc=dsahern@gmail.com \
--cc=jann@thejh.net \
--cc=john.stultz@linaro.org \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tgraf@suug.ch \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.