public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Mickaël Salaün" <mic@digikod.net>
To: Andy Lutomirski <luto@kernel.org>,
	Daniel Mack <daniel@zonque.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Kees Cook <keescook@chromium.org>, Jann Horn <jann@thejh.net>,
	Tejun Heo <tj@kernel.org>, David Ahern <dsahern@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Thomas Graf <tgraf@suug.ch>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Linux API <linux-api@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	John Stultz <john.stultz@linaro.org>
Subject: Re: Potential issues (security and otherwise) with the current cgroup-bpf API
Date: Sat, 17 Dec 2016 20:26:41 +0100	[thread overview]
Message-ID: <58559171.30402@digikod.net> (raw)
In-Reply-To: <CALCETrV81oFwq2AgeRsN54HA1jR=b5cOZfAgve8H8zhx83DTyA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 7074 bytes --]


On 17/12/2016 19:18, Andy Lutomirski wrote:
> Hi all-
> 
> I apologize for being rather late with this.  I didn't realize that
> cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
> it on the linux-api list, so I missed the discussion.
> 
> I think that the inet ingress, egress etc filters are a neat feature,
> but I think the API has some issues that will bite us down the road
> if it becomes stable in its current form.
> 
> Most of the problems I see are summarized in this transcript:
> 
> # mkdir cg2
> # mount -t cgroup2 none cg2
> # mkdir cg2/nosockets
> # strace cgrp_socket_rule cg2/nosockets/ 0
> ...
> open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3
> 
> ^^^^ You can modify a cgroup after opening it O_RDONLY?

I sent a patch to check the cgroup.procs permission before attaching a
BPF program to it [1], but it was not merged because not part of the
current security model (which may not be crystal clear). The thing is
that the current socket/BPF/cgroup feature is only available to a
process with the *global CAP_NET_ADMIN* and such a process can already
modify the network for every processes, so it doesn't make much sense to
check if it can modify the network for a subset of this processes.

[1] https://lkml.org/lkml/2016/9/19/854

However, needing a process to open a cgroup *directory* in write mode
may not make sense because the process does not modify the content of
the cgroup but only use it as a *reference* in the network stack.
Forcing an open with write mode may forbid to use this kind of
network-filtering feature in a read-only file-system but not necessarily
read-only *network configuration*.

Another point of view is that the CAP_NET_ADMIN may be an unneeded
privilege if the cgroup migration is using a no_new_privs-like feature
as I proposed with Landlock [2] (with an extra ptrace_may_access() check).
The new capability proposition for cgroup may be interesting too.

[2] https://lkml.org/lkml/2016/9/14/82

> 
> bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
> insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
> log_buf=0x6020c0, kern_version=0}, 48) = 4
> 
> ^^^^ This is fine.  The bpf() syscall manipulates bpf objects.
> 
> bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0
> 
> ^^^^ This is not so good:
> ^^^^
> ^^^^ a) The bpf() syscall is supposed to manipulate bpf objects.  This
> ^^^^    is manipulating a cgroup.  There's no reason that a socket creation
> ^^^^    filter couldn't be written in a different language (new iptables
> ^^^^    table?  Simple list of address families?), but if that happened,
> ^^^^    then using bpf() to install it would be entirely nonsensical.

Another point of view is to say that the BPF program (called by the
network stack) is using a reference to a set of processes thanks to a
cgroup.

> ^^^^
> ^^^^ b) This is starting to be an excessively ugly multiplexer.  Among
> ^^^^    other things, it's very unfriendly to seccomp.

FWIW, Landlock will have the capability to filter this kind of action.

> 
> # echo $$ >cg2/nosockets/cgroup.procs
> # ping 127.0.0.1
> ping: socket: Operation not permitted
> # ls cg2/nosockets/
> cgroup.controllers  cgroup.events  cgroup.procs  cgroup.subtree_control
> # cat cg2/nosockets/cgroup.controllers
> 
> ^^^^ Something in cgroupfs should give an indication that this cgroup
> ^^^^ filters socket creation, but there's nothing there.  You should also
> ^^^^ be able to turn the filter off from cgroupfs.

Right. Everybody was OK at LPC to add such an information but it is not
there yet.

> 
> # mkdir cg2/nosockets/sockets
> # /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1
> 
> ^^^^ This succeeded, which means that, if this feature is enabled in 4.10,
> ^^^^ then we're stuck with its semantics.  If it returned -EINVAL instead,
> ^^^^ there would be a chance to refine it.

This is indeed unfortunate.

> 
> # echo $$ >cg2/nosockets/sockets/cgroup.procs
> # ping 127.0.0.1
> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms
> ^C
> --- 127.0.0.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms
> 
> ^^^^ Bash was inside a cgroup that disallowed socket creation, but socket
> ^^^^ creation wasn't disallowed.  This means that the obvious use of socket
> ^^^^ creation filters in nestable constainers fails insecurely.
> 
> 
> There's also a subtle but nasty potential security problem here.
> In 4.9 and before, cgroups has only one real effect in the kernel:
> resource control. A process in a malicious cgroup could be DoSed,
> but that was about the extent of the damage that a malicious cgroup
> could do.
> 
> In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf
> programs attached that can do things if various events occur. (Right
> now, this means socket operations, but there are plans in the works
> to do this for LSM hooks too.) These bpf programs can say yes or no,
> but they can also read out various data (including socket payloads!)
> and save them away where an attacker can find them. This sounds a
> lot like seccomp with a narrower scope but a much stronger ability to
> exfiltrate private information.
> 
> Unfortunately, while seccomp is very, very careful to prevent
> injection of a privileged victim into a malicious sandbox, the
> CGROUP_BPF mechanism appears to have no real security model. There
> is nothing to prevent a program that's in a malicious cgroup from
> running a setuid binary, and there is nothing to prevent a program
> that has the ability to move itself or another program into a
> malicious cgroup from doing so and then, if needed for exploitation,
> exec a setuid binary.
> 
> This isn't much of a problem yet because you currently need
> CAP_NET_ADMIN to create a malicious sandbox in the first place.  I'm
> sure that, in the near future, someone will want to make this stuff
> work in containers with delegated cgroup hierarchies, and then there
> may be a real problem here.
> 
> 
> I've included a few security people on this thread.  The current API
> looks abusable, and it would be nice to find all the holes before
> 4.10 comes out.
> 
> 
> (The cgrp_socket_rule source is attached.  You can build it by sticking it
>  in samples/bpf and doing:
> 
>  $ make headers_install
>  $ cd samples/bpf
>  $ gcc -o cgrp_socket_rule cgrp_socket_rule.c libbpf.c -I../../usr/include
> )
> 
> --Andy
> 

Right. Because this feature doesn't handle namespaces (only global
CAP_NET_ADMIN), nested containers should not be allowed to use it at all.
If we want this kind of feature to be usable by something other than a
process with a global capability, then we need an inheritance mechanism
similar to what I designed for Landlock. I think it could be added later.

Regards,
 Mickaël


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2016-12-17 19:35 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-17 18:18 Potential issues (security and otherwise) with the current cgroup-bpf API Andy Lutomirski
2016-12-17 19:26 ` Mickaël Salaün [this message]
2016-12-17 20:02   ` Andy Lutomirski
2016-12-19 20:56 ` Alexei Starovoitov
2016-12-19 21:23   ` Andy Lutomirski
2016-12-20  0:02     ` Alexei Starovoitov
2016-12-20  0:25       ` Andy Lutomirski
2016-12-20  1:43         ` Andy Lutomirski
2016-12-20  1:44         ` David Ahern
2016-12-20  1:56           ` Andy Lutomirski
2016-12-20  2:52             ` David Ahern
2016-12-20  3:12               ` Andy Lutomirski
2016-12-20  4:44                 ` Alexei Starovoitov
2016-12-20  5:27                   ` Andy Lutomirski
2016-12-20  5:32                     ` Alexei Starovoitov
2016-12-20  9:11             ` Peter Zijlstra
2017-01-03 10:25               ` Michal Hocko
2017-01-16  1:19                 ` Tejun Heo
2017-01-17 13:03                   ` Michal Hocko
2017-01-17 13:32                     ` Peter Zijlstra
2017-01-17 13:58                       ` Michal Hocko
2017-01-17 20:23                         ` Andy Lutomirski
2017-01-18 22:18                         ` Tejun Heo
2017-01-19  9:00                           ` Michal Hocko
2016-12-20  3:18         ` Alexei Starovoitov
2016-12-20  3:50           ` Andy Lutomirski
2016-12-20  4:41             ` Alexei Starovoitov
2016-12-20 10:21             ` Daniel Mack
2016-12-20 17:23               ` Andy Lutomirski
2016-12-20 18:36                 ` Daniel Mack
2016-12-20 18:49                   ` Andy Lutomirski
2016-12-21  4:01                     ` Alexei Starovoitov
2016-12-20  1:34       ` David Miller
2016-12-20  1:40         ` Andy Lutomirski
2016-12-20  4:51           ` Alexei Starovoitov
2016-12-20  5:26             ` Andy Lutomirski
  -- strict thread matches above, loose matches on Subject: below --
2017-01-17  5:18 Andy Lutomirski
2017-01-18 22:41 ` Potential issues (security and otherwise) with the current cgroup-bpf API Tejun Heo
2017-01-19  0:18   ` Andy Lutomirski
2017-01-19  0:59     ` Tejun Heo
2017-01-19  2:29       ` Andy Lutomirski
2017-01-20  2:39         ` Alexei Starovoitov
2017-01-20  4:04           ` Andy Lutomirski
2017-01-23  4:31             ` Alexei Starovoitov
2017-01-23 20:20               ` Andy Lutomirski
2017-02-03 21:07                 ` Andy Lutomirski
2017-02-03 23:21                   ` Alexei Starovoitov
2017-02-04 17:10                     ` Andy Lutomirski
2017-01-19  1:01     ` Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58559171.30402@digikod.net \
    --to=mic@digikod.net \
    --cc=alexei.starovoitov@gmail.com \
    --cc=daniel@zonque.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=jann@thejh.net \
    --cc=john.stultz@linaro.org \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tgraf@suug.ch \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox