From: Sargun Dhillon <sargun@sargun.me>
To: linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org, linux-api@vger.kernel.org
Cc: Sargun Dhillon <sargun@sargun.me>,
christian.brauner@ubuntu.com, tycho@tycho.ws,
keescook@chromium.org, cyphar@cyphar.com,
Jeffrey Vander Stoep <jeffv@google.com>,
jannh@google.com, rsesek@google.com, palmer@google.com
Subject: [PATCH 0/5] Add seccomp notifier ioctl that enables adding fds
Date: Sun, 24 May 2020 16:39:37 -0700 [thread overview]
Message-ID: <20200524233942.8702-1-sargun@sargun.me> (raw)
This adds the capability for seccomp notifier listeners to add file
descriptors in response to a seccomp notification. This is useful for
syscalls in which the previous capabilities were not sufficient. The
current mechanism works well for syscalls that either have side effects
that are system / namespace wide (mount), or that operate on a specific
set of registers (reboot, mknod), and don't require dereferencing pointers.
The problem with derefencing pointers in a supervisor is that it leaves
us vulnerable to TOC-TOU [1] style attacks. For syscalls that had a direct
effect on file descriptors pidfd_getfd was added, allowing for those file
descriptors to be directly operated upon by the supervisor [2].
Unfortunately, this leaves system calls which return file descriptors
out of the picture. These are fairly common syscalls, such as openat,
socket, and perf_event_open that return file descriptors, and have
arguments that are pointers. These require that the supervisor is able to
verify the arguments, make the call on behalf of the process on hand,
and pass back the resulting file descriptor. This is where addfd comes
into play.
There is an additional flag that allows you to "set" an FD, rather than
add it with an arbitrary number. This has dup2 style semantics, and
installs the new file at that file descriptor, and atomically closes
the old one if it existed. This is useful for a particular use case
that we have, in which we want to swap out AF_INET sockets for AF_UNIX,
AF_INET6, and sockets in another namespace when doing "upconversion".
My specific usecase at Netflix is to enable our IPv4-IPv6 transition
mechanism, in which we our namespaces have no real IPv4 reachability,
and when it comes time to do a connect(2), we get a socket from a
namespace with global IPv4 reachability.
In addition, we intend to use it for our servicemesh, and where our
service mesh needs to intercept traffic ingress traffic, the addfd
capability will act as a mechanism to do socket activation.
Addfd is not implemented as a separate syscall, a la pidfd_getfd, as
VFS makes some optimizations in regards to the fdtable, and assumes
that they are not modified by external processes. Although a mechanism
that scheduled something in the context of the task could work, it is
somewhat simpler to do it in the context of the ioctl as we control
the task while in kernel.
There is an additional flag (move) that was added to enable cgroup
v1 controllers (netprio, classid), and moving sockets, as a socket
can only be associated with one cgroup at a time.
[1]: https://lore.kernel.org/lkml/20190918084833.9369-2-christian.brauner@ubuntu.com/
[2]: https://lore.kernel.org/lkml/20200107175927.4558-1-sargun@sargun.me/
Sargun Dhillon (5):
seccomp: Add find_notification helper
seccomp: Introduce addfd ioctl to seccomp user notifier
selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
seccomp: Add SECCOMP_ADDFD_FLAG_MOVE flag to add fd ioctl
selftests/seccomp: Add test for addfd move semantics
include/uapi/linux/seccomp.h | 33 +++
kernel/seccomp.c | 228 +++++++++++++++--
tools/testing/selftests/seccomp/seccomp_bpf.c | 235 ++++++++++++++++++
3 files changed, 479 insertions(+), 17 deletions(-)
--
2.25.1
next reply other threads:[~2020-05-24 23:39 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-24 23:39 Sargun Dhillon [this message]
2020-05-24 23:39 ` [PATCH 1/5] seccomp: Add find_notification helper Sargun Dhillon
2020-05-24 23:55 ` Tycho Andersen
2020-05-25 13:26 ` Christian Brauner
2020-05-24 23:39 ` [PATCH 2/5] seccomp: Introduce addfd ioctl to seccomp user notifier Sargun Dhillon
2020-05-24 23:57 ` Tycho Andersen
2020-05-24 23:58 ` Tycho Andersen
2020-05-25 0:05 ` Al Viro
2020-05-25 0:27 ` Sargun Dhillon
2020-05-25 0:39 ` Al Viro
2020-05-25 13:50 ` Christian Brauner
2020-05-26 6:59 ` Sargun Dhillon
2020-05-26 8:22 ` Christian Brauner
2020-05-24 23:39 ` [PATCH 3/5] selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD Sargun Dhillon
2020-05-24 23:39 ` [PATCH 4/5] seccomp: Add SECCOMP_ADDFD_FLAG_MOVE flag to add fd ioctl Sargun Dhillon
2020-05-25 14:20 ` Christian Brauner
2020-05-26 6:08 ` Sargun Dhillon
2020-05-24 23:39 ` [PATCH 5/5] selftests/seccomp: Add test for addfd move semantics Sargun Dhillon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200524233942.8702-1-sargun@sargun.me \
--to=sargun@sargun.me \
--cc=christian.brauner@ubuntu.com \
--cc=containers@lists.linux-foundation.org \
--cc=cyphar@cyphar.com \
--cc=jannh@google.com \
--cc=jeffv@google.com \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=palmer@google.com \
--cc=rsesek@google.com \
--cc=tycho@tycho.ws \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).