From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: luto@kernel.org, tglx@linutronix.de, keescook@chromium.org
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
linux-api@vger.kernel.org, willy@infradead.org,
linux-kselftest@vger.kernel.org, shuah@kernel.org,
Gabriel Krisman Bertazi <krisman@collabora.com>,
kernel@collabora.com
Subject: [PATCH v6 9/9] doc: Document Syscall User Dispatch
Date: Fri, 4 Sep 2020 16:31:47 -0400 [thread overview]
Message-ID: <20200904203147.2908430-10-krisman@collabora.com> (raw)
In-Reply-To: <20200904203147.2908430-1-krisman@collabora.com>
Explain the interface, provide some background and security notes.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
.../admin-guide/syscall-user-dispatch.rst | 87 +++++++++++++++++++
1 file changed, 87 insertions(+)
create mode 100644 Documentation/admin-guide/syscall-user-dispatch.rst
diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
new file mode 100644
index 000000000000..96616660fded
--- /dev/null
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -0,0 +1,87 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Syscall User Dispatch
+=====================
+
+Background
+----------
+
+Compatibility layers like Wine need a way to efficiently emulate system
+calls of only a part of their process - the part that has the
+incompatible code - while being able to execute native syscalls without
+a high performance penalty on the native part of the process. Seccomp
+falls short on this task, since it has limited support to efficiently
+filter syscalls based on memory regions, and it doesn't support removing
+filters. Therefore a new mechanism is necessary.
+
+Syscall User Dispatch brings the filtering of the syscall dispatcher
+address back to userspace. The application is in control of a flip
+switch, indicating the current personality of the process. A
+multiple-personality application can then flip the switch without
+invoking the kernel, when crossing the compatibility layer API
+boundaries, to enable/disable the syscall redirection and execute
+syscalls directly (disabled) or send them to be emulated in userspace
+through a SIGSYS.
+
+The goal of this design is to provide very quick compatibility layer
+boundary crosses, which is achieved by not executing a syscall to change
+personality every time the compatibility layer executes. Instead, a
+userspace memory region exposed to the kernel indicates the current
+personality, and the application simply modifies that variable to
+configure the mechanism.
+
+There is a relatively high cost associated with handling signals on most
+architectures, like x86, but at least for Wine, syscalls issued by
+native Windows code are currently not known to be a performance problem,
+since they are quite rare, at least for modern gaming applications.
+
+Since this mechanism is designed to capture syscalls issued by
+non-native applications, it must function on syscalls whose invocation
+ABI is completely unexpected to Linux. Syscall User Dispatch, therefore
+doesn't rely on any of the syscall ABI to make the filtering. It uses
+only the syscall dispatcher address and the userspace key.
+
+Interface
+---------
+
+A process can setup this mechanism on supported kernels
+CONFIG_SYSCALL_USER_DISPATCH) by executing the following prctl:
+
+ prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <start_addr>, <end_addr>, [selector])
+
+<op> is either PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF, to enable and
+disable the mechanism globally for that thread. When
+PR_SYS_DISPATCH_OFF is used, the other fields must be zero.
+
+<start_addr> and <end_addr> delimit a closed memory region interval from
+which syscalls are always executed directly, regardless of the userspace
+selector. This provides a fast path for the C library, which includes
+the most common syscall dispatchers in the native code applications, and
+also provides a way for the signal handler to return without triggering
+a nested SIGSYS on (rt_)sigreturn. Users of this interface should make
+sure that at least the signal trampoline code is included in this
+region. In addition, for syscalls that implement the trampoline code on
+the vDSO, that trampoline is never intercepted.
+
+[selector] is a pointer to a char-sized region in the process memory
+region, that provides a quick way to enable disable syscall redirection
+thread-wide, without the need to invoke the kernel directly. selector
+can be set to PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF. Any other
+value should terminate the program with a SIGSYS.
+
+Security Notes
+--------------
+
+Syscall User Dispatch provides functionality for compatibility layers to
+quickly capture system calls issued by a non-native part of the
+application, while not impacting the Linux native regions of the
+process. It is not a mechanism for sandboxing system calls, and it
+should not be seen as a security mechanism, since it is trivial for a
+malicious application to subvert the mechanism by jumping to an allowed
+dispatcher region prior to executing the syscall, or to discover the
+address and modify the selector value. If the use case requires any
+kind of security sandboxing, Seccomp should be used instead.
+
+Any fork or exec of the existing process resets the mechanism to
+PR_SYS_DISPATCH_OFF.
--
2.28.0
next prev parent reply other threads:[~2020-09-04 20:32 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-04 20:31 [PATCH v6 0/9] Syscall User Dispatch Gabriel Krisman Bertazi
2020-09-04 20:31 ` [PATCH v6 1/9] kernel: Support TIF_SYSCALL_INTERCEPT flag Gabriel Krisman Bertazi
2020-09-07 10:16 ` Christian Brauner
2020-09-08 4:59 ` Gabriel Krisman Bertazi
2020-09-22 19:42 ` Kees Cook
2020-09-23 20:28 ` Gabriel Krisman Bertazi
2020-09-11 9:32 ` peterz
2020-09-11 20:08 ` Gabriel Krisman Bertazi
2020-09-24 11:24 ` Peter Zijlstra
2020-09-22 19:44 ` Kees Cook
2020-09-23 20:18 ` Gabriel Krisman Bertazi
2020-09-23 20:49 ` Kees Cook
2020-09-25 8:00 ` Thomas Gleixner
2020-09-25 16:15 ` Gabriel Krisman Bertazi
2020-09-25 20:30 ` Kees Cook
2020-09-04 20:31 ` [PATCH v6 2/9] kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code Gabriel Krisman Bertazi
2020-09-07 10:16 ` Christian Brauner
2020-09-11 9:35 ` peterz
2020-09-11 20:11 ` Gabriel Krisman Bertazi
2020-09-04 20:31 ` [PATCH v6 3/9] x86: vdso: Expose sigreturn address on vdso to the kernel Gabriel Krisman Bertazi
2020-09-22 19:40 ` Kees Cook
2020-09-04 20:31 ` [PATCH v6 4/9] signal: Expose SYS_USER_DISPATCH si_code type Gabriel Krisman Bertazi
2020-09-07 10:15 ` Christian Brauner
2020-09-22 19:39 ` Kees Cook
2020-09-04 20:31 ` [PATCH v6 5/9] kernel: Implement selective syscall userspace redirection Gabriel Krisman Bertazi
2020-09-05 11:24 ` Matthew Wilcox
2020-09-11 9:44 ` peterz
2020-09-04 20:31 ` [PATCH v6 6/9] kernel: entry: Support Syscall User Dispatch for common syscall entry Gabriel Krisman Bertazi
2020-09-07 10:15 ` Christian Brauner
2020-09-07 14:15 ` Andy Lutomirski
2020-09-07 14:25 ` Christian Brauner
2020-09-07 20:20 ` Andy Lutomirski
2020-09-11 9:46 ` peterz
2020-09-04 20:31 ` [PATCH v6 7/9] x86: Enable Syscall User Dispatch Gabriel Krisman Bertazi
2020-09-22 19:37 ` Kees Cook
2020-09-23 20:23 ` Gabriel Krisman Bertazi
2020-09-04 20:31 ` [PATCH v6 8/9] selftests: Add kselftest for syscall user dispatch Gabriel Krisman Bertazi
2020-09-22 19:35 ` Kees Cook
2020-09-04 20:31 ` Gabriel Krisman Bertazi [this message]
2020-09-22 19:35 ` [PATCH v6 9/9] doc: Document Syscall User Dispatch Kees Cook
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200904203147.2908430-10-krisman@collabora.com \
--to=krisman@collabora.com \
--cc=keescook@chromium.org \
--cc=kernel@collabora.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=luto@kernel.org \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox