* Re: [PATCH V9 13/24] LoongArch: Add system call support
[not found] ` <CAAhV-H4te_+AS69viO4eBz=abBUm5oQ6AfoY1Cb+nOCZyyeMdA@mail.gmail.com>
@ 2022-04-30 10:34 ` Arnd Bergmann
2022-05-07 12:11 ` Christian Brauner
0 siblings, 1 reply; 8+ messages in thread
From: Arnd Bergmann @ 2022-04-30 10:34 UTC (permalink / raw)
To: Huacai Chen
Cc: Arnd Bergmann, Huacai Chen, Andy Lutomirski, Thomas Gleixner,
Peter Zijlstra, Andrew Morton, David Airlie, Jonathan Corbet,
Linus Torvalds, linux-arch, open list:DOCUMENTATION,
Linux Kernel Mailing List, Xuefeng Li, Yanteng Si, Guo Ren,
Xuerui Wang, Jiaxun Yang, Christian Brauner, Linux API
On Sat, Apr 30, 2022 at 12:05 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> On Sat, Apr 30, 2022 at 5:45 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > On Sat, Apr 30, 2022 at 11:05 AM Huacai Chen <chenhuacai@loongson.cn> wrote:
> > >
> > > This patch adds system call support and related uaccess.h for LoongArch.
> > >
> > > Q: Why keep __ARCH_WANT_NEW_STAT definition while there is statx:
> > > A: Until the latest glibc release (2.34), statx is only used for 32-bit
> > > platforms, or 64-bit platforms with 32-bit timestamp. I.e., Most 64-
> > > bit platforms still use newstat now.
> > >
> > > Q: Why keep _ARCH_WANT_SYS_CLONE definition while there is clone3:
> > > A: The latest glibc release (2.34) has some basic support for clone3 but
> > > it isn't complete. E.g., pthread_create() and spawni() have converted
> > > to use clone3 but fork() will still use clone. Moreover, some seccomp
> > > related applications can still not work perfectly with clone3. E.g.,
> > > Chromium sandbox cannot work at all and there is no solution for it,
> > > which is more terrible than the fork() story [1].
> > >
> > > [1] https://chromium-review.googlesource.com/c/chromium/src/+/2936184
> >
> > I still think these have to be removed. There is no mainline glibc or musl
> > port yet, and neither of them should actually be required. Please remove
> > them here, and modify your libc patches accordingly when you send those
> > upstream.
>
> If this is just a problem that can be resolved by upgrading
> glibc/musl, I will remove them. But the Chromium problem (or sandbox
> problem in general) seems to have no solution now.
I added Christian Brauner to Cc now, maybe he has come across the
sandbox problem before and has an idea for a solution.
Arnd
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V9 13/24] LoongArch: Add system call support
2022-04-30 10:34 ` [PATCH V9 13/24] LoongArch: Add system call support Arnd Bergmann
@ 2022-05-07 12:11 ` Christian Brauner
2022-05-09 10:00 ` Christian Brauner
0 siblings, 1 reply; 8+ messages in thread
From: Christian Brauner @ 2022-05-07 12:11 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Huacai Chen, Huacai Chen, Andy Lutomirski, Thomas Gleixner,
Peter Zijlstra, Andrew Morton, David Airlie, Jonathan Corbet,
Linus Torvalds, linux-arch, open list:DOCUMENTATION,
Linux Kernel Mailing List, Xuefeng Li, Yanteng Si, Guo Ren,
Xuerui Wang, Jiaxun Yang, Linux API
On Sat, Apr 30, 2022 at 12:34:52PM +0200, Arnd Bergmann wrote:
> On Sat, Apr 30, 2022 at 12:05 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > On Sat, Apr 30, 2022 at 5:45 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Sat, Apr 30, 2022 at 11:05 AM Huacai Chen <chenhuacai@loongson.cn> wrote:
> > > >
> > > > This patch adds system call support and related uaccess.h for LoongArch.
> > > >
> > > > Q: Why keep __ARCH_WANT_NEW_STAT definition while there is statx:
> > > > A: Until the latest glibc release (2.34), statx is only used for 32-bit
> > > > platforms, or 64-bit platforms with 32-bit timestamp. I.e., Most 64-
> > > > bit platforms still use newstat now.
> > > >
> > > > Q: Why keep _ARCH_WANT_SYS_CLONE definition while there is clone3:
> > > > A: The latest glibc release (2.34) has some basic support for clone3 but
> > > > it isn't complete. E.g., pthread_create() and spawni() have converted
> > > > to use clone3 but fork() will still use clone. Moreover, some seccomp
> > > > related applications can still not work perfectly with clone3. E.g.,
> > > > Chromium sandbox cannot work at all and there is no solution for it,
> > > > which is more terrible than the fork() story [1].
> > > >
> > > > [1] https://chromium-review.googlesource.com/c/chromium/src/+/2936184
> > >
> > > I still think these have to be removed. There is no mainline glibc or musl
> > > port yet, and neither of them should actually be required. Please remove
> > > them here, and modify your libc patches accordingly when you send those
> > > upstream.
> >
> > If this is just a problem that can be resolved by upgrading
> > glibc/musl, I will remove them. But the Chromium problem (or sandbox
> > problem in general) seems to have no solution now.
>
> I added Christian Brauner to Cc now, maybe he has come across the
> sandbox problem before and has an idea for a solution.
(I just got back from LSFMM so I'll reply in more detail next week. I'm
still pretty jet-lagged.)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V9 13/24] LoongArch: Add system call support
2022-05-07 12:11 ` Christian Brauner
@ 2022-05-09 10:00 ` Christian Brauner
2022-05-11 7:11 ` Arnd Bergmann
2022-05-11 16:17 ` Florian Weimer
0 siblings, 2 replies; 8+ messages in thread
From: Christian Brauner @ 2022-05-09 10:00 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Huacai Chen, Huacai Chen, Andy Lutomirski, Thomas Gleixner,
Peter Zijlstra, Andrew Morton, David Airlie, Jonathan Corbet,
Linus Torvalds, linux-arch, open list:DOCUMENTATION,
Linux Kernel Mailing List, Xuefeng Li, Yanteng Si, Guo Ren,
Xuerui Wang, Jiaxun Yang, Linux API
On Sat, May 07, 2022 at 02:11:04PM +0200, Christian Brauner wrote:
> On Sat, Apr 30, 2022 at 12:34:52PM +0200, Arnd Bergmann wrote:
> > On Sat, Apr 30, 2022 at 12:05 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > > On Sat, Apr 30, 2022 at 5:45 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > > > On Sat, Apr 30, 2022 at 11:05 AM Huacai Chen <chenhuacai@loongson.cn> wrote:
> > > > >
> > > > > This patch adds system call support and related uaccess.h for LoongArch.
> > > > >
> > > > > Q: Why keep __ARCH_WANT_NEW_STAT definition while there is statx:
> > > > > A: Until the latest glibc release (2.34), statx is only used for 32-bit
> > > > > platforms, or 64-bit platforms with 32-bit timestamp. I.e., Most 64-
> > > > > bit platforms still use newstat now.
> > > > >
> > > > > Q: Why keep _ARCH_WANT_SYS_CLONE definition while there is clone3:
> > > > > A: The latest glibc release (2.34) has some basic support for clone3 but
> > > > > it isn't complete. E.g., pthread_create() and spawni() have converted
> > > > > to use clone3 but fork() will still use clone. Moreover, some seccomp
> > > > > related applications can still not work perfectly with clone3. E.g.,
> > > > > Chromium sandbox cannot work at all and there is no solution for it,
> > > > > which is more terrible than the fork() story [1].
> > > > >
> > > > > [1] https://chromium-review.googlesource.com/c/chromium/src/+/2936184
> > > >
> > > > I still think these have to be removed. There is no mainline glibc or musl
> > > > port yet, and neither of them should actually be required. Please remove
> > > > them here, and modify your libc patches accordingly when you send those
> > > > upstream.
> > >
> > > If this is just a problem that can be resolved by upgrading
> > > glibc/musl, I will remove them. But the Chromium problem (or sandbox
> > > problem in general) seems to have no solution now.
> >
> > I added Christian Brauner to Cc now, maybe he has come across the
> > sandbox problem before and has an idea for a solution.
>
> (I just got back from LSFMM so I'll reply in more detail next week. I'm
> still pretty jet-lagged.)
Right, I forgot about the EPERM/ENOSYS sandbox thread.
Kees and I gave a talk about this problem at LPC 2019 (see [2]). The
proposed solutions back then was to add basic deep argument inspection
for first-level pointers to seccomp.
There are problems with this approach such as not useable on
second-level pointers (although we concluded that's ok) and if the input
args are very large copying stuff from within seccomp becomes rather
costly and in general the various approaches seemed handwavy at the
time.
If seccomp were to be made to support some basic form of eBPF such that
it can still be safely called by unprivileged users then this would
likely be easier to do (famous last words) but given that the stance has
traditionally bee to not port seccomp it remains a tricky patch.
Some time after that I talked to Mathieu Desnoyers about this issue who
used another angle of attack. The idea seems less complicated to me.
Instead of argument inspection we introduce basic syscall argument
checksumming for seccomp. It would only be done when seccomp is
interested in syscall input args and checksumming would be per syscall
argument. It would be validated within the syscall when it actually
reads the arguments; again, only if seccomp is used. If the checksums
mismatch an error is returned or the calling process terminated.
There's one case that deserves mentioning: since we introduced the
seccomp notifier we do allow advanced syscall interception and we do use
it extensively in various projects.
Roughly, it works by allowing a userspace process (the "supervisor") to
listen on a seccomp fd. The seccomp fd is an fd referring to the filter
of a target task (the "supervisee"). When the supervisee performs a
syscall listed in the seccomp notify filter the supervisor will receive
a notification on the seccomp fd for the filter.
I mention this because it is possible for the supervisor to e.g.
intercept an bpf() system call and then modify/create/attach a bpf
program for the supervisee and then update fields in the supervisee's
bpf struct that was passed to the bpf() syscall by it. So the supervisor
might rewrite syscall args and continue the syscall (In general, it's
not recommeneded because of TOCTOU. But still doable in certain
scenarios where we can guarantee that this is safe even if syscall args
are rewritten to something else by a MIT attack.).
Arguably, the checksumming approach could even be made to work with this
if the seccomp fd learns a new ioctl() or similar to safely update the
checksum.
I can try and move a poc for this up the todo list.
Without an approach like this certain sandboxes will fallback to
ENOSYSing system calls they can't filter. This is a generic problem
though with clone3() being one promiment example.
[2]: https://www.youtube.com/watch?v=PnOSPsRzVYM&list=PLVsQ_xZBEyN2Ol7y8axxhbTsG47Va3Se2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V9 13/24] LoongArch: Add system call support
2022-05-09 10:00 ` Christian Brauner
@ 2022-05-11 7:11 ` Arnd Bergmann
2022-05-11 21:12 ` [musl] " Rich Felker
2022-05-11 16:17 ` Florian Weimer
1 sibling, 1 reply; 8+ messages in thread
From: Arnd Bergmann @ 2022-05-11 7:11 UTC (permalink / raw)
To: Christian Brauner
Cc: Arnd Bergmann, Huacai Chen, Huacai Chen, Andy Lutomirski,
Thomas Gleixner, Peter Zijlstra, Andrew Morton, David Airlie,
Jonathan Corbet, Linus Torvalds, linux-arch,
open list:DOCUMENTATION, Linux Kernel Mailing List, Xuefeng Li,
Yanteng Si, Guo Ren, Xuerui Wang, Jiaxun Yang, Linux API,
GNU C Library, musl
On Mon, May 9, 2022 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
....
> I can try and move a poc for this up the todo list.
>
> Without an approach like this certain sandboxes will fallback to
> ENOSYSing system calls they can't filter. This is a generic problem
> though with clone3() being one promiment example.
Thank you for the detailed reply. It sounds to me like this will eventually have
to get solved anyway, so we could move ahead without clone() on loongarch,
and just not have support for Chrome until this is fully solved.
As both the glibc and musl ports are being proposed for inclusion right
now, we should try to come to a decision so the libc ports can adjust if
necessary. Adding both mailing lists to Cc here, the discussion is archived
at [1].
Arnd
[1] https://lore.kernel.org/linux-arch/20220509100058.vmrgn5fkk3ayt63v@wittgenstein/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V9 13/24] LoongArch: Add system call support
2022-05-09 10:00 ` Christian Brauner
2022-05-11 7:11 ` Arnd Bergmann
@ 2022-05-11 16:17 ` Florian Weimer
1 sibling, 0 replies; 8+ messages in thread
From: Florian Weimer @ 2022-05-11 16:17 UTC (permalink / raw)
To: Christian Brauner
Cc: Arnd Bergmann, Huacai Chen, Huacai Chen, Andy Lutomirski,
Thomas Gleixner, Peter Zijlstra, Andrew Morton, David Airlie,
Jonathan Corbet, Linus Torvalds, linux-arch,
open list:DOCUMENTATION, Linux Kernel Mailing List, Xuefeng Li,
Yanteng Si, Guo Ren, Xuerui Wang, Jiaxun Yang, Linux API
* Christian Brauner:
> Without an approach like this certain sandboxes will fallback to
> ENOSYSing system calls they can't filter. This is a generic problem
> though with clone3() being one promiment example.
Furthermore, for glibc (and I believe musl as well), the trick with
in-process emulation of clone3 using SIGSYS does not work here because
we must inhibit delivery of signals on the nascent thread, before it is
fully set up. This means that we have to block signals around the
clone/clone3 system call, so that the new thread is created with all
signals blocked. This means that instead of calling the SIGSYS handler,
the filtered system call simply terminates the process.
(I think there have been discussions of using out-of-process filtering,
but I don't know where we are with that.)
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] Re: [PATCH V9 13/24] LoongArch: Add system call support
2022-05-11 7:11 ` Arnd Bergmann
@ 2022-05-11 21:12 ` Rich Felker
2022-05-12 7:21 ` Arnd Bergmann
0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2022-05-11 21:12 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Christian Brauner, Huacai Chen, Huacai Chen, Andy Lutomirski,
Thomas Gleixner, Peter Zijlstra, Andrew Morton, David Airlie,
Jonathan Corbet, Linus Torvalds, linux-arch,
open list:DOCUMENTATION, Linux Kernel Mailing List, Xuefeng Li,
Yanteng Si, Guo Ren, Xuerui Wang, Jiaxun Yang, Linux API,
GNU C Library, musl
On Wed, May 11, 2022 at 09:11:56AM +0200, Arnd Bergmann wrote:
> On Mon, May 9, 2022 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
> .....
> > I can try and move a poc for this up the todo list.
> >
> > Without an approach like this certain sandboxes will fallback to
> > ENOSYSing system calls they can't filter. This is a generic problem
> > though with clone3() being one promiment example.
>
> Thank you for the detailed reply. It sounds to me like this will eventually have
> to get solved anyway, so we could move ahead without clone() on loongarch,
> and just not have support for Chrome until this is fully solved.
>
> As both the glibc and musl ports are being proposed for inclusion right
> now, we should try to come to a decision so the libc ports can adjust if
> necessary. Adding both mailing lists to Cc here, the discussion is archived
> at [1].
>
> Arnd
>
> [1] https://lore.kernel.org/linux-arch/20220509100058.vmrgn5fkk3ayt63v@wittgenstein/
Having read about the seccomp issue, I think it's a very strong
argument that __NR_clone should be kept permanently for all future
archs. Otherwise, at least AIUI, it's impossible to seccomp-sandbox
multithreaded programs (since you can't allow the creation of threads
without also allowing other unwanted use of clone3). It sounds like
there's some interest in extending seccomp to allow filtering of
argument blocks like clone3 uses, but some of what I read about was
checksum-based (thus a weak hardening measure at best, not a hard
privilege boundary) and even if something is eventually created that
works, it won't be available right away, and it won't be nearly as
easy to use as just allowing thread-creating clone syscalls on
existing archs.
Rich
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] Re: [PATCH V9 13/24] LoongArch: Add system call support
2022-05-11 21:12 ` [musl] " Rich Felker
@ 2022-05-12 7:21 ` Arnd Bergmann
2022-05-12 12:11 ` Rich Felker
0 siblings, 1 reply; 8+ messages in thread
From: Arnd Bergmann @ 2022-05-12 7:21 UTC (permalink / raw)
To: musl
Cc: Arnd Bergmann, Christian Brauner, Huacai Chen, Huacai Chen,
Andy Lutomirski, Thomas Gleixner, Peter Zijlstra, Andrew Morton,
David Airlie, Jonathan Corbet, Linus Torvalds, linux-arch,
open list:DOCUMENTATION, Linux Kernel Mailing List, Xuefeng Li,
Yanteng Si, Guo Ren, Xuerui Wang, Jiaxun Yang, Linux API,
GNU C Library
On Wed, May 11, 2022 at 11:12 PM Rich Felker <dalias@libc.org> wrote:
> On Wed, May 11, 2022 at 09:11:56AM +0200, Arnd Bergmann wrote:
> > On Mon, May 9, 2022 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
> > .....
> > > I can try and move a poc for this up the todo list.
> > >
> > > Without an approach like this certain sandboxes will fallback to
> > > ENOSYSing system calls they can't filter. This is a generic problem
> > > though with clone3() being one promiment example.
> >
> > Thank you for the detailed reply. It sounds to me like this will eventually have
> > to get solved anyway, so we could move ahead without clone() on loongarch,
> > and just not have support for Chrome until this is fully solved.
> >
> > As both the glibc and musl ports are being proposed for inclusion right
> > now, we should try to come to a decision so the libc ports can adjust if
> > necessary. Adding both mailing lists to Cc here, the discussion is archived
> > at [1].
> >
> > Arnd
> >
> > [1] https://lore.kernel.org/linux-arch/20220509100058.vmrgn5fkk3ayt63v@wittgenstein/
>
> Having read about the seccomp issue, I think it's a very strong
> argument that __NR_clone should be kept permanently for all future
> archs.
Ok, let's keep clone() around for all architectures then. We should probably
just remove the __ARCH_WANT_SYS_CLONE macro and build the
code into the kernel unconditionally, but at the moment there
are still private versions for ia64 and sparc with the same name as
the generic version. Both are also still lacking support for clone3() and
don't have anyone actively working on them.
In this case, we probably don't need to change clone3() to allow the
zero-length stack after all, and the wrapper that was added to the
musl port should get removed again.
For the other syscalls, I think the latest musl patches already dropped
the old-style stat() implementation, but the glibc patches still have those
and need to drop them as well to match what the kernel will get.
Arnd
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] Re: [PATCH V9 13/24] LoongArch: Add system call support
2022-05-12 7:21 ` Arnd Bergmann
@ 2022-05-12 12:11 ` Rich Felker
0 siblings, 0 replies; 8+ messages in thread
From: Rich Felker @ 2022-05-12 12:11 UTC (permalink / raw)
To: Arnd Bergmann
Cc: musl, Christian Brauner, Huacai Chen, Huacai Chen,
Andy Lutomirski, Thomas Gleixner, Peter Zijlstra, Andrew Morton,
David Airlie, Jonathan Corbet, Linus Torvalds, linux-arch,
open list:DOCUMENTATION, Linux Kernel Mailing List, Xuefeng Li,
Yanteng Si, Guo Ren, Xuerui Wang, Jiaxun Yang, Linux API,
GNU C Library
On Thu, May 12, 2022 at 09:21:13AM +0200, Arnd Bergmann wrote:
> On Wed, May 11, 2022 at 11:12 PM Rich Felker <dalias@libc.org> wrote:
> > On Wed, May 11, 2022 at 09:11:56AM +0200, Arnd Bergmann wrote:
> > > On Mon, May 9, 2022 at 12:00 PM Christian Brauner <brauner@kernel.org> wrote:
> > > .....
> > > > I can try and move a poc for this up the todo list.
> > > >
> > > > Without an approach like this certain sandboxes will fallback to
> > > > ENOSYSing system calls they can't filter. This is a generic problem
> > > > though with clone3() being one promiment example.
> > >
> > > Thank you for the detailed reply. It sounds to me like this will eventually have
> > > to get solved anyway, so we could move ahead without clone() on loongarch,
> > > and just not have support for Chrome until this is fully solved.
> > >
> > > As both the glibc and musl ports are being proposed for inclusion right
> > > now, we should try to come to a decision so the libc ports can adjust if
> > > necessary. Adding both mailing lists to Cc here, the discussion is archived
> > > at [1].
> > >
> > > Arnd
> > >
> > > [1] https://lore.kernel.org/linux-arch/20220509100058.vmrgn5fkk3ayt63v@wittgenstein/
> >
> > Having read about the seccomp issue, I think it's a very strong
> > argument that __NR_clone should be kept permanently for all future
> > archs.
>
> Ok, let's keep clone() around for all architectures then. We should probably
> just remove the __ARCH_WANT_SYS_CLONE macro and build the
> code into the kernel unconditionally, but at the moment there
> are still private versions for ia64 and sparc with the same name as
> the generic version. Both are also still lacking support for clone3() and
> don't have anyone actively working on them.
>
> In this case, we probably don't need to change clone3() to allow the
> zero-length stack after all, and the wrapper that was added to the
> musl port should get removed again.
I still think disallowing a zero length (unknown length with caller
providing the start address only) stack is a gratuitous limitation on
the clone3 interface, and would welcome leaving the change to allow
zero-length in place. There does not seem to be any good justification
for forbidding it, and it does pose other real-world obstruction to
use. For example if your main thread had exited (or if you're forking
from a non-main thread) and you wanted to create a new process using
the old main thread stack as your stack, you would not know a
size/lowest-address, only a starting address from which it extends
some long (and possibly expanding) amount.
Rich
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-05-12 12:11 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20220430090518.3127980-1-chenhuacai@loongson.cn>
[not found] ` <20220430090518.3127980-14-chenhuacai@loongson.cn>
[not found] ` <CAK8P3a0A9dW4mwJ6JHDiJxizL7vWfr4r4c5KhbjtAY0sWbZJVA@mail.gmail.com>
[not found] ` <CAAhV-H4te_+AS69viO4eBz=abBUm5oQ6AfoY1Cb+nOCZyyeMdA@mail.gmail.com>
2022-04-30 10:34 ` [PATCH V9 13/24] LoongArch: Add system call support Arnd Bergmann
2022-05-07 12:11 ` Christian Brauner
2022-05-09 10:00 ` Christian Brauner
2022-05-11 7:11 ` Arnd Bergmann
2022-05-11 21:12 ` [musl] " Rich Felker
2022-05-12 7:21 ` Arnd Bergmann
2022-05-12 12:11 ` Rich Felker
2022-05-11 16:17 ` Florian Weimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).