From: Hajime Tazaki <thehajime@gmail.com>
To: johannes@sipsolutions.net
Cc: linux-um@lists.infradead.org, ricarkol@google.com,
Liam.Howlett@oracle.com, gerg@linux-m68k.org,
geert@linux-m68k.org, dalias@libc.org
Subject: Re: [RFC PATCH v2 00/13] nommu UML
Date: Fri, 15 Nov 2024 23:48:03 +0900 [thread overview]
Message-ID: <m27c94ft98.wl-thehajime@gmail.com> (raw)
In-Reply-To: <c85bd71e0f4797edf2635cf4059699d1a3862218.camel@sipsolutions.net>
Hello Johannes,
# added Geert, Greg, Rich to Cc (sorry if you feel noisy)
# here is the original email of this thread: just in case.
# https://lore.kernel.org/linux-um/cover.1731290567.git.thehajime@gmail.com/
On Fri, 15 Nov 2024 19:12:39 +0900,
Johannes Berg wrote:
>
> On Mon, 2024-11-11 at 15:27 +0900, Hajime Tazaki wrote:
> > This is a series of patches of nommu arch addition to UML. It would
> > be nice to ask comments/opinions on this.
>
> So I've been thinking about this for a while now...
thank you for your time !
> To be clear, I'm not really _against_ it. With around 1200 lines of
> code, it really isn't even big. But I also don't know how brittle it is?
> Testing it is made somewhat difficult with the map-at-zero requirement
> too.
Given the recent situation that CI/testing facilities running are on
VMs, configuring /proc/sys/vm/mmap_min_addr=0 is not so difficult
in order to test this feature.
> And really I keep coming back to asking myself what the use case is?
>
> Is it to test something for no-MMU platforms more easily? But I'm not
> sure what that would be? Have any no-MMU platform maintainers weighed in
> on this, have they even _seen_ it? Is that interesting? Is it more
> interesting than testing an emulated system with the right architecture?
Let me explain one recent experience for the use case.
I spotted (and fixed, now in linus tree) an issue of vma subsystem
using the maple-tree library, during this development of patch series.
There is a (slightly) long thread here to discuss with the maple-tree
maintainer, Liam (below).
- traversing vma on nommu
https://lists.infradead.org/pipermail/maple-tree/2024-November/003740.html
The issue was bisected that I can reproduce it after v6.12-rc1, but
never happened with the other nommu arch (we tested with m68k and
riscv, both on buildroot qemu). maybe because I'm familiar with nommu
UML than m68k/riscv qemu, I could comfortably reproduce/debug/test
what's going on with gdb, and finally proposed a fix (one-liner
patch).
- the patch (hope it'll be landed on 6.12 release)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=247d720b2c5d22f7281437fd6054a138256986ba
This is only a case of usefulness. I believe you can also imagine
that this also can happen with regular (MMU) UML.
I also privately run a CI test which verifies that my patch doesn't
break MMU UML, with a simple boot test (static/dynamic), 12 kunit
tests in kernel tree, basic benchmarks with lmbench, etc. This is not
specific characteristics of nommu UML though.
https://github.com/thehajime/linux/actions/runs/11811327291
# The above URL may expire in future.
> With it this way you'd probably have to build the right libraries and
> binaries for x86-64 no-MMU, does such a thing already exist somewhere?
I'm preparing the patches to upstream Alpine Linux for such binaries
to be available in an appropriate way. Note that I didn't modify the
code of programs itself (except a clear bug), just build with NOMMU
option which is already implemented in busybox/musl-libc.
https://gitlab.alpinelinux.org/thehajime/aports/-/merge_requests/2/diffs
I have not contacted to the upstream developer so, this diff might be changed.
> It also doesn't look like it's meant to replace LKL? But even LKL I
> don't really know - are people using it, and if so what for? Seems
> lklfuse is a thing for some BSD folks?
>
> Is there something else to use it for?
This patchset is independent and nothing related to LKL.
# you may confuse that I've still been working on LKL.
(off topic)
lklsue is indeed used by FreeBSD but not well maintained (afaik).
NixOS (a linux pkg manager) also use lklfuse iirc.
> If it's the first (test no-MMU) then it probably should be smarter about
> not really relying on retpoline.
# I assume s/retpoline/zpoline/ in the rest of your message.
> Why is the focus so much on that
> anyway? If testing no-MMU was the most important thing then probably
> you'd have started with seccomp, and actually execute the syscalls from
> that, to not have all those restrictions that come from rewriting
> binaries, rather than ignoring the whole thing.
For the JIT part (and also syscalls from dlopen-ed binaries), as I
mentioned in the other reply, it can be implemented but not yet for
now.
The choice of zpoline is based on the speed of syscall invocations.
We have investigated that seccomp (and similar mechanism like SUD:
syscall user dispatch, ptrace, int3 signaling) are still slower than
binary rewrites, as the nature of signal delivery in its mechanism.
LD_PRELOAD with symbol rewrites is faster (even than binary rewrites)
but fundamentally cannot hook all syscalls.
zpoline tries to fill this gap, and we thought this fits the UML
usage.
> Though of course you did
> add a filter now, but I think it'll just crash?
this part (just crash w/ SIGSYS) can be improved.
> So I could perhaps see this use case, but then I'd probably think it
> should be more generic (i.e. able to execute all no-MMU binaries
> including ones that may be using JIT compilation etc.) and not _require_
> retpoline, but rather use it as an optimisation where that's possible
> (i.e. if you can map at zero)?
I understand your point.
> If the use case instead of more LKL-type usage, I guess I don't really
> understand it, though to be honest I also don't really fully understand
> LKL itself, but it always _seemed_ very different.
I didn't explain the comparison between LKL v.s. nommu UML, as I
thought those are independent from each other.
> Somewhat hyperbolically, I'm wondering if it's just a tech demo for
> retpoline?
Additional reason we used zpoline to replace syscall instruction is:
our first implementation of this nommu UML used modified version of
(userspace) standard library (musl-libc), without zpoline. We
reimplemented syscall wrappers to call a syscall entry point
(__kernel_vsyscall) exposed by ELF aux vector.
Like this:
static __inline long __syscall0(long n)
{
unsigned long ret = -1;
__asm__ __volatile__ ("call *%1" : "=a"(ret)
: "r"(__sysinfo), "a"(n)
: "rcx", "r11", "memory");
return ret;
}
# __sysinfo is exposed address from the aux vector.
# this was actually done not by myself, but Ricardo (in Cc)'s work.
https://github.com/nabla-containers/musl-libc/blob/e11be13e6abc06f7034d6b98552b5928d0ed0dfe/arch/x86_64/syscall_arch.h#L13-L20
with that, we can use unmodified binaries, but need to modify libc.so
and ld.so, which isn't trivial I thought.
My motivation to apply zpoline here is to eliminate this dependency;
with zpoline, we don't have to modify the standard library (musl).
In addition to that, since NOMMU kernel shares address space among
multiple userspace processes, we only have to prepare a trampoline
code a single time, while processes in multiple address space model
(in MMU case) needs to install those zpoline related code per each
process invocation. This is not direct motivation to use zpoline
here, but side-benefit under the given environment.
> So I dunno. Reading through it again there are a few minor things wrt.
> code style and debug things left over, but it's not awful ;-)
oh really. I'll double check them but would be nice to know any flaws
you found.
> I'd also
> prefer the code to be more clearly "marked" (as nommu), perhaps putting
> new files into a nommu/ directory, or something like that. But that's
> pretty minor.
I understand. I'm afraid that it will be still multiple of ifdefs since
nommu UML relies on various part of existing UML infrastructure.
> Still it's in a lot of places and chances are it'll make bigger
> refactoring (like seccomp mode) harder. Perhaps if at all it should come
> after seccomp mode and use that to execute syscalls if zpoline can't be
> done, and to catch all the cases where zpoline doesn't work (you have
> that in the docs)?
fallback mechanism after zpoline failure might be interesting.
> What do others think? Would you use it? What for?
-- Hajime
next prev parent reply other threads:[~2024-11-15 14:49 UTC|newest]
Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-24 12:09 [RFC PATCH 00/13] nommu UML Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 01/13] fs: binfmt_elf_efpic: add architecture hook elf_arch_finalize_exec Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 02/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
2024-10-25 8:56 ` Johannes Berg
2024-10-25 12:54 ` Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 03/13] um: nommu: memory handling Hajime Tazaki
2024-10-25 9:11 ` Johannes Berg
2024-10-25 12:55 ` Hajime Tazaki
2024-10-25 15:15 ` Johannes Berg
2024-10-26 7:24 ` Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 04/13] x86/um: nommu: syscall handling Hajime Tazaki
2024-10-25 9:14 ` Johannes Berg
2024-10-25 12:55 ` Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 05/13] x86/um: nommu: syscall translation by zpoline Hajime Tazaki
2024-10-25 9:19 ` Johannes Berg
2024-10-25 12:58 ` Hajime Tazaki
2024-10-25 15:20 ` Johannes Berg
2024-10-26 7:36 ` Hajime Tazaki
2024-10-27 9:45 ` Johannes Berg
2024-10-28 7:47 ` Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 06/13] x86/um: nommu: process/thread handling Hajime Tazaki
2024-10-25 9:22 ` Johannes Berg
2024-10-25 12:58 ` Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 07/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
2024-10-25 9:28 ` Johannes Berg
2024-10-25 13:27 ` Hajime Tazaki
2024-10-25 15:22 ` Johannes Berg
2024-10-26 7:34 ` Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 08/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
2024-10-25 9:29 ` Johannes Berg
2024-10-25 13:28 ` Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 09/13] x86/um: nommu: signal handling Hajime Tazaki
2024-10-25 9:30 ` Johannes Berg
2024-10-25 13:04 ` Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 10/13] x86/um: nommu: stack save/restore on vfork Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 11/13] um: change machine name for uname output Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 12/13] um: nommu: add documentation of nommu UML Hajime Tazaki
2024-10-24 12:09 ` [RFC PATCH 13/13] um: nommu: plug nommu code into build system Hajime Tazaki
2024-10-25 9:33 ` Johannes Berg
2024-10-25 13:05 ` Hajime Tazaki
2024-10-25 15:27 ` Johannes Berg
2024-10-26 7:36 ` Hajime Tazaki
2024-10-26 10:19 ` [RFC PATCH 00/13] nommu UML Benjamin Berg
2024-10-27 9:10 ` Hajime Tazaki
2024-10-28 13:32 ` Benjamin Berg
2024-10-30 9:25 ` Hajime Tazaki
2024-11-09 0:52 ` Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 " Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 01/13] fs: binfmt_elf_efpic: add architecture hook elf_arch_finalize_exec Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 02/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
2024-11-12 12:48 ` Geert Uytterhoeven
2024-11-12 22:07 ` Hajime Tazaki
2024-11-13 8:19 ` Geert Uytterhoeven
2024-11-13 8:36 ` Johannes Berg
2024-11-13 8:36 ` Johannes Berg
2024-11-13 10:27 ` Geert Uytterhoeven
2024-11-13 13:17 ` Hajime Tazaki
2024-11-13 13:55 ` Geert Uytterhoeven
2024-11-13 23:32 ` Hajime Tazaki
2024-11-14 1:40 ` Greg Ungerer
2024-11-14 10:41 ` Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 03/13] um: nommu: memory handling Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 04/13] x86/um: nommu: syscall handling Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 05/13] x86/um: nommu: syscall translation by zpoline Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 06/13] um: nommu: prevent host syscalls from userspace by seccomp filter Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 07/13] x86/um: nommu: process/thread handling Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 08/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
2024-11-27 10:00 ` Benjamin Berg
2024-11-27 10:26 ` Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 09/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
2024-11-27 10:36 ` Benjamin Berg
2024-11-27 23:23 ` Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 10/13] x86/um: nommu: signal handling Hajime Tazaki
2024-11-28 10:37 ` Benjamin Berg
2024-12-01 1:38 ` Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 11/13] um: change machine name for uname output Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 12/13] um: nommu: add documentation of nommu UML Hajime Tazaki
2024-11-11 6:27 ` [RFC PATCH v2 13/13] um: nommu: plug nommu code into build system Hajime Tazaki
2024-11-15 10:12 ` [RFC PATCH v2 00/13] nommu UML Johannes Berg
2024-11-15 10:26 ` Anton Ivanov
2024-11-15 14:54 ` Hajime Tazaki
2024-11-15 14:48 ` Hajime Tazaki [this message]
2024-11-22 9:33 ` Lorenzo Stoakes
2024-11-22 9:53 ` Johannes Berg
2024-11-22 10:29 ` Lorenzo Stoakes
2024-11-22 12:18 ` Christoph Hellwig
2024-11-22 12:25 ` Lorenzo Stoakes
2024-11-22 12:38 ` Christoph Hellwig
2024-11-22 12:49 ` Damien Le Moal
2024-11-22 12:52 ` Lorenzo Stoakes
2024-11-23 7:27 ` David Gow
2024-11-24 1:25 ` Hajime Tazaki
2024-12-03 4:22 ` [PATCH v3 " Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 01/13] fs: binfmt_elf_efpic: add architecture hook elf_arch_finalize_exec Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 02/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
2024-12-04 16:20 ` Johannes Berg
2024-12-05 13:41 ` Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 03/13] um: nommu: memory handling Hajime Tazaki
2024-12-04 16:34 ` Johannes Berg
2024-12-05 13:46 ` Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 04/13] x86/um: nommu: syscall handling Hajime Tazaki
2024-12-04 16:37 ` Johannes Berg
2024-12-05 13:47 ` Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 05/13] x86/um: nommu: syscall translation by zpoline Hajime Tazaki
2024-12-04 16:37 ` Johannes Berg
2024-12-05 13:48 ` Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 06/13] um: nommu: syscalls handler from userspace by seccomp filter Hajime Tazaki
2024-12-04 16:42 ` Johannes Berg
2024-12-05 13:51 ` Hajime Tazaki
2024-12-05 13:54 ` Johannes Berg
2024-12-06 2:51 ` Hajime Tazaki
2024-12-04 17:54 ` kernel test robot
2024-12-03 4:23 ` [PATCH v3 07/13] x86/um: nommu: process/thread handling Hajime Tazaki
2024-12-04 16:50 ` Johannes Berg
2024-12-05 13:56 ` Hajime Tazaki
2024-12-05 13:58 ` Johannes Berg
2024-12-06 2:49 ` Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 08/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
2024-12-04 16:52 ` Johannes Berg
2024-12-04 19:31 ` Geert Uytterhoeven
2024-12-05 13:58 ` Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 09/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 10/13] x86/um: nommu: signal handling Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 11/13] um: change machine name for uname output Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 12/13] um: nommu: add documentation of nommu UML Hajime Tazaki
2024-12-03 4:23 ` [PATCH v3 13/13] um: nommu: plug nommu code into build system Hajime Tazaki
2024-12-04 16:20 ` [PATCH v3 00/13] nommu UML Johannes Berg
2024-12-05 13:41 ` Hajime Tazaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m27c94ft98.wl-thehajime@gmail.com \
--to=thehajime@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=dalias@libc.org \
--cc=geert@linux-m68k.org \
--cc=gerg@linux-m68k.org \
--cc=johannes@sipsolutions.net \
--cc=linux-um@lists.infradead.org \
--cc=ricarkol@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).