* [PATCH v13 00/13] nommu UML
@ 2025-11-08 8:05 Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 01/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
` (13 more replies)
0 siblings, 14 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
This patchset is another spin of nommu mode addition to UML. It would
be nice to hear about your opinions on that.
There are still several limitations/issues which we already found;
here is the list of those issues.
- memory mapped by loadable modules are not distinguished from
userspace memory.
- CONFIG_SMP is disabled as host_fs handling doesn't work with thread
local storage.
-- Hajime
v13:
- rebase with the latest uml/next branch, fixing a conflict ([06/13])
v12:
- rebase with the latest uml/next branch
- disable SMP and tls as those doesn't work with host_fs handling ([11/13])
- https://lore.kernel.org/all/cover.1762075876.git.thehajime@gmail.com/
v11:
- clean up userspace return routine and integrate to userspace() ([04/13])
- fix direction flag issue on using nolibc memcpy ([04/13])
- fix a crash issue when using usermode helper ([06/13])
- test with out-of-tree kunit-uapi patches (which uses umh)
- https://lore.kernel.org/all/20250626-kunit-kselftests-v4-0-48760534fef5@linutronix.de/
- https://lore.kernel.org/all/20250626195714.2123694-3-benjamin@sipsolutions.net/
- https://lore.kernel.org/all/cover.1758181109.git.thehajime@gmail.com/
v10:
- fix wrong comment on gs register handling ([09/13])
- remove unnecessary code of early syscall implementation ([04/13])
- https://lore.kernel.org/all/cover.1750594487.git.thehajime@gmail.com/
v9:
- rebase with the latest uml/next branch
- add performance numbers of new SECCOMP mode, and update results ([12/13])
- add a workaround for upstream change on MMU depedency to PCI drivers ([10/13])
- https://lore.kernel.org/all/cover.1750294482.git.thehajime@gmail.com/
v8:
- rebase with the latest uml/next branch
- clean up segv_handler to align with the latest uml ([9/12])
- https://lore.kernel.org/all/cover.1745980082.git.thehajime@gmail.com/
v7:
- properly handle FP register upon signal delivery [10/13]
- update benchmark result with new FP register handling [12/13]
- fix arch_has_single_step() for !MMU case [07/13]
- revert stack alignment as it is in uml/fixes tree [10/13]
- https://lore.kernel.org/all/cover.1737348399.git.thehajime@gmail.com/
v6:
- rebase to the latest uml/next tree
- more clean up on mmu/nommu for signal handling [10/13]
- rename functions of mcontext routines [06,10/13]
- added Acked-by tag for binfmt_elf_fdpic [02/13]
- https://lore.kernel.org/linux-um/cover.1736853925.git.thehajime@gmail.com/
v5:
- clean up stack manipulation code [05,06,07,10/13]
- https://lore.kernel.org/linux-um/cover.1733998168.git.thehajime@gmail.com/
v4:
- add arch/um/nommu, arch/x86/um/nommu to contain !MMU specific codes
- remove zpoline patch
- drop binfmt_elf_fdpic patch
- reduce ifndef CONFIG_MMU if possible
- split to elf header cleanup patch [01/13]
- fix kernel test robot warnings [06/13]
- fix coding styles [07/13]
- move task_top_of_stack definition [05/13]
- https://lore.kernel.org/linux-um/cover.1733652929.git.thehajime@gmail.com/
v3:
- https://lore.kernel.org/linux-um/cover.1733199769.git.thehajime@gmail.com/
- add seccomp-based syscall hook in addition to zpoline [06/13]
- remove RFC, add a line to MAINTAINERS file
- fix kernel test robot warnings [02/13,08/13,10/13]
- add base-commit tag to cover letter
- pull the latest uml/next
- clean up SIGSEGV handling [10/13]
- detect fsgsbase availability with elf aux vector [08/13]
- simplify vdso code with macros [09/13]
RFC v2:
- https://lore.kernel.org/linux-um/cover.1731290567.git.thehajime@gmail.com/
- base branch is now uml/linux.git instead of torvalds/linux.git.
- reorganize the patch series to clean up
- fixed various coding styles issues
- clean up exec code path [07/13]
- fixed the crash/SIGSEGV case on userspace programs [10/13]
- add seccomp filter to limit syscall caller address [06/13]
- detect fsgsbase availability with sigsetjmp/siglongjmp [08/13]
- removes unrelated changes
- removes unneeded ifndef CONFIG_MMU
- convert UML_CONFIG_MMU to CONFIG_MMU as using uml/linux.git
- proposed a patch of maple-tree issue (resolving a limitation in RFC v1)
https://lore.kernel.org/linux-mm/20241108222834.3625217-1-thehajime@gmail.com/
RFC:
- https://lore.kernel.org/linux-um/cover.1729770373.git.thehajime@gmail.com/
Hajime Tazaki (13):
x86/um: nommu: elf loader for fdpic
um: decouple MMU specific code from the common part
um: nommu: memory handling
x86/um: nommu: syscall handling
um: nommu: seccomp syscalls hook
x86/um: nommu: process/thread handling
um: nommu: configure fs register on host syscall invocation
x86/um/vdso: nommu: vdso memory update
x86/um: nommu: signal handling
um: change machine name for uname output
um: nommu: disable SMP on nommu UML
um: nommu: add documentation of nommu UML
um: nommu: plug nommu code into build system
Documentation/virt/uml/nommu-uml.rst | 180 ++++++++++++++++++++++
MAINTAINERS | 1 +
arch/um/Kconfig | 14 +-
arch/um/Makefile | 10 ++
arch/um/configs/x86_64_nommu_defconfig | 54 +++++++
arch/um/include/asm/futex.h | 4 +
arch/um/include/asm/mmu.h | 8 +
arch/um/include/asm/mmu_context.h | 2 +
arch/um/include/asm/ptrace-generic.h | 8 +-
arch/um/include/asm/uaccess.h | 7 +-
arch/um/include/shared/kern_util.h | 6 +
arch/um/include/shared/os.h | 16 ++
arch/um/kernel/Makefile | 5 +-
arch/um/kernel/mem-pgtable.c | 55 +++++++
arch/um/kernel/mem.c | 38 +----
arch/um/kernel/process.c | 38 +++++
arch/um/kernel/skas/process.c | 37 -----
arch/um/kernel/um_arch.c | 3 +
arch/um/nommu/Makefile | 3 +
arch/um/nommu/os-Linux/Makefile | 7 +
arch/um/nommu/os-Linux/seccomp.c | 87 +++++++++++
arch/um/nommu/os-Linux/signal.c | 24 +++
arch/um/nommu/trap.c | 201 +++++++++++++++++++++++++
arch/um/os-Linux/Makefile | 3 +-
arch/um/os-Linux/internal.h | 8 +
arch/um/os-Linux/mem.c | 4 +
arch/um/os-Linux/process.c | 139 ++++++++++++++++-
arch/um/os-Linux/signal.c | 11 +-
arch/um/os-Linux/skas/process.c | 127 ----------------
arch/um/os-Linux/start_up.c | 25 ++-
arch/um/os-Linux/util.c | 3 +-
arch/x86/um/Kconfig | 2 +-
arch/x86/um/Makefile | 7 +-
arch/x86/um/asm/elf.h | 8 +-
arch/x86/um/asm/syscall.h | 6 +
arch/x86/um/nommu/Makefile | 8 +
arch/x86/um/nommu/do_syscall_64.c | 75 +++++++++
arch/x86/um/nommu/entry_64.S | 114 ++++++++++++++
arch/x86/um/nommu/os-Linux/Makefile | 6 +
arch/x86/um/nommu/os-Linux/mcontext.c | 26 ++++
arch/x86/um/nommu/syscalls.h | 18 +++
arch/x86/um/nommu/syscalls_64.c | 121 +++++++++++++++
arch/x86/um/shared/sysdep/mcontext.h | 5 +
arch/x86/um/shared/sysdep/ptrace.h | 2 +-
arch/x86/um/vdso/vma.c | 17 ++-
fs/Kconfig.binfmt | 2 +-
46 files changed, 1322 insertions(+), 223 deletions(-)
create mode 100644 Documentation/virt/uml/nommu-uml.rst
create mode 100644 arch/um/configs/x86_64_nommu_defconfig
create mode 100644 arch/um/kernel/mem-pgtable.c
create mode 100644 arch/um/nommu/Makefile
create mode 100644 arch/um/nommu/os-Linux/Makefile
create mode 100644 arch/um/nommu/os-Linux/seccomp.c
create mode 100644 arch/um/nommu/os-Linux/signal.c
create mode 100644 arch/um/nommu/trap.c
create mode 100644 arch/x86/um/nommu/Makefile
create mode 100644 arch/x86/um/nommu/do_syscall_64.c
create mode 100644 arch/x86/um/nommu/entry_64.S
create mode 100644 arch/x86/um/nommu/os-Linux/Makefile
create mode 100644 arch/x86/um/nommu/os-Linux/mcontext.c
create mode 100644 arch/x86/um/nommu/syscalls.h
create mode 100644 arch/x86/um/nommu/syscalls_64.c
base-commit: 293f71435d14f5b5c46fc3398695fa265c69363d
--
2.43.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v13 01/13] x86/um: nommu: elf loader for fdpic
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 02/13] um: decouple MMU specific code from the common part Hajime Tazaki
` (12 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um
Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel, Eric Biederman,
Kees Cook, Alexander Viro, Christian Brauner, Jan Kara, linux-mm,
linux-fsdevel
As UML supports CONFIG_MMU=n case, it has to use an alternate ELF
loader, FDPIC ELF loader. In this commit, we added necessary
definitions in the arch, as UML has not been used so far. It also
updates Kconfig file to use BINFMT_ELF_FDPIC under !MMU environment.
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-fsdevel@vger.kernel.org
Acked-by: Kees Cook <kees@kernel.org>
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
arch/um/include/asm/mmu.h | 5 +++++
arch/um/include/asm/ptrace-generic.h | 6 ++++++
arch/x86/um/asm/elf.h | 8 ++++++--
fs/Kconfig.binfmt | 2 +-
4 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/arch/um/include/asm/mmu.h b/arch/um/include/asm/mmu.h
index 07d48738b402..82a919132aff 100644
--- a/arch/um/include/asm/mmu.h
+++ b/arch/um/include/asm/mmu.h
@@ -21,6 +21,11 @@ typedef struct mm_context {
spinlock_t sync_tlb_lock;
unsigned long sync_tlb_range_from;
unsigned long sync_tlb_range_to;
+
+#ifdef CONFIG_BINFMT_ELF_FDPIC
+ unsigned long exec_fdpic_loadmap;
+ unsigned long interp_fdpic_loadmap;
+#endif
} mm_context_t;
#define INIT_MM_CONTEXT(mm) \
diff --git a/arch/um/include/asm/ptrace-generic.h b/arch/um/include/asm/ptrace-generic.h
index 86d74f9d33cf..62e9916078ec 100644
--- a/arch/um/include/asm/ptrace-generic.h
+++ b/arch/um/include/asm/ptrace-generic.h
@@ -29,6 +29,12 @@ struct pt_regs {
#define PTRACE_OLDSETOPTIONS 21
+#ifdef CONFIG_BINFMT_ELF_FDPIC
+#define PTRACE_GETFDPIC 31
+#define PTRACE_GETFDPIC_EXEC 0
+#define PTRACE_GETFDPIC_INTERP 1
+#endif
+
struct task_struct;
extern long subarch_ptrace(struct task_struct *child, long request,
diff --git a/arch/x86/um/asm/elf.h b/arch/x86/um/asm/elf.h
index 22d0111b543b..388fe669886c 100644
--- a/arch/x86/um/asm/elf.h
+++ b/arch/x86/um/asm/elf.h
@@ -9,6 +9,7 @@
#include <skas.h>
#define CORE_DUMP_USE_REGSET
+#define ELF_FDPIC_CORE_EFLAGS 0
#ifdef CONFIG_X86_32
@@ -158,8 +159,11 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm,
extern unsigned long um_vdso_addr;
#define AT_SYSINFO_EHDR 33
-#define ARCH_DLINFO NEW_AUX_ENT(AT_SYSINFO_EHDR, um_vdso_addr)
-
+#define ARCH_DLINFO \
+do { \
+ NEW_AUX_ENT(AT_SYSINFO_EHDR, um_vdso_addr); \
+ NEW_AUX_ENT(AT_MINSIGSTKSZ, 0); \
+} while (0)
#endif
typedef unsigned long elf_greg_t;
diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
index 1949e25c7741..0a92bebd5f75 100644
--- a/fs/Kconfig.binfmt
+++ b/fs/Kconfig.binfmt
@@ -58,7 +58,7 @@ config ARCH_USE_GNU_PROPERTY
config BINFMT_ELF_FDPIC
bool "Kernel support for FDPIC ELF binaries"
default y if !BINFMT_ELF
- depends on ARM || ((M68K || RISCV || SUPERH || XTENSA) && !MMU)
+ depends on ARM || ((M68K || RISCV || SUPERH || UML || XTENSA) && !MMU)
select ELFCORE
help
ELF FDPIC binaries are based on ELF, but allow the individual load
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 02/13] um: decouple MMU specific code from the common part
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 01/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 03/13] um: nommu: memory handling Hajime Tazaki
` (11 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
This splits the memory, process related code with common and MMU
specific parts in order to avoid ifdefs in .c file and duplication
between MMU and !MMU.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
arch/um/kernel/Makefile | 5 +-
arch/um/kernel/mem-pgtable.c | 55 ++++++++++++++
arch/um/kernel/mem.c | 35 ---------
arch/um/kernel/process.c | 38 ++++++++++
arch/um/kernel/skas/process.c | 37 ---------
arch/um/os-Linux/Makefile | 3 +-
arch/um/os-Linux/process.c | 129 ++++++++++++++++++++++++++++++++
arch/um/os-Linux/skas/process.c | 127 -------------------------------
8 files changed, 227 insertions(+), 202 deletions(-)
create mode 100644 arch/um/kernel/mem-pgtable.c
diff --git a/arch/um/kernel/Makefile b/arch/um/kernel/Makefile
index be60bc451b3f..76d36751973e 100644
--- a/arch/um/kernel/Makefile
+++ b/arch/um/kernel/Makefile
@@ -16,9 +16,10 @@ always-$(KBUILD_BUILTIN) := vmlinux.lds
obj-y = config.o exec.o exitcode.o irq.o ksyms.o mem.o \
physmem.o process.o ptrace.o reboot.o sigio.o \
- signal.o sysrq.o time.o tlb.o trap.o \
- um_arch.o umid.o kmsg_dump.o capflags.o skas/
+ signal.o sysrq.o time.o \
+ um_arch.o umid.o kmsg_dump.o capflags.o
obj-y += load_file.o
+obj-$(CONFIG_MMU) += mem-pgtable.o tlb.o trap.o skas/
obj-$(CONFIG_BLK_DEV_INITRD) += initrd.o
obj-$(CONFIG_GPROF) += gprof_syms.o
diff --git a/arch/um/kernel/mem-pgtable.c b/arch/um/kernel/mem-pgtable.c
new file mode 100644
index 000000000000..549da1d3bff0
--- /dev/null
+++ b/arch/um/kernel/mem-pgtable.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2000 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ */
+
+#include <linux/stddef.h>
+#include <linux/module.h>
+#include <linux/memblock.h>
+#include <linux/swap.h>
+#include <linux/slab.h>
+#include <asm/page.h>
+#include <asm/pgalloc.h>
+#include <as-layout.h>
+#include <init.h>
+#include <kern.h>
+#include <kern_util.h>
+#include <mem_user.h>
+#include <os.h>
+#include <um_malloc.h>
+
+
+/* Allocate and free page tables. */
+
+pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+ pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL);
+
+ if (pgd) {
+ memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));
+ memcpy(pgd + USER_PTRS_PER_PGD,
+ swapper_pg_dir + USER_PTRS_PER_PGD,
+ (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
+ }
+ return pgd;
+}
+
+static const pgprot_t protection_map[16] = {
+ [VM_NONE] = PAGE_NONE,
+ [VM_READ] = PAGE_READONLY,
+ [VM_WRITE] = PAGE_COPY,
+ [VM_WRITE | VM_READ] = PAGE_COPY,
+ [VM_EXEC] = PAGE_READONLY,
+ [VM_EXEC | VM_READ] = PAGE_READONLY,
+ [VM_EXEC | VM_WRITE] = PAGE_COPY,
+ [VM_EXEC | VM_WRITE | VM_READ] = PAGE_COPY,
+ [VM_SHARED] = PAGE_NONE,
+ [VM_SHARED | VM_READ] = PAGE_READONLY,
+ [VM_SHARED | VM_WRITE] = PAGE_SHARED,
+ [VM_SHARED | VM_WRITE | VM_READ] = PAGE_SHARED,
+ [VM_SHARED | VM_EXEC] = PAGE_READONLY,
+ [VM_SHARED | VM_EXEC | VM_READ] = PAGE_READONLY,
+ [VM_SHARED | VM_EXEC | VM_WRITE] = PAGE_SHARED,
+ [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED
+};
+DECLARE_VM_GET_PAGE_PROT
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 39c4a7e21c6f..f3258680bfbe 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -6,7 +6,6 @@
#include <linux/stddef.h>
#include <linux/module.h>
#include <linux/memblock.h>
-#include <linux/mm.h>
#include <linux/swap.h>
#include <linux/slab.h>
#include <linux/init.h>
@@ -107,45 +106,11 @@ void free_initmem(void)
{
}
-/* Allocate and free page tables. */
-
-pgd_t *pgd_alloc(struct mm_struct *mm)
-{
- pgd_t *pgd = __pgd_alloc(mm, 0);
-
- if (pgd)
- memcpy(pgd + USER_PTRS_PER_PGD,
- swapper_pg_dir + USER_PTRS_PER_PGD,
- (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
-
- return pgd;
-}
-
void *uml_kmalloc(int size, int flags)
{
return kmalloc(size, flags);
}
-static const pgprot_t protection_map[16] = {
- [VM_NONE] = PAGE_NONE,
- [VM_READ] = PAGE_READONLY,
- [VM_WRITE] = PAGE_COPY,
- [VM_WRITE | VM_READ] = PAGE_COPY,
- [VM_EXEC] = PAGE_READONLY,
- [VM_EXEC | VM_READ] = PAGE_READONLY,
- [VM_EXEC | VM_WRITE] = PAGE_COPY,
- [VM_EXEC | VM_WRITE | VM_READ] = PAGE_COPY,
- [VM_SHARED] = PAGE_NONE,
- [VM_SHARED | VM_READ] = PAGE_READONLY,
- [VM_SHARED | VM_WRITE] = PAGE_SHARED,
- [VM_SHARED | VM_WRITE | VM_READ] = PAGE_SHARED,
- [VM_SHARED | VM_EXEC] = PAGE_READONLY,
- [VM_SHARED | VM_EXEC | VM_READ] = PAGE_READONLY,
- [VM_SHARED | VM_EXEC | VM_WRITE] = PAGE_SHARED,
- [VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED
-};
-DECLARE_VM_GET_PAGE_PROT
-
void mark_rodata_ro(void)
{
unsigned long rodata_start = PFN_ALIGN(__start_rodata);
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 63b38a3f73f7..b07c1f120910 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -25,6 +25,7 @@
#include <linux/tick.h>
#include <linux/threads.h>
#include <linux/resume_user_mode.h>
+#include <linux/start_kernel.h>
#include <asm/current.h>
#include <asm/mmu_context.h>
#include <asm/switch_to.h>
@@ -307,3 +308,40 @@ unsigned long __get_wchan(struct task_struct *p)
return 0;
}
+
+extern void start_kernel(void);
+
+static int __init start_kernel_proc(void *unused)
+{
+ block_signals_trace();
+
+ start_kernel();
+ return 0;
+}
+
+char cpu_irqstacks[NR_CPUS][THREAD_SIZE] __aligned(THREAD_SIZE);
+
+int __init start_uml(void)
+{
+ stack_protections((unsigned long) &cpu_irqstacks[0]);
+ set_sigstack(cpu_irqstacks[0], THREAD_SIZE);
+
+ init_new_thread_signals();
+
+ init_task.thread.request.thread.proc = start_kernel_proc;
+ init_task.thread.request.thread.arg = NULL;
+ return start_idle_thread(task_stack_page(&init_task),
+ &init_task.thread.switch_buf);
+}
+
+static DEFINE_SPINLOCK(initial_jmpbuf_spinlock);
+
+void initial_jmpbuf_lock(void)
+{
+ spin_lock_irq(&initial_jmpbuf_spinlock);
+}
+
+void initial_jmpbuf_unlock(void)
+{
+ spin_unlock_irq(&initial_jmpbuf_spinlock);
+}
diff --git a/arch/um/kernel/skas/process.c b/arch/um/kernel/skas/process.c
index 4a7673b0261a..d643854942bc 100644
--- a/arch/um/kernel/skas/process.c
+++ b/arch/um/kernel/skas/process.c
@@ -17,31 +17,6 @@
#include <skas.h>
#include <kern_util.h>
-extern void start_kernel(void);
-
-static int __init start_kernel_proc(void *unused)
-{
- block_signals_trace();
-
- start_kernel();
- return 0;
-}
-
-char cpu_irqstacks[NR_CPUS][THREAD_SIZE] __aligned(THREAD_SIZE);
-
-int __init start_uml(void)
-{
- stack_protections((unsigned long) &cpu_irqstacks[0]);
- set_sigstack(cpu_irqstacks[0], THREAD_SIZE);
-
- init_new_thread_signals();
-
- init_task.thread.request.thread.proc = start_kernel_proc;
- init_task.thread.request.thread.arg = NULL;
- return start_idle_thread(task_stack_page(&init_task),
- &init_task.thread.switch_buf);
-}
-
unsigned long current_stub_stack(void)
{
if (current->mm == NULL)
@@ -65,15 +40,3 @@ void current_mm_sync(void)
um_tlb_sync(current->mm);
}
-
-static DEFINE_SPINLOCK(initial_jmpbuf_spinlock);
-
-void initial_jmpbuf_lock(void)
-{
- spin_lock_irq(&initial_jmpbuf_spinlock);
-}
-
-void initial_jmpbuf_unlock(void)
-{
- spin_unlock_irq(&initial_jmpbuf_spinlock);
-}
diff --git a/arch/um/os-Linux/Makefile b/arch/um/os-Linux/Makefile
index f8d672d570d9..40e3e0eab6a0 100644
--- a/arch/um/os-Linux/Makefile
+++ b/arch/um/os-Linux/Makefile
@@ -8,7 +8,8 @@ KCOV_INSTRUMENT := n
obj-y = elf_aux.o execvp.o file.o helper.o irq.o main.o mem.o process.o \
registers.o sigio.o signal.o start_up.o time.o tty.o \
- umid.o user_syms.o util.o skas/
+ umid.o user_syms.o util.o
+obj-$(CONFIG_MMU) += skas/
CFLAGS_signal.o += -Wframe-larger-than=4096
diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c
index 3a2a84ab9325..c50fa865d8c7 100644
--- a/arch/um/os-Linux/process.c
+++ b/arch/um/os-Linux/process.c
@@ -6,6 +6,7 @@
#include <stdio.h>
#include <stdlib.h>
+#include <stdbool.h>
#include <unistd.h>
#include <errno.h>
#include <signal.h>
@@ -17,10 +18,16 @@
#include <sys/prctl.h>
#include <sys/wait.h>
#include <asm/unistd.h>
+#include <linux/threads.h>
#include <init.h>
#include <longjmp.h>
#include <os.h>
#include <skas/skas.h>
+#include <as-layout.h>
+#include <kern_util.h>
+
+int using_seccomp;
+static int unscheduled_userspace_iterations;
void os_alarm_process(int pid)
{
@@ -209,3 +216,125 @@ int os_futex_wake(void *uaddr)
NULL, NULL, 0));
return r < 0 ? -errno : r;
}
+
+int is_skas_winch(int pid, int fd, void *data)
+{
+ return pid == getpgrp();
+}
+
+void new_thread(void *stack, jmp_buf *buf, void (*handler)(void))
+{
+ (*buf)[0].JB_IP = (unsigned long) handler;
+ (*buf)[0].JB_SP = (unsigned long) stack + UM_THREAD_SIZE -
+ sizeof(void *);
+}
+
+#define INIT_JMP_NEW_THREAD 0
+#define INIT_JMP_CALLBACK 1
+#define INIT_JMP_HALT 2
+#define INIT_JMP_REBOOT 3
+
+void switch_threads(jmp_buf *me, jmp_buf *you)
+{
+ unscheduled_userspace_iterations = 0;
+
+ if (UML_SETJMP(me) == 0)
+ UML_LONGJMP(you, 1);
+}
+
+static jmp_buf initial_jmpbuf;
+
+static __thread void (*cb_proc)(void *arg);
+static __thread void *cb_arg;
+static __thread jmp_buf *cb_back;
+
+int start_idle_thread(void *stack, jmp_buf *switch_buf)
+{
+ int n;
+
+ set_handler(SIGWINCH);
+
+ /*
+ * Can't use UML_SETJMP or UML_LONGJMP here because they save
+ * and restore signals, with the possible side-effect of
+ * trying to handle any signals which came when they were
+ * blocked, which can't be done on this stack.
+ * Signals must be blocked when jumping back here and restored
+ * after returning to the jumper.
+ */
+ n = setjmp(initial_jmpbuf);
+ switch (n) {
+ case INIT_JMP_NEW_THREAD:
+ (*switch_buf)[0].JB_IP = (unsigned long) uml_finishsetup;
+ (*switch_buf)[0].JB_SP = (unsigned long) stack +
+ UM_THREAD_SIZE - sizeof(void *);
+ break;
+ case INIT_JMP_CALLBACK:
+ (*cb_proc)(cb_arg);
+ longjmp(*cb_back, 1);
+ break;
+ case INIT_JMP_HALT:
+ kmalloc_ok = 0;
+ return 0;
+ case INIT_JMP_REBOOT:
+ kmalloc_ok = 0;
+ return 1;
+ default:
+ printk(UM_KERN_ERR "Bad sigsetjmp return in %s - %d\n",
+ __func__, n);
+ fatal_sigsegv();
+ }
+ longjmp(*switch_buf, 1);
+
+ /* unreachable */
+ printk(UM_KERN_ERR "impossible long jump!");
+ fatal_sigsegv();
+ return 0;
+}
+
+void initial_thread_cb_skas(void (*proc)(void *), void *arg)
+{
+ jmp_buf here;
+
+ cb_proc = proc;
+ cb_arg = arg;
+ cb_back = &here;
+
+ initial_jmpbuf_lock();
+ if (UML_SETJMP(&here) == 0)
+ UML_LONGJMP(&initial_jmpbuf, INIT_JMP_CALLBACK);
+ initial_jmpbuf_unlock();
+
+ cb_proc = NULL;
+ cb_arg = NULL;
+ cb_back = NULL;
+}
+
+void halt_skas(void)
+{
+ initial_jmpbuf_lock();
+ UML_LONGJMP(&initial_jmpbuf, INIT_JMP_HALT);
+ /* unreachable */
+}
+
+static bool noreboot;
+
+static int __init noreboot_cmd_param(char *str, int *add)
+{
+ *add = 0;
+ noreboot = true;
+ return 0;
+}
+
+__uml_setup("noreboot", noreboot_cmd_param,
+"noreboot\n"
+" Rather than rebooting, exit always, akin to QEMU's -no-reboot option.\n"
+" This is useful if you're using CONFIG_PANIC_TIMEOUT in order to catch\n"
+" crashes in CI\n\n");
+
+void reboot_skas(void)
+{
+ initial_jmpbuf_lock();
+ UML_LONGJMP(&initial_jmpbuf, noreboot ? INIT_JMP_HALT : INIT_JMP_REBOOT);
+ /* unreachable */
+}
diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c
index d6c22f8aa06d..01814ad82f5d 100644
--- a/arch/um/os-Linux/skas/process.c
+++ b/arch/um/os-Linux/skas/process.c
@@ -18,7 +18,6 @@
#include <sys/stat.h>
#include <sys/socket.h>
#include <asm/unistd.h>
-#include <as-layout.h>
#include <init.h>
#include <kern_util.h>
#include <mem.h>
@@ -29,16 +28,10 @@
#include <sysdep/stub.h>
#include <sysdep/mcontext.h>
#include <linux/futex.h>
-#include <linux/threads.h>
#include <timetravel.h>
#include <asm-generic/rwonce.h>
#include "../internal.h"
-int is_skas_winch(int pid, int fd, void *data)
-{
- return pid == getpgrp();
-}
-
static const char *ptrace_reg_name(int idx)
{
#define R(n) case HOST_##n: return #n
@@ -426,8 +419,6 @@ static int __init init_stub_exe_fd(void)
}
__initcall(init_stub_exe_fd);
-int using_seccomp;
-
/**
* start_userspace() - prepare a new userspace process
* @mm_id: The corresponding struct mm_id
@@ -540,7 +531,6 @@ int start_userspace(struct mm_id *mm_id)
return err;
}
-static int unscheduled_userspace_iterations;
extern unsigned long tt_extra_sched_jiffies;
void userspace(struct uml_pt_regs *regs)
@@ -789,120 +779,3 @@ void userspace(struct uml_pt_regs *regs)
}
}
}
-
-void new_thread(void *stack, jmp_buf *buf, void (*handler)(void))
-{
- (*buf)[0].JB_IP = (unsigned long) handler;
- (*buf)[0].JB_SP = (unsigned long) stack + UM_THREAD_SIZE -
- sizeof(void *);
-}
-
-#define INIT_JMP_NEW_THREAD 0
-#define INIT_JMP_CALLBACK 1
-#define INIT_JMP_HALT 2
-#define INIT_JMP_REBOOT 3
-
-void switch_threads(jmp_buf *me, jmp_buf *you)
-{
- unscheduled_userspace_iterations = 0;
-
- if (UML_SETJMP(me) == 0)
- UML_LONGJMP(you, 1);
-}
-
-static jmp_buf initial_jmpbuf;
-
-static __thread void (*cb_proc)(void *arg);
-static __thread void *cb_arg;
-static __thread jmp_buf *cb_back;
-
-int start_idle_thread(void *stack, jmp_buf *switch_buf)
-{
- int n;
-
- set_handler(SIGWINCH);
-
- /*
- * Can't use UML_SETJMP or UML_LONGJMP here because they save
- * and restore signals, with the possible side-effect of
- * trying to handle any signals which came when they were
- * blocked, which can't be done on this stack.
- * Signals must be blocked when jumping back here and restored
- * after returning to the jumper.
- */
- n = setjmp(initial_jmpbuf);
- switch (n) {
- case INIT_JMP_NEW_THREAD:
- (*switch_buf)[0].JB_IP = (unsigned long) uml_finishsetup;
- (*switch_buf)[0].JB_SP = (unsigned long) stack +
- UM_THREAD_SIZE - sizeof(void *);
- break;
- case INIT_JMP_CALLBACK:
- (*cb_proc)(cb_arg);
- longjmp(*cb_back, 1);
- break;
- case INIT_JMP_HALT:
- kmalloc_ok = 0;
- return 0;
- case INIT_JMP_REBOOT:
- kmalloc_ok = 0;
- return 1;
- default:
- printk(UM_KERN_ERR "Bad sigsetjmp return in %s - %d\n",
- __func__, n);
- fatal_sigsegv();
- }
- longjmp(*switch_buf, 1);
-
- /* unreachable */
- printk(UM_KERN_ERR "impossible long jump!");
- fatal_sigsegv();
- return 0;
-}
-
-void initial_thread_cb_skas(void (*proc)(void *), void *arg)
-{
- jmp_buf here;
-
- cb_proc = proc;
- cb_arg = arg;
- cb_back = &here;
-
- initial_jmpbuf_lock();
- if (UML_SETJMP(&here) == 0)
- UML_LONGJMP(&initial_jmpbuf, INIT_JMP_CALLBACK);
- initial_jmpbuf_unlock();
-
- cb_proc = NULL;
- cb_arg = NULL;
- cb_back = NULL;
-}
-
-void halt_skas(void)
-{
- initial_jmpbuf_lock();
- UML_LONGJMP(&initial_jmpbuf, INIT_JMP_HALT);
- /* unreachable */
-}
-
-static bool noreboot;
-
-static int __init noreboot_cmd_param(char *str, int *add)
-{
- *add = 0;
- noreboot = true;
- return 0;
-}
-
-__uml_setup("noreboot", noreboot_cmd_param,
-"noreboot\n"
-" Rather than rebooting, exit always, akin to QEMU's -no-reboot option.\n"
-" This is useful if you're using CONFIG_PANIC_TIMEOUT in order to catch\n"
-" crashes in CI\n\n");
-
-void reboot_skas(void)
-{
- initial_jmpbuf_lock();
- UML_LONGJMP(&initial_jmpbuf, noreboot ? INIT_JMP_HALT : INIT_JMP_REBOOT);
- /* unreachable */
-}
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 03/13] um: nommu: memory handling
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 01/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 02/13] um: decouple MMU specific code from the common part Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 04/13] x86/um: nommu: syscall handling Hajime Tazaki
` (10 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
This commit adds memory operations on UML under !MMU environment.
Some part of the original UML code relying on CONFIG_MMU are excluded
from compilation when !CONFIG_MMU. Additionally, generic functions such as
uaccess, futex, memcpy/strnlen/strncpy can be used as user- and
kernel-space share the address space in !CONFIG_MMU mode.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
arch/um/Makefile | 4 ++++
arch/um/include/asm/futex.h | 4 ++++
arch/um/include/asm/mmu.h | 3 +++
arch/um/include/asm/mmu_context.h | 2 ++
arch/um/include/asm/uaccess.h | 7 ++++---
arch/um/kernel/mem.c | 3 ++-
arch/um/os-Linux/mem.c | 4 ++++
arch/um/os-Linux/process.c | 4 ++--
8 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/arch/um/Makefile b/arch/um/Makefile
index 7be0143b5ba3..5371c9a1b11e 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -46,6 +46,10 @@ ARCH_INCLUDE := -I$(srctree)/$(SHARED_HEADERS)
ARCH_INCLUDE += -I$(srctree)/$(HOST_DIR)/um/shared
KBUILD_CPPFLAGS += -I$(srctree)/$(HOST_DIR)/um
+ifneq ($(CONFIG_MMU),y)
+core-y += $(ARCH_DIR)/nommu/
+endif
+
# -Dvmap=kernel_vmap prevents anything from referencing the libpcap.o symbol so
# named - it's a common symbol in libpcap, so we get a binary which crashes.
#
diff --git a/arch/um/include/asm/futex.h b/arch/um/include/asm/futex.h
index 780aa6bfc050..785fd6649aa2 100644
--- a/arch/um/include/asm/futex.h
+++ b/arch/um/include/asm/futex.h
@@ -7,8 +7,12 @@
#include <asm/errno.h>
+#ifdef CONFIG_MMU
int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr);
int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
u32 oldval, u32 newval);
+#else
+#include <asm-generic/futex.h>
+#endif
#endif
diff --git a/arch/um/include/asm/mmu.h b/arch/um/include/asm/mmu.h
index 82a919132aff..c0b9ce3215c4 100644
--- a/arch/um/include/asm/mmu.h
+++ b/arch/um/include/asm/mmu.h
@@ -22,10 +22,13 @@ typedef struct mm_context {
unsigned long sync_tlb_range_from;
unsigned long sync_tlb_range_to;
+#ifndef CONFIG_MMU
+ unsigned long end_brk;
#ifdef CONFIG_BINFMT_ELF_FDPIC
unsigned long exec_fdpic_loadmap;
unsigned long interp_fdpic_loadmap;
#endif
+#endif /* !CONFIG_MMU */
} mm_context_t;
#define INIT_MM_CONTEXT(mm) \
diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h
index c727e56ba116..528b217da285 100644
--- a/arch/um/include/asm/mmu_context.h
+++ b/arch/um/include/asm/mmu_context.h
@@ -18,11 +18,13 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
{
}
+#ifdef CONFIG_MMU
#define init_new_context init_new_context
extern int init_new_context(struct task_struct *task, struct mm_struct *mm);
#define destroy_context destroy_context
extern void destroy_context(struct mm_struct *mm);
+#endif
#include <asm-generic/mmu_context.h>
diff --git a/arch/um/include/asm/uaccess.h b/arch/um/include/asm/uaccess.h
index 0df9ea4abda8..031b357800b7 100644
--- a/arch/um/include/asm/uaccess.h
+++ b/arch/um/include/asm/uaccess.h
@@ -18,6 +18,7 @@
#define __addr_range_nowrap(addr, size) \
((unsigned long) (addr) <= ((unsigned long) (addr) + (size)))
+#ifdef CONFIG_MMU
extern unsigned long raw_copy_from_user(void *to, const void __user *from, unsigned long n);
extern unsigned long raw_copy_to_user(void __user *to, const void *from, unsigned long n);
extern unsigned long __clear_user(void __user *mem, unsigned long len);
@@ -29,9 +30,6 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
#define INLINE_COPY_FROM_USER
#define INLINE_COPY_TO_USER
-
-#include <asm-generic/uaccess.h>
-
static inline int __access_ok(const void __user *ptr, unsigned long size)
{
unsigned long addr = (unsigned long)ptr;
@@ -63,5 +61,8 @@ do { \
barrier(); \
current->thread.segv_continue = NULL; \
} while (0)
+#endif
+
+#include <asm-generic/uaccess.h>
#endif
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index f3258680bfbe..e599b637c5fb 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -71,7 +71,8 @@ void __init arch_mm_preinit(void)
* to be turned on.
*/
brk_end = PAGE_ALIGN((unsigned long) sbrk(0));
- map_memory(brk_end, __pa(brk_end), uml_reserved - brk_end, 1, 1, 0);
+ map_memory(brk_end, __pa(brk_end), uml_reserved - brk_end, 1, 1,
+ !IS_ENABLED(CONFIG_MMU));
memblock_free((void *)brk_end, uml_reserved - brk_end);
uml_reserved = brk_end;
min_low_pfn = PFN_UP(__pa(uml_reserved));
diff --git a/arch/um/os-Linux/mem.c b/arch/um/os-Linux/mem.c
index 72f302f4d197..4f5d9a94f8e2 100644
--- a/arch/um/os-Linux/mem.c
+++ b/arch/um/os-Linux/mem.c
@@ -213,6 +213,10 @@ int __init create_mem_file(unsigned long long len)
{
int err, fd;
+ /* NOMMU kernel uses -1 as a fd for further use (e.g., mmap) */
+ if (!IS_ENABLED(CONFIG_MMU))
+ return -1;
+
fd = create_tmp_file(len);
err = os_set_exec_close(fd);
diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c
index c50fa865d8c7..ddb5258d7720 100644
--- a/arch/um/os-Linux/process.c
+++ b/arch/um/os-Linux/process.c
@@ -100,8 +100,8 @@ int os_map_memory(void *virt, int fd, unsigned long long off, unsigned long len,
prot = (r ? PROT_READ : 0) | (w ? PROT_WRITE : 0) |
(x ? PROT_EXEC : 0);
- loc = mmap64((void *) virt, len, prot, MAP_SHARED | MAP_FIXED,
- fd, off);
+ loc = mmap64((void *) virt, len, prot, MAP_SHARED | MAP_FIXED |
+ (!IS_ENABLED(CONFIG_MMU) ? MAP_ANONYMOUS : 0), fd, off);
if (loc == MAP_FAILED)
return -errno;
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 04/13] x86/um: nommu: syscall handling
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (2 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 03/13] um: nommu: memory handling Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 05/13] um: nommu: seccomp syscalls hook Hajime Tazaki
` (9 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
This commit introduces an entry point of syscall interface for !MMU
mode. It uses an entry function, __kernel_vsyscall, a kernel-wide global
symbol accessible from any locations.
Although it isn't in the scope of this commit, it can be also exposed
via vdso image which is directly accessible from userspace. A standard
library (i.e., libc) can utilize this entry point to implement syscall
wrapper; we can also use this by hooking syscall for unmodified userspace
applications/libraries, which will be implemented in the subsequent
commit.
This only supports 64-bit mode of x86 architecture.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
arch/x86/um/Makefile | 4 ++
arch/x86/um/asm/syscall.h | 6 ++
arch/x86/um/nommu/Makefile | 8 +++
arch/x86/um/nommu/do_syscall_64.c | 32 +++++++++
arch/x86/um/nommu/entry_64.S | 112 ++++++++++++++++++++++++++++++
arch/x86/um/nommu/syscalls.h | 16 +++++
6 files changed, 178 insertions(+)
create mode 100644 arch/x86/um/nommu/Makefile
create mode 100644 arch/x86/um/nommu/do_syscall_64.c
create mode 100644 arch/x86/um/nommu/entry_64.S
create mode 100644 arch/x86/um/nommu/syscalls.h
diff --git a/arch/x86/um/Makefile b/arch/x86/um/Makefile
index f9ea75bf43ac..39693807755a 100644
--- a/arch/x86/um/Makefile
+++ b/arch/x86/um/Makefile
@@ -31,6 +31,10 @@ obj-y += mem_64.o syscalls_64.o vdso/
subarch-y = ../lib/csum-partial_64.o ../lib/memcpy_64.o \
../lib/memmove_64.o ../lib/memset_64.o
+ifneq ($(CONFIG_MMU),y)
+obj-y += nommu/
+endif
+
endif
subarch-$(CONFIG_MODULES) += ../kernel/module.o
diff --git a/arch/x86/um/asm/syscall.h b/arch/x86/um/asm/syscall.h
index d6208d0fad51..bb4f6f011667 100644
--- a/arch/x86/um/asm/syscall.h
+++ b/arch/x86/um/asm/syscall.h
@@ -20,4 +20,10 @@ static inline int syscall_get_arch(struct task_struct *task)
#endif
}
+#ifndef CONFIG_MMU
+extern void do_syscall_64(struct pt_regs *regs);
+extern long __kernel_vsyscall(int64_t a0, int64_t a1, int64_t a2, int64_t a3,
+ int64_t a4, int64_t a5, int64_t a6);
+#endif
+
#endif /* __UM_ASM_SYSCALL_H */
diff --git a/arch/x86/um/nommu/Makefile b/arch/x86/um/nommu/Makefile
new file mode 100644
index 000000000000..d72c63afffa5
--- /dev/null
+++ b/arch/x86/um/nommu/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+ifeq ($(CONFIG_X86_32),y)
+ BITS := 32
+else
+ BITS := 64
+endif
+
+obj-y = do_syscall_$(BITS).o entry_$(BITS).o
diff --git a/arch/x86/um/nommu/do_syscall_64.c b/arch/x86/um/nommu/do_syscall_64.c
new file mode 100644
index 000000000000..292d7c578622
--- /dev/null
+++ b/arch/x86/um/nommu/do_syscall_64.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kernel.h>
+#include <linux/ptrace.h>
+#include <kern_util.h>
+#include <asm/syscall.h>
+#include <os.h>
+
+__visible void do_syscall_64(struct pt_regs *regs)
+{
+ int syscall;
+
+ syscall = PT_SYSCALL_NR(regs->regs.gp);
+ UPT_SYSCALL_NR(®s->regs) = syscall;
+
+ if (likely(syscall < NR_syscalls)) {
+ unsigned long ret;
+
+ ret = (*sys_call_table[syscall])(UPT_SYSCALL_ARG1(®s->regs),
+ UPT_SYSCALL_ARG2(®s->regs),
+ UPT_SYSCALL_ARG3(®s->regs),
+ UPT_SYSCALL_ARG4(®s->regs),
+ UPT_SYSCALL_ARG5(®s->regs),
+ UPT_SYSCALL_ARG6(®s->regs));
+ PT_REGS_SET_SYSCALL_RETURN(regs, ret);
+ }
+
+ PT_REGS_SYSCALL_RET(regs) = regs->regs.gp[HOST_AX];
+
+ /* handle tasks and signals at the end */
+ interrupt_end();
+}
diff --git a/arch/x86/um/nommu/entry_64.S b/arch/x86/um/nommu/entry_64.S
new file mode 100644
index 000000000000..485c578aae64
--- /dev/null
+++ b/arch/x86/um/nommu/entry_64.S
@@ -0,0 +1,112 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <asm/errno.h>
+
+#include <linux/linkage.h>
+#include <asm/percpu.h>
+#include <asm/desc.h>
+
+#include "../entry/calling.h"
+
+#ifdef CONFIG_SMP
+#error need to stash these variables somewhere else
+#endif
+
+#define UM_GLOBAL_VAR(x) .data; .align 8; .globl x; x:; .long 0
+
+UM_GLOBAL_VAR(current_top_of_stack)
+UM_GLOBAL_VAR(current_ptregs)
+
+.code64
+.section .entry.text, "ax"
+
+.align 8
+#undef ENTRY
+#define ENTRY(x) .text; .globl x; .type x,%function; x:
+#undef END
+#define END(x) .size x, . - x
+
+/*
+ * %rcx has the return address (we set it before entering __kernel_vsyscall).
+ *
+ * Registers on entry:
+ * rax system call number
+ * rcx return address
+ * rdi arg0
+ * rsi arg1
+ * rdx arg2
+ * r10 arg3
+ * r8 arg4
+ * r9 arg5
+ *
+ * (note: we are allowed to mess with r11: r11 is callee-clobbered
+ * register in C ABI)
+ */
+ENTRY(__kernel_vsyscall)
+
+ movq %rsp, %r11
+
+ /* Point rsp to the top of the ptregs array, so we can
+ just fill it with a bunch of push'es. */
+ movq current_ptregs, %rsp
+
+ /* 8 bytes * 20 registers (plus 8 for the push) */
+ addq $168, %rsp
+
+ /* Construct struct pt_regs on stack */
+ pushq $0 /* pt_regs->ss (index 20) */
+ pushq %r11 /* pt_regs->sp */
+ pushfq /* pt_regs->flags */
+ pushq $0 /* pt_regs->cs */
+ pushq %rcx /* pt_regs->ip */
+ pushq %rax /* pt_regs->orig_ax */
+
+ PUSH_AND_CLEAR_REGS rax=$-ENOSYS
+
+ mov %rsp, %rdi
+
+ /*
+ * Switch to current top of stack, so "current->" points
+ * to the right task.
+ */
+ movq current_top_of_stack, %rsp
+
+ call do_syscall_64
+
+ jmp userspace
+
+END(__kernel_vsyscall)
+
+/*
+ * common userspace returning routine
+ *
+ * all procedures like syscalls, signal handlers, umh processes, will gate
+ * this routine to properly configure registers/stacks.
+ *
+ * void userspace(struct uml_pt_regs *regs)
+ */
+ENTRY(userspace)
+
+ /* clear direction flag to meet ABI */
+ cld
+ /* align the stack for x86_64 ABI */
+ and $-0x10, %rsp
+ /* Handle any immediate reschedules or signals */
+ call interrupt_end
+
+ movq current_ptregs, %rsp
+
+ POP_REGS
+
+ addq $8, %rsp /* skip orig_ax */
+ popq %rcx /* pt_regs->ip */
+ addq $8, %rsp /* skip cs */
+ addq $8, %rsp /* skip flags */
+ popq %rsp
+
+ /*
+ * not return w/ ret but w/ jmp as the stack is already popped before
+ * entering __kernel_vsyscall
+ */
+ jmp *%rcx
+
+END(userspace)
diff --git a/arch/x86/um/nommu/syscalls.h b/arch/x86/um/nommu/syscalls.h
new file mode 100644
index 000000000000..a2433756b1fc
--- /dev/null
+++ b/arch/x86/um/nommu/syscalls.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __UM_NOMMU_SYSCALLS_H
+#define __UM_NOMMU_SYSCALLS_H
+
+
+#define task_top_of_stack(task) \
+({ \
+ unsigned long __ptr = (unsigned long)task->stack; \
+ __ptr += THREAD_SIZE; \
+ __ptr; \
+})
+
+extern long current_top_of_stack;
+extern long current_ptregs;
+
+#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 05/13] um: nommu: seccomp syscalls hook
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (3 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 04/13] x86/um: nommu: syscall handling Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 06/13] x86/um: nommu: process/thread handling Hajime Tazaki
` (8 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um
Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel, Kenichi Yasukata
This commit adds syscall hook with seccomp.
Using seccomp raises SIGSYS to UML process, which is captured in the
(UML) kernel, then jumps to the syscall entry point, __kernel_vsyscall,
to hook the original syscall instructions.
The SIGSYS signal is raised upon the execution from uml_reserved and
high_physmem, which locates userspace memory.
It also renames existing static function, sigsys_handler(), in
start_up.c to avoid name conflicts between them.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Kenichi Yasukata <kenichi.yasukata@gmail.com>
---
arch/um/include/shared/kern_util.h | 2 +
arch/um/include/shared/os.h | 10 +++
arch/um/kernel/um_arch.c | 3 +
arch/um/nommu/Makefile | 3 +
arch/um/nommu/os-Linux/Makefile | 7 +++
arch/um/nommu/os-Linux/seccomp.c | 87 +++++++++++++++++++++++++++
arch/um/nommu/os-Linux/signal.c | 16 +++++
arch/um/os-Linux/signal.c | 8 +++
arch/um/os-Linux/start_up.c | 4 +-
arch/x86/um/nommu/Makefile | 2 +-
arch/x86/um/nommu/os-Linux/Makefile | 6 ++
arch/x86/um/nommu/os-Linux/mcontext.c | 15 +++++
arch/x86/um/shared/sysdep/mcontext.h | 4 ++
13 files changed, 164 insertions(+), 3 deletions(-)
create mode 100644 arch/um/nommu/Makefile
create mode 100644 arch/um/nommu/os-Linux/Makefile
create mode 100644 arch/um/nommu/os-Linux/seccomp.c
create mode 100644 arch/um/nommu/os-Linux/signal.c
create mode 100644 arch/x86/um/nommu/os-Linux/Makefile
create mode 100644 arch/x86/um/nommu/os-Linux/mcontext.c
diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h
index 38321188c04c..7798f16a4677 100644
--- a/arch/um/include/shared/kern_util.h
+++ b/arch/um/include/shared/kern_util.h
@@ -63,6 +63,8 @@ extern void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs
extern void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs,
void *mc);
extern void fatal_sigsegv(void) __attribute__ ((noreturn));
+extern void sigsys_handler(int sig, struct siginfo *si, struct uml_pt_regs *regs,
+ void *mc);
void um_idle_sleep(void);
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index b26e94292fc1..5451f9b1f41e 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -356,4 +356,14 @@ static inline void os_local_ipi_enable(void) { }
static inline void os_local_ipi_disable(void) { }
#endif /* CONFIG_SMP */
+/* seccomp.c */
+#ifdef CONFIG_MMU
+static inline int os_setup_seccomp(void)
+{
+ return 0;
+}
+#else
+extern int os_setup_seccomp(void);
+#endif
+
#endif
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index e2b24e1ecfa6..27c13423d9aa 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -423,6 +423,9 @@ void __init setup_arch(char **cmdline_p)
add_bootloader_randomness(rng_seed, sizeof(rng_seed));
memzero_explicit(rng_seed, sizeof(rng_seed));
}
+
+ /* install seccomp filter */
+ os_setup_seccomp();
}
void __init arch_cpu_finalize_init(void)
diff --git a/arch/um/nommu/Makefile b/arch/um/nommu/Makefile
new file mode 100644
index 000000000000..baab7c2f57c2
--- /dev/null
+++ b/arch/um/nommu/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y := os-Linux/
diff --git a/arch/um/nommu/os-Linux/Makefile b/arch/um/nommu/os-Linux/Makefile
new file mode 100644
index 000000000000..805e26ccf63b
--- /dev/null
+++ b/arch/um/nommu/os-Linux/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y := seccomp.o signal.o
+USER_OBJS := $(obj-y)
+
+include $(srctree)/arch/um/scripts/Makefile.rules
+USER_CFLAGS+=-I$(srctree)/arch/um/os-Linux
diff --git a/arch/um/nommu/os-Linux/seccomp.c b/arch/um/nommu/os-Linux/seccomp.c
new file mode 100644
index 000000000000..d1cfa6e3d632
--- /dev/null
+++ b/arch/um/nommu/os-Linux/seccomp.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <signal.h>
+#include <sys/prctl.h>
+#include <sys/syscall.h> /* For SYS_xxx definitions */
+#include <init.h>
+#include <as-layout.h>
+#include <os.h>
+#include <linux/filter.h>
+#include <linux/seccomp.h>
+
+int __init os_setup_seccomp(void)
+{
+ int err;
+ unsigned long __userspace_start = uml_reserved,
+ __userspace_end = high_physmem;
+
+ struct sock_filter filter[] = {
+ /* if (IP_high > __userspace_end) allow; */
+ BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+ offsetof(struct seccomp_data, instruction_pointer) + 4),
+ BPF_JUMP(BPF_JMP + BPF_JGT + BPF_K, __userspace_end >> 32,
+ /*true-skip=*/0, /*false-skip=*/1),
+ BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
+
+ /* if (IP_high == __userspace_end && IP_low >= __userspace_end) allow; */
+ BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+ offsetof(struct seccomp_data, instruction_pointer) + 4),
+ BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __userspace_end >> 32,
+ /*true-skip=*/0, /*false-skip=*/3),
+ BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+ offsetof(struct seccomp_data, instruction_pointer)),
+ BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, __userspace_end,
+ /*true-skip=*/0, /*false-skip=*/1),
+ BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
+
+ /* if (IP_high < __userspace_start) allow; */
+ BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+ offsetof(struct seccomp_data, instruction_pointer) + 4),
+ BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, __userspace_start >> 32,
+ /*true-skip=*/1, /*false-skip=*/0),
+ BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
+
+ /* if (IP_high == __userspace_start && IP_low < __userspace_start) allow; */
+ BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+ offsetof(struct seccomp_data, instruction_pointer) + 4),
+ BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __userspace_start >> 32,
+ /*true-skip=*/0, /*false-skip=*/3),
+ BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+ offsetof(struct seccomp_data, instruction_pointer)),
+ BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, __userspace_start,
+ /*true-skip=*/1, /*false-skip=*/0),
+ BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
+
+ /* other address; trap */
+ BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRAP),
+ };
+ struct sock_fprog prog = {
+ .len = ARRAY_SIZE(filter),
+ .filter = filter,
+ };
+
+ err = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+ if (err)
+ os_warn("PR_SET_NO_NEW_PRIVS (err=%d, ernro=%d)\n",
+ err, errno);
+
+ err = syscall(SYS_seccomp, SECCOMP_SET_MODE_FILTER,
+ SECCOMP_FILTER_FLAG_TSYNC, &prog);
+ if (err) {
+ os_warn("SECCOMP_SET_MODE_FILTER (err=%d, ernro=%d)\n",
+ err, errno);
+ exit(1);
+ }
+
+ set_handler(SIGSYS);
+
+ os_info("seccomp: setup filter syscalls in the range: 0x%lx-0x%lx\n",
+ __userspace_start, __userspace_end);
+
+ return 0;
+}
+
diff --git a/arch/um/nommu/os-Linux/signal.c b/arch/um/nommu/os-Linux/signal.c
new file mode 100644
index 000000000000..19043b9652e2
--- /dev/null
+++ b/arch/um/nommu/os-Linux/signal.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <signal.h>
+#include <kern_util.h>
+#include <os.h>
+#include <sysdep/mcontext.h>
+#include <sys/ucontext.h>
+
+void sigsys_handler(int sig, struct siginfo *si,
+ struct uml_pt_regs *regs, void *ptr)
+{
+ mcontext_t *mc = (mcontext_t *) ptr;
+
+ /* hook syscall via SIGSYS */
+ set_mc_sigsys_hook(mc);
+}
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index 327fb3c52fc7..2f6795cd884c 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -20,6 +20,7 @@
#include <um_malloc.h>
#include <sys/ucontext.h>
#include <timetravel.h>
+#include <linux/compiler_attributes.h>
#include "internal.h"
void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *, void *mc) = {
@@ -31,6 +32,7 @@ void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *, void *mc) =
[SIGSEGV] = segv_handler,
[SIGIO] = sigio_handler,
[SIGCHLD] = sigchld_handler,
+ [SIGSYS] = sigsys_handler,
};
static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
@@ -182,6 +184,11 @@ static void sigusr1_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
uml_pm_wake();
}
+__weak void sigsys_handler(int sig, struct siginfo *unused_si,
+ struct uml_pt_regs *regs, void *mc)
+{
+}
+
void register_pm_wake_signal(void)
{
set_handler(SIGUSR1);
@@ -193,6 +200,7 @@ static void (*handlers[_NSIG])(int sig, struct siginfo *si, mcontext_t *mc) = {
[SIGILL] = sig_handler,
[SIGFPE] = sig_handler,
[SIGTRAP] = sig_handler,
+ [SIGSYS] = sig_handler,
[SIGIO] = sig_handler,
[SIGWINCH] = sig_handler,
diff --git a/arch/um/os-Linux/start_up.c b/arch/um/os-Linux/start_up.c
index 054ac03bbf5e..33e039d2c1bf 100644
--- a/arch/um/os-Linux/start_up.c
+++ b/arch/um/os-Linux/start_up.c
@@ -239,7 +239,7 @@ extern unsigned long *exec_fp_regs;
__initdata static struct stub_data *seccomp_test_stub_data;
-static void __init sigsys_handler(int sig, siginfo_t *info, void *p)
+static void __init _sigsys_handler(int sig, siginfo_t *info, void *p)
{
ucontext_t *uc = p;
@@ -274,7 +274,7 @@ static int __init seccomp_helper(void *data)
sizeof(seccomp_test_stub_data->sigstack));
sa.sa_flags = SA_ONSTACK | SA_NODEFER | SA_SIGINFO;
- sa.sa_sigaction = (void *) sigsys_handler;
+ sa.sa_sigaction = (void *) _sigsys_handler;
sa.sa_restorer = NULL;
if (sigaction(SIGSYS, &sa, NULL) < 0)
exit(2);
diff --git a/arch/x86/um/nommu/Makefile b/arch/x86/um/nommu/Makefile
index d72c63afffa5..ebe47d4836f4 100644
--- a/arch/x86/um/nommu/Makefile
+++ b/arch/x86/um/nommu/Makefile
@@ -5,4 +5,4 @@ else
BITS := 64
endif
-obj-y = do_syscall_$(BITS).o entry_$(BITS).o
+obj-y = do_syscall_$(BITS).o entry_$(BITS).o os-Linux/
diff --git a/arch/x86/um/nommu/os-Linux/Makefile b/arch/x86/um/nommu/os-Linux/Makefile
new file mode 100644
index 000000000000..4571e403a6ff
--- /dev/null
+++ b/arch/x86/um/nommu/os-Linux/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y = mcontext.o
+USER_OBJS := mcontext.o
+
+include $(srctree)/arch/um/scripts/Makefile.rules
diff --git a/arch/x86/um/nommu/os-Linux/mcontext.c b/arch/x86/um/nommu/os-Linux/mcontext.c
new file mode 100644
index 000000000000..b62a6195096f
--- /dev/null
+++ b/arch/x86/um/nommu/os-Linux/mcontext.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sys/ucontext.h>
+#define __FRAME_OFFSETS
+#include <asm/ptrace.h>
+#include <sysdep/ptrace.h>
+#include <sysdep/mcontext.h>
+
+extern long __kernel_vsyscall(int64_t a0, int64_t a1, int64_t a2, int64_t a3,
+ int64_t a4, int64_t a5, int64_t a6);
+
+void set_mc_sigsys_hook(mcontext_t *mc)
+{
+ mc->gregs[REG_RCX] = mc->gregs[REG_RIP];
+ mc->gregs[REG_RIP] = (unsigned long) __kernel_vsyscall;
+}
diff --git a/arch/x86/um/shared/sysdep/mcontext.h b/arch/x86/um/shared/sysdep/mcontext.h
index 6fe490cc5b98..9a0d6087f357 100644
--- a/arch/x86/um/shared/sysdep/mcontext.h
+++ b/arch/x86/um/shared/sysdep/mcontext.h
@@ -17,6 +17,10 @@ extern int get_stub_state(struct uml_pt_regs *regs, struct stub_data *data,
extern int set_stub_state(struct uml_pt_regs *regs, struct stub_data *data,
int single_stepping);
+#ifndef CONFIG_MMU
+extern void set_mc_sigsys_hook(mcontext_t *mc);
+#endif
+
#ifdef __i386__
#define GET_FAULTINFO_FROM_MC(fi, mc) \
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 06/13] x86/um: nommu: process/thread handling
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (4 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 05/13] um: nommu: seccomp syscalls hook Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 07/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
` (7 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
Since ptrace facility isn't used under !MMU of UML, there is different
code path to invoke processes/threads; there are no external process
used, and need to properly configure some of registers (fs segment
register for TLS, etc) on every context switch, etc.
Signals aren't delivered in non-ptrace syscall entry/leave so, we also
need to handle pending signal by ourselves.
ptrace related syscalls are not tested yet so, marked
arch_has_single_step() unsupported in !MMU environment.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
arch/um/include/asm/ptrace-generic.h | 2 +-
arch/x86/um/Makefile | 3 +-
arch/x86/um/nommu/Makefile | 2 +-
arch/x86/um/nommu/entry_64.S | 2 ++
arch/x86/um/nommu/syscalls.h | 2 ++
arch/x86/um/nommu/syscalls_64.c | 50 ++++++++++++++++++++++++++++
6 files changed, 58 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/um/nommu/syscalls_64.c
diff --git a/arch/um/include/asm/ptrace-generic.h b/arch/um/include/asm/ptrace-generic.h
index 62e9916078ec..5aa38fe6b2fb 100644
--- a/arch/um/include/asm/ptrace-generic.h
+++ b/arch/um/include/asm/ptrace-generic.h
@@ -14,7 +14,7 @@ struct pt_regs {
struct uml_pt_regs regs;
};
-#define arch_has_single_step() (1)
+#define arch_has_single_step() (IS_ENABLED(CONFIG_MMU))
#define EMPTY_REGS { .regs = EMPTY_UML_PT_REGS }
diff --git a/arch/x86/um/Makefile b/arch/x86/um/Makefile
index 39693807755a..98dc57afff83 100644
--- a/arch/x86/um/Makefile
+++ b/arch/x86/um/Makefile
@@ -26,7 +26,8 @@ subarch-y += ../kernel/sys_ia32.o
else
-obj-y += mem_64.o syscalls_64.o vdso/
+obj-y += mem_64.o vdso/
+obj-$(CONFIG_MMU) += syscalls_64.o
subarch-y = ../lib/csum-partial_64.o ../lib/memcpy_64.o \
../lib/memmove_64.o ../lib/memset_64.o
diff --git a/arch/x86/um/nommu/Makefile b/arch/x86/um/nommu/Makefile
index ebe47d4836f4..4018d9e0aba0 100644
--- a/arch/x86/um/nommu/Makefile
+++ b/arch/x86/um/nommu/Makefile
@@ -5,4 +5,4 @@ else
BITS := 64
endif
-obj-y = do_syscall_$(BITS).o entry_$(BITS).o os-Linux/
+obj-y = do_syscall_$(BITS).o entry_$(BITS).o syscalls_$(BITS).o os-Linux/
diff --git a/arch/x86/um/nommu/entry_64.S b/arch/x86/um/nommu/entry_64.S
index 485c578aae64..a58922fc81e5 100644
--- a/arch/x86/um/nommu/entry_64.S
+++ b/arch/x86/um/nommu/entry_64.S
@@ -86,6 +86,8 @@ END(__kernel_vsyscall)
*/
ENTRY(userspace)
+ /* set stack and pt_regs to the current task */
+ call arch_set_stack_to_current
/* clear direction flag to meet ABI */
cld
/* align the stack for x86_64 ABI */
diff --git a/arch/x86/um/nommu/syscalls.h b/arch/x86/um/nommu/syscalls.h
index a2433756b1fc..ce16bf8abd59 100644
--- a/arch/x86/um/nommu/syscalls.h
+++ b/arch/x86/um/nommu/syscalls.h
@@ -13,4 +13,6 @@
extern long current_top_of_stack;
extern long current_ptregs;
+void arch_set_stack_to_current(void);
+
#endif
diff --git a/arch/x86/um/nommu/syscalls_64.c b/arch/x86/um/nommu/syscalls_64.c
new file mode 100644
index 000000000000..d56027ebc651
--- /dev/null
+++ b/arch/x86/um/nommu/syscalls_64.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2003 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Copyright 2003 PathScale, Inc.
+ *
+ * Licensed under the GPL
+ */
+
+#include <linux/sched.h>
+#include <linux/sched/mm.h>
+#include <linux/syscalls.h>
+#include <linux/uaccess.h>
+#include <asm/prctl.h> /* XXX This should get the constants from libc */
+#include <registers.h>
+#include <os.h>
+#include "syscalls.h"
+
+void arch_set_stack_to_current(void)
+{
+ current_top_of_stack = task_top_of_stack(current);
+ current_ptregs = (long)task_pt_regs(current);
+}
+
+void arch_switch_to(struct task_struct *to)
+{
+ /*
+ * In !CONFIG_MMU, it doesn't ptrace thus,
+ * The FS_BASE registers are saved here.
+ */
+ current_top_of_stack = task_top_of_stack(to);
+ current_ptregs = (long)task_pt_regs(to);
+
+ if ((to->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)] == 0) ||
+ (to->mm == NULL))
+ return;
+
+ /* this changes the FS on every context switch */
+ arch_prctl(to, ARCH_SET_FS,
+ (void __user *) to->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)]);
+}
+
+SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
+ unsigned long, prot, unsigned long, flags,
+ unsigned long, fd, unsigned long, off)
+{
+ if (off & ~PAGE_MASK)
+ return -EINVAL;
+
+ return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 07/13] um: nommu: configure fs register on host syscall invocation
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (5 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 06/13] x86/um: nommu: process/thread handling Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 08/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
` (6 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
As userspace on UML/!MMU also need to configure %fs register when it is
running to correctly access thread structure, host syscalls implemented
in os-Linux drivers may be puzzled when they are called. Thus it has to
configure %fs register via arch_prctl(SET_FS) on every host syscalls.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
arch/um/include/shared/os.h | 6 +++
arch/um/os-Linux/process.c | 6 +++
arch/um/os-Linux/start_up.c | 21 +++++++++
arch/x86/um/nommu/do_syscall_64.c | 37 ++++++++++++++++
arch/x86/um/nommu/syscalls_64.c | 71 +++++++++++++++++++++++++++++++
5 files changed, 141 insertions(+)
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 5451f9b1f41e..0ac87507e05e 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -189,6 +189,7 @@ extern void check_host_supports_tls(int *supports_tls, int *tls_min);
extern void get_host_cpu_features(
void (*flags_helper_func)(char *line),
void (*cache_helper_func)(char *line));
+extern int host_has_fsgsbase;
/* mem.c */
extern int create_mem_file(unsigned long long len);
@@ -213,6 +214,11 @@ extern int os_protect_memory(void *addr, unsigned long len,
extern int os_unmap_memory(void *addr, int len);
extern int os_drop_memory(void *addr, int length);
extern int can_drop_memory(void);
+extern int os_arch_prctl(int pid, int option, unsigned long *arg);
+#ifndef CONFIG_MMU
+extern long long host_fs;
+#endif
+
void os_set_pdeathsig(void);
diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c
index ddb5258d7720..dacf63ac33c8 100644
--- a/arch/um/os-Linux/process.c
+++ b/arch/um/os-Linux/process.c
@@ -18,6 +18,7 @@
#include <sys/prctl.h>
#include <sys/wait.h>
#include <asm/unistd.h>
+#include <sys/syscall.h> /* For SYS_xxx definitions */
#include <linux/threads.h>
#include <init.h>
#include <longjmp.h>
@@ -179,6 +180,11 @@ int __init can_drop_memory(void)
return ok;
}
+int os_arch_prctl(int pid, int option, unsigned long *arg2)
+{
+ return syscall(SYS_arch_prctl, option, arg2);
+}
+
void init_new_thread_signals(void)
{
set_handler(SIGSEGV);
diff --git a/arch/um/os-Linux/start_up.c b/arch/um/os-Linux/start_up.c
index 33e039d2c1bf..c0afe5d8b559 100644
--- a/arch/um/os-Linux/start_up.c
+++ b/arch/um/os-Linux/start_up.c
@@ -20,6 +20,8 @@
#include <sys/resource.h>
#include <asm/ldt.h>
#include <asm/unistd.h>
+#include <sys/auxv.h>
+#include <asm/hwcap2.h>
#include <init.h>
#include <os.h>
#include <smp.h>
@@ -37,6 +39,8 @@
#include <skas.h>
#include "internal.h"
+int host_has_fsgsbase;
+
static void ptrace_child(void)
{
int ret;
@@ -460,6 +464,20 @@ __uml_setup("seccomp=", uml_seccomp_config,
" This is insecure and should only be used with a trusted userspace\n\n"
);
+static void __init check_fsgsbase(void)
+{
+ unsigned long auxv = getauxval(AT_HWCAP2);
+
+ os_info("Checking FSGSBASE instructions...");
+ if (auxv & HWCAP2_FSGSBASE) {
+ host_has_fsgsbase = 1;
+ os_info("OK\n");
+ } else {
+ host_has_fsgsbase = 0;
+ os_info("disabled\n");
+ }
+}
+
void __init os_early_checks(void)
{
int pid;
@@ -488,6 +506,9 @@ void __init os_early_checks(void)
using_seccomp = 0;
check_ptrace();
+ /* probe fsgsbase instruction */
+ check_fsgsbase();
+
pid = start_ptraced_child();
if (init_pid_registers(pid))
fatal("Failed to initialize default registers");
diff --git a/arch/x86/um/nommu/do_syscall_64.c b/arch/x86/um/nommu/do_syscall_64.c
index 292d7c578622..9bc630995df9 100644
--- a/arch/x86/um/nommu/do_syscall_64.c
+++ b/arch/x86/um/nommu/do_syscall_64.c
@@ -2,10 +2,38 @@
#include <linux/kernel.h>
#include <linux/ptrace.h>
+#include <asm/fsgsbase.h>
+#include <asm/prctl.h>
#include <kern_util.h>
#include <asm/syscall.h>
#include <os.h>
+static int os_x86_arch_prctl(int pid, int option, unsigned long *arg2)
+{
+ if (!host_has_fsgsbase)
+ return os_arch_prctl(pid, option, arg2);
+
+ switch (option) {
+ case ARCH_SET_FS:
+ wrfsbase(*arg2);
+ break;
+ case ARCH_SET_GS:
+ wrgsbase(*arg2);
+ break;
+ case ARCH_GET_FS:
+ *arg2 = rdfsbase();
+ break;
+ case ARCH_GET_GS:
+ *arg2 = rdgsbase();
+ break;
+ default:
+ pr_warn("%s: unsupported option: 0x%x", __func__, option);
+ break;
+ }
+
+ return 0;
+}
+
__visible void do_syscall_64(struct pt_regs *regs)
{
int syscall;
@@ -13,6 +41,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
syscall = PT_SYSCALL_NR(regs->regs.gp);
UPT_SYSCALL_NR(®s->regs) = syscall;
+ /* set fs register to the original host one */
+ os_x86_arch_prctl(0, ARCH_SET_FS, (void *)host_fs);
+
if (likely(syscall < NR_syscalls)) {
unsigned long ret;
@@ -29,4 +60,10 @@ __visible void do_syscall_64(struct pt_regs *regs)
/* handle tasks and signals at the end */
interrupt_end();
+
+ /* restore back fs register to userspace configured one */
+ os_x86_arch_prctl(0, ARCH_SET_FS,
+ (void *)(current->thread.regs.regs.gp[FS_BASE
+ / sizeof(unsigned long)]));
+
}
diff --git a/arch/x86/um/nommu/syscalls_64.c b/arch/x86/um/nommu/syscalls_64.c
index d56027ebc651..19d23686fc5b 100644
--- a/arch/x86/um/nommu/syscalls_64.c
+++ b/arch/x86/um/nommu/syscalls_64.c
@@ -13,8 +13,70 @@
#include <asm/prctl.h> /* XXX This should get the constants from libc */
#include <registers.h>
#include <os.h>
+#include <asm/thread_info.h>
+#include <asm/mman.h>
#include "syscalls.h"
+/*
+ * The guest libc can change FS, which confuses the host libc.
+ * In fact, changing FS directly is not supported (check
+ * man arch_prctl). So, whenever we make a host syscall,
+ * we should be changing FS to the original FS (not the
+ * one set by the guest libc). This original FS is stored
+ * in host_fs.
+ */
+long long host_fs = -1;
+
+long arch_prctl(struct task_struct *task, int option,
+ unsigned long __user *arg2)
+{
+ long ret = -EINVAL;
+ unsigned long *ptr = arg2, tmp;
+
+ switch (option) {
+ case ARCH_SET_FS:
+ if (host_fs == -1)
+ os_arch_prctl(0, ARCH_GET_FS, (void *)&host_fs);
+ ret = 0;
+ break;
+ case ARCH_SET_GS:
+ ret = 0;
+ break;
+ case ARCH_GET_FS:
+ case ARCH_GET_GS:
+ ptr = &tmp;
+ break;
+ }
+
+ ret = os_arch_prctl(0, option, ptr);
+ if (ret)
+ return ret;
+
+ switch (option) {
+ case ARCH_SET_FS:
+ current->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)] =
+ (unsigned long) arg2;
+ break;
+ case ARCH_SET_GS:
+ current->thread.regs.regs.gp[GS_BASE / sizeof(unsigned long)] =
+ (unsigned long) arg2;
+ break;
+ case ARCH_GET_FS:
+ ret = put_user(current->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)], arg2);
+ break;
+ case ARCH_GET_GS:
+ ret = put_user(current->thread.regs.regs.gp[GS_BASE / sizeof(unsigned long)], arg2);
+ break;
+ }
+
+ return ret;
+}
+
+SYSCALL_DEFINE2(arch_prctl, int, option, unsigned long, arg2)
+{
+ return arch_prctl(current, option, (unsigned long __user *) arg2);
+}
+
void arch_set_stack_to_current(void)
{
current_top_of_stack = task_top_of_stack(current);
@@ -48,3 +110,12 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
}
+
+static int __init um_nommu_setup_hostfs(void)
+{
+ /* initialize the host_fs value at boottime */
+ os_arch_prctl(0, ARCH_GET_FS, (void *)&host_fs);
+
+ return 0;
+}
+arch_initcall(um_nommu_setup_hostfs);
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 08/13] x86/um/vdso: nommu: vdso memory update
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (6 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 07/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 09/13] x86/um: nommu: signal handling Hajime Tazaki
` (5 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
On !MMU mode, the address of vdso is accessible from userspace. This
commit implements the entry point by pointing a block of page address.
This commit also add memory permission configuration of vdso page to be
executable.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
arch/x86/um/vdso/vma.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c
index 51a2b9f2eca9..0799b3fe7521 100644
--- a/arch/x86/um/vdso/vma.c
+++ b/arch/x86/um/vdso/vma.c
@@ -9,6 +9,7 @@
#include <asm/page.h>
#include <asm/elf.h>
#include <linux/init.h>
+#include <os.h>
unsigned long um_vdso_addr;
static struct page *um_vdso;
@@ -20,18 +21,29 @@ static int __init init_vdso(void)
{
BUG_ON(vdso_end - vdso_start > PAGE_SIZE);
- um_vdso_addr = task_size - PAGE_SIZE;
-
um_vdso = alloc_page(GFP_KERNEL);
if (!um_vdso)
panic("Cannot allocate vdso\n");
copy_page(page_address(um_vdso), vdso_start);
+#ifdef CONFIG_MMU
+ um_vdso_addr = task_size - PAGE_SIZE;
+#else
+ /* this is fine with NOMMU as everything is accessible */
+ um_vdso_addr = (unsigned long)page_address(um_vdso);
+ os_protect_memory((void *)um_vdso_addr, vdso_end - vdso_start, 1, 0, 1);
+#endif
+
+ pr_info("vdso_start=%lx um_vdso_addr=%lx pg_um_vdso=%lx",
+ (unsigned long)vdso_start, um_vdso_addr,
+ (unsigned long)page_address(um_vdso));
+
return 0;
}
subsys_initcall(init_vdso);
+#ifdef CONFIG_MMU
int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
{
struct vm_area_struct *vma;
@@ -53,3 +65,4 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
return IS_ERR(vma) ? PTR_ERR(vma) : 0;
}
+#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 09/13] x86/um: nommu: signal handling
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (7 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 08/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 10/13] um: change machine name for uname output Hajime Tazaki
` (4 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
This commit updates the behavior of signal handling under !MMU
environment. It adds the alignment code for signal frame as the frame
is used in userspace as-is.
floating point register is carefully handling upon entry/leave of
syscall routine so that signal handlers can read/write the contents of
the register.
It also adds the follow up routine for SIGSEGV as a signal delivery runs
in the same stack frame while we have to avoid endless SIGSEGV.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
arch/um/include/shared/kern_util.h | 4 +
arch/um/nommu/Makefile | 2 +-
arch/um/nommu/os-Linux/signal.c | 8 +
arch/um/nommu/trap.c | 201 ++++++++++++++++++++++++++
arch/um/os-Linux/signal.c | 3 +-
arch/x86/um/nommu/do_syscall_64.c | 6 +
arch/x86/um/nommu/os-Linux/mcontext.c | 11 ++
arch/x86/um/shared/sysdep/mcontext.h | 1 +
arch/x86/um/shared/sysdep/ptrace.h | 2 +-
9 files changed, 235 insertions(+), 3 deletions(-)
create mode 100644 arch/um/nommu/trap.c
diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h
index 7798f16a4677..46c8d6336ca1 100644
--- a/arch/um/include/shared/kern_util.h
+++ b/arch/um/include/shared/kern_util.h
@@ -70,4 +70,8 @@ void um_idle_sleep(void);
void kasan_map_memory(void *start, size_t len);
+#ifndef CONFIG_MMU
+extern void nommu_relay_signal(void *ptr);
+#endif
+
#endif
diff --git a/arch/um/nommu/Makefile b/arch/um/nommu/Makefile
index baab7c2f57c2..096221590cfd 100644
--- a/arch/um/nommu/Makefile
+++ b/arch/um/nommu/Makefile
@@ -1,3 +1,3 @@
# SPDX-License-Identifier: GPL-2.0
-obj-y := os-Linux/
+obj-y := trap.o os-Linux/
diff --git a/arch/um/nommu/os-Linux/signal.c b/arch/um/nommu/os-Linux/signal.c
index 19043b9652e2..6febb178dcda 100644
--- a/arch/um/nommu/os-Linux/signal.c
+++ b/arch/um/nommu/os-Linux/signal.c
@@ -5,6 +5,7 @@
#include <os.h>
#include <sysdep/mcontext.h>
#include <sys/ucontext.h>
+#include <as-layout.h>
void sigsys_handler(int sig, struct siginfo *si,
struct uml_pt_regs *regs, void *ptr)
@@ -14,3 +15,10 @@ void sigsys_handler(int sig, struct siginfo *si,
/* hook syscall via SIGSYS */
set_mc_sigsys_hook(mc);
}
+
+void nommu_relay_signal(void *ptr)
+{
+ mcontext_t *mc = (mcontext_t *) ptr;
+
+ set_mc_relay_signal(mc);
+}
diff --git a/arch/um/nommu/trap.c b/arch/um/nommu/trap.c
new file mode 100644
index 000000000000..430297517455
--- /dev/null
+++ b/arch/um/nommu/trap.c
@@ -0,0 +1,201 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/hardirq.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/sched/debug.h>
+#include <asm/current.h>
+#include <asm/tlbflush.h>
+#include <arch.h>
+#include <as-layout.h>
+#include <kern_util.h>
+#include <os.h>
+#include <skas.h>
+
+/*
+ * Note this is constrained to return 0, -EFAULT, -EACCES, -ENOMEM by
+ * segv().
+ */
+int handle_page_fault(unsigned long address, unsigned long ip,
+ int is_write, int is_user, int *code_out)
+{
+ /* !MMU has no pagefault */
+ return -EFAULT;
+}
+
+static void show_segv_info(struct uml_pt_regs *regs)
+{
+ struct task_struct *tsk = current;
+ struct faultinfo *fi = UPT_FAULTINFO(regs);
+
+ if (!unhandled_signal(tsk, SIGSEGV))
+ return;
+
+ pr_warn_ratelimited("%s%s[%d]: segfault at %lx ip %p sp %p error %x",
+ task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
+ tsk->comm, task_pid_nr(tsk), FAULT_ADDRESS(*fi),
+ (void *)UPT_IP(regs), (void *)UPT_SP(regs),
+ fi->error_code);
+}
+
+static void bad_segv(struct faultinfo fi, unsigned long ip)
+{
+ current->thread.arch.faultinfo = fi;
+ force_sig_fault(SIGSEGV, SEGV_ACCERR, (void __user *) FAULT_ADDRESS(fi));
+}
+
+void fatal_sigsegv(void)
+{
+ force_fatal_sig(SIGSEGV);
+ do_signal(¤t->thread.regs);
+ /*
+ * This is to tell gcc that we're not returning - do_signal
+ * can, in general, return, but in this case, it's not, since
+ * we just got a fatal SIGSEGV queued.
+ */
+ os_dump_core();
+}
+
+/**
+ * segv_handler() - the SIGSEGV handler
+ * @sig: the signal number
+ * @unused_si: the signal info struct; unused in this handler
+ * @regs: the ptrace register information
+ *
+ * The handler first extracts the faultinfo from the UML ptrace regs struct.
+ * If the userfault did not happen in an UML userspace process, bad_segv is called.
+ * Otherwise the signal did happen in a cloned userspace process, handle it.
+ */
+void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs,
+ void *mc)
+{
+ struct faultinfo *fi = UPT_FAULTINFO(regs);
+
+ /* !MMU specific part; detection of userspace */
+ /* mark is_user=1 when the IP is from userspace code. */
+ if (UPT_IP(regs) > uml_reserved && UPT_IP(regs) < high_physmem)
+ regs->is_user = 1;
+
+ if (UPT_IS_USER(regs) && !SEGV_IS_FIXABLE(fi)) {
+ show_segv_info(regs);
+ bad_segv(*fi, UPT_IP(regs));
+ return;
+ }
+ segv(*fi, UPT_IP(regs), UPT_IS_USER(regs), regs, mc);
+
+ /* !MMU specific part; detection of userspace */
+ relay_signal(sig, unused_si, regs, mc);
+}
+
+/*
+ * We give a *copy* of the faultinfo in the regs to segv.
+ * This must be done, since nesting SEGVs could overwrite
+ * the info in the regs. A pointer to the info then would
+ * give us bad data!
+ */
+unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
+ struct uml_pt_regs *regs, void *mc)
+{
+ int si_code;
+ int err;
+ int is_write = FAULT_WRITE(fi);
+ unsigned long address = FAULT_ADDRESS(fi);
+
+ if (!is_user && regs)
+ current->thread.segv_regs = container_of(regs, struct pt_regs, regs);
+
+ if (current->mm == NULL) {
+ show_regs(container_of(regs, struct pt_regs, regs));
+ panic("Segfault with no mm");
+ } else if (!is_user && address > PAGE_SIZE && address < TASK_SIZE) {
+ show_regs(container_of(regs, struct pt_regs, regs));
+ panic("Kernel tried to access user memory at addr 0x%lx, ip 0x%lx",
+ address, ip);
+ }
+
+ if (SEGV_IS_FIXABLE(&fi))
+ err = handle_page_fault(address, ip, is_write, is_user,
+ &si_code);
+ else {
+ err = -EFAULT;
+ /*
+ * A thread accessed NULL, we get a fault, but CR2 is invalid.
+ * This code is used in __do_copy_from_user() of TT mode.
+ * XXX tt mode is gone, so maybe this isn't needed any more
+ */
+ address = 0;
+ }
+
+ if (!err)
+ goto out;
+ else if (!is_user && arch_fixup(ip, regs))
+ goto out;
+
+ if (!is_user) {
+ show_regs(container_of(regs, struct pt_regs, regs));
+ panic("Kernel mode fault at addr 0x%lx, ip 0x%lx",
+ address, ip);
+ }
+
+ show_segv_info(regs);
+
+ if (err == -EACCES) {
+ current->thread.arch.faultinfo = fi;
+ force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
+ } else {
+ WARN_ON_ONCE(err != -EFAULT);
+ current->thread.arch.faultinfo = fi;
+ force_sig_fault(SIGSEGV, si_code, (void __user *) address);
+ }
+
+out:
+ if (regs)
+ current->thread.segv_regs = NULL;
+
+ return 0;
+}
+
+void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs,
+ void *mc)
+{
+ int code, err;
+
+ /* !MMU specific part; detection of userspace */
+ /* mark is_user=1 when the IP is from userspace code. */
+ if (UPT_IP(regs) > uml_reserved && UPT_IP(regs) < high_physmem)
+ regs->is_user = 1;
+
+ if (!UPT_IS_USER(regs)) {
+ if (sig == SIGBUS)
+ pr_err("Bus error - the host /dev/shm or /tmp mount likely just ran out of space\n");
+ panic("Kernel mode signal %d", sig);
+ }
+ /* if is_user==1, set return to userspace sig handler to relay signal */
+ nommu_relay_signal(mc);
+
+ arch_examine_signal(sig, regs);
+
+ /* Is the signal layout for the signal known?
+ * Signal data must be scrubbed to prevent information leaks.
+ */
+ code = si->si_code;
+ err = si->si_errno;
+ if ((err == 0) && (siginfo_layout(sig, code) == SIL_FAULT)) {
+ struct faultinfo *fi = UPT_FAULTINFO(regs);
+
+ current->thread.arch.faultinfo = *fi;
+ force_sig_fault(sig, code, (void __user *)FAULT_ADDRESS(*fi));
+ } else {
+ pr_err("Attempted to relay unknown signal %d (si_code = %d) with errno %d\n",
+ sig, code, err);
+ force_sig(sig);
+ }
+}
+
+void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs,
+ void *mc)
+{
+ do_IRQ(WINCH_IRQ, regs);
+}
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index 2f6795cd884c..28754f56c42b 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -41,9 +41,10 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
int save_errno = errno;
r.is_user = 0;
+ if (mc)
+ get_regs_from_mc(&r, mc);
if (sig == SIGSEGV) {
/* For segfaults, we want the data from the sigcontext. */
- get_regs_from_mc(&r, mc);
GET_FAULTINFO_FROM_MC(r.faultinfo, mc);
}
diff --git a/arch/x86/um/nommu/do_syscall_64.c b/arch/x86/um/nommu/do_syscall_64.c
index 9bc630995df9..cf5a347ee9b1 100644
--- a/arch/x86/um/nommu/do_syscall_64.c
+++ b/arch/x86/um/nommu/do_syscall_64.c
@@ -44,6 +44,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
/* set fs register to the original host one */
os_x86_arch_prctl(0, ARCH_SET_FS, (void *)host_fs);
+ /* save fp registers */
+ asm volatile("fxsaveq %0" : "=m"(*(struct _xstate *)regs->regs.fp));
+
if (likely(syscall < NR_syscalls)) {
unsigned long ret;
@@ -61,6 +64,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
/* handle tasks and signals at the end */
interrupt_end();
+ /* restore fp registers */
+ asm volatile("fxrstorq %0" : : "m"((current->thread.regs.regs.fp)));
+
/* restore back fs register to userspace configured one */
os_x86_arch_prctl(0, ARCH_SET_FS,
(void *)(current->thread.regs.regs.gp[FS_BASE
diff --git a/arch/x86/um/nommu/os-Linux/mcontext.c b/arch/x86/um/nommu/os-Linux/mcontext.c
index b62a6195096f..afa20f1e235a 100644
--- a/arch/x86/um/nommu/os-Linux/mcontext.c
+++ b/arch/x86/um/nommu/os-Linux/mcontext.c
@@ -4,10 +4,21 @@
#include <asm/ptrace.h>
#include <sysdep/ptrace.h>
#include <sysdep/mcontext.h>
+#include <os.h>
+#include "../syscalls.h"
extern long __kernel_vsyscall(int64_t a0, int64_t a1, int64_t a2, int64_t a3,
int64_t a4, int64_t a5, int64_t a6);
+void set_mc_relay_signal(mcontext_t *mc)
+{
+ /* configure stack and userspace returning routine as
+ * instruction pointer
+ */
+ mc->gregs[REG_RSP] = (unsigned long) current_top_of_stack;
+ mc->gregs[REG_RIP] = (unsigned long) userspace;
+}
+
void set_mc_sigsys_hook(mcontext_t *mc)
{
mc->gregs[REG_RCX] = mc->gregs[REG_RIP];
diff --git a/arch/x86/um/shared/sysdep/mcontext.h b/arch/x86/um/shared/sysdep/mcontext.h
index 9a0d6087f357..82a5f38b350f 100644
--- a/arch/x86/um/shared/sysdep/mcontext.h
+++ b/arch/x86/um/shared/sysdep/mcontext.h
@@ -19,6 +19,7 @@ extern int set_stub_state(struct uml_pt_regs *regs, struct stub_data *data,
#ifndef CONFIG_MMU
extern void set_mc_sigsys_hook(mcontext_t *mc);
+extern void set_mc_relay_signal(mcontext_t *mc);
#endif
#ifdef __i386__
diff --git a/arch/x86/um/shared/sysdep/ptrace.h b/arch/x86/um/shared/sysdep/ptrace.h
index 572ea2d79131..6ed6bb1ca50e 100644
--- a/arch/x86/um/shared/sysdep/ptrace.h
+++ b/arch/x86/um/shared/sysdep/ptrace.h
@@ -53,7 +53,7 @@ struct uml_pt_regs {
int is_user;
/* Dynamically sized FP registers (holds an XSTATE) */
- unsigned long fp[];
+ unsigned long fp[] __attribute__((aligned(16)));
};
#define EMPTY_UML_PT_REGS { }
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 10/13] um: change machine name for uname output
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (8 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 09/13] x86/um: nommu: signal handling Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 11/13] um: nommu: disable SMP on nommu UML Hajime Tazaki
` (3 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
This commit tries to display MMU/!MMU mode from the output of uname(2)
so that users can distinguish which mode of UML is running right now.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
arch/um/Makefile | 6 ++++++
arch/um/os-Linux/util.c | 3 ++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/um/Makefile b/arch/um/Makefile
index 5371c9a1b11e..9bc8fc149514 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -153,6 +153,12 @@ export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) $(CC_FLAGS_
CLEAN_FILES += linux x.i gmon.out
MRPROPER_FILES += $(HOST_DIR)/include/generated
+ifeq ($(CONFIG_MMU),y)
+UTS_MACHINE := "um"
+else
+UTS_MACHINE := "um\(nommu\)"
+endif
+
archclean:
@find . \( -name '*.bb' -o -name '*.bbg' -o -name '*.da' \
-o -name '*.gcov' \) -type f -print | xargs rm -f
diff --git a/arch/um/os-Linux/util.c b/arch/um/os-Linux/util.c
index e3ad71a0d13c..5fb26f5dfcb6 100644
--- a/arch/um/os-Linux/util.c
+++ b/arch/um/os-Linux/util.c
@@ -64,7 +64,8 @@ void setup_machinename(char *machine_out)
}
# endif
#endif
- strcpy(machine_out, host.machine);
+ strcat(machine_out, "/");
+ strcat(machine_out, host.machine);
}
void setup_hostinfo(char *buf, int len)
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 11/13] um: nommu: disable SMP on nommu UML
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (9 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 10/13] um: change machine name for uname output Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 12/13] um: nommu: add documentation of " Hajime Tazaki
` (2 subsequent siblings)
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
CONFIG_SMP doesn't work with nommu UML since fs register handling of
host does conflict with thread local storage (more specifically,
the variable signals_enabled).
Thus this commit disables the CONFIG option and the TLS variables.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
arch/um/os-Linux/internal.h | 8 ++++++++
arch/x86/um/Kconfig | 2 +-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/um/os-Linux/internal.h b/arch/um/os-Linux/internal.h
index bac9fcc8c14c..25cb5cc931c1 100644
--- a/arch/um/os-Linux/internal.h
+++ b/arch/um/os-Linux/internal.h
@@ -6,6 +6,14 @@
#include <stub-data.h>
#include <signal.h>
+/* NOMMU doesn't work with thread-local storage used in CONFIG_SMP,
+ * due to the dependency on host_fs variable switch upon user/kernel
+ * context so, disable TLS until NOMMU supports SMP.
+ */
+#ifndef CONFIG_MMU
+#define __thread
+#endif
+
/*
* elf_aux.c
*/
diff --git a/arch/x86/um/Kconfig b/arch/x86/um/Kconfig
index bdd7c8e39b01..f12e2e4e0a12 100644
--- a/arch/x86/um/Kconfig
+++ b/arch/x86/um/Kconfig
@@ -12,7 +12,7 @@ config UML_X86
select ARCH_USE_QUEUED_SPINLOCKS
select DCACHE_WORD_ACCESS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
- select UML_SUBARCH_SUPPORTS_SMP if X86_CX8
+ select UML_SUBARCH_SUPPORTS_SMP if X86_CX8 && MMU
config 64BIT
bool "64-bit kernel" if "$(SUBARCH)" = "x86"
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 12/13] um: nommu: add documentation of nommu UML
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (10 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 11/13] um: nommu: disable SMP on nommu UML Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 13/13] um: nommu: plug nommu code into build system Hajime Tazaki
2025-11-10 9:14 ` [PATCH v13 00/13] nommu UML Christoph Hellwig
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
This commit adds an initial documentation for !MMU mode of UML.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
Documentation/virt/uml/nommu-uml.rst | 180 +++++++++++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 181 insertions(+)
create mode 100644 Documentation/virt/uml/nommu-uml.rst
diff --git a/Documentation/virt/uml/nommu-uml.rst b/Documentation/virt/uml/nommu-uml.rst
new file mode 100644
index 000000000000..f049bbc697d1
--- /dev/null
+++ b/Documentation/virt/uml/nommu-uml.rst
@@ -0,0 +1,180 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+UML has been built with CONFIG_MMU since day 0. The patchset
+introduces the nommu mode on UML in a different angle from what Linux
+Kernel Library tried.
+
+.. contents:: :local:
+
+What is it for ?
+================
+
+- Alleviate syscall hook overhead implemented with ptrace(2)
+- To exercises nommu code over UML (and over KUnit)
+- Less dependency to host facilities
+
+
+How it works ?
+==============
+
+To illustrate how this feature works, the below shows how syscalls are
+called under nommu/UML environment.
+
+- boot kernel, install seccomp filter if ``syscall`` instructions are
+ called from userspace memory based on the address of instruction
+ pointer
+- (userspace starts)
+- calls ``vfork``/``execve`` syscalls
+- ``SIGSYS`` signal raised, handler calls syscall entry point ``__kernel_vsyscall``
+- call handler function in ``sys_call_table[]`` and follow how UML syscall
+ works.
+- return to userspace
+
+
+What are the differences from MMU-full UML ?
+============================================
+
+The current nommu implementation adds 3 different functions which
+MMU-full UML doesn't have:
+
+- kernel address space can directly be accessible from userspace
+ - so, ``uaccess()`` always returns 1
+ - generic implementation of memcpy/strcpy/futex is also used
+- alternate syscall entrypoint without ptrace
+- alternate syscall hook
+ - hook syscall by seccomp filter
+
+With those modifications, it allows us to use unmodified userspace
+binaries with nommu UML.
+
+
+History
+=======
+
+This feature was originally introduced by Ricardo Koller at Open
+Source Summit NA 2020, then integrated with the syscall translation
+functionality with the clean up to the original code.
+
+Building and run
+================
+
+::
+
+ make ARCH=um x86_64_nommu_defconfig
+ make ARCH=um
+
+will build UML with ``CONFIG_MMU=n`` applied.
+
+Kunit tests can run with the following command::
+
+ ./tools/testing/kunit/kunit.py run --kconfig_add CONFIG_MMU=n
+
+To run a typical Linux distribution, we need nommu-aware userspace.
+We can use a stock version of Alpine Linux with nommu-built version of
+busybox and musl-libc.
+
+
+Preparing root filesystem
+=========================
+
+nommu UML requires to use a specific standard library which is aware
+of nommu kernel. We have tested custom-build musl-libc and busybox,
+both of which have built-in support for nommu kernels.
+
+There are no available Linux distributions for nommu under x86_64
+architecture, so we need to prepare our own image for the root
+filesystem. We use Alpine Linux as a base distribution and replace
+busybox and musl-libc on top of that. The following are the step to
+prepare the filesystem for the quick start::
+
+ container_id=$(docker create ghcr.io/thehajime/alpine:3.20.3-um-nommu)
+ docker start $container_id
+ docker wait $container_id
+ docker export $container_id > alpine.tar
+ docker rm $container_id
+
+ mnt=$(mktemp -d)
+ dd if=/dev/zero of=alpine.ext4 bs=1 count=0 seek=1G
+ sudo chmod og+wr "alpine.ext4"
+ yes 2>/dev/null | mkfs.ext4 "alpine.ext4" || true
+ sudo mount "alpine.ext4" $mnt
+ sudo tar -xf alpine.tar -C $mnt
+ sudo umount $mnt
+
+This will create a file image, ``alpine.ext4``, which contains busybox
+and musl with nommu build on the Alpine Linux root filesystem. The
+file can be specified to the argument ``ubd0=`` to the UML command line::
+
+ ./vmlinux ubd0=./alpine.ext4 rw mem=1024m loglevel=8 init=/sbin/init
+
+We plan to upstream apk packages for busybox and musl so that we can
+follow the proper procedure to set up the root filesystem.
+
+
+Quick start with docker
+=======================
+
+There is a docker image that you can quickly start with a simple step::
+
+ docker run -it -v /dev/shm:/dev/shm --rm ghcr.io/thehajime/alpine:3.20.3-um-nommu
+
+This will launch a UML instance with an pre-configured root filesystem.
+
+Benchmark
+=========
+
+The below shows an example of performance measurement conducted with
+lmbench and (self-crafted) getpid benchmark (with v6.17-rc5 uml/next
+tree).
+
+.. csv-table:: lmbench (usec)
+ :header: ,native,um,um-mmu(s),um-nommu(s)
+
+ select-10 ,0.5319,36.1214,24.2795,2.9174
+ select-100 ,1.6019,34.6049,28.8865,3.8080
+ select-1000 ,12.2588,43.6838,48.7438,12.7872
+ syscall ,0.1644,35.0321,53.2119,2.5981
+ read ,0.3055,31.5509,45.8538,2.7068
+ write ,0.2512,31.3609,29.2636,2.6948
+ stat ,1.8894,43.8477,49.6121,3.1908
+ open/close ,3.2973,77.5123,68.9431,6.2575
+ fork+sh ,1110.3000,7359.5000,4618.6667,439.4615
+ fork+execve ,510.8182,2834.0000,2461.1667,139.7848
+
+.. csv-table:: do_getpid bench (nsec)
+ :header: ,native,um,um-mmu(s),um-nommu(s)
+
+ getpid , 161 , 34477 , 26242 , 2599
+
+(um-nommu(s) is with seccomp syscall hook, um-mmu(s) is SECCOMP mode,
+respectively)
+
+Limitations
+===========
+
+generic nommu limitations
+-------------------------
+Since this port is a kernel of nommu architecture so, the
+implementation inherits the characteristics of other nommu kernels
+(riscv, arm, etc), described below.
+
+- vfork(2) should be used instead of fork(2)
+- ELF loader only loads PIE (position independent executable) binaries
+- processes share the address space among others
+- mmap(2) offers a subset of functionalities (e.g., unsupported
+ MMAP_FIXED)
+
+Thus, we have limited options to userspace programs. We have tested
+Alpine Linux with musl-libc, which has a support nommu kernel.
+
+supported architecture
+----------------------
+The current implementation of nommu UML only works on x86_64 SUBARCH.
+We have not tested with 32-bit environment.
+
+
+Further readings about NOMMU UML
+================================
+
+- NOMMU UML (original code by Ricardo Koller)
+ - https://static.sched.com/hosted_files/ossna2020/ec/kollerr_linux_um_nommu.pdf
diff --git a/MAINTAINERS b/MAINTAINERS
index 3da2c26a796b..2f227f56d04e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -26764,6 +26764,7 @@ USER-MODE LINUX (UML)
M: Richard Weinberger <richard@nod.at>
M: Anton Ivanov <anton.ivanov@cambridgegreys.com>
M: Johannes Berg <johannes@sipsolutions.net>
+M: Hajime Tazaki <thehajime@gmail.com>
L: linux-um@lists.infradead.org
S: Maintained
W: http://user-mode-linux.sourceforge.net
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v13 13/13] um: nommu: plug nommu code into build system
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (11 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 12/13] um: nommu: add documentation of " Hajime Tazaki
@ 2025-11-08 8:05 ` Hajime Tazaki
2025-11-10 9:14 ` [PATCH v13 00/13] nommu UML Christoph Hellwig
13 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-08 8:05 UTC (permalink / raw)
To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, linux-kernel
Add nommu kernel for um build. defconfig is also provided.
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
arch/um/Kconfig | 14 ++++++-
arch/um/configs/x86_64_nommu_defconfig | 54 ++++++++++++++++++++++++++
2 files changed, 66 insertions(+), 2 deletions(-)
create mode 100644 arch/um/configs/x86_64_nommu_defconfig
diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index 097c6a6265ef..4907fd2db512 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -34,16 +34,19 @@ config UML
select ARCH_SUPPORTS_LTO_CLANG_THIN
select TRACE_IRQFLAGS_SUPPORT
select TTY # Needed for line.c
- select HAVE_ARCH_VMAP_STACK
+ select HAVE_ARCH_VMAP_STACK if MMU
select HAVE_RUST
select ARCH_HAS_UBSAN
select HAVE_ARCH_TRACEHOOK
select HAVE_SYSCALL_TRACEPOINTS
select THREAD_INFO_IN_TASK
select SPARSE_IRQ
+ select UACCESS_MEMCPY if !MMU
+ select GENERIC_STRNLEN_USER if !MMU
+ select GENERIC_STRNCPY_FROM_USER if !MMU
config MMU
- bool
+ bool "MMU-based Paged Memory Management Support" if 64BIT
default y
config UML_DMA_EMULATION
@@ -225,8 +228,15 @@ config MAGIC_SYSRQ
The keys are documented in <file:Documentation/admin-guide/sysrq.rst>. Don't say Y
unless you really know what this hack does.
+config ARCH_FORCE_MAX_ORDER
+ int "Order of maximal physically contiguous allocations" if EXPERT
+ default "10" if MMU
+ default "16" if !MMU
+
config KERNEL_STACK_ORDER
int "Kernel stack size order"
+ default 3 if !MMU
+ range 3 10 if !MMU
default 2 if 64BIT
range 2 10 if 64BIT
default 1 if !64BIT
diff --git a/arch/um/configs/x86_64_nommu_defconfig b/arch/um/configs/x86_64_nommu_defconfig
new file mode 100644
index 000000000000..02cb87091c9f
--- /dev/null
+++ b/arch/um/configs/x86_64_nommu_defconfig
@@ -0,0 +1,54 @@
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_LOG_BUF_SHIFT=14
+CONFIG_CGROUPS=y
+CONFIG_BLK_CGROUP=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_CGROUP_DEVICE=y
+CONFIG_CGROUP_CPUACCT=y
+# CONFIG_PID_NS is not set
+CONFIG_CC_OPTIMIZE_FOR_SIZE=y
+# CONFIG_MMU is not set
+CONFIG_HOSTFS=y
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_SSL=y
+CONFIG_NULL_CHAN=y
+CONFIG_PORT_CHAN=y
+CONFIG_PTY_CHAN=y
+CONFIG_TTY_CHAN=y
+CONFIG_CON_CHAN="pts"
+CONFIG_SSL_CHAN="pts"
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_IOSCHED_BFQ=m
+CONFIG_BINFMT_MISC=m
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_INET=y
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
+CONFIG_BLK_DEV_UBD=y
+CONFIG_BLK_DEV_LOOP=m
+CONFIG_BLK_DEV_NBD=m
+CONFIG_DUMMY=m
+CONFIG_TUN=m
+CONFIG_PPP=m
+CONFIG_SLIP=m
+CONFIG_LEGACY_PTY_COUNT=32
+CONFIG_UML_RANDOM=y
+CONFIG_EXT4_FS=y
+CONFIG_QUOTA=y
+CONFIG_AUTOFS_FS=m
+CONFIG_ISO9660_FS=m
+CONFIG_JOLIET=y
+CONFIG_NLS=y
+CONFIG_DEBUG_KERNEL=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
+CONFIG_FRAME_WARN=1024
+CONFIG_IPV6=y
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v13 00/13] nommu UML
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
` (12 preceding siblings ...)
2025-11-08 8:05 ` [PATCH v13 13/13] um: nommu: plug nommu code into build system Hajime Tazaki
@ 2025-11-10 9:14 ` Christoph Hellwig
2025-11-10 12:18 ` Hajime Tazaki
13 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2025-11-10 9:14 UTC (permalink / raw)
To: Hajime Tazaki; +Cc: linux-um, ricarkol, Liam.Howlett, linux-kernel
On Sat, Nov 08, 2025 at 05:05:35PM +0900, Hajime Tazaki wrote:
> This patchset is another spin of nommu mode addition to UML. It would
> be nice to hear about your opinions on that.
I've not seen any explanation of the use case and/or benefits anywhere
in this cover letter or the patches. Without that it's usually pretty
hard to get maintainers and reviewers excited.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v13 00/13] nommu UML
2025-11-10 9:14 ` [PATCH v13 00/13] nommu UML Christoph Hellwig
@ 2025-11-10 12:18 ` Hajime Tazaki
2025-11-11 8:01 ` Johannes Berg
0 siblings, 1 reply; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-10 12:18 UTC (permalink / raw)
To: hch; +Cc: linux-um, ricarkol, Liam.Howlett, linux-kernel
Hello,
On Mon, 10 Nov 2025 18:14:26 +0900,
Christoph Hellwig wrote:
>
> On Sat, Nov 08, 2025 at 05:05:35PM +0900, Hajime Tazaki wrote:
> > This patchset is another spin of nommu mode addition to UML. It would
> > be nice to hear about your opinions on that.
>
> I've not seen any explanation of the use case and/or benefits anywhere
> in this cover letter or the patches. Without that it's usually pretty
> hard to get maintainers and reviewers excited.
thank you for the comment. I tried to include this explanation in the
document patch [12/13], which I copied from the text below.
What is it for ?
================
- Alleviate syscall hook overhead implemented with ptrace(2)
- To exercises nommu code over UML (and over KUnit)
- Less dependency to host facilities
the first item is for speed up, the second item is for more testing,
the last item is for more extensibility in the future.
Early version of this patchset included this information as well as
the whole documentation, but I removed it as the versions grow. But I
can revert it to the cover letter if it helps.
-- Hajime
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v13 00/13] nommu UML
2025-11-10 12:18 ` Hajime Tazaki
@ 2025-11-11 8:01 ` Johannes Berg
2025-11-12 8:52 ` Hajime Tazaki
0 siblings, 1 reply; 20+ messages in thread
From: Johannes Berg @ 2025-11-11 8:01 UTC (permalink / raw)
To: Hajime Tazaki, hch; +Cc: linux-um, ricarkol, Liam.Howlett, linux-kernel
On Mon, 2025-11-10 at 21:18 +0900, Hajime Tazaki wrote:
>
> What is it for ?
> ================
>
> - Alleviate syscall hook overhead implemented with ptrace(2)
> - To exercises nommu code over UML (and over KUnit)
> - Less dependency to host facilities
FWIW, in some way, this order of priorities is exactly why this hasn't
been going anywhere, and every time I looked at it I got somewhat
annoyed by what seems to me like choices made to support especially the
first bullet.
I suspect that the first and third bullet are not even really true any
more, since you moved to seccomp (per our request), yet I think design
choices influenced by them persist.
People are definitely interested in the second bullet, mostly for kunit,
and I'd be willing to support them in that to some extent.
However, I'm not yet convinced that all of the complexities presented in
this patchset (such as completely separate seccomp implementation) are
actually necessary in support of _just_ the second bullet. These seem to
me like design choices necessary to support the _first_ bullet [1].
[1] and then I suppose the third, which I'm reading as "doesn't need
seccomp or ptrace", but I'm not really quite sure what you meant
I've thought about what would happen if we stuck to creating a (single)
separate process on the host to execute userspace, and just used
CLONE_VM for it. That way, it's still no-MMU with full memory access,
but there's some implicit isolation between the kernel and userspace
processes which will likely remove complexities around FP/SSE/AVX
handling, may completely remove the need for a separate seccomp
implementation, etc.
It would, on the other hand, make it completely non-viable to achieve
the first and third bullets, so given your pursuit of those, one some
level I understand the design right now. I'm yet to be convinced,
however, that those are even worthy goals for (upstream) UML, what use
case would that enable that we really need? Especially considering that
over a longer perspective, NOMMU architectures _are_ on their way out,
and UML will certainly follow once that happens, it won't be the last
remaining NOMMU architecture.
So the only value I see in this is for testing over the net couple of
years, which really doesn't need any sort of significant optimisation or
less reliance on host facilities.
Where do you see this differently?
johannes
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v13 00/13] nommu UML
2025-11-11 8:01 ` Johannes Berg
@ 2025-11-12 8:52 ` Hajime Tazaki
2025-11-12 16:36 ` Tiwei Bie
0 siblings, 1 reply; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-12 8:52 UTC (permalink / raw)
To: johannes; +Cc: hch, linux-um, ricarkol, Liam.Howlett, linux-kernel
On Tue, 11 Nov 2025 17:01:25 +0900,
Johannes Berg wrote:
>
> On Mon, 2025-11-10 at 21:18 +0900, Hajime Tazaki wrote:
> >
> > What is it for ?
> > ================
> >
> > - Alleviate syscall hook overhead implemented with ptrace(2)
> > - To exercises nommu code over UML (and over KUnit)
> > - Less dependency to host facilities
>
> FWIW, in some way, this order of priorities is exactly why this hasn't
> been going anywhere, and every time I looked at it I got somewhat
> annoyed by what seems to me like choices made to support especially the
> first bullet.
over the past versions, I've been emphasized that the 2nd bullet (testing)
is the primary usecase as I saw several actually cases from mm folks,
https://lists.infradead.org/pipermail/maple-tree/2024-November/003775.html
https://lore.kernel.org/all/cb1cf0be-871d-4982-9a1b-5fdd54deec8d@lucifer.local/
and I think this is not limited to mm code.
other 2 bullets are additional benefits which we observed in a
comment, and our experience.
https://lore.kernel.org/all/20241122121826.GA26024@lst.de/
[2] https://static.sched.com/hosted_files/ossna2020/ec/kollerr_linux_um_nommu.pdf
but those are not the primary goal, so I'm not pushing this aspect
with usecases.
> I suspect that the first and third bullet are not even really true any
> more, since you moved to seccomp (per our request), yet I think design
> choices influenced by them persist.
this observation is not true; the first bullet is still true even
using seccomp. please look at the benchmark result in the patch
[12/13], quoted below.
summary: most of tests show that um-nommu+seccomp is x4 to x20 faster
than um-mmu+seccomp (and ptrace).
.. csv-table:: lmbench (usec)
:header: ,native,um,um-mmu(s),um-nommu(s)
select-10 ,0.5319,36.1214,24.2795,2.9174
select-100 ,1.6019,34.6049,28.8865,3.8080
select-1000 ,12.2588,43.6838,48.7438,12.7872
syscall ,0.1644,35.0321,53.2119,2.5981
read ,0.3055,31.5509,45.8538,2.7068
write ,0.2512,31.3609,29.2636,2.6948
stat ,1.8894,43.8477,49.6121,3.1908
open/close ,3.2973,77.5123,68.9431,6.2575
fork+sh ,1110.3000,7359.5000,4618.6667,439.4615
fork+execve ,510.8182,2834.0000,2461.1667,139.7848
.. csv-table:: do_getpid bench (nsec)
:header: ,native,um,um-mmu(s),um-nommu(s)
getpid , 161 , 34477 , 26242 , 2599
the 1st bullet saying ptrace(2) is somehow misleading now. this might
be rephrased with "a separate process handling userspace", instead of
"ptrace".
# when I started this patchset, the seccomp patch wasn't in upstream.
saying ptrace(2) wasn't not that much wrong.
> People are definitely interested in the second bullet, mostly for kunit,
> and I'd be willing to support them in that to some extent.
so (again) the 2nd bullet is the primary use case at this stage.
> However, I'm not yet convinced that all of the complexities presented in
> this patchset (such as completely separate seccomp implementation) are
> actually necessary in support of _just_ the second bullet. These seem to
> me like design choices necessary to support the _first_ bullet [1].
separate seccomp implementation is indeed needed due to the design
choice we made, to use a single process to host a (um) userspace. I
think there is no reason to unify the seccomp part because the
signal handlers and filter installation do the different jobs.
I don't see why you see this as a _complexity_, as functionally both
seccomp handling don't interfere each other. we have prepared
separate sub-directories for nommu to avoid unnecessary if/else
clauses in .c/.h files. we haven't seen any functional regressions
since this RFC version (which was 6.12 kernel).
> [1] and then I suppose the third, which I'm reading as "doesn't need
> seccomp or ptrace", but I'm not really quite sure what you meant
>
> I've thought about what would happen if we stuck to creating a (single)
> separate process on the host to execute userspace, and just used
> CLONE_VM for it. That way, it's still no-MMU with full memory access,
> but there's some implicit isolation between the kernel and userspace
> processes which will likely remove complexities around FP/SSE/AVX
> handling, may completely remove the need for a separate seccomp
> implementation, etc.
this would be doable I think, but we went the different way, as
using separate host processes (with ptrace/seccomp) is slow and add
complexity by the synchronization between processes, which we think
it's not easy to maintain in the future.
this was natural for us (not sure for maintainers) when we add a new
functionality, consider several options to implement, and took one of the
option which is faster, simpler, and having less cost to maintain.
the avoidance of separate processes is probably the core of our design
choice we made for nommu UML. I'm not strongly pushing the benefits
of 1st/3rd bullets, but I thought describing the characteristics of
what _this_ patchset can should be useful. thus in the document.
additionally, if the design choice we made introduces any breakages on
existing code, or maintenance burdens, I would understand your concern
on the complexity, but I don't think this is the case.
> It would, on the other hand, make it completely non-viable to achieve
> the first and third bullets, so given your pursuit of those, one some
> level I understand the design right now. I'm yet to be convinced,
> however, that those are even worthy goals for (upstream) UML, what use
> case would that enable that we really need?
the usecase for those are inherited from the original implementation,
[2] above, which is running UML on containers with less host dependency
and speedups. but again, this is not the primary goal at this stage.
if you think that the document should not describe the potential
benefits/usecases which are not related to the primary goal of the
functionality, I'd agree to remove those descriptions.
> Especially considering that
> over a longer perspective, NOMMU architectures _are_ on their way out,
> and UML will certainly follow once that happens, it won't be the last
> remaining NOMMU architecture.
I'm aware of this nommu removal discussion, but also saw there are
expressions not to support this direction. This patchset is still
useful even now.
> So the only value I see in this is for testing over the net couple of
> years, which really doesn't need any sort of significant optimisation or
> less reliance on host facilities.
I agree the former, but not the latter.
- there is a value with a real usecase,
- there are different ways to implement it but this went with the
one with potential (additional) benefits,
- without breakages to the exising (MMU) uml code.
with that, we're proposing this patchset.
> Where do you see this differently?
thanks for the careful prompt for me.
I hope my answer clarifies your concerns.
I also wish to understand concerns of maintainers, due to the single
process design of nommu for um userspace, and the codebase is still
young so may have unexpected influence to others. but this is exactly
the reason why I also put myself to MAINTAINERS in order to take care
of this patchset even it is small (1.3k loc).
-- Hajime
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v13 00/13] nommu UML
2025-11-12 8:52 ` Hajime Tazaki
@ 2025-11-12 16:36 ` Tiwei Bie
2025-11-14 6:47 ` Hajime Tazaki
0 siblings, 1 reply; 20+ messages in thread
From: Tiwei Bie @ 2025-11-12 16:36 UTC (permalink / raw)
To: thehajime
Cc: Liam.Howlett, hch, johannes, linux-kernel, linux-um, ricarkol,
tiwei.bie
On Wed, 12 Nov 2025 17:52:56 +0900, Hajime Tazaki wrote:
[...]
> > However, I'm not yet convinced that all of the complexities presented in
> > this patchset (such as completely separate seccomp implementation) are
> > actually necessary in support of _just_ the second bullet. These seem to
> > me like design choices necessary to support the _first_ bullet [1].
>
> separate seccomp implementation is indeed needed due to the design
> choice we made, to use a single process to host a (um) userspace. I
> think there is no reason to unify the seccomp part because the
> signal handlers and filter installation do the different jobs.
>
> I don't see why you see this as a _complexity_, as functionally both
> seccomp handling don't interfere each other. we have prepared
> separate sub-directories for nommu to avoid unnecessary if/else
> clauses in .c/.h files.
I have the same concern about the complexities introduced by this
patch set. The new processing paths it introduces (such as the
separate handling for FP/SSE/AVX, FS, signal, syscall, ...) add a
lot of unnecessary complexities. I think Johannes's suggestion is
a great idea.
> we haven't seen any functional regressions
> since this RFC version (which was 6.12 kernel).
I took a quick look at the code. It appears that patch 02/13 will
break the mmu build when UML_TIME_TRAVEL_SUPPORT is enabled.
Regards,
Tiwei
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v13 00/13] nommu UML
2025-11-12 16:36 ` Tiwei Bie
@ 2025-11-14 6:47 ` Hajime Tazaki
0 siblings, 0 replies; 20+ messages in thread
From: Hajime Tazaki @ 2025-11-14 6:47 UTC (permalink / raw)
To: tiwei.bie; +Cc: Liam.Howlett, hch, johannes, linux-kernel, linux-um, ricarkol
On Thu, 13 Nov 2025 01:36:51 +0900,
Tiwei Bie wrote:
> > we haven't seen any functional regressions
> > since this RFC version (which was 6.12 kernel).
>
> I took a quick look at the code. It appears that patch 02/13 will
> break the mmu build when UML_TIME_TRAVEL_SUPPORT is enabled.
thanks, it is my bad on the move the chunk.
will fix it and added to my local test.
-- Hajime
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-11-14 6:47 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-08 8:05 [PATCH v13 00/13] nommu UML Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 01/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 02/13] um: decouple MMU specific code from the common part Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 03/13] um: nommu: memory handling Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 04/13] x86/um: nommu: syscall handling Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 05/13] um: nommu: seccomp syscalls hook Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 06/13] x86/um: nommu: process/thread handling Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 07/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 08/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 09/13] x86/um: nommu: signal handling Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 10/13] um: change machine name for uname output Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 11/13] um: nommu: disable SMP on nommu UML Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 12/13] um: nommu: add documentation of " Hajime Tazaki
2025-11-08 8:05 ` [PATCH v13 13/13] um: nommu: plug nommu code into build system Hajime Tazaki
2025-11-10 9:14 ` [PATCH v13 00/13] nommu UML Christoph Hellwig
2025-11-10 12:18 ` Hajime Tazaki
2025-11-11 8:01 ` Johannes Berg
2025-11-12 8:52 ` Hajime Tazaki
2025-11-12 16:36 ` Tiwei Bie
2025-11-14 6:47 ` Hajime Tazaki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).