linux-um.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/13] nommu UML
@ 2025-01-20  6:00 Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 01/13] x86/um: clean up elf specific definitions Hajime Tazaki
                   ` (14 more replies)
  0 siblings, 15 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

This patchset is another spin of nommu mode addition to UML.  It doesn't
change a lot since the last version (v5), but contain clean ups.  It would
be nice to hear about your opinions on that.

There are still several limitations/issues which we already found;
here is the list of those issues.

- memory mapped by loadable modules are not distinguished from
  userspace memory.

-- Hajime

v7:
- properly handle FP register upon signal delivery [10/13]
- update benchmark result with new FP register handling [12/13]
- fix arch_has_single_step() for !MMU case [07/13]
- revert stack alignment as it is in uml/fixes tree [10/13]

v6:
- rebase to the latest uml/next tree
- more clean up on mmu/nommu for signal handling [10/13]
- rename functions of mcontext routines [06,10/13]
- added Acked-by tag for binfmt_elf_fdpic [02/13]
- https://lore.kernel.org/linux-um/cover.1736853925.git.thehajime@gmail.com/

v5:
- clean up stack manipulation code [05,06,07,10/13]
- https://lore.kernel.org/linux-um/cover.1733998168.git.thehajime@gmail.com/

v4:
- add arch/um/nommu, arch/x86/um/nommu to contain !MMU specific codes
- remove zpoline patch
- drop binfmt_elf_fdpic patch
- reduce ifndef CONFIG_MMU if possible
- split to elf header cleanup patch [01/13]
- fix kernel test robot warnings [06/13]
- fix coding styles [07/13]
- move task_top_of_stack definition [05/13]
- https://lore.kernel.org/linux-um/cover.1733652929.git.thehajime@gmail.com/

v3:
- https://lore.kernel.org/linux-um/cover.1733199769.git.thehajime@gmail.com/
- add seccomp-based syscall hook in addition to zpoline [06/13]
- remove RFC, add a line to MAINTAINERS file
- fix kernel test robot warnings [02/13,08/13,10/13]
- add base-commit tag to cover letter
- pull the latest uml/next
- clean up SIGSEGV handling [10/13]
- detect fsgsbase availability with elf aux vector [08/13]
- simplify vdso code with macros [09/13]

RFC v2:
- https://lore.kernel.org/linux-um/cover.1731290567.git.thehajime@gmail.com/
- base branch is now uml/linux.git instead of torvalds/linux.git.
- reorganize the patch series to clean up
- fixed various coding styles issues
- clean up exec code path [07/13]
- fixed the crash/SIGSEGV case on userspace programs [10/13]
- add seccomp filter to limit syscall caller address [06/13]
- detect fsgsbase availability with sigsetjmp/siglongjmp [08/13]
- removes unrelated changes
- removes unneeded ifndef CONFIG_MMU
- convert UML_CONFIG_MMU to CONFIG_MMU as using uml/linux.git
- proposed a patch of maple-tree issue (resolving a limitation in RFC v1)
  https://lore.kernel.org/linux-mm/20241108222834.3625217-1-thehajime@gmail.com/

RFC:
- https://lore.kernel.org/linux-um/cover.1729770373.git.thehajime@gmail.com/

Hajime Tazaki (13):
  x86/um: clean up elf specific definitions
  x86/um: nommu: elf loader for fdpic
  um: decouple MMU specific code from the common part
  um: nommu: memory handling
  x86/um: nommu: syscall handling
  um: nommu: seccomp syscalls hook
  x86/um: nommu: process/thread handling
  um: nommu: configure fs register on host syscall invocation
  x86/um/vdso: nommu: vdso memory update
  x86/um: nommu: signal handling
  um: change machine name for uname output
  um: nommu: add documentation of nommu UML
  um: nommu: plug nommu code into build system

 Documentation/virt/uml/nommu-uml.rst    | 177 ++++++++++++++++++++++
 MAINTAINERS                             |   1 +
 arch/um/Kconfig                         |  14 +-
 arch/um/Makefile                        |  10 ++
 arch/um/configs/x86_64_nommu_defconfig  |  64 ++++++++
 arch/um/include/asm/Kbuild              |   1 +
 arch/um/include/asm/futex.h             |   4 +
 arch/um/include/asm/mmu.h               |   8 +
 arch/um/include/asm/mmu_context.h       |   2 +
 arch/um/include/asm/ptrace-generic.h    |   8 +-
 arch/um/include/asm/uaccess.h           |   7 +-
 arch/um/include/shared/kern_util.h      |  12 ++
 arch/um/include/shared/os.h             |  16 ++
 arch/um/kernel/Makefile                 |   5 +-
 arch/um/kernel/mem-pgtable.c            |  55 +++++++
 arch/um/kernel/mem.c                    |  39 +----
 arch/um/kernel/process.c                |  25 ++++
 arch/um/kernel/skas/process.c           |  27 ----
 arch/um/kernel/um_arch.c                |   3 +
 arch/um/nommu/Makefile                  |   3 +
 arch/um/nommu/os-Linux/Makefile         |   7 +
 arch/um/nommu/os-Linux/signal.c         |  28 ++++
 arch/um/nommu/trap.c                    | 188 ++++++++++++++++++++++++
 arch/um/os-Linux/Makefile               |   8 +-
 arch/um/os-Linux/internal.h             |   5 +
 arch/um/os-Linux/mem.c                  |   4 +
 arch/um/os-Linux/process.c              | 149 ++++++++++++++++++-
 arch/um/os-Linux/seccomp.c              |  87 +++++++++++
 arch/um/os-Linux/signal.c               |  31 +++-
 arch/um/os-Linux/skas/process.c         | 132 -----------------
 arch/um/os-Linux/start_up.c             |  20 +++
 arch/um/os-Linux/util.c                 |   3 +-
 arch/x86/um/Makefile                    |   7 +-
 arch/x86/um/asm/elf.h                   |   8 +-
 arch/x86/um/asm/module.h                |  24 ---
 arch/x86/um/nommu/Makefile              |   8 +
 arch/x86/um/nommu/do_syscall_64.c       |  80 ++++++++++
 arch/x86/um/nommu/entry_64.S            | 113 ++++++++++++++
 arch/x86/um/nommu/os-Linux/Makefile     |   6 +
 arch/x86/um/nommu/os-Linux/mcontext.c   |  24 +++
 arch/x86/um/nommu/syscalls.h            |  16 ++
 arch/x86/um/nommu/syscalls_64.c         | 115 +++++++++++++++
 arch/x86/um/shared/sysdep/mcontext.h    |   4 +
 arch/x86/um/shared/sysdep/ptrace.h      |   2 +-
 arch/x86/um/shared/sysdep/syscalls_64.h |   6 +
 arch/x86/um/vdso/vma.c                  |  17 ++-
 fs/Kconfig.binfmt                       |   2 +-
 47 files changed, 1329 insertions(+), 246 deletions(-)
 create mode 100644 Documentation/virt/uml/nommu-uml.rst
 create mode 100644 arch/um/configs/x86_64_nommu_defconfig
 create mode 100644 arch/um/kernel/mem-pgtable.c
 create mode 100644 arch/um/nommu/Makefile
 create mode 100644 arch/um/nommu/os-Linux/Makefile
 create mode 100644 arch/um/nommu/os-Linux/signal.c
 create mode 100644 arch/um/nommu/trap.c
 create mode 100644 arch/um/os-Linux/seccomp.c
 delete mode 100644 arch/x86/um/asm/module.h
 create mode 100644 arch/x86/um/nommu/Makefile
 create mode 100644 arch/x86/um/nommu/do_syscall_64.c
 create mode 100644 arch/x86/um/nommu/entry_64.S
 create mode 100644 arch/x86/um/nommu/os-Linux/Makefile
 create mode 100644 arch/x86/um/nommu/os-Linux/mcontext.c
 create mode 100644 arch/x86/um/nommu/syscalls.h
 create mode 100644 arch/x86/um/nommu/syscalls_64.c


base-commit: 2d2b61ae38bd91217ea7cc5bc700a2b9e75b3937
-- 
2.43.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v7 01/13] x86/um: clean up elf specific definitions
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 02/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

The file arch/x86/um/asm/module.h is equivalent to the definition of
asm-generic.  Thus this commit cleans up to use it.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
 arch/um/include/asm/Kbuild |  1 +
 arch/x86/um/asm/module.h   | 24 ------------------------
 2 files changed, 1 insertion(+), 24 deletions(-)
 delete mode 100644 arch/x86/um/asm/module.h

diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 428f2c5158c2..04ab3b653a48 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += irq_work.h
 generic-y += kdebug.h
 generic-y += mcs_spinlock.h
 generic-y += mmiowb.h
+generic-y += module.h
 generic-y += module.lds.h
 generic-y += param.h
 generic-y += parport.h
diff --git a/arch/x86/um/asm/module.h b/arch/x86/um/asm/module.h
deleted file mode 100644
index a3b061d66082..000000000000
--- a/arch/x86/um/asm/module.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __UM_MODULE_H
-#define __UM_MODULE_H
-
-/* UML is simple */
-struct mod_arch_specific
-{
-};
-
-#ifdef CONFIG_X86_32
-
-#define Elf_Shdr Elf32_Shdr
-#define Elf_Sym Elf32_Sym
-#define Elf_Ehdr Elf32_Ehdr
-
-#else
-
-#define Elf_Shdr Elf64_Shdr
-#define Elf_Sym Elf64_Sym
-#define Elf_Ehdr Elf64_Ehdr
-
-#endif
-
-#endif
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 02/13] x86/um: nommu: elf loader for fdpic
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 01/13] x86/um: clean up elf specific definitions Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 03/13] um: decouple MMU specific code from the common part Hajime Tazaki
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um
  Cc: thehajime, ricarkol, Liam.Howlett, Eric Biederman, Kees Cook,
	Alexander Viro, Christian Brauner, Jan Kara, linux-mm,
	linux-fsdevel

As UML supports CONFIG_MMU=n case, it has to use an alternate ELF
loader, FDPIC ELF loader.  In this commit, we added necessary
definitions in the arch, as UML has not been used so far.  It also
updates Kconfig file to use BINFMT_ELF_FDPIC under !MMU environment.

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-fsdevel@vger.kernel.org
Acked-by: Kees Cook <kees@kernel.org>
Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arch/um/include/asm/mmu.h            | 5 +++++
 arch/um/include/asm/ptrace-generic.h | 6 ++++++
 arch/x86/um/asm/elf.h                | 8 ++++++--
 fs/Kconfig.binfmt                    | 2 +-
 4 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/um/include/asm/mmu.h b/arch/um/include/asm/mmu.h
index a3eaca41ff61..01422b761aa0 100644
--- a/arch/um/include/asm/mmu.h
+++ b/arch/um/include/asm/mmu.h
@@ -14,6 +14,11 @@ typedef struct mm_context {
 	/* Address range in need of a TLB sync */
 	unsigned long sync_tlb_range_from;
 	unsigned long sync_tlb_range_to;
+
+#ifdef CONFIG_BINFMT_ELF_FDPIC
+	unsigned long   exec_fdpic_loadmap;
+	unsigned long   interp_fdpic_loadmap;
+#endif
 } mm_context_t;
 
 #endif
diff --git a/arch/um/include/asm/ptrace-generic.h b/arch/um/include/asm/ptrace-generic.h
index 4696f24d1492..4ff844bcb1cd 100644
--- a/arch/um/include/asm/ptrace-generic.h
+++ b/arch/um/include/asm/ptrace-generic.h
@@ -29,6 +29,12 @@ struct pt_regs {
 
 #define PTRACE_OLDSETOPTIONS 21
 
+#ifdef CONFIG_BINFMT_ELF_FDPIC
+#define PTRACE_GETFDPIC		31
+#define PTRACE_GETFDPIC_EXEC	0
+#define PTRACE_GETFDPIC_INTERP	1
+#endif
+
 struct task_struct;
 
 extern long subarch_ptrace(struct task_struct *child, long request,
diff --git a/arch/x86/um/asm/elf.h b/arch/x86/um/asm/elf.h
index 62ed5d68a978..33f69f1eac10 100644
--- a/arch/x86/um/asm/elf.h
+++ b/arch/x86/um/asm/elf.h
@@ -9,6 +9,7 @@
 #include <skas.h>
 
 #define CORE_DUMP_USE_REGSET
+#define ELF_FDPIC_CORE_EFLAGS  0
 
 #ifdef CONFIG_X86_32
 
@@ -190,8 +191,11 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm,
 
 extern unsigned long um_vdso_addr;
 #define AT_SYSINFO_EHDR 33
-#define ARCH_DLINFO	NEW_AUX_ENT(AT_SYSINFO_EHDR, um_vdso_addr)
-
+#define ARCH_DLINFO						\
+do {								\
+	NEW_AUX_ENT(AT_SYSINFO_EHDR, um_vdso_addr);		\
+	NEW_AUX_ENT(AT_MINSIGSTKSZ, 0);			\
+} while (0)
 #endif
 
 typedef unsigned long elf_greg_t;
diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
index bd2f530e5740..419ba0282806 100644
--- a/fs/Kconfig.binfmt
+++ b/fs/Kconfig.binfmt
@@ -58,7 +58,7 @@ config ARCH_USE_GNU_PROPERTY
 config BINFMT_ELF_FDPIC
 	bool "Kernel support for FDPIC ELF binaries"
 	default y if !BINFMT_ELF
-	depends on ARM || ((M68K || RISCV || SUPERH || XTENSA) && !MMU)
+	depends on ARM || ((M68K || RISCV || SUPERH || UML || XTENSA) && !MMU)
 	select ELFCORE
 	help
 	  ELF FDPIC binaries are based on ELF, but allow the individual load
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 03/13] um: decouple MMU specific code from the common part
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 01/13] x86/um: clean up elf specific definitions Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 02/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 04/13] um: nommu: memory handling Hajime Tazaki
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

This splits the memory, process related code with common and MMU
specific parts in order to avoid ifdefs in .c file and duplication
between MMU and !MMU.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
 arch/um/kernel/Makefile         |   5 +-
 arch/um/kernel/mem-pgtable.c    |  55 +++++++++++++
 arch/um/kernel/mem.c            |  36 ---------
 arch/um/kernel/process.c        |  25 ++++++
 arch/um/kernel/skas/process.c   |  27 -------
 arch/um/os-Linux/Makefile       |   3 +-
 arch/um/os-Linux/internal.h     |   5 ++
 arch/um/os-Linux/process.c      | 133 ++++++++++++++++++++++++++++++++
 arch/um/os-Linux/skas/process.c | 132 -------------------------------
 9 files changed, 223 insertions(+), 198 deletions(-)
 create mode 100644 arch/um/kernel/mem-pgtable.c

diff --git a/arch/um/kernel/Makefile b/arch/um/kernel/Makefile
index f8567b933ffa..eec6682812ce 100644
--- a/arch/um/kernel/Makefile
+++ b/arch/um/kernel/Makefile
@@ -16,9 +16,10 @@ extra-y := vmlinux.lds
 
 obj-y = config.o exec.o exitcode.o irq.o ksyms.o mem.o \
 	physmem.o process.o ptrace.o reboot.o sigio.o \
-	signal.o sysrq.o time.o tlb.o trap.o \
-	um_arch.o umid.o maccess.o kmsg_dump.o capflags.o skas/
+	signal.o sysrq.o time.o \
+	um_arch.o umid.o maccess.o kmsg_dump.o capflags.o
 obj-y += load_file.o
+obj-$(CONFIG_MMU) += mem-pgtable.o tlb.o trap.o skas/
 
 obj-$(CONFIG_BLK_DEV_INITRD) += initrd.o
 obj-$(CONFIG_GPROF)	+= gprof_syms.o
diff --git a/arch/um/kernel/mem-pgtable.c b/arch/um/kernel/mem-pgtable.c
new file mode 100644
index 000000000000..549da1d3bff0
--- /dev/null
+++ b/arch/um/kernel/mem-pgtable.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2000 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ */
+
+#include <linux/stddef.h>
+#include <linux/module.h>
+#include <linux/memblock.h>
+#include <linux/swap.h>
+#include <linux/slab.h>
+#include <asm/page.h>
+#include <asm/pgalloc.h>
+#include <as-layout.h>
+#include <init.h>
+#include <kern.h>
+#include <kern_util.h>
+#include <mem_user.h>
+#include <os.h>
+#include <um_malloc.h>
+
+
+/* Allocate and free page tables. */
+
+pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+	pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL);
+
+	if (pgd) {
+		memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));
+		memcpy(pgd + USER_PTRS_PER_PGD,
+		       swapper_pg_dir + USER_PTRS_PER_PGD,
+		       (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
+	}
+	return pgd;
+}
+
+static const pgprot_t protection_map[16] = {
+	[VM_NONE]					= PAGE_NONE,
+	[VM_READ]					= PAGE_READONLY,
+	[VM_WRITE]					= PAGE_COPY,
+	[VM_WRITE | VM_READ]				= PAGE_COPY,
+	[VM_EXEC]					= PAGE_READONLY,
+	[VM_EXEC | VM_READ]				= PAGE_READONLY,
+	[VM_EXEC | VM_WRITE]				= PAGE_COPY,
+	[VM_EXEC | VM_WRITE | VM_READ]			= PAGE_COPY,
+	[VM_SHARED]					= PAGE_NONE,
+	[VM_SHARED | VM_READ]				= PAGE_READONLY,
+	[VM_SHARED | VM_WRITE]				= PAGE_SHARED,
+	[VM_SHARED | VM_WRITE | VM_READ]		= PAGE_SHARED,
+	[VM_SHARED | VM_EXEC]				= PAGE_READONLY,
+	[VM_SHARED | VM_EXEC | VM_READ]			= PAGE_READONLY,
+	[VM_SHARED | VM_EXEC | VM_WRITE]		= PAGE_SHARED,
+	[VM_SHARED | VM_EXEC | VM_WRITE | VM_READ]	= PAGE_SHARED
+};
+DECLARE_VM_GET_PAGE_PROT
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 8a0e74ad00d1..767b9b05d6bb 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -6,7 +6,6 @@
 #include <linux/stddef.h>
 #include <linux/module.h>
 #include <linux/memblock.h>
-#include <linux/mm.h>
 #include <linux/swap.h>
 #include <linux/slab.h>
 #include <asm/page.h>
@@ -203,42 +202,7 @@ void free_initmem(void)
 {
 }
 
-/* Allocate and free page tables. */
-
-pgd_t *pgd_alloc(struct mm_struct *mm)
-{
-	pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL);
-
-	if (pgd) {
-		memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t));
-		memcpy(pgd + USER_PTRS_PER_PGD,
-		       swapper_pg_dir + USER_PTRS_PER_PGD,
-		       (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
-	}
-	return pgd;
-}
-
 void *uml_kmalloc(int size, int flags)
 {
 	return kmalloc(size, flags);
 }
-
-static const pgprot_t protection_map[16] = {
-	[VM_NONE]					= PAGE_NONE,
-	[VM_READ]					= PAGE_READONLY,
-	[VM_WRITE]					= PAGE_COPY,
-	[VM_WRITE | VM_READ]				= PAGE_COPY,
-	[VM_EXEC]					= PAGE_READONLY,
-	[VM_EXEC | VM_READ]				= PAGE_READONLY,
-	[VM_EXEC | VM_WRITE]				= PAGE_COPY,
-	[VM_EXEC | VM_WRITE | VM_READ]			= PAGE_COPY,
-	[VM_SHARED]					= PAGE_NONE,
-	[VM_SHARED | VM_READ]				= PAGE_READONLY,
-	[VM_SHARED | VM_WRITE]				= PAGE_SHARED,
-	[VM_SHARED | VM_WRITE | VM_READ]		= PAGE_SHARED,
-	[VM_SHARED | VM_EXEC]				= PAGE_READONLY,
-	[VM_SHARED | VM_EXEC | VM_READ]			= PAGE_READONLY,
-	[VM_SHARED | VM_EXEC | VM_WRITE]		= PAGE_SHARED,
-	[VM_SHARED | VM_EXEC | VM_WRITE | VM_READ]	= PAGE_SHARED
-};
-DECLARE_VM_GET_PAGE_PROT
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index e5a2d4d897e0..2f191daed532 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -25,6 +25,7 @@
 #include <linux/tick.h>
 #include <linux/threads.h>
 #include <linux/resume_user_mode.h>
+#include <linux/start_kernel.h>
 #include <asm/current.h>
 #include <asm/mmu_context.h>
 #include <asm/switch_to.h>
@@ -46,6 +47,8 @@
 struct task_struct *cpu_tasks[NR_CPUS];
 EXPORT_SYMBOL(cpu_tasks);
 
+static char cpu0_irqstack[THREAD_SIZE] __aligned(THREAD_SIZE);
+
 void free_stack(unsigned long stack, int order)
 {
 	free_pages(stack, order);
@@ -287,3 +290,25 @@ unsigned long __get_wchan(struct task_struct *p)
 
 	return 0;
 }
+
+
+static int __init start_kernel_proc(void *unused)
+{
+	block_signals_trace();
+
+	start_kernel();
+	return 0;
+}
+
+int __init start_uml(void)
+{
+	stack_protections((unsigned long) &cpu0_irqstack);
+	set_sigstack(cpu0_irqstack, THREAD_SIZE);
+
+	init_new_thread_signals();
+
+	init_task.thread.request.thread.proc = start_kernel_proc;
+	init_task.thread.request.thread.arg = NULL;
+	return start_idle_thread(task_stack_page(&init_task),
+				 &init_task.thread.switch_buf);
+}
diff --git a/arch/um/kernel/skas/process.c b/arch/um/kernel/skas/process.c
index 05dcdc057af9..5247121d3419 100644
--- a/arch/um/kernel/skas/process.c
+++ b/arch/um/kernel/skas/process.c
@@ -16,33 +16,6 @@
 #include <skas.h>
 #include <kern_util.h>
 
-extern void start_kernel(void);
-
-static int __init start_kernel_proc(void *unused)
-{
-	block_signals_trace();
-
-	start_kernel();
-	return 0;
-}
-
-extern int userspace_pid[];
-
-static char cpu0_irqstack[THREAD_SIZE] __aligned(THREAD_SIZE);
-
-int __init start_uml(void)
-{
-	stack_protections((unsigned long) &cpu0_irqstack);
-	set_sigstack(cpu0_irqstack, THREAD_SIZE);
-
-	init_new_thread_signals();
-
-	init_task.thread.request.thread.proc = start_kernel_proc;
-	init_task.thread.request.thread.arg = NULL;
-	return start_idle_thread(task_stack_page(&init_task),
-				 &init_task.thread.switch_buf);
-}
-
 unsigned long current_stub_stack(void)
 {
 	if (current->mm == NULL)
diff --git a/arch/um/os-Linux/Makefile b/arch/um/os-Linux/Makefile
index 049dfa5bc9c6..331564888400 100644
--- a/arch/um/os-Linux/Makefile
+++ b/arch/um/os-Linux/Makefile
@@ -8,7 +8,8 @@ KCOV_INSTRUMENT                := n
 
 obj-y = execvp.o file.o helper.o irq.o main.o mem.o process.o \
 	registers.o sigio.o signal.o start_up.o time.o tty.o \
-	umid.o user_syms.o util.o drivers/ skas/
+	umid.o user_syms.o util.o drivers/
+obj-$(CONFIG_MMU) += skas/
 
 CFLAGS_signal.o += -Wframe-larger-than=4096
 
diff --git a/arch/um/os-Linux/internal.h b/arch/um/os-Linux/internal.h
index 317fca190c2b..bcbf64ce8cd1 100644
--- a/arch/um/os-Linux/internal.h
+++ b/arch/um/os-Linux/internal.h
@@ -2,6 +2,11 @@
 #ifndef __UM_OS_LINUX_INTERNAL_H
 #define __UM_OS_LINUX_INTERNAL_H
 
+/*
+ * process.c
+ */
+extern int userspace_pid[];
+
 /*
  * elf_aux.c
  */
diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c
index 9f086f939420..6df378639880 100644
--- a/arch/um/os-Linux/process.c
+++ b/arch/um/os-Linux/process.c
@@ -6,6 +6,7 @@
 
 #include <stdio.h>
 #include <stdlib.h>
+#include <stdbool.h>
 #include <unistd.h>
 #include <errno.h>
 #include <signal.h>
@@ -15,9 +16,15 @@
 #include <sys/prctl.h>
 #include <sys/wait.h>
 #include <asm/unistd.h>
+#include <linux/threads.h>
 #include <init.h>
 #include <longjmp.h>
 #include <os.h>
+#include <as-layout.h>
+#include <kern_util.h>
+
+int userspace_pid[NR_CPUS];
+int unscheduled_userspace_iterations;
 
 void os_alarm_process(int pid)
 {
@@ -209,3 +216,129 @@ void os_set_pdeathsig(void)
 {
 	prctl(PR_SET_PDEATHSIG, SIGKILL);
 }
+
+int is_skas_winch(int pid, int fd, void *data)
+{
+	return pid == getpgrp();
+}
+
+void new_thread(void *stack, jmp_buf *buf, void (*handler)(void))
+{
+	(*buf)[0].JB_IP = (unsigned long) handler;
+	(*buf)[0].JB_SP = (unsigned long) stack + UM_THREAD_SIZE -
+		sizeof(void *);
+}
+
+#define INIT_JMP_NEW_THREAD 0
+#define INIT_JMP_CALLBACK 1
+#define INIT_JMP_HALT 2
+#define INIT_JMP_REBOOT 3
+
+void switch_threads(jmp_buf *me, jmp_buf *you)
+{
+	unscheduled_userspace_iterations = 0;
+
+	if (UML_SETJMP(me) == 0)
+		UML_LONGJMP(you, 1);
+}
+
+static jmp_buf initial_jmpbuf;
+
+/* XXX Make these percpu */
+static void (*cb_proc)(void *arg);
+static void *cb_arg;
+static jmp_buf *cb_back;
+
+int start_idle_thread(void *stack, jmp_buf *switch_buf)
+{
+	int n;
+
+	set_handler(SIGWINCH);
+
+	/*
+	 * Can't use UML_SETJMP or UML_LONGJMP here because they save
+	 * and restore signals, with the possible side-effect of
+	 * trying to handle any signals which came when they were
+	 * blocked, which can't be done on this stack.
+	 * Signals must be blocked when jumping back here and restored
+	 * after returning to the jumper.
+	 */
+	n = setjmp(initial_jmpbuf);
+	switch (n) {
+	case INIT_JMP_NEW_THREAD:
+		(*switch_buf)[0].JB_IP = (unsigned long) uml_finishsetup;
+		(*switch_buf)[0].JB_SP = (unsigned long) stack +
+			UM_THREAD_SIZE - sizeof(void *);
+		break;
+	case INIT_JMP_CALLBACK:
+		(*cb_proc)(cb_arg);
+		longjmp(*cb_back, 1);
+		break;
+	case INIT_JMP_HALT:
+		kmalloc_ok = 0;
+		return 0;
+	case INIT_JMP_REBOOT:
+		kmalloc_ok = 0;
+		return 1;
+	default:
+		printk(UM_KERN_ERR "Bad sigsetjmp return in %s - %d\n",
+		       __func__, n);
+		fatal_sigsegv();
+	}
+	longjmp(*switch_buf, 1);
+
+	/* unreachable */
+	printk(UM_KERN_ERR "impossible long jump!");
+	fatal_sigsegv();
+	return 0;
+}
+
+void initial_thread_cb_skas(void (*proc)(void *), void *arg)
+{
+	jmp_buf here;
+
+	cb_proc = proc;
+	cb_arg = arg;
+	cb_back = &here;
+
+	block_signals_trace();
+	if (UML_SETJMP(&here) == 0)
+		UML_LONGJMP(&initial_jmpbuf, INIT_JMP_CALLBACK);
+	unblock_signals_trace();
+
+	cb_proc = NULL;
+	cb_arg = NULL;
+	cb_back = NULL;
+}
+
+void halt_skas(void)
+{
+	block_signals_trace();
+	UML_LONGJMP(&initial_jmpbuf, INIT_JMP_HALT);
+}
+
+static bool noreboot;
+
+static int __init noreboot_cmd_param(char *str, int *add)
+{
+	*add = 0;
+	noreboot = true;
+	return 0;
+}
+
+__uml_setup("noreboot", noreboot_cmd_param,
+"noreboot\n"
+"    Rather than rebooting, exit always, akin to QEMU's -no-reboot option.\n"
+"    This is useful if you're using CONFIG_PANIC_TIMEOUT in order to catch\n"
+"    crashes in CI\n");
+
+void reboot_skas(void)
+{
+	block_signals_trace();
+	UML_LONGJMP(&initial_jmpbuf, noreboot ? INIT_JMP_HALT : INIT_JMP_REBOOT);
+}
+
+void __switch_mm(struct mm_id *mm_idp)
+{
+	userspace_pid[0] = mm_idp->pid;
+}
diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c
index f683cfc9e51a..555eb4905f70 100644
--- a/arch/um/os-Linux/skas/process.c
+++ b/arch/um/os-Linux/skas/process.c
@@ -5,7 +5,6 @@
  */
 
 #include <stdlib.h>
-#include <stdbool.h>
 #include <unistd.h>
 #include <sched.h>
 #include <errno.h>
@@ -16,7 +15,6 @@
 #include <sys/wait.h>
 #include <sys/stat.h>
 #include <asm/unistd.h>
-#include <as-layout.h>
 #include <init.h>
 #include <kern_util.h>
 #include <mem.h>
@@ -25,15 +23,9 @@
 #include <registers.h>
 #include <skas.h>
 #include <sysdep/stub.h>
-#include <linux/threads.h>
 #include <timetravel.h>
 #include "../internal.h"
 
-int is_skas_winch(int pid, int fd, void *data)
-{
-	return pid == getpgrp();
-}
-
 static const char *ptrace_reg_name(int idx)
 {
 #define R(n) case HOST_##n: return #n
@@ -305,8 +297,6 @@ static int __init init_stub_exe_fd(void)
 }
 __initcall(init_stub_exe_fd);
 
-int userspace_pid[NR_CPUS];
-
 /**
  * start_userspace() - prepare a new userspace process
  * @stub_stack:	pointer to the stub stack.
@@ -388,7 +378,6 @@ int start_userspace(unsigned long stub_stack)
 	return err;
 }
 
-int unscheduled_userspace_iterations;
 extern unsigned long tt_extra_sched_jiffies;
 
 void userspace(struct uml_pt_regs *regs)
@@ -550,124 +539,3 @@ void userspace(struct uml_pt_regs *regs)
 		}
 	}
 }
-
-void new_thread(void *stack, jmp_buf *buf, void (*handler)(void))
-{
-	(*buf)[0].JB_IP = (unsigned long) handler;
-	(*buf)[0].JB_SP = (unsigned long) stack + UM_THREAD_SIZE -
-		sizeof(void *);
-}
-
-#define INIT_JMP_NEW_THREAD 0
-#define INIT_JMP_CALLBACK 1
-#define INIT_JMP_HALT 2
-#define INIT_JMP_REBOOT 3
-
-void switch_threads(jmp_buf *me, jmp_buf *you)
-{
-	unscheduled_userspace_iterations = 0;
-
-	if (UML_SETJMP(me) == 0)
-		UML_LONGJMP(you, 1);
-}
-
-static jmp_buf initial_jmpbuf;
-
-/* XXX Make these percpu */
-static void (*cb_proc)(void *arg);
-static void *cb_arg;
-static jmp_buf *cb_back;
-
-int start_idle_thread(void *stack, jmp_buf *switch_buf)
-{
-	int n;
-
-	set_handler(SIGWINCH);
-
-	/*
-	 * Can't use UML_SETJMP or UML_LONGJMP here because they save
-	 * and restore signals, with the possible side-effect of
-	 * trying to handle any signals which came when they were
-	 * blocked, which can't be done on this stack.
-	 * Signals must be blocked when jumping back here and restored
-	 * after returning to the jumper.
-	 */
-	n = setjmp(initial_jmpbuf);
-	switch (n) {
-	case INIT_JMP_NEW_THREAD:
-		(*switch_buf)[0].JB_IP = (unsigned long) uml_finishsetup;
-		(*switch_buf)[0].JB_SP = (unsigned long) stack +
-			UM_THREAD_SIZE - sizeof(void *);
-		break;
-	case INIT_JMP_CALLBACK:
-		(*cb_proc)(cb_arg);
-		longjmp(*cb_back, 1);
-		break;
-	case INIT_JMP_HALT:
-		kmalloc_ok = 0;
-		return 0;
-	case INIT_JMP_REBOOT:
-		kmalloc_ok = 0;
-		return 1;
-	default:
-		printk(UM_KERN_ERR "Bad sigsetjmp return in %s - %d\n",
-		       __func__, n);
-		fatal_sigsegv();
-	}
-	longjmp(*switch_buf, 1);
-
-	/* unreachable */
-	printk(UM_KERN_ERR "impossible long jump!");
-	fatal_sigsegv();
-	return 0;
-}
-
-void initial_thread_cb_skas(void (*proc)(void *), void *arg)
-{
-	jmp_buf here;
-
-	cb_proc = proc;
-	cb_arg = arg;
-	cb_back = &here;
-
-	block_signals_trace();
-	if (UML_SETJMP(&here) == 0)
-		UML_LONGJMP(&initial_jmpbuf, INIT_JMP_CALLBACK);
-	unblock_signals_trace();
-
-	cb_proc = NULL;
-	cb_arg = NULL;
-	cb_back = NULL;
-}
-
-void halt_skas(void)
-{
-	block_signals_trace();
-	UML_LONGJMP(&initial_jmpbuf, INIT_JMP_HALT);
-}
-
-static bool noreboot;
-
-static int __init noreboot_cmd_param(char *str, int *add)
-{
-	*add = 0;
-	noreboot = true;
-	return 0;
-}
-
-__uml_setup("noreboot", noreboot_cmd_param,
-"noreboot\n"
-"    Rather than rebooting, exit always, akin to QEMU's -no-reboot option.\n"
-"    This is useful if you're using CONFIG_PANIC_TIMEOUT in order to catch\n"
-"    crashes in CI\n");
-
-void reboot_skas(void)
-{
-	block_signals_trace();
-	UML_LONGJMP(&initial_jmpbuf, noreboot ? INIT_JMP_HALT : INIT_JMP_REBOOT);
-}
-
-void __switch_mm(struct mm_id *mm_idp)
-{
-	userspace_pid[0] = mm_idp->pid;
-}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 04/13] um: nommu: memory handling
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (2 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 03/13] um: decouple MMU specific code from the common part Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 05/13] x86/um: nommu: syscall handling Hajime Tazaki
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

This commit adds memory operations on UML under !MMU environment.

Some part of the original UML code relying on CONFIG_MMU are excluded
from compilation when !CONFIG_MMU.  Additionally, generic functions such as
uaccess, futex, memcpy/strnlen/strncpy can be used as user- and
kernel-space share the address space in !CONFIG_MMU mode.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arch/um/Makefile                  | 4 ++++
 arch/um/include/asm/futex.h       | 4 ++++
 arch/um/include/asm/mmu.h         | 3 +++
 arch/um/include/asm/mmu_context.h | 2 ++
 arch/um/include/asm/uaccess.h     | 7 ++++---
 arch/um/kernel/mem.c              | 3 ++-
 arch/um/os-Linux/mem.c            | 4 ++++
 arch/um/os-Linux/process.c        | 4 ++--
 8 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/um/Makefile b/arch/um/Makefile
index 1d36a613aad8..fcf4bb915a31 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -46,6 +46,10 @@ ARCH_INCLUDE	:= -I$(srctree)/$(SHARED_HEADERS)
 ARCH_INCLUDE	+= -I$(srctree)/$(HOST_DIR)/um/shared
 KBUILD_CPPFLAGS += -I$(srctree)/$(HOST_DIR)/um
 
+ifneq ($(CONFIG_MMU),y)
+core-y += $(ARCH_DIR)/nommu/
+endif
+
 # -Dvmap=kernel_vmap prevents anything from referencing the libpcap.o symbol so
 # named - it's a common symbol in libpcap, so we get a binary which crashes.
 #
diff --git a/arch/um/include/asm/futex.h b/arch/um/include/asm/futex.h
index 780aa6bfc050..785fd6649aa2 100644
--- a/arch/um/include/asm/futex.h
+++ b/arch/um/include/asm/futex.h
@@ -7,8 +7,12 @@
 #include <asm/errno.h>
 
 
+#ifdef CONFIG_MMU
 int arch_futex_atomic_op_inuser(int op, u32 oparg, int *oval, u32 __user *uaddr);
 int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 			      u32 oldval, u32 newval);
+#else
+#include <asm-generic/futex.h>
+#endif
 
 #endif
diff --git a/arch/um/include/asm/mmu.h b/arch/um/include/asm/mmu.h
index 01422b761aa0..d4087f9499e2 100644
--- a/arch/um/include/asm/mmu.h
+++ b/arch/um/include/asm/mmu.h
@@ -15,10 +15,13 @@ typedef struct mm_context {
 	unsigned long sync_tlb_range_from;
 	unsigned long sync_tlb_range_to;
 
+#ifndef CONFIG_MMU
+	unsigned long   end_brk;
 #ifdef CONFIG_BINFMT_ELF_FDPIC
 	unsigned long   exec_fdpic_loadmap;
 	unsigned long   interp_fdpic_loadmap;
 #endif
+#endif /* !CONFIG_MMU */
 } mm_context_t;
 
 #endif
diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h
index 23dcc914d44e..033a70166066 100644
--- a/arch/um/include/asm/mmu_context.h
+++ b/arch/um/include/asm/mmu_context.h
@@ -36,11 +36,13 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 	}
 }
 
+#ifdef CONFIG_MMU
 #define init_new_context init_new_context
 extern int init_new_context(struct task_struct *task, struct mm_struct *mm);
 
 #define destroy_context destroy_context
 extern void destroy_context(struct mm_struct *mm);
+#endif
 
 #include <asm-generic/mmu_context.h>
 
diff --git a/arch/um/include/asm/uaccess.h b/arch/um/include/asm/uaccess.h
index 1d4b6bbc1b65..9bfee12cb6b7 100644
--- a/arch/um/include/asm/uaccess.h
+++ b/arch/um/include/asm/uaccess.h
@@ -22,6 +22,7 @@
 #define __addr_range_nowrap(addr, size) \
 	((unsigned long) (addr) <= ((unsigned long) (addr) + (size)))
 
+#ifdef CONFIG_MMU
 extern unsigned long raw_copy_from_user(void *to, const void __user *from, unsigned long n);
 extern unsigned long raw_copy_to_user(void __user *to, const void *from, unsigned long n);
 extern unsigned long __clear_user(void __user *mem, unsigned long len);
@@ -33,9 +34,6 @@ static inline int __access_ok(const void __user *ptr, unsigned long size);
 
 #define INLINE_COPY_FROM_USER
 #define INLINE_COPY_TO_USER
-
-#include <asm-generic/uaccess.h>
-
 static inline int __access_ok(const void __user *ptr, unsigned long size)
 {
 	unsigned long addr = (unsigned long)ptr;
@@ -43,6 +41,9 @@ static inline int __access_ok(const void __user *ptr, unsigned long size)
 		(__under_task_size(addr, size) ||
 		 __access_ok_vsyscall(addr, size));
 }
+#endif
+
+#include <asm-generic/uaccess.h>
 
 /* no pagefaults for kernel addresses in um */
 #define __get_kernel_nofault(dst, src, type, err_label)			\
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 767b9b05d6bb..beab40bc9f30 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -62,7 +62,8 @@ void __init mem_init(void)
 	 * to be turned on.
 	 */
 	brk_end = (unsigned long) UML_ROUND_UP(sbrk(0));
-	map_memory(brk_end, __pa(brk_end), uml_reserved - brk_end, 1, 1, 0);
+	map_memory(brk_end, __pa(brk_end), uml_reserved - brk_end, 1, 1,
+		   !IS_ENABLED(CONFIG_MMU));
 	memblock_free((void *)brk_end, uml_reserved - brk_end);
 	uml_reserved = brk_end;
 
diff --git a/arch/um/os-Linux/mem.c b/arch/um/os-Linux/mem.c
index 72f302f4d197..4f5d9a94f8e2 100644
--- a/arch/um/os-Linux/mem.c
+++ b/arch/um/os-Linux/mem.c
@@ -213,6 +213,10 @@ int __init create_mem_file(unsigned long long len)
 {
 	int err, fd;
 
+	/* NOMMU kernel uses -1 as a fd for further use (e.g., mmap) */
+	if (!IS_ENABLED(CONFIG_MMU))
+		return -1;
+
 	fd = create_tmp_file(len);
 
 	err = os_set_exec_close(fd);
diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c
index 6df378639880..e2dc00fc84c0 100644
--- a/arch/um/os-Linux/process.c
+++ b/arch/um/os-Linux/process.c
@@ -70,8 +70,8 @@ int os_map_memory(void *virt, int fd, unsigned long long off, unsigned long len,
 	prot = (r ? PROT_READ : 0) | (w ? PROT_WRITE : 0) |
 		(x ? PROT_EXEC : 0);
 
-	loc = mmap64((void *) virt, len, prot, MAP_SHARED | MAP_FIXED,
-		     fd, off);
+	loc = mmap64((void *) virt, len, prot, MAP_SHARED | MAP_FIXED |
+		     (!IS_ENABLED(CONFIG_MMU) ? MAP_ANONYMOUS : 0), fd, off);
 	if (loc == MAP_FAILED)
 		return -errno;
 	return 0;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 05/13] x86/um: nommu: syscall handling
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (3 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 04/13] um: nommu: memory handling Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 06/13] um: nommu: seccomp syscalls hook Hajime Tazaki
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

This commit introduces an entry point of syscall interface for !MMU
mode. It uses an entry function, __kernel_vsyscall, a kernel-wide global
symbol accessible from any locations.

Although it isn't in the scope of this commit, it can be also exposed
via vdso image which is directly accessible from userspace. A standard
library (i.e., libc) can utilize this entry point to implement syscall
wrapper; we can also use this by hooking syscall for unmodified userspace
applications/libraries, which will be implemented in the subsequent
commit.

This only supports 64-bit mode of x86 architecture.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arch/x86/um/Makefile                    |  4 ++
 arch/x86/um/nommu/Makefile              |  8 +++
 arch/x86/um/nommu/do_syscall_64.c       | 37 ++++++++++
 arch/x86/um/nommu/entry_64.S            | 91 +++++++++++++++++++++++++
 arch/x86/um/nommu/syscalls.h            | 16 +++++
 arch/x86/um/shared/sysdep/syscalls_64.h |  6 ++
 6 files changed, 162 insertions(+)
 create mode 100644 arch/x86/um/nommu/Makefile
 create mode 100644 arch/x86/um/nommu/do_syscall_64.c
 create mode 100644 arch/x86/um/nommu/entry_64.S
 create mode 100644 arch/x86/um/nommu/syscalls.h

diff --git a/arch/x86/um/Makefile b/arch/x86/um/Makefile
index b42c31cd2390..227af2a987e2 100644
--- a/arch/x86/um/Makefile
+++ b/arch/x86/um/Makefile
@@ -32,6 +32,10 @@ obj-y += syscalls_64.o vdso/
 subarch-y = ../lib/csum-partial_64.o ../lib/memcpy_64.o \
 	../lib/memmove_64.o ../lib/memset_64.o
 
+ifneq ($(CONFIG_MMU),y)
+obj-y += nommu/
+endif
+
 endif
 
 subarch-$(CONFIG_MODULES) += ../kernel/module.o
diff --git a/arch/x86/um/nommu/Makefile b/arch/x86/um/nommu/Makefile
new file mode 100644
index 000000000000..d72c63afffa5
--- /dev/null
+++ b/arch/x86/um/nommu/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+ifeq ($(CONFIG_X86_32),y)
+	BITS := 32
+else
+	BITS := 64
+endif
+
+obj-y = do_syscall_$(BITS).o entry_$(BITS).o
diff --git a/arch/x86/um/nommu/do_syscall_64.c b/arch/x86/um/nommu/do_syscall_64.c
new file mode 100644
index 000000000000..5d0fa83e7fdc
--- /dev/null
+++ b/arch/x86/um/nommu/do_syscall_64.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kernel.h>
+#include <linux/ptrace.h>
+#include <kern_util.h>
+#include <sysdep/syscalls.h>
+#include <os.h>
+
+__visible void do_syscall_64(struct pt_regs *regs)
+{
+	int syscall;
+
+	syscall = PT_SYSCALL_NR(regs->regs.gp);
+	UPT_SYSCALL_NR(&regs->regs) = syscall;
+
+	pr_debug("syscall(%d) (current=%lx) (fn=%lx)\n",
+		 syscall, (unsigned long)current,
+		 (unsigned long)sys_call_table[syscall]);
+
+	if (likely(syscall < NR_syscalls)) {
+		PT_REGS_SET_SYSCALL_RETURN(regs,
+				EXECUTE_SYSCALL(syscall, regs));
+	}
+
+	pr_debug("syscall(%d) --> %lx\n", syscall,
+		regs->regs.gp[HOST_AX]);
+
+	PT_REGS_SYSCALL_RET(regs) = regs->regs.gp[HOST_AX];
+
+	/* execve succeeded */
+	if (syscall == __NR_execve && regs->regs.gp[HOST_AX] == 0)
+		userspace(&current->thread.regs.regs);
+
+	/* force do_signal() --> is_syscall() */
+	set_thread_flag(TIF_SIGPENDING);
+	interrupt_end();
+}
diff --git a/arch/x86/um/nommu/entry_64.S b/arch/x86/um/nommu/entry_64.S
new file mode 100644
index 000000000000..e9bfc7b93c84
--- /dev/null
+++ b/arch/x86/um/nommu/entry_64.S
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <asm/errno.h>
+
+#include <linux/linkage.h>
+#include <asm/percpu.h>
+#include <asm/desc.h>
+
+#include "../entry/calling.h"
+
+#ifdef CONFIG_SMP
+#error need to stash these variables somewhere else
+#endif
+
+#define UM_GLOBAL_VAR(x) .data; .align 8; .globl x; x:; .long 0
+
+UM_GLOBAL_VAR(current_top_of_stack)
+UM_GLOBAL_VAR(current_ptregs)
+
+.code64
+.section .entry.text, "ax"
+
+.align 8
+#undef ENTRY
+#define ENTRY(x) .text; .globl x; .type x,%function; x:
+#undef END
+#define END(x)   .size x, . - x
+
+/*
+ * %rcx has the return address (we set it before entering __kernel_vsyscall).
+ *
+ * Registers on entry:
+ * rax  system call number
+ * rcx  return address
+ * rdi  arg0
+ * rsi  arg1
+ * rdx  arg2
+ * r10  arg3
+ * r8   arg4
+ * r9   arg5
+ *
+ * (note: we are allowed to mess with r11: r11 is callee-clobbered
+ * register in C ABI)
+ */
+ENTRY(__kernel_vsyscall)
+
+	movq	%rsp, %r11
+
+	/* Point rsp to the top of the ptregs array, so we can
+           just fill it with a bunch of push'es. */
+	movq	current_ptregs, %rsp
+
+	/* 8 bytes * 20 registers (plus 8 for the push) */
+	addq	$168, %rsp
+
+	/* Construct struct pt_regs on stack */
+	pushq	$0		/* pt_regs->ss (index 20) */
+	pushq   %r11		/* pt_regs->sp */
+	pushfq			/* pt_regs->flags */
+	pushq	$0		/* pt_regs->cs */
+	pushq	%rcx		/* pt_regs->ip */
+	pushq	%rax		/* pt_regs->orig_ax */
+
+	PUSH_AND_CLEAR_REGS rax=$-ENOSYS
+
+	mov %rsp, %rdi
+
+	/*
+	 * Switch to current top of stack, so "current->" points
+	 * to the right task.
+	 */
+	movq	current_top_of_stack, %rsp
+
+	call	do_syscall_64
+
+	movq	current_ptregs, %rsp
+
+	POP_REGS
+
+	addq	$8, %rsp	/* skip orig_ax */
+	popq	%rcx		/* pt_regs->ip */
+	addq	$8, %rsp	/* skip cs */
+	addq	$8, %rsp	/* skip flags */
+	popq	%rsp
+
+	/*
+	* not return w/ ret but w/ jmp as the stack is already popped before
+	* entering __kernel_vsyscall
+	*/
+	jmp	*%rcx
+
+END(__kernel_vsyscall)
diff --git a/arch/x86/um/nommu/syscalls.h b/arch/x86/um/nommu/syscalls.h
new file mode 100644
index 000000000000..a2433756b1fc
--- /dev/null
+++ b/arch/x86/um/nommu/syscalls.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __UM_NOMMU_SYSCALLS_H
+#define __UM_NOMMU_SYSCALLS_H
+
+
+#define task_top_of_stack(task) \
+({									\
+	unsigned long __ptr = (unsigned long)task->stack;	\
+	__ptr += THREAD_SIZE;			\
+	__ptr;					\
+})
+
+extern long current_top_of_stack;
+extern long current_ptregs;
+
+#endif
diff --git a/arch/x86/um/shared/sysdep/syscalls_64.h b/arch/x86/um/shared/sysdep/syscalls_64.h
index b6b997225841..ffd80ee3b9dc 100644
--- a/arch/x86/um/shared/sysdep/syscalls_64.h
+++ b/arch/x86/um/shared/sysdep/syscalls_64.h
@@ -25,4 +25,10 @@ extern syscall_handler_t *sys_call_table[];
 extern syscall_handler_t sys_modify_ldt;
 extern syscall_handler_t sys_arch_prctl;
 
+#ifndef CONFIG_MMU
+extern void do_syscall_64(struct pt_regs *regs);
+extern long __kernel_vsyscall(int64_t a0, int64_t a1, int64_t a2, int64_t a3,
+			      int64_t a4, int64_t a5, int64_t a6);
+#endif
+
 #endif
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 06/13] um: nommu: seccomp syscalls hook
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (4 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 05/13] x86/um: nommu: syscall handling Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 07/13] x86/um: nommu: process/thread handling Hajime Tazaki
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett, Kenichi Yasukata

This commit adds syscall hook with seccomp.

Using seccomp raises SIGSYS to UML process, which is captured in the
(UML) kernel, then jumps to the syscall entry point, __kernel_vsyscall,
to hook the original syscall instructions.

The SIGSYS signal is raised upon the execution from uml_reserved and
high_physmem, which locates userspace memory.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Kenichi Yasukata <kenichi.yasukata@gmail.com>
---
 arch/um/include/shared/kern_util.h    |  8 +++
 arch/um/include/shared/os.h           | 10 +++
 arch/um/kernel/um_arch.c              |  3 +
 arch/um/nommu/Makefile                |  3 +
 arch/um/nommu/os-Linux/Makefile       |  7 +++
 arch/um/nommu/os-Linux/signal.c       | 15 +++++
 arch/um/os-Linux/Makefile             |  5 ++
 arch/um/os-Linux/seccomp.c            | 87 +++++++++++++++++++++++++++
 arch/um/os-Linux/signal.c             |  6 ++
 arch/x86/um/nommu/Makefile            |  2 +-
 arch/x86/um/nommu/os-Linux/Makefile   |  6 ++
 arch/x86/um/nommu/os-Linux/mcontext.c | 13 ++++
 arch/x86/um/shared/sysdep/mcontext.h  |  3 +
 13 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 arch/um/nommu/Makefile
 create mode 100644 arch/um/nommu/os-Linux/Makefile
 create mode 100644 arch/um/nommu/os-Linux/signal.c
 create mode 100644 arch/um/os-Linux/seccomp.c
 create mode 100644 arch/x86/um/nommu/os-Linux/Makefile
 create mode 100644 arch/x86/um/nommu/os-Linux/mcontext.c

diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h
index f21dc8517538..1dd3d02c4e39 100644
--- a/arch/um/include/shared/kern_util.h
+++ b/arch/um/include/shared/kern_util.h
@@ -67,4 +67,12 @@ void um_idle_sleep(void);
 
 void kasan_map_memory(void *start, size_t len);
 
+#ifdef CONFIG_MMU
+static inline void arch_sigsys_handler(int sig, struct siginfo *si, void *mc)
+{
+}
+#else
+extern void arch_sigsys_handler(int sig, struct siginfo *si, void *mc);
+#endif
+
 #endif
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 5babad8c5f75..d29b967a009e 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -334,4 +334,14 @@ extern void um_trace_signals_off(void);
 /* time-travel */
 extern void deliver_time_travel_irqs(void);
 
+/* seccomp.c */
+#ifdef CONFIG_MMU
+static inline int os_setup_seccomp(void)
+{
+	return 0;
+}
+#else
+extern int os_setup_seccomp(void);
+#endif
+
 #endif
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 79ea97d4797e..dc7efccbcb63 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -432,6 +432,9 @@ void __init setup_arch(char **cmdline_p)
 		add_bootloader_randomness(rng_seed, sizeof(rng_seed));
 		memzero_explicit(rng_seed, sizeof(rng_seed));
 	}
+
+	/* install seccomp filter */
+	os_setup_seccomp();
 }
 
 void __init arch_cpu_finalize_init(void)
diff --git a/arch/um/nommu/Makefile b/arch/um/nommu/Makefile
new file mode 100644
index 000000000000..baab7c2f57c2
--- /dev/null
+++ b/arch/um/nommu/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y := os-Linux/
diff --git a/arch/um/nommu/os-Linux/Makefile b/arch/um/nommu/os-Linux/Makefile
new file mode 100644
index 000000000000..68833c576437
--- /dev/null
+++ b/arch/um/nommu/os-Linux/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y := signal.o
+USER_OBJS := $(obj-y)
+
+include $(srctree)/arch/um/scripts/Makefile.rules
+USER_CFLAGS+=-I$(srctree)/arch/um/os-Linux
diff --git a/arch/um/nommu/os-Linux/signal.c b/arch/um/nommu/os-Linux/signal.c
new file mode 100644
index 000000000000..1674962e509d
--- /dev/null
+++ b/arch/um/nommu/os-Linux/signal.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <signal.h>
+#include <kern_util.h>
+#include <os.h>
+#include <sysdep/mcontext.h>
+#include <sys/ucontext.h>
+
+void arch_sigsys_handler(int sig, struct siginfo *si, void *ptr)
+{
+	mcontext_t *mc = (mcontext_t *) ptr;
+
+	/* hook syscall via SIGSYS */
+	set_mc_sigsys_hook(mc);
+}
diff --git a/arch/um/os-Linux/Makefile b/arch/um/os-Linux/Makefile
index 331564888400..6bfdfec8cbcc 100644
--- a/arch/um/os-Linux/Makefile
+++ b/arch/um/os-Linux/Makefile
@@ -21,4 +21,9 @@ USER_OBJS := $(user-objs-y) elf_aux.o execvp.o file.o helper.o irq.o \
 	main.o mem.o process.o registers.o sigio.o signal.o start_up.o time.o \
 	tty.o umid.o util.o
 
+ifneq ($(CONFIG_MMU),y)
+obj-y += seccomp.o
+USER_OBJS += seccomp.o
+endif
+
 include $(srctree)/arch/um/scripts/Makefile.rules
diff --git a/arch/um/os-Linux/seccomp.c b/arch/um/os-Linux/seccomp.c
new file mode 100644
index 000000000000..d1cfa6e3d632
--- /dev/null
+++ b/arch/um/os-Linux/seccomp.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <signal.h>
+#include <sys/prctl.h>
+#include <sys/syscall.h>   /* For SYS_xxx definitions */
+#include <init.h>
+#include <as-layout.h>
+#include <os.h>
+#include <linux/filter.h>
+#include <linux/seccomp.h>
+
+int __init os_setup_seccomp(void)
+{
+	int err;
+	unsigned long __userspace_start = uml_reserved,
+		__userspace_end = high_physmem;
+
+	struct sock_filter filter[] = {
+		/* if (IP_high > __userspace_end) allow; */
+		BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+			 offsetof(struct seccomp_data, instruction_pointer) + 4),
+		BPF_JUMP(BPF_JMP + BPF_JGT + BPF_K, __userspace_end >> 32,
+			 /*true-skip=*/0, /*false-skip=*/1),
+		BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
+
+		/* if (IP_high == __userspace_end && IP_low >= __userspace_end) allow; */
+		BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+			 offsetof(struct seccomp_data, instruction_pointer) + 4),
+		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __userspace_end >> 32,
+			 /*true-skip=*/0, /*false-skip=*/3),
+		BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+			 offsetof(struct seccomp_data, instruction_pointer)),
+		BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, __userspace_end,
+			 /*true-skip=*/0, /*false-skip=*/1),
+		BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
+
+		/* if (IP_high < __userspace_start) allow; */
+		BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+			 offsetof(struct seccomp_data, instruction_pointer) + 4),
+		BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, __userspace_start >> 32,
+			 /*true-skip=*/1, /*false-skip=*/0),
+		BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
+
+		/* if (IP_high == __userspace_start && IP_low < __userspace_start) allow; */
+		BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+			 offsetof(struct seccomp_data, instruction_pointer) + 4),
+		BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __userspace_start >> 32,
+			 /*true-skip=*/0, /*false-skip=*/3),
+		BPF_STMT(BPF_LD + BPF_W + BPF_ABS,
+			 offsetof(struct seccomp_data, instruction_pointer)),
+		BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, __userspace_start,
+			 /*true-skip=*/1, /*false-skip=*/0),
+		BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
+
+		/* other address; trap  */
+		BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRAP),
+	};
+	struct sock_fprog prog = {
+		.len = ARRAY_SIZE(filter),
+		.filter = filter,
+	};
+
+	err = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	if (err)
+		os_warn("PR_SET_NO_NEW_PRIVS (err=%d, ernro=%d)\n",
+		       err, errno);
+
+	err = syscall(SYS_seccomp, SECCOMP_SET_MODE_FILTER,
+		      SECCOMP_FILTER_FLAG_TSYNC, &prog);
+	if (err) {
+		os_warn("SECCOMP_SET_MODE_FILTER (err=%d, ernro=%d)\n",
+		       err, errno);
+		exit(1);
+	}
+
+	set_handler(SIGSYS);
+
+	os_info("seccomp: setup filter syscalls in the range: 0x%lx-0x%lx\n",
+		__userspace_start, __userspace_end);
+
+	return 0;
+}
+
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index 9ea7269ffb77..97efdabe1e45 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -172,12 +172,18 @@ void register_pm_wake_signal(void)
 	set_handler(SIGUSR1);
 }
 
+static void sigsys_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
+{
+	arch_sigsys_handler(sig, unused_si, mc);
+}
+
 static void (*handlers[_NSIG])(int sig, struct siginfo *si, mcontext_t *mc) = {
 	[SIGSEGV] = sig_handler,
 	[SIGBUS] = sig_handler,
 	[SIGILL] = sig_handler,
 	[SIGFPE] = sig_handler,
 	[SIGTRAP] = sig_handler,
+	[SIGSYS] = sigsys_handler,
 
 	[SIGIO] = sig_handler,
 	[SIGWINCH] = sig_handler,
diff --git a/arch/x86/um/nommu/Makefile b/arch/x86/um/nommu/Makefile
index d72c63afffa5..ebe47d4836f4 100644
--- a/arch/x86/um/nommu/Makefile
+++ b/arch/x86/um/nommu/Makefile
@@ -5,4 +5,4 @@ else
 	BITS := 64
 endif
 
-obj-y = do_syscall_$(BITS).o entry_$(BITS).o
+obj-y = do_syscall_$(BITS).o entry_$(BITS).o os-Linux/
diff --git a/arch/x86/um/nommu/os-Linux/Makefile b/arch/x86/um/nommu/os-Linux/Makefile
new file mode 100644
index 000000000000..4571e403a6ff
--- /dev/null
+++ b/arch/x86/um/nommu/os-Linux/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y = mcontext.o
+USER_OBJS := mcontext.o
+
+include $(srctree)/arch/um/scripts/Makefile.rules
diff --git a/arch/x86/um/nommu/os-Linux/mcontext.c b/arch/x86/um/nommu/os-Linux/mcontext.c
new file mode 100644
index 000000000000..c4ef877d5ea0
--- /dev/null
+++ b/arch/x86/um/nommu/os-Linux/mcontext.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sys/ucontext.h>
+#define __FRAME_OFFSETS
+#include <asm/ptrace.h>
+#include <sysdep/ptrace.h>
+#include <sysdep/mcontext.h>
+#include <sysdep/syscalls.h>
+
+void set_mc_sigsys_hook(mcontext_t *mc)
+{
+	mc->gregs[REG_RCX] = mc->gregs[REG_RIP];
+	mc->gregs[REG_RIP] = (unsigned long) __kernel_vsyscall;
+}
diff --git a/arch/x86/um/shared/sysdep/mcontext.h b/arch/x86/um/shared/sysdep/mcontext.h
index b724c54da316..15e9ff287021 100644
--- a/arch/x86/um/shared/sysdep/mcontext.h
+++ b/arch/x86/um/shared/sysdep/mcontext.h
@@ -7,6 +7,9 @@
 #define __SYS_SIGCONTEXT_X86_H
 
 extern void get_regs_from_mc(struct uml_pt_regs *, mcontext_t *);
+#ifndef CONFIG_MMU
+extern void set_mc_sigsys_hook(mcontext_t *mc);
+#endif
 
 #ifdef __i386__
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 07/13] x86/um: nommu: process/thread handling
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (5 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 06/13] um: nommu: seccomp syscalls hook Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 08/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

Since ptrace facility isn't used under !MMU of UML, there is different
code path to invoke processes/threads; there are no external process
used, and need to properly configure some of registers (fs segment
register for TLS, etc) on every context switch, etc.

Signals aren't delivered in non-ptrace syscall entry/leave so, we also
need to handle pending signal by ourselves.

ptrace related syscalls are not tested yet so, marked
arch_has_single_step() unsupported in !MMU environment.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arch/um/include/asm/ptrace-generic.h |  2 +-
 arch/um/os-Linux/process.c           |  6 ++++
 arch/x86/um/Makefile                 |  3 +-
 arch/x86/um/nommu/Makefile           |  2 +-
 arch/x86/um/nommu/entry_64.S         | 22 ++++++++++++++
 arch/x86/um/nommu/syscalls_64.c      | 44 ++++++++++++++++++++++++++++
 6 files changed, 76 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/um/nommu/syscalls_64.c

diff --git a/arch/um/include/asm/ptrace-generic.h b/arch/um/include/asm/ptrace-generic.h
index 4ff844bcb1cd..a9778c9a59a3 100644
--- a/arch/um/include/asm/ptrace-generic.h
+++ b/arch/um/include/asm/ptrace-generic.h
@@ -14,7 +14,7 @@ struct pt_regs {
 	struct uml_pt_regs regs;
 };
 
-#define arch_has_single_step()	(1)
+#define arch_has_single_step()	(IS_ENABLED(CONFIG_MMU))
 
 #define EMPTY_REGS { .regs = EMPTY_UML_PT_REGS }
 
diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c
index e2dc00fc84c0..49aa8e92205e 100644
--- a/arch/um/os-Linux/process.c
+++ b/arch/um/os-Linux/process.c
@@ -28,6 +28,9 @@ int unscheduled_userspace_iterations;
 
 void os_alarm_process(int pid)
 {
+	if (pid <= 0)
+		return;
+
 	kill(pid, SIGALRM);
 }
 
@@ -45,6 +48,9 @@ void os_kill_process(int pid, int reap_child)
 
 void os_kill_ptraced_process(int pid, int reap_child)
 {
+	if (pid <= 0)
+		return;
+
 	kill(pid, SIGKILL);
 	ptrace(PTRACE_KILL, pid);
 	ptrace(PTRACE_CONT, pid);
diff --git a/arch/x86/um/Makefile b/arch/x86/um/Makefile
index 227af2a987e2..53c9ebb3c41c 100644
--- a/arch/x86/um/Makefile
+++ b/arch/x86/um/Makefile
@@ -27,7 +27,8 @@ subarch-y += ../kernel/sys_ia32.o
 
 else
 
-obj-y += syscalls_64.o vdso/
+obj-y += vdso/
+obj-$(CONFIG_MMU) += syscalls_64.o
 
 subarch-y = ../lib/csum-partial_64.o ../lib/memcpy_64.o \
 	../lib/memmove_64.o ../lib/memset_64.o
diff --git a/arch/x86/um/nommu/Makefile b/arch/x86/um/nommu/Makefile
index ebe47d4836f4..4018d9e0aba0 100644
--- a/arch/x86/um/nommu/Makefile
+++ b/arch/x86/um/nommu/Makefile
@@ -5,4 +5,4 @@ else
 	BITS := 64
 endif
 
-obj-y = do_syscall_$(BITS).o entry_$(BITS).o os-Linux/
+obj-y = do_syscall_$(BITS).o entry_$(BITS).o syscalls_$(BITS).o os-Linux/
diff --git a/arch/x86/um/nommu/entry_64.S b/arch/x86/um/nommu/entry_64.S
index e9bfc7b93c84..950447dfa66b 100644
--- a/arch/x86/um/nommu/entry_64.S
+++ b/arch/x86/um/nommu/entry_64.S
@@ -89,3 +89,25 @@ ENTRY(__kernel_vsyscall)
 	jmp	*%rcx
 
 END(__kernel_vsyscall)
+
+// void userspace(struct uml_pt_regs *regs)
+ENTRY(userspace)
+
+	/* align the stack for x86_64 ABI */
+	and     $-0x10, %rsp
+	/* Handle any immediate reschedules or signals */
+	call	interrupt_end
+
+	movq	current_ptregs, %rsp
+
+	POP_REGS
+
+	addq	$8, %rsp	/* skip orig_ax */
+	popq	%r11		/* pt_regs->ip */
+	addq	$8, %rsp	/* skip cs */
+	addq	$8, %rsp	/* skip flags */
+	popq	%rsp
+
+	jmp	*%r11
+
+END(userspace)
diff --git a/arch/x86/um/nommu/syscalls_64.c b/arch/x86/um/nommu/syscalls_64.c
new file mode 100644
index 000000000000..c78c442aed1d
--- /dev/null
+++ b/arch/x86/um/nommu/syscalls_64.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2003 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Copyright 2003 PathScale, Inc.
+ *
+ * Licensed under the GPL
+ */
+
+#include <linux/sched.h>
+#include <linux/sched/mm.h>
+#include <linux/syscalls.h>
+#include <linux/uaccess.h>
+#include <asm/prctl.h> /* XXX This should get the constants from libc */
+#include <registers.h>
+#include <os.h>
+#include "syscalls.h"
+
+void arch_switch_to(struct task_struct *to)
+{
+	/*
+	 * In !CONFIG_MMU, it doesn't ptrace thus,
+	 * The FS_BASE/GS_BASE registers are saved here.
+	 */
+	current_top_of_stack = task_top_of_stack(to);
+	current_ptregs = (long)task_pt_regs(to);
+
+	if ((to->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)] == 0) ||
+	    (to->mm == NULL))
+		return;
+
+	/* this changes the FS on every context switch */
+	arch_prctl(to, ARCH_SET_FS,
+		   (void __user *) to->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)]);
+}
+
+SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
+		unsigned long, prot, unsigned long, flags,
+		unsigned long, fd, unsigned long, off)
+{
+	if (off & ~PAGE_MASK)
+		return -EINVAL;
+
+	return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 08/13] um: nommu: configure fs register on host syscall invocation
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (6 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 07/13] x86/um: nommu: process/thread handling Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 09/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

As userspace on UML/!MMU also need to configure %fs register when it is
running to correctly access thread structure, host syscalls implemented
in os-Linux drivers may be puzzled when they are called.  Thus it has to
configure %fs register via arch_prctl(SET_FS) on every host syscalls.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arch/um/include/shared/os.h       |  6 +++
 arch/um/os-Linux/process.c        |  6 +++
 arch/um/os-Linux/start_up.c       | 20 +++++++++
 arch/x86/um/nommu/do_syscall_64.c | 37 ++++++++++++++++
 arch/x86/um/nommu/syscalls_64.c   | 71 +++++++++++++++++++++++++++++++
 5 files changed, 140 insertions(+)

diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index d29b967a009e..abb6f03414c5 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -190,6 +190,7 @@ extern void check_host_supports_tls(int *supports_tls, int *tls_min);
 extern void get_host_cpu_features(
 	void (*flags_helper_func)(char *line),
 	void (*cache_helper_func)(char *line));
+extern int host_has_fsgsbase;
 
 /* mem.c */
 extern int create_mem_file(unsigned long long len);
@@ -214,6 +215,11 @@ extern int os_unmap_memory(void *addr, int len);
 extern int os_drop_memory(void *addr, int length);
 extern int can_drop_memory(void);
 extern int os_mincore(void *addr, unsigned long len);
+extern int os_arch_prctl(int pid, int option, unsigned long *arg);
+#ifndef CONFIG_MMU
+extern long long host_fs;
+#endif
+
 
 void os_set_pdeathsig(void);
 
diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c
index 49aa8e92205e..9acb865e2399 100644
--- a/arch/um/os-Linux/process.c
+++ b/arch/um/os-Linux/process.c
@@ -16,6 +16,7 @@
 #include <sys/prctl.h>
 #include <sys/wait.h>
 #include <asm/unistd.h>
+#include <sys/syscall.h>   /* For SYS_xxx definitions */
 #include <linux/threads.h>
 #include <init.h>
 #include <longjmp.h>
@@ -206,6 +207,11 @@ int os_mincore(void *addr, unsigned long len)
 	return ret;
 }
 
+int os_arch_prctl(int pid, int option, unsigned long *arg2)
+{
+	return syscall(SYS_arch_prctl, option, arg2);
+}
+
 void init_new_thread_signals(void)
 {
 	set_handler(SIGSEGV);
diff --git a/arch/um/os-Linux/start_up.c b/arch/um/os-Linux/start_up.c
index 93fc82c01aba..dbab091892b3 100644
--- a/arch/um/os-Linux/start_up.c
+++ b/arch/um/os-Linux/start_up.c
@@ -19,6 +19,8 @@
 #include <sys/resource.h>
 #include <asm/ldt.h>
 #include <asm/unistd.h>
+#include <sys/auxv.h>
+#include <asm/hwcap2.h>
 #include <init.h>
 #include <os.h>
 #include <kern_util.h>
@@ -28,6 +30,8 @@
 #include <skas.h>
 #include "internal.h"
 
+int host_has_fsgsbase;
+
 static void ptrace_child(void)
 {
 	int ret;
@@ -278,6 +282,19 @@ void  __init get_host_cpu_features(
 	}
 }
 
+static void __init check_fsgsbase(void)
+{
+	unsigned long auxv = getauxval(AT_HWCAP2);
+
+	os_info("Checking FSGSBASE instructions...");
+	if (auxv & HWCAP2_FSGSBASE) {
+		host_has_fsgsbase = 1;
+		os_info("OK\n");
+	} else {
+		host_has_fsgsbase = 0;
+		os_info("disabled\n");
+	}
+}
 
 void __init os_early_checks(void)
 {
@@ -293,6 +310,9 @@ void __init os_early_checks(void)
 	 */
 	check_tmpexec();
 
+	/* probe fsgsbase instruction */
+	check_fsgsbase();
+
 	pid = start_ptraced_child();
 	if (init_pid_registers(pid))
 		fatal("Failed to initialize default registers");
diff --git a/arch/x86/um/nommu/do_syscall_64.c b/arch/x86/um/nommu/do_syscall_64.c
index 5d0fa83e7fdc..796beb0089fc 100644
--- a/arch/x86/um/nommu/do_syscall_64.c
+++ b/arch/x86/um/nommu/do_syscall_64.c
@@ -2,10 +2,38 @@
 
 #include <linux/kernel.h>
 #include <linux/ptrace.h>
+#include <asm/fsgsbase.h>
+#include <asm/prctl.h>
 #include <kern_util.h>
 #include <sysdep/syscalls.h>
 #include <os.h>
 
+static int os_x86_arch_prctl(int pid, int option, unsigned long *arg2)
+{
+	if (!host_has_fsgsbase)
+		return os_arch_prctl(pid, option, arg2);
+
+	switch (option) {
+	case ARCH_SET_FS:
+		wrfsbase(*arg2);
+		break;
+	case ARCH_SET_GS:
+		wrgsbase(*arg2);
+		break;
+	case ARCH_GET_FS:
+		*arg2 = rdfsbase();
+		break;
+	case ARCH_GET_GS:
+		*arg2 = rdgsbase();
+		break;
+	default:
+		pr_warn("%s: unsupported option: 0x%x", __func__, option);
+		break;
+	}
+
+	return 0;
+}
+
 __visible void do_syscall_64(struct pt_regs *regs)
 {
 	int syscall;
@@ -17,6 +45,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
 		 syscall, (unsigned long)current,
 		 (unsigned long)sys_call_table[syscall]);
 
+	/* set fs register to the original host one */
+	os_x86_arch_prctl(0, ARCH_SET_FS, (void *)host_fs);
+
 	if (likely(syscall < NR_syscalls)) {
 		PT_REGS_SET_SYSCALL_RETURN(regs,
 				EXECUTE_SYSCALL(syscall, regs));
@@ -34,4 +65,10 @@ __visible void do_syscall_64(struct pt_regs *regs)
 	/* force do_signal() --> is_syscall() */
 	set_thread_flag(TIF_SIGPENDING);
 	interrupt_end();
+
+	/* restore back fs register to userspace configured one */
+	os_x86_arch_prctl(0, ARCH_SET_FS,
+		      (void *)(current->thread.regs.regs.gp[FS_BASE
+						     / sizeof(unsigned long)]));
+
 }
diff --git a/arch/x86/um/nommu/syscalls_64.c b/arch/x86/um/nommu/syscalls_64.c
index c78c442aed1d..5bb6d55b4bb5 100644
--- a/arch/x86/um/nommu/syscalls_64.c
+++ b/arch/x86/um/nommu/syscalls_64.c
@@ -13,8 +13,70 @@
 #include <asm/prctl.h> /* XXX This should get the constants from libc */
 #include <registers.h>
 #include <os.h>
+#include <asm/thread_info.h>
+#include <asm/mman.h>
 #include "syscalls.h"
 
+/*
+ * The guest libc can change FS, which confuses the host libc.
+ * In fact, changing FS directly is not supported (check
+ * man arch_prctl). So, whenever we make a host syscall,
+ * we should be changing FS to the original FS (not the
+ * one set by the guest libc). This original FS is stored
+ * in host_fs.
+ */
+long long host_fs = -1;
+
+long arch_prctl(struct task_struct *task, int option,
+		unsigned long __user *arg2)
+{
+	long ret = -EINVAL;
+	unsigned long *ptr = arg2, tmp;
+
+	switch (option) {
+	case ARCH_SET_FS:
+		if (host_fs == -1)
+			os_arch_prctl(0, ARCH_GET_FS, (void *)&host_fs);
+		ret = 0;
+		break;
+	case ARCH_SET_GS:
+		ret = 0;
+		break;
+	case ARCH_GET_FS:
+	case ARCH_GET_GS:
+		ptr = &tmp;
+		break;
+	}
+
+	ret = os_arch_prctl(0, option, ptr);
+	if (ret)
+		return ret;
+
+	switch (option) {
+	case ARCH_SET_FS:
+		current->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)] =
+			(unsigned long) arg2;
+		break;
+	case ARCH_SET_GS:
+		current->thread.regs.regs.gp[GS_BASE / sizeof(unsigned long)] =
+			(unsigned long) arg2;
+		break;
+	case ARCH_GET_FS:
+		ret = put_user(current->thread.regs.regs.gp[FS_BASE / sizeof(unsigned long)], arg2);
+		break;
+	case ARCH_GET_GS:
+		ret = put_user(current->thread.regs.regs.gp[GS_BASE / sizeof(unsigned long)], arg2);
+		break;
+	}
+
+	return ret;
+}
+
+SYSCALL_DEFINE2(arch_prctl, int, option, unsigned long, arg2)
+{
+	return arch_prctl(current, option, (unsigned long __user *) arg2);
+}
+
 void arch_switch_to(struct task_struct *to)
 {
 	/*
@@ -42,3 +104,12 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 
 	return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
 }
+
+static int __init um_nommu_setup_hostfs(void)
+{
+	/* initialize the host_fs value at boottime */
+	os_arch_prctl(0, ARCH_GET_FS, (void *)&host_fs);
+
+	return 0;
+}
+arch_initcall(um_nommu_setup_hostfs);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 09/13] x86/um/vdso: nommu: vdso memory update
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (7 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 08/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 10/13] x86/um: nommu: signal handling Hajime Tazaki
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

On !MMU mode, the address of vdso is accessible from userspace.  This
commit implements the entry point by pointing a block of page address.

This commit also add memory permission configuration of vdso page to be
executable.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arch/x86/um/vdso/vma.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c
index f238f7b33cdd..2dc38cd77f23 100644
--- a/arch/x86/um/vdso/vma.c
+++ b/arch/x86/um/vdso/vma.c
@@ -9,6 +9,7 @@
 #include <asm/page.h>
 #include <asm/elf.h>
 #include <linux/init.h>
+#include <os.h>
 
 static unsigned int __read_mostly vdso_enabled = 1;
 unsigned long um_vdso_addr;
@@ -24,8 +25,6 @@ static int __init init_vdso(void)
 
 	BUG_ON(vdso_end - vdso_start > PAGE_SIZE);
 
-	um_vdso_addr = task_size - PAGE_SIZE;
-
 	vdsop = kmalloc(sizeof(struct page *), GFP_KERNEL);
 	if (!vdsop)
 		goto oom;
@@ -40,6 +39,18 @@ static int __init init_vdso(void)
 	copy_page(page_address(um_vdso), vdso_start);
 	*vdsop = um_vdso;
 
+#ifdef CONFIG_MMU
+	um_vdso_addr = task_size - PAGE_SIZE;
+#else
+	/* this is fine with NOMMU as everything is accessible */
+	um_vdso_addr = (unsigned long)page_address(um_vdso);
+	os_protect_memory((void *)um_vdso_addr, vdso_end - vdso_start, 1, 0, 1);
+#endif
+
+	pr_info("vdso_start=%lx um_vdso_addr=%lx pg_um_vdso=%lx",
+	       (unsigned long)vdso_start, um_vdso_addr,
+	       (unsigned long)page_address(um_vdso));
+
 	return 0;
 
 oom:
@@ -50,6 +61,7 @@ static int __init init_vdso(void)
 }
 subsys_initcall(init_vdso);
 
+#ifdef CONFIG_MMU
 int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 {
 	struct vm_area_struct *vma;
@@ -74,3 +86,4 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 
 	return IS_ERR(vma) ? PTR_ERR(vma) : 0;
 }
+#endif
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 10/13] x86/um: nommu: signal handling
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (8 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 09/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-06-19 11:43   ` Benjamin Berg
  2025-01-20  6:00 ` [PATCH v7 11/13] um: change machine name for uname output Hajime Tazaki
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

This commit updates the behavior of signal handling under !MMU
environment. It adds the alignment code for signal frame as the frame
is used in userspace as-is.

floating point register is carefully handling upon entry/leave of
syscall routine so that signal handlers can read/write the contents of
the register.

It also adds the follow up routine for SIGSEGV as a signal delivery runs
in the same stack frame while we have to avoid endless SIGSEGV.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
 arch/um/include/shared/kern_util.h    |   4 +
 arch/um/nommu/Makefile                |   2 +-
 arch/um/nommu/os-Linux/signal.c       |  13 ++
 arch/um/nommu/trap.c                  | 188 ++++++++++++++++++++++++++
 arch/um/os-Linux/signal.c             |  25 +++-
 arch/x86/um/nommu/do_syscall_64.c     |   6 +
 arch/x86/um/nommu/os-Linux/mcontext.c |  11 ++
 arch/x86/um/shared/sysdep/mcontext.h  |   1 +
 arch/x86/um/shared/sysdep/ptrace.h    |   2 +-
 9 files changed, 243 insertions(+), 9 deletions(-)
 create mode 100644 arch/um/nommu/trap.c

diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h
index 1dd3d02c4e39..3569ca1603a8 100644
--- a/arch/um/include/shared/kern_util.h
+++ b/arch/um/include/shared/kern_util.h
@@ -71,8 +71,12 @@ void kasan_map_memory(void *start, size_t len);
 static inline void arch_sigsys_handler(int sig, struct siginfo *si, void *mc)
 {
 }
+static inline void arch_sigsegv_handler(int sig, struct siginfo *unused_si, void *mc)
+{
+}
 #else
 extern void arch_sigsys_handler(int sig, struct siginfo *si, void *mc);
+extern void arch_sigsegv_handler(int sig, struct siginfo *si, void *mc);
 #endif
 
 #endif
diff --git a/arch/um/nommu/Makefile b/arch/um/nommu/Makefile
index baab7c2f57c2..096221590cfd 100644
--- a/arch/um/nommu/Makefile
+++ b/arch/um/nommu/Makefile
@@ -1,3 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0
 
-obj-y := os-Linux/
+obj-y := trap.o os-Linux/
diff --git a/arch/um/nommu/os-Linux/signal.c b/arch/um/nommu/os-Linux/signal.c
index 1674962e509d..b9e3070f2c36 100644
--- a/arch/um/nommu/os-Linux/signal.c
+++ b/arch/um/nommu/os-Linux/signal.c
@@ -5,6 +5,7 @@
 #include <os.h>
 #include <sysdep/mcontext.h>
 #include <sys/ucontext.h>
+#include <as-layout.h>
 
 void arch_sigsys_handler(int sig, struct siginfo *si, void *ptr)
 {
@@ -13,3 +14,15 @@ void arch_sigsys_handler(int sig, struct siginfo *si, void *ptr)
 	/* hook syscall via SIGSYS */
 	set_mc_sigsys_hook(mc);
 }
+
+void arch_sigsegv_handler(int sig, struct siginfo *si, void *ptr)
+{
+	mcontext_t *mc = (mcontext_t *) ptr;
+
+	/* !MMU specific part; detection of userspace */
+	if (mc->gregs[REG_RIP] > uml_reserved &&
+	    mc->gregs[REG_RIP] < high_physmem) {
+		/* !MMU: force handle signals after rt_sigreturn() */
+		set_mc_userspace_relay_signal(mc);
+	}
+}
diff --git a/arch/um/nommu/trap.c b/arch/um/nommu/trap.c
new file mode 100644
index 000000000000..cda71ecb1115
--- /dev/null
+++ b/arch/um/nommu/trap.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/hardirq.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+#include <linux/sched/debug.h>
+#include <asm/current.h>
+#include <asm/tlbflush.h>
+#include <arch.h>
+#include <as-layout.h>
+#include <kern_util.h>
+#include <os.h>
+#include <skas.h>
+
+/*
+ * Note this is constrained to return 0, -EFAULT, -EACCES, -ENOMEM by
+ * segv().
+ */
+int handle_page_fault(unsigned long address, unsigned long ip,
+		      int is_write, int is_user, int *code_out)
+{
+	/* !MMU has no pagefault */
+	return -EFAULT;
+}
+
+static void show_segv_info(struct uml_pt_regs *regs)
+{
+	struct task_struct *tsk = current;
+	struct faultinfo *fi = UPT_FAULTINFO(regs);
+
+	if (!unhandled_signal(tsk, SIGSEGV))
+		return;
+
+	pr_warn_ratelimited("%s%s[%d]: segfault at %lx ip %p sp %p error %x",
+			    task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
+			    tsk->comm, task_pid_nr(tsk), FAULT_ADDRESS(*fi),
+			    (void *)UPT_IP(regs), (void *)UPT_SP(regs),
+			    fi->error_code);
+}
+
+static void bad_segv(struct faultinfo fi, unsigned long ip)
+{
+	current->thread.arch.faultinfo = fi;
+	force_sig_fault(SIGSEGV, SEGV_ACCERR, (void __user *) FAULT_ADDRESS(fi));
+}
+
+void fatal_sigsegv(void)
+{
+	force_fatal_sig(SIGSEGV);
+	do_signal(&current->thread.regs);
+	/*
+	 * This is to tell gcc that we're not returning - do_signal
+	 * can, in general, return, but in this case, it's not, since
+	 * we just got a fatal SIGSEGV queued.
+	 */
+	os_dump_core();
+}
+
+/**
+ * segv_handler() - the SIGSEGV handler
+ * @sig:	the signal number
+ * @unused_si:	the signal info struct; unused in this handler
+ * @regs:	the ptrace register information
+ *
+ * The handler first extracts the faultinfo from the UML ptrace regs struct.
+ * If the userfault did not happen in an UML userspace process, bad_segv is called.
+ * Otherwise the signal did happen in a cloned userspace process, handle it.
+ */
+void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
+{
+	struct faultinfo *fi = UPT_FAULTINFO(regs);
+
+	/* !MMU specific part; detection of userspace */
+	/* mark is_user=1 when the IP is from userspace code. */
+	if (UPT_IP(regs) > uml_reserved && UPT_IP(regs) < high_physmem)
+		regs->is_user = 1;
+
+	if (UPT_IS_USER(regs) && !SEGV_IS_FIXABLE(fi)) {
+		show_segv_info(regs);
+		bad_segv(*fi, UPT_IP(regs));
+		return;
+	}
+	segv(*fi, UPT_IP(regs), UPT_IS_USER(regs), regs);
+}
+
+/*
+ * We give a *copy* of the faultinfo in the regs to segv.
+ * This must be done, since nesting SEGVs could overwrite
+ * the info in the regs. A pointer to the info then would
+ * give us bad data!
+ */
+unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
+		   struct uml_pt_regs *regs)
+{
+	int si_code;
+	int err;
+	int is_write = FAULT_WRITE(fi);
+	unsigned long address = FAULT_ADDRESS(fi);
+
+	if (!is_user && regs)
+		current->thread.segv_regs = container_of(regs, struct pt_regs, regs);
+
+	if (current->mm == NULL) {
+		show_regs(container_of(regs, struct pt_regs, regs));
+		panic("Segfault with no mm");
+	} else if (!is_user && address > PAGE_SIZE && address < TASK_SIZE) {
+		show_regs(container_of(regs, struct pt_regs, regs));
+		panic("Kernel tried to access user memory at addr 0x%lx, ip 0x%lx",
+		       address, ip);
+	}
+
+	if (SEGV_IS_FIXABLE(&fi))
+		err = handle_page_fault(address, ip, is_write, is_user,
+					&si_code);
+	else {
+		err = -EFAULT;
+		/*
+		 * A thread accessed NULL, we get a fault, but CR2 is invalid.
+		 * This code is used in __do_copy_from_user() of TT mode.
+		 * XXX tt mode is gone, so maybe this isn't needed any more
+		 */
+		address = 0;
+	}
+
+	if (!err)
+		goto out;
+	else if (!is_user && arch_fixup(ip, regs))
+		goto out;
+
+	if (!is_user) {
+		show_regs(container_of(regs, struct pt_regs, regs));
+		panic("Kernel mode fault at addr 0x%lx, ip 0x%lx",
+		      address, ip);
+	}
+
+	show_segv_info(regs);
+
+	if (err == -EACCES) {
+		current->thread.arch.faultinfo = fi;
+		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
+	} else {
+		WARN_ON_ONCE(err != -EFAULT);
+		current->thread.arch.faultinfo = fi;
+		force_sig_fault(SIGSEGV, si_code, (void __user *) address);
+	}
+
+out:
+	if (regs)
+		current->thread.segv_regs = NULL;
+
+	return 0;
+}
+
+void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs)
+{
+	int code, err;
+
+	if (!UPT_IS_USER(regs)) {
+		if (sig == SIGBUS)
+			pr_err("Bus error - the host /dev/shm or /tmp mount likely just ran out of space\n");
+		panic("Kernel mode signal %d", sig);
+	}
+
+	arch_examine_signal(sig, regs);
+
+	/* Is the signal layout for the signal known?
+	 * Signal data must be scrubbed to prevent information leaks.
+	 */
+	code = si->si_code;
+	err = si->si_errno;
+	if ((err == 0) && (siginfo_layout(sig, code) == SIL_FAULT)) {
+		struct faultinfo *fi = UPT_FAULTINFO(regs);
+
+		current->thread.arch.faultinfo = *fi;
+		force_sig_fault(sig, code, (void __user *)FAULT_ADDRESS(*fi));
+	} else {
+		pr_err("Attempted to relay unknown signal %d (si_code = %d) with errno %d\n",
+		       sig, code, err);
+		force_sig(sig);
+	}
+}
+
+void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
+{
+	do_IRQ(WINCH_IRQ, regs);
+}
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index 97efdabe1e45..3997d827bf4c 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -37,12 +37,6 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
 	int save_errno = errno;
 
 	r.is_user = 0;
-	if (sig == SIGSEGV) {
-		/* For segfaults, we want the data from the sigcontext. */
-		get_regs_from_mc(&r, mc);
-		GET_FAULTINFO_FROM_MC(r.faultinfo, mc);
-	}
-
 	/* enable signals if sig isn't IRQ signal */
 	if ((sig != SIGIO) && (sig != SIGWINCH))
 		unblock_signals_trace();
@@ -177,8 +171,25 @@ static void sigsys_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
 	arch_sigsys_handler(sig, unused_si, mc);
 }
 
+static void sigsegv_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
+{
+	struct uml_pt_regs r;
+	int save_errno = errno;
+
+	r.is_user = 0;
+
+	/* For segfaults, we want the data from the sigcontext. */
+	get_regs_from_mc(&r, mc);
+	GET_FAULTINFO_FROM_MC(r.faultinfo, mc);
+
+	segv_handler(sig, unused_si, &r);
+	arch_sigsegv_handler(sig, unused_si, mc);
+
+	errno = save_errno;
+}
+
 static void (*handlers[_NSIG])(int sig, struct siginfo *si, mcontext_t *mc) = {
-	[SIGSEGV] = sig_handler,
+	[SIGSEGV] = sigsegv_handler,
 	[SIGBUS] = sig_handler,
 	[SIGILL] = sig_handler,
 	[SIGFPE] = sig_handler,
diff --git a/arch/x86/um/nommu/do_syscall_64.c b/arch/x86/um/nommu/do_syscall_64.c
index 796beb0089fc..48b3d29e2db1 100644
--- a/arch/x86/um/nommu/do_syscall_64.c
+++ b/arch/x86/um/nommu/do_syscall_64.c
@@ -48,6 +48,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
 	/* set fs register to the original host one */
 	os_x86_arch_prctl(0, ARCH_SET_FS, (void *)host_fs);
 
+	/* save fp registers */
+	asm volatile("fxsaveq %0" : "=m"(*(struct _xstate *)regs->regs.fp));
+
 	if (likely(syscall < NR_syscalls)) {
 		PT_REGS_SET_SYSCALL_RETURN(regs,
 				EXECUTE_SYSCALL(syscall, regs));
@@ -66,6 +69,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
 	set_thread_flag(TIF_SIGPENDING);
 	interrupt_end();
 
+	/* restore fp registers */
+	asm volatile("fxrstorq %0" : : "m"((current->thread.regs.regs.fp)));
+
 	/* restore back fs register to userspace configured one */
 	os_x86_arch_prctl(0, ARCH_SET_FS,
 		      (void *)(current->thread.regs.regs.gp[FS_BASE
diff --git a/arch/x86/um/nommu/os-Linux/mcontext.c b/arch/x86/um/nommu/os-Linux/mcontext.c
index c4ef877d5ea0..955e7d9f4765 100644
--- a/arch/x86/um/nommu/os-Linux/mcontext.c
+++ b/arch/x86/um/nommu/os-Linux/mcontext.c
@@ -6,6 +6,17 @@
 #include <sysdep/mcontext.h>
 #include <sysdep/syscalls.h>
 
+static void __userspace_relay_signal(void)
+{
+	/* XXX: dummy syscall */
+	__asm__ volatile("call *%0" : : "r"(__kernel_vsyscall), "a"(39) :);
+}
+
+void set_mc_userspace_relay_signal(mcontext_t *mc)
+{
+	mc->gregs[REG_RIP] = (unsigned long) __userspace_relay_signal;
+}
+
 void set_mc_sigsys_hook(mcontext_t *mc)
 {
 	mc->gregs[REG_RCX] = mc->gregs[REG_RIP];
diff --git a/arch/x86/um/shared/sysdep/mcontext.h b/arch/x86/um/shared/sysdep/mcontext.h
index 15e9ff287021..0efe5ddf95af 100644
--- a/arch/x86/um/shared/sysdep/mcontext.h
+++ b/arch/x86/um/shared/sysdep/mcontext.h
@@ -9,6 +9,7 @@
 extern void get_regs_from_mc(struct uml_pt_regs *, mcontext_t *);
 #ifndef CONFIG_MMU
 extern void set_mc_sigsys_hook(mcontext_t *mc);
+extern void set_mc_userspace_relay_signal(mcontext_t *mc);
 #endif
 
 #ifdef __i386__
diff --git a/arch/x86/um/shared/sysdep/ptrace.h b/arch/x86/um/shared/sysdep/ptrace.h
index 8f7476ff6e95..7d553d9f05be 100644
--- a/arch/x86/um/shared/sysdep/ptrace.h
+++ b/arch/x86/um/shared/sysdep/ptrace.h
@@ -65,7 +65,7 @@ struct uml_pt_regs {
 	int is_user;
 
 	/* Dynamically sized FP registers (holds an XSTATE) */
-	unsigned long fp[];
+	unsigned long fp[] __attribute__((aligned(16)));
 };
 
 #define EMPTY_UML_PT_REGS { }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 11/13] um: change machine name for uname output
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (9 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 10/13] x86/um: nommu: signal handling Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 12/13] um: nommu: add documentation of nommu UML Hajime Tazaki
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

This commit tries to display MMU/!MMU mode from the output of uname(2)
so that users can distinguish which mode of UML is running right now.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
 arch/um/Makefile        | 6 ++++++
 arch/um/os-Linux/util.c | 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/um/Makefile b/arch/um/Makefile
index fcf4bb915a31..722a3bd03f24 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -155,6 +155,12 @@ export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) $(CC_FLAGS_
 CLEAN_FILES += linux x.i gmon.out
 MRPROPER_FILES += $(HOST_DIR)/include/generated
 
+ifeq ($(CONFIG_MMU),y)
+UTS_MACHINE := "um"
+else
+UTS_MACHINE := "um\(nommu\)"
+endif
+
 archclean:
 	@find . \( -name '*.bb' -o -name '*.bbg' -o -name '*.da' \
 		-o -name '*.gcov' \) -type f -print | xargs rm -f
diff --git a/arch/um/os-Linux/util.c b/arch/um/os-Linux/util.c
index 4193e04d7e4a..20421e9f0f77 100644
--- a/arch/um/os-Linux/util.c
+++ b/arch/um/os-Linux/util.c
@@ -65,7 +65,8 @@ void setup_machinename(char *machine_out)
 	}
 # endif
 #endif
-	strcpy(machine_out, host.machine);
+	strcat(machine_out, "/");
+	strcat(machine_out, host.machine);
 }
 
 void setup_hostinfo(char *buf, int len)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 12/13] um: nommu: add documentation of nommu UML
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (10 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 11/13] um: change machine name for uname output Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-01-20  6:00 ` [PATCH v7 13/13] um: nommu: plug nommu code into build system Hajime Tazaki
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

This commit adds an initial documentation for !MMU mode of UML.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
---
 Documentation/virt/uml/nommu-uml.rst | 177 +++++++++++++++++++++++++++
 MAINTAINERS                          |   1 +
 2 files changed, 178 insertions(+)
 create mode 100644 Documentation/virt/uml/nommu-uml.rst

diff --git a/Documentation/virt/uml/nommu-uml.rst b/Documentation/virt/uml/nommu-uml.rst
new file mode 100644
index 000000000000..2034ba4d397a
--- /dev/null
+++ b/Documentation/virt/uml/nommu-uml.rst
@@ -0,0 +1,177 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+UML has been built with CONFIG_MMU since day 0.  The patchset
+introduces the nommu mode on UML in a different angle from what Linux
+Kernel Library tried.
+
+.. contents:: :local:
+
+What is it for ?
+================
+
+- Alleviate syscall hook overhead implemented with ptrace(2)
+- To exercises nommu code over UML (and over KUnit)
+- Less dependency to host facilities
+
+
+How it works ?
+==============
+
+To illustrate how this feature works, the below shows how syscalls are
+called under nommu/UML environment.
+
+- boot kernel, install seccomp filter if ``syscall`` instructions are
+  called from userspace memory based on the address of instruction
+  pointer
+- (userspace starts)
+- calls ``vfork``/``execve`` syscalls
+- ``SIGSYS`` signal raised, handler calls syscall entry point ``__kernel_vsyscall``
+- call handler function in ``sys_call_table[]`` and follow how UML syscall
+  works.
+- return to userspace
+
+
+What are the differences from MMU-full UML ?
+============================================
+
+The current nommu implementation adds 3 different functions which
+MMU-full UML doesn't have:
+
+- kernel address space can directly be accessible from userspace
+  - so, ``uaccess()`` always returns 1
+  - generic implementation of memcpy/strcpy/futex is also used
+- alternate syscall entrypoint without ptrace
+- alternate syscall hook
+  - hook syscall by seccomp filter
+
+With those modifications, it allows us to use unmodified userspace
+binaries with nommu UML.
+
+
+History
+=======
+
+This feature was originally introduced by Ricardo Koller at Open
+Source Summit NA 2020, then integrated with the syscall translation
+functionality with the clean up to the original code.
+
+Building and run
+================
+
+::
+
+   make ARCH=um x86_64_nommu_defconfig
+   make ARCH=um
+
+will build UML with ``CONFIG_MMU=n`` applied.
+
+Kunit tests can run with the following command::
+
+   ./tools/testing/kunit/kunit.py run --kconfig_add CONFIG_MMU=n
+
+To run a typical Linux distribution, we need nommu-aware userspace.
+We can use a stock version of Alpine Linux with nommu-built version of
+busybox and musl-libc.
+
+
+Preparing root filesystem
+=========================
+
+nommu UML requires to use a specific standard library which is aware
+of nommu kernel.  We have tested custom-build musl-libc and busybox,
+both of which have built-in support for nommu kernels.
+
+There are no available Linux distributions for nommu under x86_64
+architecture, so we need to prepare our own image for the root
+filesystem.  We use Alpine Linux as a base distribution and replace
+busybox and musl-libc on top of that.  The following are the step to
+prepare the filesystem for the quick start::
+
+     container_id=$(docker create ghcr.io/thehajime/alpine:3.20.3-um-nommu)
+     docker start $container_id
+     docker wait $container_id
+     docker export $container_id > alpine.tar
+     docker rm $container_id
+
+     mnt=$(mktemp -d)
+     dd if=/dev/zero of=alpine.ext4 bs=1 count=0 seek=1G
+     sudo chmod og+wr "alpine.ext4"
+     yes 2>/dev/null | mkfs.ext4 "alpine.ext4" || true
+     sudo mount "alpine.ext4" $mnt
+     sudo tar -xf alpine.tar -C $mnt
+     sudo umount $mnt
+
+This will create a file image, ``alpine.ext4``, which contains busybox
+and musl with nommu build on the Alpine Linux root filesystem.  The
+file can be specified to the argument ``ubd0=`` to the UML command line::
+
+  ./vmlinux ubd0=./alpine.ext4 rw mem=1024m loglevel=8 init=/sbin/init
+
+We plan to upstream apk packages for busybox and musl so that we can
+follow the proper procedure to set up the root filesystem.
+
+
+Quick start with docker
+=======================
+
+There is a docker image that you can quickly start with a simple step::
+
+  docker run -it -v /dev/shm:/dev/shm --rm ghcr.io/thehajime/alpine:3.20.3-um-nommu
+
+This will launch a UML instance with an pre-configured root filesystem.
+
+Benchmark
+=========
+
+The below shows an example of performance measurement conducted with
+lmbench and (self-crafted) getpid benchmark (with v6.13-rc5 uml/next
+tree).
+
+.. csv-table:: lmbench (usec)
+  :header: ,native,um,um-nommu(s)
+
+  select-10    ,0.5569,27.0149,2.9772
+  select-100   ,2.3964,26.9242,3.8947
+  select-1000  ,20.8101,39.6842,12.8161
+  syscall      ,0.1735,25.7706,2.6997
+  read         ,0.3488,25.7922,2.7923
+  write        ,0.2861,26.5560,2.7961
+  stat         ,1.9171,36.2893,3.2678
+  open/close   ,3.8475,62.2847,6.3909
+  fork+sh      ,1159.0000,5230.3333,409.1786
+  fork+execve  ,535.3000,2075.8333,135.4074
+
+.. csv-table:: do_getpid bench (nsec)
+  :header: ,native,um,um-nommu(s)
+
+  getpid, 172 , 24979 , 2691
+
+Limitations
+===========
+
+generic nommu limitations
+-------------------------
+Since this port is a kernel of nommu architecture so, the
+implementation inherits the characteristics of other nommu kernels
+(riscv, arm, etc), described below.
+
+- vfork(2) should be used instead of fork(2)
+- ELF loader only loads PIE (position independent executable) binaries
+- processes share the address space among others
+- mmap(2) offers a subset of functionalities (e.g., unsupported
+  MMAP_FIXED)
+
+Thus, we have limited options to userspace programs.  We have tested
+Alpine Linux with musl-libc, which has a support nommu kernel.
+
+supported architecture
+----------------------
+The current implementation of nommu UML only works on x86_64 SUBARCH.
+We have not tested with 32-bit environment.
+
+
+Further readings about NOMMU UML
+================================
+
+- NOMMU UML (original code by Ricardo Koller)
+ - https://static.sched.com/hosted_files/ossna2020/ec/kollerr_linux_um_nommu.pdf
diff --git a/MAINTAINERS b/MAINTAINERS
index 910305c11e8a..fc5473fd528b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24470,6 +24470,7 @@ USER-MODE LINUX (UML)
 M:	Richard Weinberger <richard@nod.at>
 M:	Anton Ivanov <anton.ivanov@cambridgegreys.com>
 M:	Johannes Berg <johannes@sipsolutions.net>
+M:	Hajime Tazaki <thehajime@gmail.com>
 L:	linux-um@lists.infradead.org
 S:	Maintained
 W:	http://user-mode-linux.sourceforge.net
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v7 13/13] um: nommu: plug nommu code into build system
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (11 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 12/13] um: nommu: add documentation of nommu UML Hajime Tazaki
@ 2025-01-20  6:00 ` Hajime Tazaki
  2025-02-04 12:26 ` [PATCH v7 00/13] nommu UML Hajime Tazaki
  2025-04-25 13:49 ` Lorenzo Stoakes
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-01-20  6:00 UTC (permalink / raw)
  To: linux-um; +Cc: thehajime, ricarkol, Liam.Howlett

Add nommu kernel for um build.  defconfig is also provided.

Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arch/um/Kconfig                        | 14 +++++-
 arch/um/configs/x86_64_nommu_defconfig | 64 ++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 2 deletions(-)
 create mode 100644 arch/um/configs/x86_64_nommu_defconfig

diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index 18051b1cfce0..2fc5a91c90a7 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -30,14 +30,17 @@ config UML
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
 	select TRACE_IRQFLAGS_SUPPORT
 	select TTY # Needed for line.c
-	select HAVE_ARCH_VMAP_STACK
+	select HAVE_ARCH_VMAP_STACK if MMU
 	select HAVE_RUST
 	select ARCH_HAS_UBSAN
 	select HAVE_ARCH_TRACEHOOK
 	select THREAD_INFO_IN_TASK
+	select UACCESS_MEMCPY if !MMU
+	select GENERIC_STRNLEN_USER if !MMU
+	select GENERIC_STRNCPY_FROM_USER if !MMU
 
 config MMU
-	bool
+	bool "MMU-based Paged Memory Management Support" if 64BIT
 	default y
 
 config UML_DMA_EMULATION
@@ -190,8 +193,15 @@ config MAGIC_SYSRQ
 	  The keys are documented in <file:Documentation/admin-guide/sysrq.rst>. Don't say Y
 	  unless you really know what this hack does.
 
+config ARCH_FORCE_MAX_ORDER
+	int "Order of maximal physically contiguous allocations" if EXPERT
+	default "10" if MMU
+	default "16" if !MMU
+
 config KERNEL_STACK_ORDER
 	int "Kernel stack size order"
+	default 3 if !MMU
+	range 3 10 if !MMU
 	default 2 if 64BIT
 	range 2 10 if 64BIT
 	default 1 if !64BIT
diff --git a/arch/um/configs/x86_64_nommu_defconfig b/arch/um/configs/x86_64_nommu_defconfig
new file mode 100644
index 000000000000..c2e0fb546987
--- /dev/null
+++ b/arch/um/configs/x86_64_nommu_defconfig
@@ -0,0 +1,64 @@
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_LOG_BUF_SHIFT=14
+CONFIG_CGROUPS=y
+CONFIG_BLK_CGROUP=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_CGROUP_DEVICE=y
+CONFIG_CGROUP_CPUACCT=y
+# CONFIG_PID_NS is not set
+CONFIG_CC_OPTIMIZE_FOR_SIZE=y
+# CONFIG_MMU is not set
+CONFIG_HOSTFS=y
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_SSL=y
+CONFIG_NULL_CHAN=y
+CONFIG_PORT_CHAN=y
+CONFIG_PTY_CHAN=y
+CONFIG_TTY_CHAN=y
+CONFIG_CON_CHAN="pts"
+CONFIG_SSL_CHAN="pts"
+CONFIG_UML_SOUND=m
+CONFIG_UML_NET=y
+CONFIG_UML_NET_ETHERTAP=y
+CONFIG_UML_NET_TUNTAP=y
+CONFIG_UML_NET_SLIP=y
+CONFIG_UML_NET_DAEMON=y
+CONFIG_UML_NET_MCAST=y
+CONFIG_UML_NET_SLIRP=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_IOSCHED_BFQ=m
+CONFIG_BINFMT_MISC=m
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_INET=y
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
+CONFIG_BLK_DEV_UBD=y
+CONFIG_BLK_DEV_LOOP=m
+CONFIG_BLK_DEV_NBD=m
+CONFIG_DUMMY=m
+CONFIG_TUN=m
+CONFIG_PPP=m
+CONFIG_SLIP=m
+CONFIG_LEGACY_PTY_COUNT=32
+CONFIG_UML_RANDOM=y
+CONFIG_SOUND=m
+CONFIG_EXT4_FS=y
+CONFIG_REISERFS_FS=y
+CONFIG_QUOTA=y
+CONFIG_AUTOFS_FS=m
+CONFIG_ISO9660_FS=m
+CONFIG_JOLIET=y
+CONFIG_NLS=y
+CONFIG_DEBUG_KERNEL=y
+CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
+CONFIG_FRAME_WARN=1024
+CONFIG_IPV6=y
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 00/13] nommu UML
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (12 preceding siblings ...)
  2025-01-20  6:00 ` [PATCH v7 13/13] um: nommu: plug nommu code into build system Hajime Tazaki
@ 2025-02-04 12:26 ` Hajime Tazaki
  2025-04-25 13:49 ` Lorenzo Stoakes
  14 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-02-04 12:26 UTC (permalink / raw)
  To: richard, anton.ivanov, johannes, linux-um; +Cc: ricarkol, Liam.Howlett


Hello Richard, Anton,
Cc: Johannes,

On Mon, 20 Jan 2025 15:00:02 +0900,
Hajime Tazaki wrote:
> 
> This patchset is another spin of nommu mode addition to UML.  It doesn't
> change a lot since the last version (v5), but contain clean ups.  It would
> be nice to hear about your opinions on that.

Hope you're doing good.
I'm writing this email to ask you for a review to this patchset.

I have been addressed comments received so far, and trying to shape
the patchset as a minimal size so that we can avoid confusion by the
new feature.

But still wish to see how I can improve the patches so, it'll be nice
to borrow your eyes to look at the patchset in your spare time.
Johannes gave me a bunch of feedbacks but I would also like to see
other maintainers of UML.

Thank you in advance and looking forward to see your inputs.

-- Hajime



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 00/13] nommu UML
  2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
                   ` (13 preceding siblings ...)
  2025-02-04 12:26 ` [PATCH v7 00/13] nommu UML Hajime Tazaki
@ 2025-04-25 13:49 ` Lorenzo Stoakes
  2025-04-27  3:49   ` Hajime Tazaki
  14 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-04-25 13:49 UTC (permalink / raw)
  To: Hajime Tazaki
  Cc: linux-um, ricarkol, Liam.Howlett, Richard Weinberger,
	Anton Ivanov, Johannes Berg, linux-mm, linux-kernel

+cc current UML maintainers as per- current MAINTAINERS file, mm + kernel
lists.

Hi,

It seemed this series died, which is a pity, i'd be very useful to have
this functionality to aid in easily testing nommu in mm code.

I know that I pushed back a little (or rather - wondering about the status
of nommu in general) back in v2, however with it confirmed that nommu is
required, this series becomes really quite important.

This would need rebasing of course for a v8, but I wonder if there is any
appetite for it or why this didn't go anywhere?

In any case, if you are still interested in this Hajime (thanks so much for
all your hard work on it!), just saying there are definitely users out
there.

Thanks, Lorenzo

On Mon, Jan 20, 2025 at 03:00:02PM +0900, Hajime Tazaki wrote:
> This patchset is another spin of nommu mode addition to UML.  It doesn't
> change a lot since the last version (v5), but contain clean ups.  It would
> be nice to hear about your opinions on that.
>
> There are still several limitations/issues which we already found;
> here is the list of those issues.
>
> - memory mapped by loadable modules are not distinguished from
>   userspace memory.
>
> -- Hajime
>
> v7:
> - properly handle FP register upon signal delivery [10/13]
> - update benchmark result with new FP register handling [12/13]
> - fix arch_has_single_step() for !MMU case [07/13]
> - revert stack alignment as it is in uml/fixes tree [10/13]
>
> v6:
> - rebase to the latest uml/next tree
> - more clean up on mmu/nommu for signal handling [10/13]
> - rename functions of mcontext routines [06,10/13]
> - added Acked-by tag for binfmt_elf_fdpic [02/13]
> - https://lore.kernel.org/linux-um/cover.1736853925.git.thehajime@gmail.com/
>
> v5:
> - clean up stack manipulation code [05,06,07,10/13]
> - https://lore.kernel.org/linux-um/cover.1733998168.git.thehajime@gmail.com/
>
> v4:
> - add arch/um/nommu, arch/x86/um/nommu to contain !MMU specific codes
> - remove zpoline patch
> - drop binfmt_elf_fdpic patch
> - reduce ifndef CONFIG_MMU if possible
> - split to elf header cleanup patch [01/13]
> - fix kernel test robot warnings [06/13]
> - fix coding styles [07/13]
> - move task_top_of_stack definition [05/13]
> - https://lore.kernel.org/linux-um/cover.1733652929.git.thehajime@gmail.com/
>
> v3:
> - https://lore.kernel.org/linux-um/cover.1733199769.git.thehajime@gmail.com/
> - add seccomp-based syscall hook in addition to zpoline [06/13]
> - remove RFC, add a line to MAINTAINERS file
> - fix kernel test robot warnings [02/13,08/13,10/13]
> - add base-commit tag to cover letter
> - pull the latest uml/next
> - clean up SIGSEGV handling [10/13]
> - detect fsgsbase availability with elf aux vector [08/13]
> - simplify vdso code with macros [09/13]
>
> RFC v2:
> - https://lore.kernel.org/linux-um/cover.1731290567.git.thehajime@gmail.com/
> - base branch is now uml/linux.git instead of torvalds/linux.git.
> - reorganize the patch series to clean up
> - fixed various coding styles issues
> - clean up exec code path [07/13]
> - fixed the crash/SIGSEGV case on userspace programs [10/13]
> - add seccomp filter to limit syscall caller address [06/13]
> - detect fsgsbase availability with sigsetjmp/siglongjmp [08/13]
> - removes unrelated changes
> - removes unneeded ifndef CONFIG_MMU
> - convert UML_CONFIG_MMU to CONFIG_MMU as using uml/linux.git
> - proposed a patch of maple-tree issue (resolving a limitation in RFC v1)
>   https://lore.kernel.org/linux-mm/20241108222834.3625217-1-thehajime@gmail.com/
>
> RFC:
> - https://lore.kernel.org/linux-um/cover.1729770373.git.thehajime@gmail.com/
>
> Hajime Tazaki (13):
>   x86/um: clean up elf specific definitions
>   x86/um: nommu: elf loader for fdpic
>   um: decouple MMU specific code from the common part
>   um: nommu: memory handling
>   x86/um: nommu: syscall handling
>   um: nommu: seccomp syscalls hook
>   x86/um: nommu: process/thread handling
>   um: nommu: configure fs register on host syscall invocation
>   x86/um/vdso: nommu: vdso memory update
>   x86/um: nommu: signal handling
>   um: change machine name for uname output
>   um: nommu: add documentation of nommu UML
>   um: nommu: plug nommu code into build system
>
>  Documentation/virt/uml/nommu-uml.rst    | 177 ++++++++++++++++++++++
>  MAINTAINERS                             |   1 +
>  arch/um/Kconfig                         |  14 +-
>  arch/um/Makefile                        |  10 ++
>  arch/um/configs/x86_64_nommu_defconfig  |  64 ++++++++
>  arch/um/include/asm/Kbuild              |   1 +
>  arch/um/include/asm/futex.h             |   4 +
>  arch/um/include/asm/mmu.h               |   8 +
>  arch/um/include/asm/mmu_context.h       |   2 +
>  arch/um/include/asm/ptrace-generic.h    |   8 +-
>  arch/um/include/asm/uaccess.h           |   7 +-
>  arch/um/include/shared/kern_util.h      |  12 ++
>  arch/um/include/shared/os.h             |  16 ++
>  arch/um/kernel/Makefile                 |   5 +-
>  arch/um/kernel/mem-pgtable.c            |  55 +++++++
>  arch/um/kernel/mem.c                    |  39 +----
>  arch/um/kernel/process.c                |  25 ++++
>  arch/um/kernel/skas/process.c           |  27 ----
>  arch/um/kernel/um_arch.c                |   3 +
>  arch/um/nommu/Makefile                  |   3 +
>  arch/um/nommu/os-Linux/Makefile         |   7 +
>  arch/um/nommu/os-Linux/signal.c         |  28 ++++
>  arch/um/nommu/trap.c                    | 188 ++++++++++++++++++++++++
>  arch/um/os-Linux/Makefile               |   8 +-
>  arch/um/os-Linux/internal.h             |   5 +
>  arch/um/os-Linux/mem.c                  |   4 +
>  arch/um/os-Linux/process.c              | 149 ++++++++++++++++++-
>  arch/um/os-Linux/seccomp.c              |  87 +++++++++++
>  arch/um/os-Linux/signal.c               |  31 +++-
>  arch/um/os-Linux/skas/process.c         | 132 -----------------
>  arch/um/os-Linux/start_up.c             |  20 +++
>  arch/um/os-Linux/util.c                 |   3 +-
>  arch/x86/um/Makefile                    |   7 +-
>  arch/x86/um/asm/elf.h                   |   8 +-
>  arch/x86/um/asm/module.h                |  24 ---
>  arch/x86/um/nommu/Makefile              |   8 +
>  arch/x86/um/nommu/do_syscall_64.c       |  80 ++++++++++
>  arch/x86/um/nommu/entry_64.S            | 113 ++++++++++++++
>  arch/x86/um/nommu/os-Linux/Makefile     |   6 +
>  arch/x86/um/nommu/os-Linux/mcontext.c   |  24 +++
>  arch/x86/um/nommu/syscalls.h            |  16 ++
>  arch/x86/um/nommu/syscalls_64.c         | 115 +++++++++++++++
>  arch/x86/um/shared/sysdep/mcontext.h    |   4 +
>  arch/x86/um/shared/sysdep/ptrace.h      |   2 +-
>  arch/x86/um/shared/sysdep/syscalls_64.h |   6 +
>  arch/x86/um/vdso/vma.c                  |  17 ++-
>  fs/Kconfig.binfmt                       |   2 +-
>  47 files changed, 1329 insertions(+), 246 deletions(-)
>  create mode 100644 Documentation/virt/uml/nommu-uml.rst
>  create mode 100644 arch/um/configs/x86_64_nommu_defconfig
>  create mode 100644 arch/um/kernel/mem-pgtable.c
>  create mode 100644 arch/um/nommu/Makefile
>  create mode 100644 arch/um/nommu/os-Linux/Makefile
>  create mode 100644 arch/um/nommu/os-Linux/signal.c
>  create mode 100644 arch/um/nommu/trap.c
>  create mode 100644 arch/um/os-Linux/seccomp.c
>  delete mode 100644 arch/x86/um/asm/module.h
>  create mode 100644 arch/x86/um/nommu/Makefile
>  create mode 100644 arch/x86/um/nommu/do_syscall_64.c
>  create mode 100644 arch/x86/um/nommu/entry_64.S
>  create mode 100644 arch/x86/um/nommu/os-Linux/Makefile
>  create mode 100644 arch/x86/um/nommu/os-Linux/mcontext.c
>  create mode 100644 arch/x86/um/nommu/syscalls.h
>  create mode 100644 arch/x86/um/nommu/syscalls_64.c
>
>
> base-commit: 2d2b61ae38bd91217ea7cc5bc700a2b9e75b3937
> --
> 2.43.0
>
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 00/13] nommu UML
  2025-04-25 13:49 ` Lorenzo Stoakes
@ 2025-04-27  3:49   ` Hajime Tazaki
  2025-04-29 13:23     ` Lorenzo Stoakes
  0 siblings, 1 reply; 22+ messages in thread
From: Hajime Tazaki @ 2025-04-27  3:49 UTC (permalink / raw)
  To: lorenzo.stoakes
  Cc: linux-um, ricarkol, Liam.Howlett, richard, anton.ivanov, johannes,
	linux-mm, linux-kernel


Hello Lorenzo,

On Fri, 25 Apr 2025 22:49:31 +0900,
Lorenzo Stoakes wrote:

> It seemed this series died, which is a pity, i'd be very useful to have
> this functionality to aid in easily testing nommu in mm code.
> 
> I know that I pushed back a little (or rather - wondering about the status
> of nommu in general) back in v2, however with it confirmed that nommu is
> required, this series becomes really quite important.
> 
> This would need rebasing of course for a v8, but I wonder if there is any
> appetite for it or why this didn't go anywhere?
> 
> In any case, if you are still interested in this Hajime (thanks so much for
> all your hard work on it!), just saying there are definitely users out
> there.

I've been waiting for opinions from the um maintainers to the v7
patchset, and am ready to send v8 series with rebasing the current
uml/next branch.

I believe the v7 series is still in the patchwork queue so, I hope
it'll be reviewed soon.

# thanks for the information that this feature is useful to test mm
  code.

-- Hajime


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 00/13] nommu UML
  2025-04-27  3:49   ` Hajime Tazaki
@ 2025-04-29 13:23     ` Lorenzo Stoakes
  2025-04-30  9:51       ` Johannes Berg
  0 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-04-29 13:23 UTC (permalink / raw)
  To: Hajime Tazaki
  Cc: linux-um, ricarkol, Liam.Howlett, richard, anton.ivanov, johannes,
	linux-mm, linux-kernel

On Sun, Apr 27, 2025 at 12:49:36PM +0900, Hajime Tazaki wrote:
>
> Hello Lorenzo,
>
> On Fri, 25 Apr 2025 22:49:31 +0900,
> Lorenzo Stoakes wrote:
>
> > It seemed this series died, which is a pity, i'd be very useful to have
> > this functionality to aid in easily testing nommu in mm code.
> >
> > I know that I pushed back a little (or rather - wondering about the status
> > of nommu in general) back in v2, however with it confirmed that nommu is
> > required, this series becomes really quite important.
> >
> > This would need rebasing of course for a v8, but I wonder if there is any
> > appetite for it or why this didn't go anywhere?
> >
> > In any case, if you are still interested in this Hajime (thanks so much for
> > all your hard work on it!), just saying there are definitely users out
> > there.
>
> I've been waiting for opinions from the um maintainers to the v7
> patchset, and am ready to send v8 series with rebasing the current
> uml/next branch.
>
> I believe the v7 series is still in the patchwork queue so, I hope
> it'll be reviewed soon.
>
> # thanks for the information that this feature is useful to test mm
>   code.

Thanks, appreciate the response. I would say send a v8 that's rebased to
make life easier for maintainers :) if you already have it ready to go that
is!

I think at this stage that makes more sense, as gives the maintainers a
chance to run locally + account for any recent changes etc.

Thank you again for all your hard work on this! This could be super useful
for us over in mm :)

>
> -- Hajime

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 00/13] nommu UML
  2025-04-29 13:23     ` Lorenzo Stoakes
@ 2025-04-30  9:51       ` Johannes Berg
  2025-05-01 10:40         ` Lorenzo Stoakes
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Berg @ 2025-04-30  9:51 UTC (permalink / raw)
  To: Lorenzo Stoakes, Hajime Tazaki
  Cc: linux-um, ricarkol, Liam.Howlett, richard, anton.ivanov, linux-mm,
	linux-kernel

On Tue, 2025-04-29 at 14:23 +0100, Lorenzo Stoakes wrote:
> Thanks, appreciate the response. I would say send a v8 that's rebased to
> make life easier for maintainers :) if you already have it ready to go that
> is!

There probably haven't been that many patches since, but I guess why not
:)

I was sort of hoping to get the seccomp patches from Benjamin - that
have been waiting far longer - in first to see what overlap if any there
might be, but I think he and Hajime have already concluded there wasn't
all that much overlap anyway.

But it's good to hear that it'd be useful to you.

johannes


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 00/13] nommu UML
  2025-04-30  9:51       ` Johannes Berg
@ 2025-05-01 10:40         ` Lorenzo Stoakes
  0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-05-01 10:40 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Hajime Tazaki, linux-um, ricarkol, Liam.Howlett, richard,
	anton.ivanov, linux-mm, linux-kernel

On Wed, Apr 30, 2025 at 11:51:47AM +0200, Johannes Berg wrote:
> On Tue, 2025-04-29 at 14:23 +0100, Lorenzo Stoakes wrote:
> > Thanks, appreciate the response. I would say send a v8 that's rebased to
> > make life easier for maintainers :) if you already have it ready to go that
> > is!
>
> There probably haven't been that many patches since, but I guess why not
> :)

Appreciate you having a look again! :>)

Perhaps I'm being selfish :P but it at least needs a rebase, perhaps just a
trivial one, as b4 shazam refused to play ball when I tried.

>
> I was sort of hoping to get the seccomp patches from Benjamin - that
> have been waiting far longer - in first to see what overlap if any there
> might be, but I think he and Hajime have already concluded there wasn't
> all that much overlap anyway.

Ah thanks for the insight, hopefully not too much friction on that.

>
> But it's good to hear that it'd be useful to you.

Yeah, I (for one) at least struggle with a workable nommu setup, I had a
buildroot for m68k-nommu but it broke in various unpleasant ways, so having
something really straightforward would be hugely helpful.

>
> johannes

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 10/13] x86/um: nommu: signal handling
  2025-01-20  6:00 ` [PATCH v7 10/13] x86/um: nommu: signal handling Hajime Tazaki
@ 2025-06-19 11:43   ` Benjamin Berg
  2025-06-19 12:55     ` Hajime Tazaki
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Berg @ 2025-06-19 11:43 UTC (permalink / raw)
  To: Hajime Tazaki, linux-um; +Cc: ricarkol, Liam.Howlett

Hi,

I am not sure that fxsaveq/fxrstorq is guaranteed to be sufficient. I
don't think that catches some of the newer AVX registers for example.

At the same time, dealing with all that properly could be rather
annoying. I don't really have a solution for that, normal UM does not
need to deal with many of those details …

Benjamin 

On Mon, 2025-01-20 at 15:00 +0900, Hajime Tazaki wrote:
> This commit updates the behavior of signal handling under !MMU
> environment. It adds the alignment code for signal frame as the frame
> is used in userspace as-is.
> 
> floating point register is carefully handling upon entry/leave of
> syscall routine so that signal handlers can read/write the contents of
> the register.
> 
> It also adds the follow up routine for SIGSEGV as a signal delivery runs
> in the same stack frame while we have to avoid endless SIGSEGV.
> 
> Signed-off-by: Hajime Tazaki <thehajime@gmail.com>
> ---
>  arch/um/include/shared/kern_util.h    |   4 +
>  arch/um/nommu/Makefile                |   2 +-
>  arch/um/nommu/os-Linux/signal.c       |  13 ++
>  arch/um/nommu/trap.c                  | 188 ++++++++++++++++++++++++++
>  arch/um/os-Linux/signal.c             |  25 +++-
>  arch/x86/um/nommu/do_syscall_64.c     |   6 +
>  arch/x86/um/nommu/os-Linux/mcontext.c |  11 ++
>  arch/x86/um/shared/sysdep/mcontext.h  |   1 +
>  arch/x86/um/shared/sysdep/ptrace.h    |   2 +-
>  9 files changed, 243 insertions(+), 9 deletions(-)
>  create mode 100644 arch/um/nommu/trap.c
> 
> diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h
> index 1dd3d02c4e39..3569ca1603a8 100644
> --- a/arch/um/include/shared/kern_util.h
> +++ b/arch/um/include/shared/kern_util.h
> @@ -71,8 +71,12 @@ void kasan_map_memory(void *start, size_t len);
>  static inline void arch_sigsys_handler(int sig, struct siginfo *si, void *mc)
>  {
>  }
> +static inline void arch_sigsegv_handler(int sig, struct siginfo *unused_si, void *mc)
> +{
> +}
>  #else
>  extern void arch_sigsys_handler(int sig, struct siginfo *si, void *mc);
> +extern void arch_sigsegv_handler(int sig, struct siginfo *si, void *mc);
>  #endif
>  
>  #endif
> diff --git a/arch/um/nommu/Makefile b/arch/um/nommu/Makefile
> index baab7c2f57c2..096221590cfd 100644
> --- a/arch/um/nommu/Makefile
> +++ b/arch/um/nommu/Makefile
> @@ -1,3 +1,3 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
> -obj-y := os-Linux/
> +obj-y := trap.o os-Linux/
> diff --git a/arch/um/nommu/os-Linux/signal.c b/arch/um/nommu/os-Linux/signal.c
> index 1674962e509d..b9e3070f2c36 100644
> --- a/arch/um/nommu/os-Linux/signal.c
> +++ b/arch/um/nommu/os-Linux/signal.c
> @@ -5,6 +5,7 @@
>  #include <os.h>
>  #include <sysdep/mcontext.h>
>  #include <sys/ucontext.h>
> +#include <as-layout.h>
>  
>  void arch_sigsys_handler(int sig, struct siginfo *si, void *ptr)
>  {
> @@ -13,3 +14,15 @@ void arch_sigsys_handler(int sig, struct siginfo *si, void *ptr)
>  	/* hook syscall via SIGSYS */
>  	set_mc_sigsys_hook(mc);
>  }
> +
> +void arch_sigsegv_handler(int sig, struct siginfo *si, void *ptr)
> +{
> +	mcontext_t *mc = (mcontext_t *) ptr;
> +
> +	/* !MMU specific part; detection of userspace */
> +	if (mc->gregs[REG_RIP] > uml_reserved &&
> +	    mc->gregs[REG_RIP] < high_physmem) {
> +		/* !MMU: force handle signals after rt_sigreturn() */
> +		set_mc_userspace_relay_signal(mc);
> +	}
> +}
> diff --git a/arch/um/nommu/trap.c b/arch/um/nommu/trap.c
> new file mode 100644
> index 000000000000..cda71ecb1115
> --- /dev/null
> +++ b/arch/um/nommu/trap.c
> @@ -0,0 +1,188 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/mm.h>
> +#include <linux/sched/signal.h>
> +#include <linux/hardirq.h>
> +#include <linux/module.h>
> +#include <linux/uaccess.h>
> +#include <linux/sched/debug.h>
> +#include <asm/current.h>
> +#include <asm/tlbflush.h>
> +#include <arch.h>
> +#include <as-layout.h>
> +#include <kern_util.h>
> +#include <os.h>
> +#include <skas.h>
> +
> +/*
> + * Note this is constrained to return 0, -EFAULT, -EACCES, -ENOMEM by
> + * segv().
> + */
> +int handle_page_fault(unsigned long address, unsigned long ip,
> +		      int is_write, int is_user, int *code_out)
> +{
> +	/* !MMU has no pagefault */
> +	return -EFAULT;
> +}
> +
> +static void show_segv_info(struct uml_pt_regs *regs)
> +{
> +	struct task_struct *tsk = current;
> +	struct faultinfo *fi = UPT_FAULTINFO(regs);
> +
> +	if (!unhandled_signal(tsk, SIGSEGV))
> +		return;
> +
> +	pr_warn_ratelimited("%s%s[%d]: segfault at %lx ip %p sp %p error %x",
> +			    task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
> +			    tsk->comm, task_pid_nr(tsk), FAULT_ADDRESS(*fi),
> +			    (void *)UPT_IP(regs), (void *)UPT_SP(regs),
> +			    fi->error_code);
> +}
> +
> +static void bad_segv(struct faultinfo fi, unsigned long ip)
> +{
> +	current->thread.arch.faultinfo = fi;
> +	force_sig_fault(SIGSEGV, SEGV_ACCERR, (void __user *) FAULT_ADDRESS(fi));
> +}
> +
> +void fatal_sigsegv(void)
> +{
> +	force_fatal_sig(SIGSEGV);
> +	do_signal(&current->thread.regs);
> +	/*
> +	 * This is to tell gcc that we're not returning - do_signal
> +	 * can, in general, return, but in this case, it's not, since
> +	 * we just got a fatal SIGSEGV queued.
> +	 */
> +	os_dump_core();
> +}
> +
> +/**
> + * segv_handler() - the SIGSEGV handler
> + * @sig:	the signal number
> + * @unused_si:	the signal info struct; unused in this handler
> + * @regs:	the ptrace register information
> + *
> + * The handler first extracts the faultinfo from the UML ptrace regs struct.
> + * If the userfault did not happen in an UML userspace process, bad_segv is called.
> + * Otherwise the signal did happen in a cloned userspace process, handle it.
> + */
> +void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
> +{
> +	struct faultinfo *fi = UPT_FAULTINFO(regs);
> +
> +	/* !MMU specific part; detection of userspace */
> +	/* mark is_user=1 when the IP is from userspace code. */
> +	if (UPT_IP(regs) > uml_reserved && UPT_IP(regs) < high_physmem)
> +		regs->is_user = 1;
> +
> +	if (UPT_IS_USER(regs) && !SEGV_IS_FIXABLE(fi)) {
> +		show_segv_info(regs);
> +		bad_segv(*fi, UPT_IP(regs));
> +		return;
> +	}
> +	segv(*fi, UPT_IP(regs), UPT_IS_USER(regs), regs);
> +}
> +
> +/*
> + * We give a *copy* of the faultinfo in the regs to segv.
> + * This must be done, since nesting SEGVs could overwrite
> + * the info in the regs. A pointer to the info then would
> + * give us bad data!
> + */
> +unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
> +		   struct uml_pt_regs *regs)
> +{
> +	int si_code;
> +	int err;
> +	int is_write = FAULT_WRITE(fi);
> +	unsigned long address = FAULT_ADDRESS(fi);
> +
> +	if (!is_user && regs)
> +		current->thread.segv_regs = container_of(regs, struct pt_regs, regs);
> +
> +	if (current->mm == NULL) {
> +		show_regs(container_of(regs, struct pt_regs, regs));
> +		panic("Segfault with no mm");
> +	} else if (!is_user && address > PAGE_SIZE && address < TASK_SIZE) {
> +		show_regs(container_of(regs, struct pt_regs, regs));
> +		panic("Kernel tried to access user memory at addr 0x%lx, ip 0x%lx",
> +		       address, ip);
> +	}
> +
> +	if (SEGV_IS_FIXABLE(&fi))
> +		err = handle_page_fault(address, ip, is_write, is_user,
> +					&si_code);
> +	else {
> +		err = -EFAULT;
> +		/*
> +		 * A thread accessed NULL, we get a fault, but CR2 is invalid.
> +		 * This code is used in __do_copy_from_user() of TT mode.
> +		 * XXX tt mode is gone, so maybe this isn't needed any more
> +		 */
> +		address = 0;
> +	}
> +
> +	if (!err)
> +		goto out;
> +	else if (!is_user && arch_fixup(ip, regs))
> +		goto out;
> +
> +	if (!is_user) {
> +		show_regs(container_of(regs, struct pt_regs, regs));
> +		panic("Kernel mode fault at addr 0x%lx, ip 0x%lx",
> +		      address, ip);
> +	}
> +
> +	show_segv_info(regs);
> +
> +	if (err == -EACCES) {
> +		current->thread.arch.faultinfo = fi;
> +		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
> +	} else {
> +		WARN_ON_ONCE(err != -EFAULT);
> +		current->thread.arch.faultinfo = fi;
> +		force_sig_fault(SIGSEGV, si_code, (void __user *) address);
> +	}
> +
> +out:
> +	if (regs)
> +		current->thread.segv_regs = NULL;
> +
> +	return 0;
> +}
> +
> +void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs)
> +{
> +	int code, err;
> +
> +	if (!UPT_IS_USER(regs)) {
> +		if (sig == SIGBUS)
> +			pr_err("Bus error - the host /dev/shm or /tmp mount likely just ran out of space\n");
> +		panic("Kernel mode signal %d", sig);
> +	}
> +
> +	arch_examine_signal(sig, regs);
> +
> +	/* Is the signal layout for the signal known?
> +	 * Signal data must be scrubbed to prevent information leaks.
> +	 */
> +	code = si->si_code;
> +	err = si->si_errno;
> +	if ((err == 0) && (siginfo_layout(sig, code) == SIL_FAULT)) {
> +		struct faultinfo *fi = UPT_FAULTINFO(regs);
> +
> +		current->thread.arch.faultinfo = *fi;
> +		force_sig_fault(sig, code, (void __user *)FAULT_ADDRESS(*fi));
> +	} else {
> +		pr_err("Attempted to relay unknown signal %d (si_code = %d) with errno %d\n",
> +		       sig, code, err);
> +		force_sig(sig);
> +	}
> +}
> +
> +void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
> +{
> +	do_IRQ(WINCH_IRQ, regs);
> +}
> diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
> index 97efdabe1e45..3997d827bf4c 100644
> --- a/arch/um/os-Linux/signal.c
> +++ b/arch/um/os-Linux/signal.c
> @@ -37,12 +37,6 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
>  	int save_errno = errno;
>  
>  	r.is_user = 0;
> -	if (sig == SIGSEGV) {
> -		/* For segfaults, we want the data from the sigcontext. */
> -		get_regs_from_mc(&r, mc);
> -		GET_FAULTINFO_FROM_MC(r.faultinfo, mc);
> -	}
> -
>  	/* enable signals if sig isn't IRQ signal */
>  	if ((sig != SIGIO) && (sig != SIGWINCH))
>  		unblock_signals_trace();
> @@ -177,8 +171,25 @@ static void sigsys_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
>  	arch_sigsys_handler(sig, unused_si, mc);
>  }
>  
> +static void sigsegv_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
> +{
> +	struct uml_pt_regs r;
> +	int save_errno = errno;
> +
> +	r.is_user = 0;
> +
> +	/* For segfaults, we want the data from the sigcontext. */
> +	get_regs_from_mc(&r, mc);
> +	GET_FAULTINFO_FROM_MC(r.faultinfo, mc);
> +
> +	segv_handler(sig, unused_si, &r);
> +	arch_sigsegv_handler(sig, unused_si, mc);
> +
> +	errno = save_errno;
> +}
> +
>  static void (*handlers[_NSIG])(int sig, struct siginfo *si, mcontext_t *mc) = {
> -	[SIGSEGV] = sig_handler,
> +	[SIGSEGV] = sigsegv_handler,
>  	[SIGBUS] = sig_handler,
>  	[SIGILL] = sig_handler,
>  	[SIGFPE] = sig_handler,
> diff --git a/arch/x86/um/nommu/do_syscall_64.c b/arch/x86/um/nommu/do_syscall_64.c
> index 796beb0089fc..48b3d29e2db1 100644
> --- a/arch/x86/um/nommu/do_syscall_64.c
> +++ b/arch/x86/um/nommu/do_syscall_64.c
> @@ -48,6 +48,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
>  	/* set fs register to the original host one */
>  	os_x86_arch_prctl(0, ARCH_SET_FS, (void *)host_fs);
>  
> +	/* save fp registers */
> +	asm volatile("fxsaveq %0" : "=m"(*(struct _xstate *)regs->regs.fp));
> +
>  	if (likely(syscall < NR_syscalls)) {
>  		PT_REGS_SET_SYSCALL_RETURN(regs,
>  				EXECUTE_SYSCALL(syscall, regs));
> @@ -66,6 +69,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
>  	set_thread_flag(TIF_SIGPENDING);
>  	interrupt_end();
>  
> +	/* restore fp registers */
> +	asm volatile("fxrstorq %0" : : "m"((current->thread.regs.regs.fp)));
> +
>  	/* restore back fs register to userspace configured one */
>  	os_x86_arch_prctl(0, ARCH_SET_FS,
>  		      (void *)(current->thread.regs.regs.gp[FS_BASE
> diff --git a/arch/x86/um/nommu/os-Linux/mcontext.c b/arch/x86/um/nommu/os-Linux/mcontext.c
> index c4ef877d5ea0..955e7d9f4765 100644
> --- a/arch/x86/um/nommu/os-Linux/mcontext.c
> +++ b/arch/x86/um/nommu/os-Linux/mcontext.c
> @@ -6,6 +6,17 @@
>  #include <sysdep/mcontext.h>
>  #include <sysdep/syscalls.h>
>  
> +static void __userspace_relay_signal(void)
> +{
> +	/* XXX: dummy syscall */
> +	__asm__ volatile("call *%0" : : "r"(__kernel_vsyscall), "a"(39) :);
> +}
> +
> +void set_mc_userspace_relay_signal(mcontext_t *mc)
> +{
> +	mc->gregs[REG_RIP] = (unsigned long) __userspace_relay_signal;
> +}
> +
>  void set_mc_sigsys_hook(mcontext_t *mc)
>  {
>  	mc->gregs[REG_RCX] = mc->gregs[REG_RIP];
> diff --git a/arch/x86/um/shared/sysdep/mcontext.h b/arch/x86/um/shared/sysdep/mcontext.h
> index 15e9ff287021..0efe5ddf95af 100644
> --- a/arch/x86/um/shared/sysdep/mcontext.h
> +++ b/arch/x86/um/shared/sysdep/mcontext.h
> @@ -9,6 +9,7 @@
>  extern void get_regs_from_mc(struct uml_pt_regs *, mcontext_t *);
>  #ifndef CONFIG_MMU
>  extern void set_mc_sigsys_hook(mcontext_t *mc);
> +extern void set_mc_userspace_relay_signal(mcontext_t *mc);
>  #endif
>  
>  #ifdef __i386__
> diff --git a/arch/x86/um/shared/sysdep/ptrace.h b/arch/x86/um/shared/sysdep/ptrace.h
> index 8f7476ff6e95..7d553d9f05be 100644
> --- a/arch/x86/um/shared/sysdep/ptrace.h
> +++ b/arch/x86/um/shared/sysdep/ptrace.h
> @@ -65,7 +65,7 @@ struct uml_pt_regs {
>  	int is_user;
>  
>  	/* Dynamically sized FP registers (holds an XSTATE) */
> -	unsigned long fp[];
> +	unsigned long fp[] __attribute__((aligned(16)));
>  };
>  
>  #define EMPTY_UML_PT_REGS { }



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v7 10/13] x86/um: nommu: signal handling
  2025-06-19 11:43   ` Benjamin Berg
@ 2025-06-19 12:55     ` Hajime Tazaki
  0 siblings, 0 replies; 22+ messages in thread
From: Hajime Tazaki @ 2025-06-19 12:55 UTC (permalink / raw)
  To: benjamin; +Cc: linux-um, ricarkol, Liam.Howlett


Thanks,
# this is a thread of my older series, v7 (current one is v9), tho
your comment is still valid so, thanks !

On Thu, 19 Jun 2025 20:43:06 +0900,
Benjamin Berg wrote:
> 
> Hi,
> 
> I am not sure that fxsaveq/fxrstorq is guaranteed to be sufficient. I
> don't think that catches some of the newer AVX registers for example.

I've added fxsaveq/fxrstorq parts with your suggestion,
test-signal-restore.c.  I think I may not know all of userspace
situations with register usage so, my current approach is to address
each issue when we faced real issue with a concrete test case.

> At the same time, dealing with all that properly could be rather
> annoying. I don't really have a solution for that, normal UM does not
> need to deal with many of those details …

thanks for the input.
I'm also/always trying to study a lot from existing code base.

-- Hajime



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-06-19 15:48 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-20  6:00 [PATCH v7 00/13] nommu UML Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 01/13] x86/um: clean up elf specific definitions Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 02/13] x86/um: nommu: elf loader for fdpic Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 03/13] um: decouple MMU specific code from the common part Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 04/13] um: nommu: memory handling Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 05/13] x86/um: nommu: syscall handling Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 06/13] um: nommu: seccomp syscalls hook Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 07/13] x86/um: nommu: process/thread handling Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 08/13] um: nommu: configure fs register on host syscall invocation Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 09/13] x86/um/vdso: nommu: vdso memory update Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 10/13] x86/um: nommu: signal handling Hajime Tazaki
2025-06-19 11:43   ` Benjamin Berg
2025-06-19 12:55     ` Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 11/13] um: change machine name for uname output Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 12/13] um: nommu: add documentation of nommu UML Hajime Tazaki
2025-01-20  6:00 ` [PATCH v7 13/13] um: nommu: plug nommu code into build system Hajime Tazaki
2025-02-04 12:26 ` [PATCH v7 00/13] nommu UML Hajime Tazaki
2025-04-25 13:49 ` Lorenzo Stoakes
2025-04-27  3:49   ` Hajime Tazaki
2025-04-29 13:23     ` Lorenzo Stoakes
2025-04-30  9:51       ` Johannes Berg
2025-05-01 10:40         ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).