* [PATCH 0/2] Remove incorrect host mincore call and add rodata handling
@ 2025-02-10 16:09 Benjamin Berg
2025-02-10 16:09 ` [PATCH 1/2] um: mark rodata read-only and implement _nofault accesses Benjamin Berg
2025-02-10 16:09 ` [PATCH 2/2] um: remove copy_from_kernel_nofault_allowed Benjamin Berg
0 siblings, 2 replies; 7+ messages in thread
From: Benjamin Berg @ 2025-02-10 16:09 UTC (permalink / raw)
To: linux-um; +Cc: Johannes Berg, Benjamin Berg
From: Benjamin Berg <benjamin.berg@intel.com>
Hi,
using mincore() to check whether a page is owned by UML is not correct
as it returns whether the page is resident in memory and not whether
something has been mapped at the address. This means that UML could get
spurious failures in *_nofault functions like copy_from_kernel_nofault).
If this happens, it can create hard to debug issues. For us the problem
showed up because hostfs was randomly throwing ENOENT errors when
loading libraries (including ld.so). This can be traced back to
dentry_path_raw failing as it uses an optimization that requires the
use of copy_from_kernel_nofault internally (see the documentation of
prepend_name and prepend_copy).
Pull in the existing RODATA patch from Johannes (with light editing) and
then remove the call to mincore() as it is incorrect.
Benjamin
Benjamin Berg (1):
um: remove copy_from_kernel_nofault_allowed
Johannes Berg (1):
um: mark rodata read-only and implement _nofault accesses
arch/um/Kconfig | 1 +
arch/um/include/asm/processor-generic.h | 2 +
arch/um/include/asm/uaccess.h | 20 +++++++---
arch/um/include/shared/arch.h | 2 +
arch/um/include/shared/as-layout.h | 2 +-
arch/um/include/shared/irq_user.h | 3 +-
arch/um/include/shared/kern_util.h | 12 ++++--
arch/um/include/shared/os.h | 1 -
arch/um/kernel/Makefile | 2 +-
arch/um/kernel/irq.c | 3 +-
arch/um/kernel/maccess.c | 19 ---------
arch/um/kernel/mem.c | 10 +++++
arch/um/kernel/trap.c | 28 ++++++++++---
arch/um/os-Linux/process.c | 51 ------------------------
arch/um/os-Linux/signal.c | 4 +-
arch/um/os-Linux/skas/process.c | 8 ++--
arch/x86/um/os-Linux/mcontext.c | 12 ++++++
arch/x86/um/shared/sysdep/faultinfo_32.h | 12 ++++++
arch/x86/um/shared/sysdep/faultinfo_64.h | 12 ++++++
19 files changed, 109 insertions(+), 95 deletions(-)
delete mode 100644 arch/um/kernel/maccess.c
--
2.48.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] um: mark rodata read-only and implement _nofault accesses
2025-02-10 16:09 [PATCH 0/2] Remove incorrect host mincore call and add rodata handling Benjamin Berg
@ 2025-02-10 16:09 ` Benjamin Berg
2025-04-02 22:12 ` Nathan Chancellor
2025-02-10 16:09 ` [PATCH 2/2] um: remove copy_from_kernel_nofault_allowed Benjamin Berg
1 sibling, 1 reply; 7+ messages in thread
From: Benjamin Berg @ 2025-02-10 16:09 UTC (permalink / raw)
To: linux-um; +Cc: Johannes Berg, Benjamin Berg
From: Johannes Berg <johannes.berg@intel.com>
Mark read-only data actually read-only (simple mprotect), and
to be able to test it also implement _nofault accesses. This
works by setting up a new "segv_continue" pointer in current,
and then when we hit a segfault we change the signal return
context so that we continue at that address. The code using
this sets it up so that it jumps to a label and then aborts
the access that way, returning -EFAULT.
It's possible to optimize the ___backtrack_faulted() thing by
using asm goto (compiler version dependent) and/or gcc's (not
sure if clang has it) &&label extension, but at least in one
attempt I made the && caused the compiler to not load -EFAULT
into the register in case of jumping to the &&label from the
fault handler. So leave it like this for now.
Co-developed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
arch/um/Kconfig | 1 +
arch/um/include/asm/processor-generic.h | 2 ++
arch/um/include/asm/uaccess.h | 20 ++++++++++++-----
arch/um/include/shared/arch.h | 2 ++
arch/um/include/shared/as-layout.h | 2 +-
arch/um/include/shared/irq_user.h | 3 ++-
arch/um/include/shared/kern_util.h | 12 ++++++----
arch/um/kernel/irq.c | 3 ++-
arch/um/kernel/mem.c | 10 +++++++++
arch/um/kernel/trap.c | 28 +++++++++++++++++++-----
arch/um/os-Linux/signal.c | 4 ++--
arch/um/os-Linux/skas/process.c | 8 +++----
arch/x86/um/os-Linux/mcontext.c | 12 ++++++++++
arch/x86/um/shared/sysdep/faultinfo_32.h | 12 ++++++++++
arch/x86/um/shared/sysdep/faultinfo_64.h | 12 ++++++++++
15 files changed, 108 insertions(+), 23 deletions(-)
diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index 18051b1cfce0..79509c7f39de 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -12,6 +12,7 @@ config UML
select ARCH_HAS_KCOV
select ARCH_HAS_STRNCPY_FROM_USER
select ARCH_HAS_STRNLEN_USER
+ select ARCH_HAS_STRICT_KERNEL_RWX
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_KASAN if X86_64
select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
diff --git a/arch/um/include/asm/processor-generic.h b/arch/um/include/asm/processor-generic.h
index 5d6356eafffe..8a789c17acd8 100644
--- a/arch/um/include/asm/processor-generic.h
+++ b/arch/um/include/asm/processor-generic.h
@@ -31,6 +31,8 @@ struct thread_struct {
} thread;
} request;
+ void *segv_continue;
+
/* Contains variable sized FP registers */
struct pt_regs regs;
};
diff --git a/arch/um/include/asm/uaccess.h b/arch/um/include/asm/uaccess.h
index 1d4b6bbc1b65..3a08f9029a3f 100644
--- a/arch/um/include/asm/uaccess.h
+++ b/arch/um/include/asm/uaccess.h
@@ -9,6 +9,7 @@
#include <asm/elf.h>
#include <linux/unaligned.h>
+#include <sysdep/faultinfo.h>
#define __under_task_size(addr, size) \
(((unsigned long) (addr) < TASK_SIZE) && \
@@ -44,19 +45,28 @@ static inline int __access_ok(const void __user *ptr, unsigned long size)
__access_ok_vsyscall(addr, size));
}
-/* no pagefaults for kernel addresses in um */
#define __get_kernel_nofault(dst, src, type, err_label) \
do { \
- *((type *)dst) = get_unaligned((type *)(src)); \
- if (0) /* make sure the label looks used to the compiler */ \
+ int __faulted; \
+ \
+ ___backtrack_faulted(__faulted); \
+ if (__faulted) { \
+ *((type *)dst) = (type) 0; \
goto err_label; \
+ } \
+ *((type *)dst) = get_unaligned((type *)(src)); \
+ current->thread.segv_continue = NULL; \
} while (0)
#define __put_kernel_nofault(dst, src, type, err_label) \
do { \
- put_unaligned(*((type *)src), (type *)(dst)); \
- if (0) /* make sure the label looks used to the compiler */ \
+ int __faulted; \
+ \
+ ___backtrack_faulted(__faulted); \
+ if (__faulted) \
goto err_label; \
+ put_unaligned(*((type *)src), (type *)(dst)); \
+ current->thread.segv_continue = NULL; \
} while (0)
#endif
diff --git a/arch/um/include/shared/arch.h b/arch/um/include/shared/arch.h
index 880ee42a3329..cc398a21ad96 100644
--- a/arch/um/include/shared/arch.h
+++ b/arch/um/include/shared/arch.h
@@ -12,4 +12,6 @@ extern void arch_check_bugs(void);
extern int arch_fixup(unsigned long address, struct uml_pt_regs *regs);
extern void arch_examine_signal(int sig, struct uml_pt_regs *regs);
+void mc_set_rip(void *_mc, void *target);
+
#endif
diff --git a/arch/um/include/shared/as-layout.h b/arch/um/include/shared/as-layout.h
index ea65f151bf48..4f44dcce8a7c 100644
--- a/arch/um/include/shared/as-layout.h
+++ b/arch/um/include/shared/as-layout.h
@@ -50,7 +50,7 @@ extern int linux_main(int argc, char **argv, char **envp);
extern void uml_finishsetup(void);
struct siginfo;
-extern void (*sig_info[])(int, struct siginfo *si, struct uml_pt_regs *);
+extern void (*sig_info[])(int, struct siginfo *si, struct uml_pt_regs *, void *);
#endif
diff --git a/arch/um/include/shared/irq_user.h b/arch/um/include/shared/irq_user.h
index da0f6eea30d0..88835b52ae2b 100644
--- a/arch/um/include/shared/irq_user.h
+++ b/arch/um/include/shared/irq_user.h
@@ -15,7 +15,8 @@ enum um_irq_type {
};
struct siginfo;
-extern void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs);
+extern void sigio_handler(int sig, struct siginfo *unused_si,
+ struct uml_pt_regs *regs, void *mc);
void sigio_run_timetravel_handlers(void);
extern void free_irq_by_fd(int fd);
extern void deactivate_fd(int fd, int irqnum);
diff --git a/arch/um/include/shared/kern_util.h b/arch/um/include/shared/kern_util.h
index f21dc8517538..00ca3e12fd9a 100644
--- a/arch/um/include/shared/kern_util.h
+++ b/arch/um/include/shared/kern_util.h
@@ -24,10 +24,12 @@ extern void free_stack(unsigned long stack, int order);
struct pt_regs;
extern void do_signal(struct pt_regs *regs);
extern void interrupt_end(void);
-extern void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs);
+extern void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs,
+ void *mc);
extern unsigned long segv(struct faultinfo fi, unsigned long ip,
- int is_user, struct uml_pt_regs *regs);
+ int is_user, struct uml_pt_regs *regs,
+ void *mc);
extern int handle_page_fault(unsigned long address, unsigned long ip,
int is_write, int is_user, int *code_out);
@@ -59,8 +61,10 @@ extern unsigned long from_irq_stack(int nested);
extern int singlestepping(void);
-extern void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs);
-extern void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs);
+extern void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs,
+ void *mc);
+extern void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs,
+ void *mc);
extern void fatal_sigsegv(void) __attribute__ ((noreturn));
void um_idle_sleep(void);
diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index 338450741aac..185664215814 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
@@ -236,7 +236,8 @@ static void _sigio_handler(struct uml_pt_regs *regs,
free_irqs();
}
-void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
+void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs,
+ void *mc)
{
preempt_disable();
_sigio_handler(regs, irqs_suspended);
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index befed230aac2..05ffceb555b4 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -9,6 +9,8 @@
#include <linux/mm.h>
#include <linux/swap.h>
#include <linux/slab.h>
+#include <linux/init.h>
+#include <asm/sections.h>
#include <asm/page.h>
#include <asm/pgalloc.h>
#include <as-layout.h>
@@ -241,3 +243,11 @@ static const pgprot_t protection_map[16] = {
[VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = PAGE_SHARED
};
DECLARE_VM_GET_PAGE_PROT
+
+void mark_rodata_ro(void)
+{
+ unsigned long rodata_start = PFN_ALIGN(__start_rodata);
+ unsigned long rodata_end = PFN_ALIGN(__end_rodata);
+
+ os_protect_memory((void *)rodata_start, rodata_end - rodata_start, 1, 0, 0);
+}
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index cdaee3e94273..ce073150dc20 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -16,6 +16,7 @@
#include <kern_util.h>
#include <os.h>
#include <skas.h>
+#include <arch.h>
/*
* Note this is constrained to return 0, -EFAULT, -EACCES, -ENOMEM by
@@ -175,12 +176,14 @@ void fatal_sigsegv(void)
* @sig: the signal number
* @unused_si: the signal info struct; unused in this handler
* @regs: the ptrace register information
+ * @mc: the mcontext of the signal
*
* The handler first extracts the faultinfo from the UML ptrace regs struct.
* If the userfault did not happen in an UML userspace process, bad_segv is called.
* Otherwise the signal did happen in a cloned userspace process, handle it.
*/
-void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
+void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs,
+ void *mc)
{
struct faultinfo * fi = UPT_FAULTINFO(regs);
@@ -189,7 +192,7 @@ void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
bad_segv(*fi, UPT_IP(regs));
return;
}
- segv(*fi, UPT_IP(regs), UPT_IS_USER(regs), regs);
+ segv(*fi, UPT_IP(regs), UPT_IS_USER(regs), regs, mc);
}
/*
@@ -199,7 +202,7 @@ void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
* give us bad data!
*/
unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
- struct uml_pt_regs *regs)
+ struct uml_pt_regs *regs, void *mc)
{
int si_code;
int err;
@@ -223,6 +226,19 @@ unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
goto out;
}
else if (current->mm == NULL) {
+ if (current->pagefault_disabled) {
+ if (!mc) {
+ show_regs(container_of(regs, struct pt_regs, regs));
+ panic("Segfault with pagefaults disabled but no mcontext");
+ }
+ if (!current->thread.segv_continue) {
+ show_regs(container_of(regs, struct pt_regs, regs));
+ panic("Segfault without recovery target");
+ }
+ mc_set_rip(mc, current->thread.segv_continue);
+ current->thread.segv_continue = NULL;
+ goto out;
+ }
show_regs(container_of(regs, struct pt_regs, regs));
panic("Segfault with no mm");
}
@@ -274,7 +290,8 @@ unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
return 0;
}
-void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs)
+void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs,
+ void *mc)
{
int code, err;
if (!UPT_IS_USER(regs)) {
@@ -302,7 +319,8 @@ void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs)
}
}
-void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
+void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs,
+ void *mc)
{
do_IRQ(WINCH_IRQ, regs);
}
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index 9ea7269ffb77..e71e5b4878d1 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -21,7 +21,7 @@
#include <sys/ucontext.h>
#include <timetravel.h>
-void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = {
+void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *, void *mc) = {
[SIGTRAP] = relay_signal,
[SIGFPE] = relay_signal,
[SIGILL] = relay_signal,
@@ -47,7 +47,7 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc)
if ((sig != SIGIO) && (sig != SIGWINCH))
unblock_signals_trace();
- (*sig_info[sig])(sig, si, &r);
+ (*sig_info[sig])(sig, si, &r, mc);
errno = save_errno;
}
diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c
index f683cfc9e51a..2bd3d796bc55 100644
--- a/arch/um/os-Linux/skas/process.c
+++ b/arch/um/os-Linux/skas/process.c
@@ -166,7 +166,7 @@ static void get_skas_faultinfo(int pid, struct faultinfo *fi)
static void handle_segv(int pid, struct uml_pt_regs *regs)
{
get_skas_faultinfo(pid, ®s->faultinfo);
- segv(regs->faultinfo, 0, 1, NULL);
+ segv(regs->faultinfo, 0, 1, NULL, NULL);
}
static void handle_trap(int pid, struct uml_pt_regs *regs)
@@ -515,7 +515,7 @@ void userspace(struct uml_pt_regs *regs)
get_skas_faultinfo(pid,
®s->faultinfo);
(*sig_info[SIGSEGV])(SIGSEGV, (struct siginfo *)&si,
- regs);
+ regs, NULL);
}
else handle_segv(pid, regs);
break;
@@ -523,7 +523,7 @@ void userspace(struct uml_pt_regs *regs)
handle_trap(pid, regs);
break;
case SIGTRAP:
- relay_signal(SIGTRAP, (struct siginfo *)&si, regs);
+ relay_signal(SIGTRAP, (struct siginfo *)&si, regs, NULL);
break;
case SIGALRM:
break;
@@ -533,7 +533,7 @@ void userspace(struct uml_pt_regs *regs)
case SIGFPE:
case SIGWINCH:
block_signals_trace();
- (*sig_info[sig])(sig, (struct siginfo *)&si, regs);
+ (*sig_info[sig])(sig, (struct siginfo *)&si, regs, NULL);
unblock_signals_trace();
break;
default:
diff --git a/arch/x86/um/os-Linux/mcontext.c b/arch/x86/um/os-Linux/mcontext.c
index e80ab7d28117..d2f3a595b4ef 100644
--- a/arch/x86/um/os-Linux/mcontext.c
+++ b/arch/x86/um/os-Linux/mcontext.c
@@ -4,6 +4,7 @@
#include <asm/ptrace.h>
#include <sysdep/ptrace.h>
#include <sysdep/mcontext.h>
+#include <arch.h>
void get_regs_from_mc(struct uml_pt_regs *regs, mcontext_t *mc)
{
@@ -31,3 +32,14 @@ void get_regs_from_mc(struct uml_pt_regs *regs, mcontext_t *mc)
regs->gp[CS / sizeof(unsigned long)] |= 3;
#endif
}
+
+void mc_set_rip(void *_mc, void *target)
+{
+ mcontext_t *mc = _mc;
+
+#ifdef __i386__
+ mc->gregs[REG_EIP] = (unsigned long)target;
+#else
+ mc->gregs[REG_RIP] = (unsigned long)target;
+#endif
+}
diff --git a/arch/x86/um/shared/sysdep/faultinfo_32.h b/arch/x86/um/shared/sysdep/faultinfo_32.h
index b6f2437ec29c..ab5c8e47049c 100644
--- a/arch/x86/um/shared/sysdep/faultinfo_32.h
+++ b/arch/x86/um/shared/sysdep/faultinfo_32.h
@@ -29,4 +29,16 @@ struct faultinfo {
#define PTRACE_FULL_FAULTINFO 0
+#define ___backtrack_faulted(_faulted) \
+ asm volatile ( \
+ "mov $0, %0\n" \
+ "movl $__get_kernel_nofault_faulted_%=,%1\n" \
+ "jmp _end_%=\n" \
+ "__get_kernel_nofault_faulted_%=:\n" \
+ "mov $1, %0;" \
+ "_end_%=:" \
+ : "=r" (_faulted), \
+ "=m" (current->thread.segv_continue) :: \
+ )
+
#endif
diff --git a/arch/x86/um/shared/sysdep/faultinfo_64.h b/arch/x86/um/shared/sysdep/faultinfo_64.h
index ee88f88974ea..26fb4835d3e9 100644
--- a/arch/x86/um/shared/sysdep/faultinfo_64.h
+++ b/arch/x86/um/shared/sysdep/faultinfo_64.h
@@ -29,4 +29,16 @@ struct faultinfo {
#define PTRACE_FULL_FAULTINFO 1
+#define ___backtrack_faulted(_faulted) \
+ asm volatile ( \
+ "mov $0, %0\n" \
+ "movq $__get_kernel_nofault_faulted_%=,%1\n" \
+ "jmp _end_%=\n" \
+ "__get_kernel_nofault_faulted_%=:\n" \
+ "mov $1, %0;" \
+ "_end_%=:" \
+ : "=r" (_faulted), \
+ "=m" (current->thread.segv_continue) :: \
+ )
+
#endif
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] um: remove copy_from_kernel_nofault_allowed
2025-02-10 16:09 [PATCH 0/2] Remove incorrect host mincore call and add rodata handling Benjamin Berg
2025-02-10 16:09 ` [PATCH 1/2] um: mark rodata read-only and implement _nofault accesses Benjamin Berg
@ 2025-02-10 16:09 ` Benjamin Berg
1 sibling, 0 replies; 7+ messages in thread
From: Benjamin Berg @ 2025-02-10 16:09 UTC (permalink / raw)
To: linux-um; +Cc: Johannes Berg, Benjamin Berg
From: Benjamin Berg <benjamin.berg@intel.com>
There is no need to override the default version of this function
anymore as UML now has proper _nofault memory access functions.
Doing this also fixes the fact that the implementation was incorrect as
using mincore() will incorrectly flag pages as inaccessible if they were
swapped out by the host.
Fixes: f75b1b1bedfb ("um: Implement probe_kernel_read()")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
arch/um/include/shared/os.h | 1 -
arch/um/kernel/Makefile | 2 +-
arch/um/kernel/maccess.c | 19 --------------
arch/um/os-Linux/process.c | 51 -------------------------------------
4 files changed, 1 insertion(+), 72 deletions(-)
delete mode 100644 arch/um/kernel/maccess.c
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 5babad8c5f75..bc02767f0639 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -213,7 +213,6 @@ extern int os_protect_memory(void *addr, unsigned long len,
extern int os_unmap_memory(void *addr, int len);
extern int os_drop_memory(void *addr, int length);
extern int can_drop_memory(void);
-extern int os_mincore(void *addr, unsigned long len);
void os_set_pdeathsig(void);
diff --git a/arch/um/kernel/Makefile b/arch/um/kernel/Makefile
index f8567b933ffa..4df1cd0d2017 100644
--- a/arch/um/kernel/Makefile
+++ b/arch/um/kernel/Makefile
@@ -17,7 +17,7 @@ extra-y := vmlinux.lds
obj-y = config.o exec.o exitcode.o irq.o ksyms.o mem.o \
physmem.o process.o ptrace.o reboot.o sigio.o \
signal.o sysrq.o time.o tlb.o trap.o \
- um_arch.o umid.o maccess.o kmsg_dump.o capflags.o skas/
+ um_arch.o umid.o kmsg_dump.o capflags.o skas/
obj-y += load_file.o
obj-$(CONFIG_BLK_DEV_INITRD) += initrd.o
diff --git a/arch/um/kernel/maccess.c b/arch/um/kernel/maccess.c
deleted file mode 100644
index 8ccd56813f68..000000000000
--- a/arch/um/kernel/maccess.c
+++ /dev/null
@@ -1,19 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (C) 2013 Richard Weinberger <richrd@nod.at>
- */
-
-#include <linux/uaccess.h>
-#include <linux/kernel.h>
-#include <os.h>
-
-bool copy_from_kernel_nofault_allowed(const void *src, size_t size)
-{
- void *psrc = (void *)rounddown((unsigned long)src, PAGE_SIZE);
-
- if ((unsigned long)src < PAGE_SIZE || size <= 0)
- return false;
- if (os_mincore(psrc, size + src - psrc) <= 0)
- return false;
- return true;
-}
diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c
index 9f086f939420..184566edeee9 100644
--- a/arch/um/os-Linux/process.c
+++ b/arch/um/os-Linux/process.c
@@ -142,57 +142,6 @@ int __init can_drop_memory(void)
return ok;
}
-static int os_page_mincore(void *addr)
-{
- char vec[2];
- int ret;
-
- ret = mincore(addr, UM_KERN_PAGE_SIZE, vec);
- if (ret < 0) {
- if (errno == ENOMEM || errno == EINVAL)
- return 0;
- else
- return -errno;
- }
-
- return vec[0] & 1;
-}
-
-int os_mincore(void *addr, unsigned long len)
-{
- char *vec;
- int ret, i;
-
- if (len <= UM_KERN_PAGE_SIZE)
- return os_page_mincore(addr);
-
- vec = calloc(1, (len + UM_KERN_PAGE_SIZE - 1) / UM_KERN_PAGE_SIZE);
- if (!vec)
- return -ENOMEM;
-
- ret = mincore(addr, UM_KERN_PAGE_SIZE, vec);
- if (ret < 0) {
- if (errno == ENOMEM || errno == EINVAL)
- ret = 0;
- else
- ret = -errno;
-
- goto out;
- }
-
- for (i = 0; i < ((len + UM_KERN_PAGE_SIZE - 1) / UM_KERN_PAGE_SIZE); i++) {
- if (!(vec[i] & 1)) {
- ret = 0;
- goto out;
- }
- }
-
- ret = 1;
-out:
- free(vec);
- return ret;
-}
-
void init_new_thread_signals(void)
{
set_handler(SIGSEGV);
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] um: mark rodata read-only and implement _nofault accesses
2025-02-10 16:09 ` [PATCH 1/2] um: mark rodata read-only and implement _nofault accesses Benjamin Berg
@ 2025-04-02 22:12 ` Nathan Chancellor
2025-04-03 6:20 ` Benjamin Berg
0 siblings, 1 reply; 7+ messages in thread
From: Nathan Chancellor @ 2025-04-02 22:12 UTC (permalink / raw)
To: Benjamin Berg; +Cc: linux-um, Johannes Berg, Benjamin Berg, llvm
Hi Benjamin and Johannes,
On Mon, Feb 10, 2025 at 05:09:25PM +0100, Benjamin Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
>
> Mark read-only data actually read-only (simple mprotect), and
> to be able to test it also implement _nofault accesses. This
> works by setting up a new "segv_continue" pointer in current,
> and then when we hit a segfault we change the signal return
> context so that we continue at that address. The code using
> this sets it up so that it jumps to a label and then aborts
> the access that way, returning -EFAULT.
>
> It's possible to optimize the ___backtrack_faulted() thing by
> using asm goto (compiler version dependent) and/or gcc's (not
> sure if clang has it) &&label extension, but at least in one
> attempt I made the && caused the compiler to not load -EFAULT
> into the register in case of jumping to the &&label from the
> fault handler. So leave it like this for now.
>
> Co-developed-by: Benjamin Berg <benjamin.berg@intel.com>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
> ---
> arch/um/Kconfig | 1 +
> arch/um/include/asm/processor-generic.h | 2 ++
> arch/um/include/asm/uaccess.h | 20 ++++++++++++-----
> arch/um/include/shared/arch.h | 2 ++
> arch/um/include/shared/as-layout.h | 2 +-
> arch/um/include/shared/irq_user.h | 3 ++-
> arch/um/include/shared/kern_util.h | 12 ++++++----
> arch/um/kernel/irq.c | 3 ++-
> arch/um/kernel/mem.c | 10 +++++++++
> arch/um/kernel/trap.c | 28 +++++++++++++++++++-----
> arch/um/os-Linux/signal.c | 4 ++--
> arch/um/os-Linux/skas/process.c | 8 +++----
> arch/x86/um/os-Linux/mcontext.c | 12 ++++++++++
> arch/x86/um/shared/sysdep/faultinfo_32.h | 12 ++++++++++
> arch/x86/um/shared/sysdep/faultinfo_64.h | 12 ++++++++++
> 15 files changed, 108 insertions(+), 23 deletions(-)
>
> diff --git a/arch/um/Kconfig b/arch/um/Kconfig
> index 18051b1cfce0..79509c7f39de 100644
> --- a/arch/um/Kconfig
> +++ b/arch/um/Kconfig
> @@ -12,6 +12,7 @@ config UML
> select ARCH_HAS_KCOV
> select ARCH_HAS_STRNCPY_FROM_USER
> select ARCH_HAS_STRNLEN_USER
> + select ARCH_HAS_STRICT_KERNEL_RWX
> select HAVE_ARCH_AUDITSYSCALL
> select HAVE_ARCH_KASAN if X86_64
> select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
> diff --git a/arch/um/include/asm/processor-generic.h b/arch/um/include/asm/processor-generic.h
> index 5d6356eafffe..8a789c17acd8 100644
> --- a/arch/um/include/asm/processor-generic.h
> +++ b/arch/um/include/asm/processor-generic.h
> @@ -31,6 +31,8 @@ struct thread_struct {
> } thread;
> } request;
>
> + void *segv_continue;
> +
> /* Contains variable sized FP registers */
> struct pt_regs regs;
> };
> diff --git a/arch/um/include/asm/uaccess.h b/arch/um/include/asm/uaccess.h
> index 1d4b6bbc1b65..3a08f9029a3f 100644
> --- a/arch/um/include/asm/uaccess.h
> +++ b/arch/um/include/asm/uaccess.h
> @@ -9,6 +9,7 @@
>
> #include <asm/elf.h>
> #include <linux/unaligned.h>
> +#include <sysdep/faultinfo.h>
>
> #define __under_task_size(addr, size) \
> (((unsigned long) (addr) < TASK_SIZE) && \
> @@ -44,19 +45,28 @@ static inline int __access_ok(const void __user *ptr, unsigned long size)
> __access_ok_vsyscall(addr, size));
> }
>
> -/* no pagefaults for kernel addresses in um */
> #define __get_kernel_nofault(dst, src, type, err_label) \
> do { \
> - *((type *)dst) = get_unaligned((type *)(src)); \
> - if (0) /* make sure the label looks used to the compiler */ \
> + int __faulted; \
> + \
> + ___backtrack_faulted(__faulted); \
> + if (__faulted) { \
> + *((type *)dst) = (type) 0; \
> goto err_label; \
> + } \
> + *((type *)dst) = get_unaligned((type *)(src)); \
> + current->thread.segv_continue = NULL; \
> } while (0)
>
> #define __put_kernel_nofault(dst, src, type, err_label) \
> do { \
> - put_unaligned(*((type *)src), (type *)(dst)); \
> - if (0) /* make sure the label looks used to the compiler */ \
> + int __faulted; \
> + \
> + ___backtrack_faulted(__faulted); \
> + if (__faulted) \
> goto err_label; \
> + put_unaligned(*((type *)src), (type *)(dst)); \
> + current->thread.segv_continue = NULL; \
> } while (0)
>
> #endif
...
> diff --git a/arch/x86/um/shared/sysdep/faultinfo_64.h b/arch/x86/um/shared/sysdep/faultinfo_64.h
> index ee88f88974ea..26fb4835d3e9 100644
> --- a/arch/x86/um/shared/sysdep/faultinfo_64.h
> +++ b/arch/x86/um/shared/sysdep/faultinfo_64.h
> @@ -29,4 +29,16 @@ struct faultinfo {
>
> #define PTRACE_FULL_FAULTINFO 1
>
> +#define ___backtrack_faulted(_faulted) \
> + asm volatile ( \
> + "mov $0, %0\n" \
> + "movq $__get_kernel_nofault_faulted_%=,%1\n" \
> + "jmp _end_%=\n" \
> + "__get_kernel_nofault_faulted_%=:\n" \
> + "mov $1, %0;" \
> + "_end_%=:" \
> + : "=r" (_faulted), \
> + "=m" (current->thread.segv_continue) :: \
> + )
> +
> #endif
I bisected a crash that our CI sees with this change as
commit d1d7f01f7cd3 ("um: mark rodata read-only and implement _nofault
accesses"). Unfortunately, I only see this with clang, I do not see it
with GCC 14.2.0 but I am not sure where I would start trying to figure
out what is going on here.
$ make -skj"$(nproc)" ARCH=um LLVM=1 mrproper defconfig vmlinux
# Get rootfs via
# curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/x86_64-rootfs.ext4.zst | zstd -d >rootfs.ext4
# if necessary
$ ./vmlinux ubd0=$PWD/rootfs.ext4
Core dump limits :
soft - NONE
hard - NONE
Checking that ptrace can change system call numbers...OK
Checking syscall emulation for ptrace...OK
Checking environment variables for a tempdir...none found
Checking if /dev/shm is on tmpfs...OK
Checking PROT_EXEC mmap in /dev/shm...OK
Linux version 6.14.0-rc5-00002-gd1d7f01f7cd3 (nathan@n3-xlarge-x86) (ClangBuiltLinux clang version 20.1.2 (https://github.com/llvm/llvm-project.git 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247), ClangBuiltLinux LLD 20.1.2 (https://github.com/llvm/llvm-project.git 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)) #1 Wed Apr 2 15:07:50 MST 2025
...
EXT4-fs (ubda): orphan cleanup on readonly fs
EXT4-fs (ubda): mounted filesystem 267f8e03-3e3c-4aec-979b-f4d9b964b716 ro with ordered data mode. Quota mode: none.
VFS: Mounted root (ext4 filesystem) readonly on device 98:0.
devtmpfs: mounted
Run /sbin/init as init process
Modules linked in:
Pid: 24, comm: mount Not tainted 6.14.0-rc5-00002-gd1d7f01f7cd3
RIP: 0033:_end_0+0x38/0x45
RSP: 00000000648bfca0 EFLAGS: 00010206
RAX: 0000000000000000 RBX: 00000000609c7ff8 RCX: 00000000606350b8
RDX: 00000000648bfc5e RSI: 0000000000001000 RDI: 0000000060c0c000
RBP: 00000000648bfcc0 R08: 0000000000000000 R09: 0000000060c0c780
R10: 0000000000000000 R11: 0000000000000206 R12: 00000000648bfd60
R13: 0000000060c0c900 R14: 0000000060c0c9f8 R15: 0000000000000007
Kernel panic - not syncing: Kernel mode fault at addr 0x790, ip 0x600c268a
CPU: 0 UID: 0 PID: 24 Comm: mount Not tainted 6.14.0-rc5-00002-gd1d7f01f7cd3 #1
Stack:
600c25e9 00000007 609c7ff8 608c4780
648bfcf0 601533ca 601533ab 648bfd60
648bfdb0 608c4780 648bfd10 60152ebe
Call Trace:
[<600c25e9>] ? copy_from_kernel_nofault+0x0/0x64
[<601533ca>] prepend_copy+0x1f/0x51
[<601533ab>] ? prepend_copy+0x0/0x51
[<60152ebe>] prepend+0x60/0x67
[<60153379>] prepend_name+0x1f/0x51
[<6015335a>] ? prepend_name+0x0/0x51
[<60152bad>] prepend_path+0xb9/0x1d8
[<6003cb83>] ? um_set_signals+0x0/0x3b
[<60152e17>] d_path+0xce/0x115
[<601867f5>] proc_pid_readlink+0xa1/0x17d
[<6012b182>] vfs_readlink+0xec/0xee
[<60138920>] ? touch_atime+0x0/0x162
[<60120341>] do_readlinkat+0xbe/0x13a
[<6011f91b>] sys_readlinkat+0x10/0x14
[<6002caad>] handle_syscall+0x7b/0xaa
[<6002ca32>] ? handle_syscall+0x0/0xaa
[<6003f687>] userspace+0x289/0x4a5
[<6002a8e3>] fork_handler+0x56/0x5d
Cheers,
Nathan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] um: mark rodata read-only and implement _nofault accesses
2025-04-02 22:12 ` Nathan Chancellor
@ 2025-04-03 6:20 ` Benjamin Berg
2025-04-03 19:19 ` Nathan Chancellor
0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Berg @ 2025-04-03 6:20 UTC (permalink / raw)
To: Nathan Chancellor; +Cc: linux-um, Johannes Berg, llvm
Hi Nathan,
oops, that is a little logic bug in the code. When we are coming from
userspace, we may be running as part of a user task and have an mm. And
then the new "current->pagefault_disabled" logic is skipped. So when
prepend_copy is doing its nofault copy to evade a lock it is crashing
instead of failing gracefully and retrying.
In segv_handler, can you try moving that into a separate "else if"
block just above the "current->mm == NULL" check?
Something like the patch I copied below.
Benjamin
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index cbe924a0fa87..8a2e68d07de6 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -330,20 +330,20 @@ unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
panic("Failed to sync kernel TLBs: %d", err);
goto out;
}
- else if (current->mm == NULL) {
- if (current->pagefault_disabled) {
- if (!mc) {
- show_regs(container_of(regs, struct pt_regs, regs));
- panic("Segfault with pagefaults disabled but no mcontext");
- }
- if (!current->thread.segv_continue) {
- show_regs(container_of(regs, struct pt_regs, regs));
- panic("Segfault without recovery target");
- }
- mc_set_rip(mc, current->thread.segv_continue);
- current->thread.segv_continue = NULL;
- goto out;
+ else if (current->pagefault_disabled) {
+ if (!mc) {
+ show_regs(container_of(regs, struct pt_regs, regs));
+ panic("Segfault with pagefaults disabled but no mcontext");
}
+ if (!current->thread.segv_continue) {
+ show_regs(container_of(regs, struct pt_regs, regs));
+ panic("Segfault without recovery target");
+ }
+ mc_set_rip(mc, current->thread.segv_continue);
+ current->thread.segv_continue = NULL;
+ goto out;
+ }
+ else if (current->mm == NULL) {
show_regs(container_of(regs, struct pt_regs, regs));
panic("Segfault with no mm");
}
On Wed, 2025-04-02 at 15:12 -0700, Nathan Chancellor wrote:
> Hi Benjamin and Johannes,
>
> On Mon, Feb 10, 2025 at 05:09:25PM +0100, Benjamin Berg wrote:
> > From: Johannes Berg <johannes.berg@intel.com>
> >
> > Mark read-only data actually read-only (simple mprotect), and
> > to be able to test it also implement _nofault accesses. This
> > works by setting up a new "segv_continue" pointer in current,
> > and then when we hit a segfault we change the signal return
> > context so that we continue at that address. The code using
> > this sets it up so that it jumps to a label and then aborts
> > the access that way, returning -EFAULT.
> >
> > It's possible to optimize the ___backtrack_faulted() thing by
> > using asm goto (compiler version dependent) and/or gcc's (not
> > sure if clang has it) &&label extension, but at least in one
> > attempt I made the && caused the compiler to not load -EFAULT
> > into the register in case of jumping to the &&label from the
> > fault handler. So leave it like this for now.
> >
> > Co-developed-by: Benjamin Berg <benjamin.berg@intel.com>
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
> > ---
> > arch/um/Kconfig | 1 +
> > arch/um/include/asm/processor-generic.h | 2 ++
> > arch/um/include/asm/uaccess.h | 20 ++++++++++++-----
> > arch/um/include/shared/arch.h | 2 ++
> > arch/um/include/shared/as-layout.h | 2 +-
> > arch/um/include/shared/irq_user.h | 3 ++-
> > arch/um/include/shared/kern_util.h | 12 ++++++----
> > arch/um/kernel/irq.c | 3 ++-
> > arch/um/kernel/mem.c | 10 +++++++++
> > arch/um/kernel/trap.c | 28 +++++++++++++++++++-----
> > arch/um/os-Linux/signal.c | 4 ++--
> > arch/um/os-Linux/skas/process.c | 8 +++----
> > arch/x86/um/os-Linux/mcontext.c | 12 ++++++++++
> > arch/x86/um/shared/sysdep/faultinfo_32.h | 12 ++++++++++
> > arch/x86/um/shared/sysdep/faultinfo_64.h | 12 ++++++++++
> > 15 files changed, 108 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/um/Kconfig b/arch/um/Kconfig
> > index 18051b1cfce0..79509c7f39de 100644
> > --- a/arch/um/Kconfig
> > +++ b/arch/um/Kconfig
> > @@ -12,6 +12,7 @@ config UML
> > select ARCH_HAS_KCOV
> > select ARCH_HAS_STRNCPY_FROM_USER
> > select ARCH_HAS_STRNLEN_USER
> > + select ARCH_HAS_STRICT_KERNEL_RWX
> > select HAVE_ARCH_AUDITSYSCALL
> > select HAVE_ARCH_KASAN if X86_64
> > select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
> > diff --git a/arch/um/include/asm/processor-generic.h b/arch/um/include/asm/processor-generic.h
> > index 5d6356eafffe..8a789c17acd8 100644
> > --- a/arch/um/include/asm/processor-generic.h
> > +++ b/arch/um/include/asm/processor-generic.h
> > @@ -31,6 +31,8 @@ struct thread_struct {
> > } thread;
> > } request;
> >
> > + void *segv_continue;
> > +
> > /* Contains variable sized FP registers */
> > struct pt_regs regs;
> > };
> > diff --git a/arch/um/include/asm/uaccess.h b/arch/um/include/asm/uaccess.h
> > index 1d4b6bbc1b65..3a08f9029a3f 100644
> > --- a/arch/um/include/asm/uaccess.h
> > +++ b/arch/um/include/asm/uaccess.h
> > @@ -9,6 +9,7 @@
> >
> > #include <asm/elf.h>
> > #include <linux/unaligned.h>
> > +#include <sysdep/faultinfo.h>
> >
> > #define __under_task_size(addr, size) \
> > (((unsigned long) (addr) < TASK_SIZE) && \
> > @@ -44,19 +45,28 @@ static inline int __access_ok(const void __user *ptr, unsigned long size)
> > __access_ok_vsyscall(addr, size));
> > }
> >
> > -/* no pagefaults for kernel addresses in um */
> > #define __get_kernel_nofault(dst, src, type, err_label) \
> > do { \
> > - *((type *)dst) = get_unaligned((type *)(src)); \
> > - if (0) /* make sure the label looks used to the compiler */ \
> > + int __faulted; \
> > + \
> > + ___backtrack_faulted(__faulted); \
> > + if (__faulted) { \
> > + *((type *)dst) = (type) 0; \
> > goto err_label; \
> > + } \
> > + *((type *)dst) = get_unaligned((type *)(src)); \
> > + current->thread.segv_continue = NULL; \
> > } while (0)
> >
> > #define __put_kernel_nofault(dst, src, type, err_label) \
> > do { \
> > - put_unaligned(*((type *)src), (type *)(dst)); \
> > - if (0) /* make sure the label looks used to the compiler */ \
> > + int __faulted; \
> > + \
> > + ___backtrack_faulted(__faulted); \
> > + if (__faulted) \
> > goto err_label; \
> > + put_unaligned(*((type *)src), (type *)(dst)); \
> > + current->thread.segv_continue = NULL; \
> > } while (0)
> >
> > #endif
> ...
> > diff --git a/arch/x86/um/shared/sysdep/faultinfo_64.h b/arch/x86/um/shared/sysdep/faultinfo_64.h
> > index ee88f88974ea..26fb4835d3e9 100644
> > --- a/arch/x86/um/shared/sysdep/faultinfo_64.h
> > +++ b/arch/x86/um/shared/sysdep/faultinfo_64.h
> > @@ -29,4 +29,16 @@ struct faultinfo {
> >
> > #define PTRACE_FULL_FAULTINFO 1
> >
> > +#define ___backtrack_faulted(_faulted) \
> > + asm volatile ( \
> > + "mov $0, %0\n" \
> > + "movq $__get_kernel_nofault_faulted_%=,%1\n" \
> > + "jmp _end_%=\n" \
> > + "__get_kernel_nofault_faulted_%=:\n" \
> > + "mov $1, %0;" \
> > + "_end_%=:" \
> > + : "=r" (_faulted), \
> > + "=m" (current->thread.segv_continue) :: \
> > + )
> > +
> > #endif
>
> I bisected a crash that our CI sees with this change as
> commit d1d7f01f7cd3 ("um: mark rodata read-only and implement _nofault
> accesses"). Unfortunately, I only see this with clang, I do not see it
> with GCC 14.2.0 but I am not sure where I would start trying to figure
> out what is going on here.
>
> $ make -skj"$(nproc)" ARCH=um LLVM=1 mrproper defconfig vmlinux
>
> # Get rootfs via
> # curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/x86_64-rootfs.ext4.zst | zstd -d >rootfs.ext4
> # if necessary
> $ ./vmlinux ubd0=$PWD/rootfs.ext4
> Core dump limits :
> soft - NONE
> hard - NONE
> Checking that ptrace can change system call numbers...OK
> Checking syscall emulation for ptrace...OK
> Checking environment variables for a tempdir...none found
> Checking if /dev/shm is on tmpfs...OK
> Checking PROT_EXEC mmap in /dev/shm...OK
> Linux version 6.14.0-rc5-00002-gd1d7f01f7cd3 (nathan@n3-xlarge-x86) (ClangBuiltLinux clang version 20.1.2 (https://github.com/llvm/llvm-project.git 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247), ClangBuiltLinux LLD 20.1.2 (https://github.com/llvm/llvm-project.git 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)) #1 Wed Apr 2 15:07:50 MST 2025
> ...
> EXT4-fs (ubda): orphan cleanup on readonly fs
> EXT4-fs (ubda): mounted filesystem 267f8e03-3e3c-4aec-979b-f4d9b964b716 ro with ordered data mode. Quota mode: none.
> VFS: Mounted root (ext4 filesystem) readonly on device 98:0.
> devtmpfs: mounted
> Run /sbin/init as init process
>
> Modules linked in:
> Pid: 24, comm: mount Not tainted 6.14.0-rc5-00002-gd1d7f01f7cd3
> RIP: 0033:_end_0+0x38/0x45
> RSP: 00000000648bfca0 EFLAGS: 00010206
> RAX: 0000000000000000 RBX: 00000000609c7ff8 RCX: 00000000606350b8
> RDX: 00000000648bfc5e RSI: 0000000000001000 RDI: 0000000060c0c000
> RBP: 00000000648bfcc0 R08: 0000000000000000 R09: 0000000060c0c780
> R10: 0000000000000000 R11: 0000000000000206 R12: 00000000648bfd60
> R13: 0000000060c0c900 R14: 0000000060c0c9f8 R15: 0000000000000007
> Kernel panic - not syncing: Kernel mode fault at addr 0x790, ip 0x600c268a
> CPU: 0 UID: 0 PID: 24 Comm: mount Not tainted 6.14.0-rc5-00002-gd1d7f01f7cd3 #1
> Stack:
> 600c25e9 00000007 609c7ff8 608c4780
> 648bfcf0 601533ca 601533ab 648bfd60
> 648bfdb0 608c4780 648bfd10 60152ebe
> Call Trace:
> [<600c25e9>] ? copy_from_kernel_nofault+0x0/0x64
> [<601533ca>] prepend_copy+0x1f/0x51
> [<601533ab>] ? prepend_copy+0x0/0x51
> [<60152ebe>] prepend+0x60/0x67
> [<60153379>] prepend_name+0x1f/0x51
> [<6015335a>] ? prepend_name+0x0/0x51
> [<60152bad>] prepend_path+0xb9/0x1d8
> [<6003cb83>] ? um_set_signals+0x0/0x3b
> [<60152e17>] d_path+0xce/0x115
> [<601867f5>] proc_pid_readlink+0xa1/0x17d
> [<6012b182>] vfs_readlink+0xec/0xee
> [<60138920>] ? touch_atime+0x0/0x162
> [<60120341>] do_readlinkat+0xbe/0x13a
> [<6011f91b>] sys_readlinkat+0x10/0x14
> [<6002caad>] handle_syscall+0x7b/0xaa
> [<6002ca32>] ? handle_syscall+0x0/0xaa
> [<6003f687>] userspace+0x289/0x4a5
> [<6002a8e3>] fork_handler+0x56/0x5d
>
> Cheers,
> Nathan
>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] um: mark rodata read-only and implement _nofault accesses
2025-04-03 6:20 ` Benjamin Berg
@ 2025-04-03 19:19 ` Nathan Chancellor
2025-04-03 20:47 ` Johannes Berg
0 siblings, 1 reply; 7+ messages in thread
From: Nathan Chancellor @ 2025-04-03 19:19 UTC (permalink / raw)
To: Benjamin Berg; +Cc: linux-um, Johannes Berg, llvm
On Thu, Apr 03, 2025 at 08:20:23AM +0200, Benjamin Berg wrote:
> Hi Nathan,
>
> oops, that is a little logic bug in the code. When we are coming from
> userspace, we may be running as part of a user task and have an mm. And
> then the new "current->pagefault_disabled" logic is skipped. So when
> prepend_copy is doing its nofault copy to evade a lock it is crashing
> instead of failing gracefully and retrying.
>
> In segv_handler, can you try moving that into a separate "else if"
> block just above the "current->mm == NULL" check?
>
> Something like the patch I copied below.
Thanks, I applied that change, which shows a slightly different crash
message now:
Modules linked in:
Pid: 24, comm: mount Not tainted 6.14.0-rc5-00002-gd1d7f01f7cd3-dirty
RIP: 0033:_end_0+0x38/0x45
RSP: 00000000648bfca0 EFLAGS: 00010206
RAX: 0000000000000000 RBX: 00000000609c7ff8 RCX: 00000000606350b8
RDX: 00000000648bfc5e RSI: 0000000000001000 RDI: 0000000060c0c000
RBP: 00000000648bfcc0 R08: 0000000000000000 R09: 0000000060c0c780
R10: 0000000000000000 R11: 0000000000000206 R12: 00000000648bfd60
R13: 0000000060c0c900 R14: 0000000060c0c9f8 R15: 0000000000000007
Kernel panic - not syncing: Segfault without recovery target
CPU: 0 UID: 0 PID: 24 Comm: mount Not tainted 6.14.0-rc5-00002-gd1d7f01f7cd3-dirty #1
Stack:
600c25f9 00000007 609c7ff8 608c4780
648bfcf0 601533da 601533bb 648bfd60
648bfdb0 608c4780 648bfd10 60152ece
Call Trace:
[<600c25f9>] ? copy_from_kernel_nofault+0x0/0x64
[<601533da>] prepend_copy+0x1f/0x51
[<601533bb>] ? prepend_copy+0x0/0x51
[<60152ece>] prepend+0x60/0x67
[<60153389>] prepend_name+0x1f/0x51
[<6015336a>] ? prepend_name+0x0/0x51
[<60152bbd>] prepend_path+0xb9/0x1d8
[<6003cb91>] ? um_set_signals+0x0/0x3b
[<60152e27>] d_path+0xce/0x115
[<60186805>] proc_pid_readlink+0xa1/0x17d
[<6012b192>] vfs_readlink+0xec/0xee
[<60138930>] ? touch_atime+0x0/0x162
[<60120351>] do_readlinkat+0xbe/0x13a
[<6011f92b>] sys_readlinkat+0x10/0x14
[<6002cabb>] handle_syscall+0x7b/0xaa
[<6002ca40>] ? handle_syscall+0x0/0xaa
[<6003f695>] userspace+0x289/0x4a5
[<6002a8e3>] fork_handler+0x56/0x5d
> diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
> index cbe924a0fa87..8a2e68d07de6 100644
> --- a/arch/um/kernel/trap.c
> +++ b/arch/um/kernel/trap.c
> @@ -330,20 +330,20 @@ unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
> panic("Failed to sync kernel TLBs: %d", err);
> goto out;
> }
> - else if (current->mm == NULL) {
> - if (current->pagefault_disabled) {
> - if (!mc) {
> - show_regs(container_of(regs, struct pt_regs, regs));
> - panic("Segfault with pagefaults disabled but no mcontext");
> - }
> - if (!current->thread.segv_continue) {
> - show_regs(container_of(regs, struct pt_regs, regs));
> - panic("Segfault without recovery target");
> - }
> - mc_set_rip(mc, current->thread.segv_continue);
> - current->thread.segv_continue = NULL;
> - goto out;
> + else if (current->pagefault_disabled) {
> + if (!mc) {
> + show_regs(container_of(regs, struct pt_regs, regs));
> + panic("Segfault with pagefaults disabled but no mcontext");
> }
> + if (!current->thread.segv_continue) {
> + show_regs(container_of(regs, struct pt_regs, regs));
> + panic("Segfault without recovery target");
> + }
> + mc_set_rip(mc, current->thread.segv_continue);
> + current->thread.segv_continue = NULL;
> + goto out;
> + }
> + else if (current->mm == NULL) {
> show_regs(container_of(regs, struct pt_regs, regs));
> panic("Segfault with no mm");
> }
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] um: mark rodata read-only and implement _nofault accesses
2025-04-03 19:19 ` Nathan Chancellor
@ 2025-04-03 20:47 ` Johannes Berg
0 siblings, 0 replies; 7+ messages in thread
From: Johannes Berg @ 2025-04-03 20:47 UTC (permalink / raw)
To: Nathan Chancellor, Benjamin Berg; +Cc: linux-um, llvm
On Thu, 2025-04-03 at 12:19 -0700, Nathan Chancellor wrote:
>
> Thanks, I applied that change, which shows a slightly different crash
> message now:
Pretty sure it's all just a bug in my inline assembly, and clang
allocates registers differently:
#define ___backtrack_faulted(_faulted) \
asm volatile ( \
"mov $0, %0\n" \
"movq $__get_kernel_nofault_faulted_%=,%1\n" \
"jmp _end_%=\n" \
"__get_kernel_nofault_faulted_%=:\n" \
"mov $1, %0;" \
"_end_%=:" \
: "=r" (_faulted), \
"=m" (current->thread.segv_continue) :: \
)
It _looks_ as though both %0 and %1 are output only, but clang compiles
it to:
51: 48 83 fb 08 cmp $0x8,%rbx
55: 72 44 jb 9b <_end_0+0x2a>
57: 48 8b 01 mov (%rcx),%rax
// start inline assembly ---vvv--- //
5a: b8 00 00 00 00 mov $0x0,%eax
5f: 48 c7 80 90 07 00 00 movq $0x0,0x790(%rax) // crash
66: 00 00 00 00
66: R_X86_64_32S .text+0x6c
6a: eb 05 jmp 71 <_end_0>
000000000000006c <__get_kernel_nofault_faulted_0>:
6c: b8 01 00 00 00 mov $0x1,%eax
// end inline assembly ---^^^--- //
0000000000000071 <_end_0>:
71: 85 c0 test %eax,%eax
73: 75 56 jne cb <_end_1+0x10>
which clearly cannot work? I must be missing something. Switching the
first two instructions fixes it, of course, but right now I can't see
what I forgot in terms of constraints to make the compiler not do that.
Probably trivial to someone more familiar with inline assembly.
Modifying the _faulted to be +r instead of =r also fixes it.
johannes
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-04-03 20:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-10 16:09 [PATCH 0/2] Remove incorrect host mincore call and add rodata handling Benjamin Berg
2025-02-10 16:09 ` [PATCH 1/2] um: mark rodata read-only and implement _nofault accesses Benjamin Berg
2025-04-02 22:12 ` Nathan Chancellor
2025-04-03 6:20 ` Benjamin Berg
2025-04-03 19:19 ` Nathan Chancellor
2025-04-03 20:47 ` Johannes Berg
2025-02-10 16:09 ` [PATCH 2/2] um: remove copy_from_kernel_nofault_allowed Benjamin Berg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).