* cacheflush completely broken, suspecting PAN+LPAE
@ 2024-11-11 22:38 Michał Pecio
2024-11-12 1:15 ` Linus Walleij
2024-11-12 10:21 ` Russell King (Oracle)
0 siblings, 2 replies; 10+ messages in thread
From: Michał Pecio @ 2024-11-11 22:38 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: Linus Walleij
Hi,
I installed v6.11.5 on Tegra K1 (Cortex-A15) with tegra_defconfing +
CONFIG_ARM_LPAE + a few drivers + minor patches for driver issues.
gdb segfaults on startup, strace shows this:
openat(AT_FDCWD, "/usr/lib/guile/3.0/ccache/ice-9/psyntax-pp.go", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11
_llseek(11, 0, [477309], SEEK_END) = 0
mmap2(NULL, 477309, PROT_READ, MAP_PRIVATE, 11, 0) = 0xb378a000
close(11) = 0
mprotect(0xb37da000, 43512, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 262144, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb374a000
cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address)
cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address)
cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address)
futex(0xb6f0bb14, FUTEX_WAKE_PRIVATE, 2147483647) = 0
cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address)
cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x4} ---
guile is apparently a scripting language with JIT compiler, disabling
JIT resolves the crash, so cacheflush is a big suspect at this point.
No apparent valid reason for those failures and no recent changes to
cacheflush handling that I see, so I searched for commits touching LPAE
(perhaps the most uncommon part of my system) and I quickly found the
ARM_PAN feature, which claims to monkey with page tables. Hmm...
Disabled ARM_PAN, cacheflush returns 0 now and gdb crashes no more.
So I guess it looks like there is a problem with this feature, perhaps
a missing "permit user accesss" somewhere?
Thanks,
Michal
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: cacheflush completely broken, suspecting PAN+LPAE 2024-11-11 22:38 cacheflush completely broken, suspecting PAN+LPAE Michał Pecio @ 2024-11-12 1:15 ` Linus Walleij 2024-11-12 6:41 ` Arnd Bergmann 2024-11-12 9:32 ` Michał Pecio 2024-11-12 10:21 ` Russell King (Oracle) 1 sibling, 2 replies; 10+ messages in thread From: Linus Walleij @ 2024-11-12 1:15 UTC (permalink / raw) To: Michał Pecio Cc: linux-arm-kernel, Catalin Marinas, Linux kernel regressions list, Kees Cook Hi Michal, On Mon, Nov 11, 2024 at 11:38 PM Michał Pecio <michal.pecio@gmail.com> wrote: > I installed v6.11.5 on Tegra K1 (Cortex-A15) with tegra_defconfing + > CONFIG_ARM_LPAE + a few drivers + minor patches for driver issues. > > gdb segfaults on startup, strace shows this: > > openat(AT_FDCWD, "/usr/lib/guile/3.0/ccache/ice-9/psyntax-pp.go", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 11 > _llseek(11, 0, [477309], SEEK_END) = 0 > mmap2(NULL, 477309, PROT_READ, MAP_PRIVATE, 11, 0) = 0xb378a000 > close(11) = 0 > mprotect(0xb37da000, 43512, PROT_READ|PROT_WRITE) = 0 > mmap2(NULL, 262144, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb374a000 > cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address) > cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address) > cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address) > futex(0xb6f0bb14, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address) > cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address) > --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x4} --- > > guile is apparently a scripting language with JIT compiler, disabling > JIT resolves the crash, so cacheflush is a big suspect at this point. > > No apparent valid reason for those failures and no recent changes to > cacheflush handling that I see, so I searched for commits touching LPAE > (perhaps the most uncommon part of my system) and I quickly found the > ARM_PAN feature, which claims to monkey with page tables. Hmm... > > Disabled ARM_PAN, cacheflush returns 0 now and gdb crashes no more. > > So I guess it looks like there is a problem with this feature, perhaps > a missing "permit user accesss" somewhere? We are trying to locate the issue, which I think is the same as this but not sure: https://bugzilla.kernel.org/show_bug.cgi?id=219247 I have been trying to replicate it on a Chromebook but didn't get so far yet because the installation is pretty idiomatic :/ also there is only appears in a single Qt program and not as predictable as here. But. It appears the code is issuing cacheflush() which I guess ends up in arm_syscall() here: case NR(cacheflush): return do_cache_op(regs->ARM_r0, regs->ARM_r1, regs->ARM_r2); To here: static inline int do_cache_op(unsigned long start, unsigned long end, int flags) { if (end < start || flags) return -EINVAL; if (!access_ok((void __user *)start, end - start)) return -EFAULT; return __do_cache_op(start, end); } Here userspace access should be fine because we have entered a syscall from userspace. I tried to emulate the situation with this program: #include <stdlib.h> #include <stdio.h> #include <errno.h> #include <unistd.h> #include <fcntl.h> #include <sys/mman.h> #define NR_cacheflush 0xf0002 /* libgcc */ extern void __clear_cache(void *, void *); int main (int argc, char **argv) { void *addr; int ret; printf("Test()\n"); addr = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); if (addr == MAP_FAILED) { printf("mmap() failed\n"); exit(1); } The libgcc version is what guile uses. But it all just works fine for me. I added prints in the cacheflush trap: diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 480e307501bb..400650519bd1 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -592,11 +592,14 @@ __do_cache_op(unsigned long start, unsigned long end) static inline int do_cache_op(unsigned long start, unsigned long end, int flags) { + pr_info("%s(%08lx-%08lx)\n", __func__, start, end); if (end < start || flags) return -EINVAL; - if (!access_ok((void __user *)start, end - start)) + if (!access_ok((void __user *)start, end - start)) { + pr_err("ACCESS NOT OK\n"); return -EFAULT; + } return __do_cache_op(start, end); } And they all work fine with the test program. This is on v6.12-rc1. Does this test program work also on your set-up? Yours, Linus Walleij ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: cacheflush completely broken, suspecting PAN+LPAE 2024-11-12 1:15 ` Linus Walleij @ 2024-11-12 6:41 ` Arnd Bergmann 2024-11-12 9:46 ` Michał Pecio 2024-11-12 9:32 ` Michał Pecio 1 sibling, 1 reply; 10+ messages in thread From: Arnd Bergmann @ 2024-11-12 6:41 UTC (permalink / raw) To: Linus Walleij, Michał Pecio Cc: linux-arm-kernel, Catalin Marinas, Linux kernel regressions list, Kees Cook On Tue, Nov 12, 2024, at 02:15, Linus Walleij wrote: > On Mon, Nov 11, 2024 at 11:38 PM Michał Pecio <michal.pecio@gmail.com> wrote: >> cacheflush(0xb374a000, 0xb374b000, 0) = -1 EFAULT (Bad address) ... >> So I guess it looks like there is a problem with this feature, perhaps >> a missing "permit user accesss" somewhere? > > static inline int > do_cache_op(unsigned long start, unsigned long end, int flags) > { > if (end < start || flags) > return -EINVAL; > > if (!access_ok((void __user *)start, end - start)) > return -EFAULT; > > return __do_cache_op(start, end); > } I would guess that the problem is not the access_ok() but the actual access in v7_coherent_user_range() that does not appear to call uaccess_save_and_enable() or its assembler equivalent around the lines USER( mcr p15, 0, r12, c7, c11, 1 ) ... USER( mcr p15, 0, r12, c7, c5, 1 ) > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c > index 480e307501bb..400650519bd1 100644 > --- a/arch/arm/kernel/traps.c > +++ b/arch/arm/kernel/traps.c > @@ -592,11 +592,14 @@ __do_cache_op(unsigned long start, unsigned long end) > static inline int > do_cache_op(unsigned long start, unsigned long end, int flags) > { > + pr_info("%s(%08lx-%08lx)\n", __func__, start, end); > if (end < start || flags) > return -EINVAL; > > - if (!access_ok((void __user *)start, end - start)) > + if (!access_ok((void __user *)start, end - start)) { > + pr_err("ACCESS NOT OK\n"); > return -EFAULT; > + } > > return __do_cache_op(start, end); > } This does not catch -EFAULT being returned from the USER() trap in __do_cache_op(), so I would expect it to trigger but also not to flush the caches. It's unclear to me if this problem is specific to the TTBR0 PAN variant, or if it can also happen on any variant of the CPU_SW_DOMAIN_PAN. It seems unlikely that CPU_SW_DOMAIN_PAN has been broken for this long without anyone noticing, but I also don't see why it doesn't trap in the cache flush when the TTBR0 version does. Arnd ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: cacheflush completely broken, suspecting PAN+LPAE 2024-11-12 6:41 ` Arnd Bergmann @ 2024-11-12 9:46 ` Michał Pecio 0 siblings, 0 replies; 10+ messages in thread From: Michał Pecio @ 2024-11-12 9:46 UTC (permalink / raw) To: Arnd Bergmann Cc: Linus Walleij, linux-arm-kernel, Catalin Marinas, Linux kernel regressions list, Kees Cook Hi, On Tue, 12 Nov 2024 07:41:12 +0100, Arnd Bergmann wrote: > I would guess that the problem is not the access_ok() but > the actual access in v7_coherent_user_range() that does > not appear to call uaccess_save_and_enable() or its assembler > equivalent around the lines > > USER( mcr p15, 0, r12, c7, c11, 1 ) > ... > USER( mcr p15, 0, r12, c7, c5, 1 ) > Yes, this is what fails and where I got stuck tracing this code because I'm not exactly an ARM MMU wizard. What I could tell is that this code hasn't changed much since the 3.10 vendor kernel which works fine with LPAE on the same CPU and userspace, so I started looking for recent changes in arch/arm and found PAN. According to comments, this routine is meant to return an error when it catches a pagefault, so that's what I suppose happens here and that's why PAN immediately caught my attention. Disabling PAN fixes cacheflush for me. > It's unclear to me if this problem is specific to the TTBR0 > PAN variant, or if it can also happen on any variant of the > CPU_SW_DOMAIN_PAN. It seems unlikely that CPU_SW_DOMAIN_PAN > has been broken for this long without anyone noticing, but > I also don't see why it doesn't trap in the cache flush > when the TTBR0 version does. I don't know, but I booted the first kernel image I made for this machine, which lacks LPAE and a few other things and barely works, but it runs gdb without issues (and cacheflush returns 0). --- config.good 2024-11-11 20:22:16.604586266 +0100 +++ config.bad 2024-11-11 20:22:10.641511948 +0100 +CONFIG_ARCH_DMA_ADDR_T_64BIT=y +CONFIG_ARCH_HAS_PTE_SPECIAL=y +CONFIG_ARCH_SUPPORTS_HUGETLBFS=y +CONFIG_ARM_LPAE=y +CONFIG_CFG80211_WEXT=y -CONFIG_CPU_SW_DOMAIN_PAN=y +CONFIG_CPU_TTBR0_PAN=y +CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y +CONFIG_HAVE_GUP_FAST=y +CONFIG_MMU_GATHER_RCU_TABLE_FREE=y +CONFIG_MMU_GATHER_TABLE_FREE=y -CONFIG_PGTABLE_LEVELS=2 +CONFIG_PGTABLE_LEVELS=3 +CONFIG_PHYS_ADDR_T_64BIT=y +CONFIG_SWIOTLB=y +CONFIG_WEXT_CORE=y +CONFIG_WEXT_PROC=y +CONFIG_ZONE_DMA=y Regards, Michal ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: cacheflush completely broken, suspecting PAN+LPAE 2024-11-12 1:15 ` Linus Walleij 2024-11-12 6:41 ` Arnd Bergmann @ 2024-11-12 9:32 ` Michał Pecio 2024-11-12 10:16 ` Michał Pecio 1 sibling, 1 reply; 10+ messages in thread From: Michał Pecio @ 2024-11-12 9:32 UTC (permalink / raw) To: Linus Walleij Cc: linux-arm-kernel, Catalin Marinas, Linux kernel regressions list, Kees Cook Hi Linus, On Tue, 12 Nov 2024 02:15:19 +0100, Linus Walleij wrote: > We are trying to locate the issue, which I think is the same as this > but not sure: > https://bugzilla.kernel.org/show_bug.cgi?id=219247 You can verify by asking the reporter to run the crashing program under strace. If SIGSEGV follows a failed cacheflush, it's my bug most likely. A straightforward repro of this bug: gdb GUILE_JIT_THRESHOLD=0 gdb GUILE_JIT_THRESHOLD=-1 gdb Expected outcome: segfault, segfault, shows command prompt. > I have been trying to replicate it on a Chromebook but didn't get so > far yet because the installation is pretty idiomatic :/ also there is > only appears in a single Qt program and not as predictable as here. My bug also appears in a single program ;) This system works fine, but any JIT is broken by this kind of bug. The failure may be random if the caches resynchronize by a fluke, but with gdb it was every time so far. > But. It appears the code is issuing cacheflush() which I guess ends > up in arm_syscall() here: > > case NR(cacheflush): > return do_cache_op(regs->ARM_r0, regs->ARM_r1, regs->ARM_r2); > > To here: > > static inline int > do_cache_op(unsigned long start, unsigned long end, int flags) > { > if (end < start || flags) > return -EINVAL; > > if (!access_ok((void __user *)start, end - start)) > return -EFAULT; > > return __do_cache_op(start, end); > } Yep. I added printks here and it is particularly the call to flush_icache_range() from __do_cache_op() which returns -EFAULT. > Here userspace access should be fine because we have entered a > syscall from userspace. I tried to emulate the situation with this > program: > > #include <stdlib.h> > #include <stdio.h> > #include <errno.h> > #include <unistd.h> > #include <fcntl.h> > #include <sys/mman.h> > > #define NR_cacheflush 0xf0002 > > /* libgcc */ > extern void __clear_cache(void *, void *); > > int main (int argc, char **argv) { > void *addr; > int ret; > > printf("Test()\n"); > addr = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC, > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); > if (addr == MAP_FAILED) { > printf("mmap() failed\n"); > exit(1); > } This seems incomplete, there is no __clear_cache(). But if you add it at the end then yes, it should fail. Confirm it with strace. > I added prints in the cacheflush trap: > > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c > index 480e307501bb..400650519bd1 100644 > --- a/arch/arm/kernel/traps.c > +++ b/arch/arm/kernel/traps.c > @@ -592,11 +592,14 @@ __do_cache_op(unsigned long start, unsigned > long end) static inline int > do_cache_op(unsigned long start, unsigned long end, int flags) > { > + pr_info("%s(%08lx-%08lx)\n", __func__, start, end); > if (end < start || flags) > return -EINVAL; > > - if (!access_ok((void __user *)start, end - start)) > + if (!access_ok((void __user *)start, end - start)) { > + pr_err("ACCESS NOT OK\n"); > return -EFAULT; > + } > > return __do_cache_op(start, end); > } You also need to check what __do_cache_op() returns. Regards, Michal ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: cacheflush completely broken, suspecting PAN+LPAE 2024-11-12 9:32 ` Michał Pecio @ 2024-11-12 10:16 ` Michał Pecio 0 siblings, 0 replies; 10+ messages in thread From: Michał Pecio @ 2024-11-12 10:16 UTC (permalink / raw) To: Linus Walleij Cc: linux-arm-kernel, Catalin Marinas, Linux kernel regressions list, Kees Cook Regarding test programs, I also wrote and tried this one yesterday. It's based on a similar demo released by ARM, but much simplified. It both triggers the bug and confirms the necessity of cacheflush in JIT compilers on my CPU when it works normally (prints: 1, 1, 2). On the buggy kernel, it usually segfaults on the first attempt to call *code, but sometimes both __clear_cache() appear to take effect despite the syscall returning EFAULT (according to strace), not sure why. #include <stdio.h> #include <string.h> #include <sys/mman.h> int f1() { return 1; } int f2() { return 2; } int main() { puts("start"); char *code = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); int x; memcpy(code, f1, 0x100); __builtin___clear_cache(code, code + 0x100); x = ((int(*)())code)(); printf("%x\n", x); memcpy(code, f2, 0x100); x = ((int(*)())code)(); printf("%x\n", x); __builtin___clear_cache(code, code + 0x100); x = ((int(*)())code)(); printf("%x\n", x); } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: cacheflush completely broken, suspecting PAN+LPAE 2024-11-11 22:38 cacheflush completely broken, suspecting PAN+LPAE Michał Pecio 2024-11-12 1:15 ` Linus Walleij @ 2024-11-12 10:21 ` Russell King (Oracle) 2024-11-12 10:45 ` Michał Pecio 2024-11-12 17:10 ` Michał Pecio 1 sibling, 2 replies; 10+ messages in thread From: Russell King (Oracle) @ 2024-11-12 10:21 UTC (permalink / raw) To: Michał Pecio; +Cc: linux-arm-kernel, Linus Walleij On Mon, Nov 11, 2024 at 11:38:17PM +0100, Michał Pecio wrote: > Hi, > So I guess it looks like there is a problem with this feature, perhaps > a missing "permit user accesss" somewhere? That's exactly the reason - user access needs to be enabled before calling flush_icache_user_range() so that the cache operation instructions don't fault. The patch below should fix this. Please ensure that you copy me with ARM related bugs in future. Thanks for finding the issue. 8<=== From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Subject: [PATCH] ARM: fix cacheflush with PAN It seems that the cacheflush syscall got broken when PAN was implemented. User access was not enabled around the cache maintenance instructions, causing them to fault. Fixes: a5e090acbf54 ("ARM: software-based priviledged-no-access support") Reported-by: From: Michał Pecio <michal.pecio@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> --- arch/arm/kernel/traps.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 54dcdcde3f77..6518771c1496 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -574,6 +574,7 @@ static int bad_syscall(int n, struct pt_regs *regs) static inline int __do_cache_op(unsigned long start, unsigned long end) { + unsigned int ua_flags; int ret; do { @@ -582,7 +583,9 @@ __do_cache_op(unsigned long start, unsigned long end) if (fatal_signal_pending(current)) return 0; + ua_flags = uaccess_save_and_enable(); ret = flush_icache_user_range(start, start + chunk); + uaccess_restore(ua_flags); if (ret) return ret; -- 2.30.2 -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: cacheflush completely broken, suspecting PAN+LPAE 2024-11-12 10:21 ` Russell King (Oracle) @ 2024-11-12 10:45 ` Michał Pecio 2024-11-12 13:58 ` Linus Walleij 2024-11-12 17:10 ` Michał Pecio 1 sibling, 1 reply; 10+ messages in thread From: Michał Pecio @ 2024-11-12 10:45 UTC (permalink / raw) To: Russell King (Oracle); +Cc: linux-arm-kernel, Linus Walleij Hi, On Tue, 12 Nov 2024 10:21:36 +0000, Russell King (Oracle) wrote: > On Mon, Nov 11, 2024 at 11:38:17PM +0100, Michał Pecio wrote: > > Hi, > > So I guess it looks like there is a problem with this feature, > > perhaps a missing "permit user accesss" somewhere? > > That's exactly the reason - user access needs to be enabled before > calling flush_icache_user_range() so that the cache operation > instructions don't fault. The patch below should fix this. Thanks, I will test it later this day. By the way, do you know why it wasn't broken without LPAE? It looks like either those specific coprocessor operations somehow bypass the protection, or maybe all of PAN is a big, expensive no-op... > Please ensure that you copy me with ARM related bugs in future. OK, will do. Regards, Michal ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: cacheflush completely broken, suspecting PAN+LPAE 2024-11-12 10:45 ` Michał Pecio @ 2024-11-12 13:58 ` Linus Walleij 0 siblings, 0 replies; 10+ messages in thread From: Linus Walleij @ 2024-11-12 13:58 UTC (permalink / raw) To: Michał Pecio; +Cc: Russell King (Oracle), linux-arm-kernel On Tue, Nov 12, 2024 at 11:46 AM Michał Pecio <michal.pecio@gmail.com> wrote: > By the way, do you know why it wasn't broken without LPAE? It looks > like either those specific coprocessor operations somehow bypass the > protection, or maybe all of PAN is a big, expensive no-op... PAN is supposed to stop the kernel from reading or writing into userspace memory and that works. Nobody really said we can't flush caches for some random userspace memory, and so, that works. It's what we call a grey area. But with the TTBR0 thing for PAN on LPAE we completely disable the page walks on userspace pages from the kernel (unless explicitly allowed) and that means this now also blocks cacheflush, because we can't flush caches for a piece of memory we don't even map. Russell's patch looks like it fixes the issue, I was gonna write something similar but he quickly beat me to it, so test that! Yours, Linus Walleij ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: cacheflush completely broken, suspecting PAN+LPAE 2024-11-12 10:21 ` Russell King (Oracle) 2024-11-12 10:45 ` Michał Pecio @ 2024-11-12 17:10 ` Michał Pecio 1 sibling, 0 replies; 10+ messages in thread From: Michał Pecio @ 2024-11-12 17:10 UTC (permalink / raw) To: Russell King (Oracle); +Cc: linux-arm-kernel, Linus Walleij On Tue, 12 Nov 2024 10:21:36 +0000, Russell King (Oracle) wrote: > From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> > Subject: [PATCH] ARM: fix cacheflush with PAN > > It seems that the cacheflush syscall got broken when PAN was > implemented. User access was not enabled around the cache maintenance > instructions, causing them to fault. > > Fixes: a5e090acbf54 ("ARM: software-based priviledged-no-access support") For the record, non-LPAE seems to be OK, so for me it was 7af5b901e847. > Reported-by: From: Michał Pecio <michal.pecio@gmail.com> "From:" perhaps not strictly necessary ;) Tested-by: Michał Pecio <michal.pecio@gmail.com> That was it. With this patch gdb starts normally, does 295 cacheflush calls and all of them succeed, with ARM_LPAE and ARM_PAN enabled. > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > --- > arch/arm/kernel/traps.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c > index 54dcdcde3f77..6518771c1496 100644 > --- a/arch/arm/kernel/traps.c > +++ b/arch/arm/kernel/traps.c > @@ -574,6 +574,7 @@ static int bad_syscall(int n, struct pt_regs *regs) > static inline int > __do_cache_op(unsigned long start, unsigned long end) > { > + unsigned int ua_flags; > int ret; > > do { > @@ -582,7 +583,9 @@ __do_cache_op(unsigned long start, unsigned long end) > if (fatal_signal_pending(current)) > return 0; > > + ua_flags = uaccess_save_and_enable(); > ret = flush_icache_user_range(start, start + chunk); > + uaccess_restore(ua_flags); > if (ret) > return ret; > > -- > 2.30.2 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-11-12 17:36 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-11 22:38 cacheflush completely broken, suspecting PAN+LPAE Michał Pecio 2024-11-12 1:15 ` Linus Walleij 2024-11-12 6:41 ` Arnd Bergmann 2024-11-12 9:46 ` Michał Pecio 2024-11-12 9:32 ` Michał Pecio 2024-11-12 10:16 ` Michał Pecio 2024-11-12 10:21 ` Russell King (Oracle) 2024-11-12 10:45 ` Michał Pecio 2024-11-12 13:58 ` Linus Walleij 2024-11-12 17:10 ` Michał Pecio
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).