* Re: [PATCH v5 18/23] bpf: Use vmalloc special flag
[not found] ` <20190426001143.4983-19-namit@vmware.com>
@ 2025-08-12 16:43 ` John Paul Adrian Glaubitz
2025-08-12 18:03 ` Edgecombe, Rick P
0 siblings, 1 reply; 5+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-08-12 16:43 UTC (permalink / raw)
To: Nadav Amit, Peter Zijlstra, Borislav Petkov, Andy Lutomirski,
Ingo Molnar
Cc: linux-kernel, x86, hpa, Thomas Gleixner, Nadav Amit, Dave Hansen,
linux_dti, linux-integrity, linux-security-module, akpm,
kernel-hardening, linux-mm, will.deacon, ard.biesheuvel, kristen,
deneen.t.dock, Rick Edgecombe, Daniel Borkmann,
Alexei Starovoitov, sparclinux, Sam James, Andreas Larsson,
Anthony Yznaga
Hi,
On Thu, 2019-04-25 at 17:11 -0700, Nadav Amit wrote:
> From: Rick Edgecombe <rick.p.edgecombe@intel.com>
>
> Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
> permissioned memory in vmalloc and remove places where memory was set RW
> before freeing which is no longer needed. Don't track if the memory is RO
> anymore because it is now tracked in vmalloc.
>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> include/linux/filter.h | 17 +++--------------
> kernel/bpf/core.c | 1 -
> 2 files changed, 3 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 14ec3bdad9a9..7d3abde3f183 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -20,6 +20,7 @@
> #include <linux/set_memory.h>
> #include <linux/kallsyms.h>
> #include <linux/if_vlan.h>
> +#include <linux/vmalloc.h>
>
> #include <net/sch_generic.h>
>
> @@ -503,7 +504,6 @@ struct bpf_prog {
> u16 pages; /* Number of allocated pages */
> u16 jited:1, /* Is our filter JIT'ed? */
> jit_requested:1,/* archs need to JIT the prog */
> - undo_set_mem:1, /* Passed set_memory_ro() checkpoint */
> gpl_compatible:1, /* Is filter GPL compatible? */
> cb_access:1, /* Is control block accessed? */
> dst_needed:1, /* Do we need dst entry? */
> @@ -733,27 +733,17 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
>
> static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
> {
> - fp->undo_set_mem = 1;
> + set_vm_flush_reset_perms(fp);
> set_memory_ro((unsigned long)fp, fp->pages);
> }
>
> -static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
> -{
> - if (fp->undo_set_mem)
> - set_memory_rw((unsigned long)fp, fp->pages);
> -}
> -
> static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
> {
> + set_vm_flush_reset_perms(hdr);
> set_memory_ro((unsigned long)hdr, hdr->pages);
> set_memory_x((unsigned long)hdr, hdr->pages);
> }
>
> -static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
> -{
> - set_memory_rw((unsigned long)hdr, hdr->pages);
> -}
> -
> static inline struct bpf_binary_header *
> bpf_jit_binary_hdr(const struct bpf_prog *fp)
> {
> @@ -789,7 +779,6 @@ void __bpf_prog_free(struct bpf_prog *fp);
>
> static inline void bpf_prog_unlock_free(struct bpf_prog *fp)
> {
> - bpf_prog_unlock_ro(fp);
> __bpf_prog_free(fp);
> }
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index ff09d32a8a1b..c605397c79f0 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -848,7 +848,6 @@ void __weak bpf_jit_free(struct bpf_prog *fp)
> if (fp->jited) {
> struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);
>
> - bpf_jit_binary_unlock_ro(hdr);
> bpf_jit_binary_free(hdr);
>
> WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
> --
> 2.17.1
>
>
> From mboxrd@z Thu Jan 1 00:00:00 1970
> From: Nadav Amit <namit@vmware.com>
> Subject: [PATCH v5 18/23] bpf: Use vmalloc special flag
> Date: Thu, 25 Apr 2019 17:11:38 -0700
> Message-ID: <20190426001143.4983-19-namit@vmware.com>
> In-Reply-To: <20190426001143.4983-1-namit@vmware.com>
> References: <20190426001143.4983-1-namit@vmware.com>
> MIME-Version: 1.0
> Content-Type: text/plain
> To: Peter Zijlstra <peterz@infradead.org>, Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>, Ingo Molnar <mingo@redhat.com>
> Cc: linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com, Thomas Gleixner <tglx@linutronix.de>, Nadav Amit <nadav.amit@gmail.com>, Dave Hansen <dave.hansen@linux.intel.com>, linux_dti@icloud.com, linux-integrity@vger.kernel.org, linux-security-module@vger.kernel.org, akpm@linux-foundation.org, kernel-hardening@lists.openwall.com, linux-mm@kvack.org, will.deacon@arm.com, ard.biesheuvel@linaro.org, kristen@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe <rick.p.edgecombe@intel.com>, Daniel Borkmann <daniel@iogearbox.net>, Alexei Starovoitov <ast@kernel.org>
> List-ID: <kernel-hardening.lists.openwall.com>
>
> From: Rick Edgecombe <rick.p.edgecombe@intel.com>
>
> Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
> permissioned memory in vmalloc and remove places where memory was set RW
> before freeing which is no longer needed. Don't track if the memory is RO
> anymore because it is now tracked in vmalloc.
>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> include/linux/filter.h | 17 +++--------------
> kernel/bpf/core.c | 1 -
> 2 files changed, 3 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 14ec3bdad9a9..7d3abde3f183 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -20,6 +20,7 @@
> #include <linux/set_memory.h>
> #include <linux/kallsyms.h>
> #include <linux/if_vlan.h>
> +#include <linux/vmalloc.h>
>
> #include <net/sch_generic.h>
>
> @@ -503,7 +504,6 @@ struct bpf_prog {
> u16 pages; /* Number of allocated pages */
> u16 jited:1, /* Is our filter JIT'ed? */
> jit_requested:1,/* archs need to JIT the prog */
> - undo_set_mem:1, /* Passed set_memory_ro() checkpoint */
> gpl_compatible:1, /* Is filter GPL compatible? */
> cb_access:1, /* Is control block accessed? */
> dst_needed:1, /* Do we need dst entry? */
> @@ -733,27 +733,17 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
>
> static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
> {
> - fp->undo_set_mem = 1;
> + set_vm_flush_reset_perms(fp);
> set_memory_ro((unsigned long)fp, fp->pages);
> }
>
> -static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
> -{
> - if (fp->undo_set_mem)
> - set_memory_rw((unsigned long)fp, fp->pages);
> -}
> -
> static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
> {
> + set_vm_flush_reset_perms(hdr);
> set_memory_ro((unsigned long)hdr, hdr->pages);
> set_memory_x((unsigned long)hdr, hdr->pages);
> }
>
> -static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
> -{
> - set_memory_rw((unsigned long)hdr, hdr->pages);
> -}
> -
> static inline struct bpf_binary_header *
> bpf_jit_binary_hdr(const struct bpf_prog *fp)
> {
> @@ -789,7 +779,6 @@ void __bpf_prog_free(struct bpf_prog *fp);
>
> static inline void bpf_prog_unlock_free(struct bpf_prog *fp)
> {
> - bpf_prog_unlock_ro(fp);
> __bpf_prog_free(fp);
> }
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index ff09d32a8a1b..c605397c79f0 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -848,7 +848,6 @@ void __weak bpf_jit_free(struct bpf_prog *fp)
> if (fp->jited) {
> struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);
>
> - bpf_jit_binary_unlock_ro(hdr);
> bpf_jit_binary_free(hdr);
>
> WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
There are issues with the TLB management on sparc64 (primarily sun4u) that were introduced
by this patch. A typical backtrace after a crash looks like this:
[ 122.085803] Unable to handle kernel NULL pointer dereference
[ 122.160227] tsk->{mm,active_mm}->context = 000000000000009d
[ 122.233502] tsk->{mm,active_mm}->pgd = fff0000231d14000
[ 122.302118] \|/ ____ \|/
[ 122.302118] "@'/ .. \`@"
[ 122.302118] /_| \__/ |_\
[ 122.302118] \__U_/
[ 122.495420] systemd(1): Oops [#1]
[ 122.538874] CPU: 0 PID: 1 Comm: systemd Not tainted 5.2.0-3-sparc64 #1 Debian 5.2.17-1
[ 122.642957] TSTATE: 0000004411001601 TPC: 000000000061cd94 TNPC: 000000000061cd98 Y: 00000000 Not tainted
[ 122.772207] TPC: <vfs_getattr_nosec+0x34/0xc0>
[ 122.830529] g0: 0000000000000000 g1: 00000000000007ff g2: 0000000000000000 g3: 00000000000007df
[ 122.944902] g4: fff00002381771c0 g5: 0000000000000003 g6: fff0000238178000 g7: 0000000000000000
[ 123.059275] o0: fff000023817be18 o1: 0000000000000000 o2: 0000000000000000 o3: fff000023817be18
[ 123.173658] o4: 0000000000000000 o5: 0000000000000000 sp: fff000023817b341 ret_pc: 000000000061cd7c
[ 123.292611] RPC: <vfs_getattr_nosec+0x1c/0xc0>
[ 123.350933] l0: 0000010000204010 l1: fff0000101600e28 l2: e4e45b5b8ae44628 l3: 0000000000000000
[ 123.465311] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: fff0000100bff140
[ 123.579692] i0: fff000023817bd50 i1: fff000023817be18 i2: 0000000000000001 i3: 0000000000000900
[ 123.694060] i4: 0000000000000000 i5: fff00002320c1210 i6: fff000023817b3f1 i7: 000000000061ce48
[ 123.808439] I7: <vfs_getattr+0x28/0x40>
[ 123.858759] Call Trace:
[ 123.890785] [000000000061ce48] vfs_getattr+0x28/0x40
[ 123.957123] [000000000061cf64] vfs_statx+0x84/0xc0
[ 124.021173] [000000000061d918] sys_statx+0x38/0x60
[ 124.085226] [0000000000406154] linux_sparc_syscall+0x34/0x44
[ 124.160708] Disabling lock debugging due to kernel taint
[ 124.230481] Caller[000000000061ce48]: vfs_getattr+0x28/0x40
[ 124.303680] Caller[000000000061cf64]: vfs_statx+0x84/0xc0
[ 124.374593] Caller[000000000061d918]: sys_statx+0x38/0x60
[ 124.445503] Caller[0000000000406154]: linux_sparc_syscall+0x34/0x44
[ 124.527857] Caller[fff00001013fde40]: 0xfff00001013fde40
[ 124.597621] Instruction DUMP:
[ 124.597623] c2264000
[ 124.636505] 861027df
[ 124.667386] c45f6028
[ 124.698267] <c458a050>
[ 124.729148] 8408a401
[ 124.760031] 83789403
[ 124.790910] c2264000
[ 124.821801] c207600c
[ 124.852675] 80886800
[ 124.883556]
[ 124.954015] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 125.054721] Press Stop-A (L1-A) from sun keyboard or send break
[ 125.054721] twice on console to return to the boot prom
[ 125.201103] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
I suspect that the main issue is to be found in the following patch which introduced VM_FLUSH_RESET_PERMS
which may not work as expected on sun4u SPARC systems:
commit 868b104d7379e28013e9d48bdd2db25e0bdcf751
Author: Rick Edgecombe <rick.p.edgecombe@intel.com>
Date: Thu Apr 25 17:11:36 2019 -0700
mm/vmalloc: Add flag for freeing of special permsissions
Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to
immediately clear executable TLB entries before freeing pages, and handle
resetting permissions on the directmap. This flag is useful for any kind
of memory with elevated permissions, or where there can be related
permissions changes on the directmap. Today this is RO+X and RO memory.
Although this enables directly vfreeing non-writeable memory now,
non-writable memory cannot be freed in an interrupt because the allocation
itself is used as a node on deferred free list. So when RO memory needs to
be freed in an interrupt the code doing the vfree needs to have its own
work queue, as was the case before the deferred vfree list was added to
vmalloc.
For architectures with set_direct_map_ implementations this whole operation
can be done with one TLB flush when centralized like this. For others with
directmap permissions, currently only arm64, a backup method using
set_memory functions is used to reset the directmap. When arm64 adds
set_direct_map_ functions, this backup can be removed.
When the TLB is flushed to both remove TLB entries for the vmalloc range
mapping and the direct map permissions, the lazy purge operation could be
done to try to save a TLB flush later. However today vm_unmap_aliases
could flush a TLB range that does not include the directmap. So a helper
is added with extra parameters that can allow both the vmalloc address and
the direct mapping to be flushed during this operation. The behavior of the
normal vm_unmap_aliases function is unchanged.
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <akpm@linux-foundation.org>
Cc: <ard.biesheuvel@linaro.org>
Cc: <deneen.t.dock@intel.com>
Cc: <kernel-hardening@lists.openwall.com>
Cc: <kristen@linux.intel.com>
Cc: <linux_dti@icloud.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190426001143.4983-17-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The crash will always happen when support for transparent huge pages is enabled (CONFIG_TRANSPARENT_HUGEPAGE=y
and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y) and in particular on sun4u machines (but not so much the more modern
sun4v machines although I cannot rule out that the crashes sometimes happening on these machines is related
to this bug).
With THP enabled, the crash can be delayed by either reverting d563d678aa0b or, for example, by this crude hack:
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6dbcdceecae1..128118593b48 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2948,8 +2948,8 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
}
free_purged_blocks(&purge_list);
- if (!__purge_vmap_area_lazy(start, end, false) && flush)
- flush_tlb_kernel_range(start, end);
+ // if (!__purge_vmap_area_lazy(start, end, false) && flush)
+ // flush_tlb_kernel_range(start, end);
mutex_unlock(&vmap_purge_lock);
}
Please see also the discussion in [1].
Thanks,
Adrian
> [1] https://lore.kernel.org/all/35f5ec4eda8a7dbeeb7df9ec0be5c0b062c509f7.camel@physik.fu-berlin.de/
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v5 18/23] bpf: Use vmalloc special flag
2025-08-12 16:43 ` [PATCH v5 18/23] bpf: Use vmalloc special flag John Paul Adrian Glaubitz
@ 2025-08-12 18:03 ` Edgecombe, Rick P
2025-08-12 18:37 ` John Paul Adrian Glaubitz
0 siblings, 1 reply; 5+ messages in thread
From: Edgecombe, Rick P @ 2025-08-12 18:03 UTC (permalink / raw)
To: glaubitz@physik.fu-berlin.de, peterz@infradead.org,
mingo@redhat.com, luto@kernel.org, namit@vmware.com, bp@alien8.de
Cc: ard.biesheuvel@linaro.org, sam@gentoo.org, andreas@gaisler.com,
nadav.amit@gmail.com, dave.hansen@linux.intel.com,
anthony.yznaga@oracle.com, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org, will.deacon@arm.com,
linux_dti@icloud.com, deneen.t.dock@intel.com, linux-mm@kvack.org,
tglx@linutronix.de, linux-security-module@vger.kernel.org,
sparclinux@vger.kernel.org, hpa@zytor.com,
linux-integrity@vger.kernel.org, daniel@iogearbox.net,
kernel-hardening@lists.openwall.com, ast@kernel.org,
x86@kernel.org, kristen@linux.intel.com
On Tue, 2025-08-12 at 18:43 +0200, John Paul Adrian Glaubitz wrote:
> I suspect that the main issue is to be found in the following patch which introduced VM_FLUSH_RESET_PERMS
> which may not work as expected on sun4u SPARC systems:
I think the problem we found with VM_FLUSH_RESET_PERMS was that the sparc64
kernel TLB flush implementation was broken. Since VM_FLUSH_RESET_PERMS caused
kernel TLB flushes to happen sooner, it just showed up sooner. [0]
This other issue seems to be about userspace memory. So I wonder if these are
two separate issues? Bisecting to the original VM_FLUSH_RESET_PERMS would have
had the known sparc kernel range TLB flush issue. So to bisect the other issue
you might need to apply this [1].
[0] https://marc.info/?l=linux-sparc&m=155915694304118&w=2
[1] https://lore.kernel.org/all/57385AAB-C9A1-46AD-B743-445D4ECCA902@jrtc27.com/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v5 18/23] bpf: Use vmalloc special flag
2025-08-12 18:03 ` Edgecombe, Rick P
@ 2025-08-12 18:37 ` John Paul Adrian Glaubitz
2025-08-12 18:49 ` Edgecombe, Rick P
0 siblings, 1 reply; 5+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-08-12 18:37 UTC (permalink / raw)
To: Edgecombe, Rick P, peterz@infradead.org, mingo@redhat.com,
luto@kernel.org, bp@alien8.de
Cc: sam@gentoo.org, andreas@gaisler.com, nadav.amit@gmail.com,
dave.hansen@linux.intel.com, anthony.yznaga@oracle.com,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
will.deacon@arm.com, linux_dti@icloud.com,
deneen.t.dock@intel.com, linux-mm@kvack.org, tglx@linutronix.de,
linux-security-module@vger.kernel.org, sparclinux@vger.kernel.org,
hpa@zytor.com, linux-integrity@vger.kernel.org,
daniel@iogearbox.net, kernel-hardening@lists.openwall.com,
ast@kernel.org, x86@kernel.org, kristen@linux.intel.com
Hi,
On Tue, 2025-08-12 at 18:03 +0000, Edgecombe, Rick P wrote:
> On Tue, 2025-08-12 at 18:43 +0200, John Paul Adrian Glaubitz wrote:
> > I suspect that the main issue is to be found in the following patch which introduced VM_FLUSH_RESET_PERMS
> > which may not work as expected on sun4u SPARC systems:
>
> I think the problem we found with VM_FLUSH_RESET_PERMS was that the sparc64
> kernel TLB flush implementation was broken. Since VM_FLUSH_RESET_PERMS caused
> kernel TLB flushes to happen sooner, it just showed up sooner. [0]
>
> This other issue seems to be about userspace memory. So I wonder if these are
> two separate issues? Bisecting to the original VM_FLUSH_RESET_PERMS would have
> had the known sparc kernel range TLB flush issue. So to bisect the other issue
> you might need to apply this [1].
That could be true. I knew about the patch in [1] but I didn't think of applying it.
FWIW, the crashes we're seeing on recent kernel versions look like this:
[ 40.992851] \|/ ____ \|/
[ 40.992851] "@'/ .. \`@"
[ 40.992851] /_| \__/ |_\
[ 40.992851] \__U_/
[ 41.186220] (udev-worker)(88): Kernel illegal instruction [#1]
[ 41.262910] CPU: 0 UID: 0 PID: 88 Comm: (udev-worker) Tainted: G W 6.12.0+ #25
[ 41.376151] Tainted: [W]=WARN
[ 41.415025] TSTATE: 0000004411001607 TPC: 00000000101c21c0 TNPC: 00000000101c21c4 Y: 00000000 Tainted: G W
[ 41.563717] TPC: <ehci_init_driver+0x0/0x160 [ehci_hcd]>
[ 41.633584] g0: 00000000012005b8 g1: 00000000100a1800 g2: 0000000010206000 g3: 00000000101de000
[ 41.747962] g4: fff000000a5af380 g5: 0000000000000000 g6: fff000000aac8000 g7: 0000000000000e7b
[ 41.862338] o0: 0000000010060118 o1: 000000001020a000 o2: fff000000aa30ce0 o3: 0000000000000e7a
[ 41.976728] o4: 00000000ff000000 o5: 00ff000000000000 sp: fff000000aacb091 ret_pc: 00000000101de028
[ 42.095768] RPC: <ehci_pci_init+0x28/0x2000 [ehci_pci]>
[ 42.164394] l0: 0000000000000000 l1: 0000000100043fff l2: ffffffffff800000 l3: 0000000000800000
[ 42.278768] l4: fff00000001c8008 l5: 0000000000000000 l6: 00000000013358e0 l7: 0000000001002800
[ 42.393143] i0: ffffffffffffffed i1: 00000000004db8d8 i2: 0000000000000000 i3: fff000000aa304e0
[ 42.507517] i4: 0000000001127250 i5: 0000000010060000 i6: fff000000aacb141 i7: 0000000000427d90
[ 42.621893] I7: <do_one_initcall+0x30/0x200>
[ 42.677931] Call Trace:
[ 42.709953] [<0000000000427d90>] do_one_initcall+0x30/0x200
[ 42.783158] [<00000000004db908>] do_init_module+0x48/0x240
[ 42.855214] [<00000000004dd82c>] load_module+0x19cc/0x1f20
[ 42.927270] [<00000000004ddf8c>] init_module_from_file+0x6c/0xa0
[ 43.006189] [<00000000004de1e4>] sys_finit_module+0x1c4/0x2c0
[ 43.081677] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
[ 43.158307] Disabling lock debugging due to kernel taint
[ 43.228077] Caller[0000000000427d90]: do_one_initcall+0x30/0x200
[ 43.306995] Caller[00000000004db908]: do_init_module+0x48/0x240
[ 43.384772] Caller[00000000004dd82c]: load_module+0x19cc/0x1f20
[ 43.462544] Caller[00000000004ddf8c]: init_module_from_file+0x6c/0xa0
[ 43.547184] Caller[00000000004de1e4]: sys_finit_module+0x1c4/0x2c0
[ 43.628389] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
[ 43.710741] Caller[fff000010480e2fc]: 0xfff000010480e2fc
[ 43.780508] Instruction DUMP:
[ 43.780511] 00000000
[ 43.819394] 00000000
[ 43.850273] 00000000
[ 43.881153] <00000000>
[ 43.912036] 00000000
[ 43.942917] 00000000
[ 43.973797] 00000000
[ 44.004678] 00000000
[ 44.035561] 00000000
[ 44.066443]
Do you have any suggestion what to bisect?
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v5 18/23] bpf: Use vmalloc special flag
2025-08-12 18:37 ` John Paul Adrian Glaubitz
@ 2025-08-12 18:49 ` Edgecombe, Rick P
2025-08-12 18:59 ` John Paul Adrian Glaubitz
0 siblings, 1 reply; 5+ messages in thread
From: Edgecombe, Rick P @ 2025-08-12 18:49 UTC (permalink / raw)
To: glaubitz@physik.fu-berlin.de, peterz@infradead.org,
mingo@redhat.com, luto@kernel.org, bp@alien8.de
Cc: sam@gentoo.org, andreas@gaisler.com, nadav.amit@gmail.com,
anthony.yznaga@oracle.com, dave.hansen@linux.intel.com,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux_dti@icloud.com, will.deacon@arm.com,
deneen.t.dock@intel.com, linux-mm@kvack.org, tglx@linutronix.de,
linux-security-module@vger.kernel.org, sparclinux@vger.kernel.org,
hpa@zytor.com, linux-integrity@vger.kernel.org,
daniel@iogearbox.net, kernel-hardening@lists.openwall.com,
ast@kernel.org, x86@kernel.org, kristen@linux.intel.com
On Tue, 2025-08-12 at 20:37 +0200, John Paul Adrian Glaubitz wrote:
> That could be true. I knew about the patch in [1] but I didn't think of applying it.
>
> FWIW, the crashes we're seeing on recent kernel versions look like this:
>
> [ 40.992851] \|/ ____ \|/
> [ 40.992851] "@'/ .. \`@"
> [ 40.992851] /_| \__/ |_\
> [ 40.992851] \__U_/
> [ 41.186220] (udev-worker)(88): Kernel illegal instruction [#1]
Possibly re-using some stale TLB executable VA which's page now has other data
in it.
> [ 41.262910] CPU: 0 UID: 0 PID: 88 Comm: (udev-worker) Tainted: G W 6.12.0+ #25
> [ 41.376151] Tainted: [W]=WARN
> [ 41.415025] TSTATE: 0000004411001607 TPC: 00000000101c21c0 TNPC: 00000000101c21c4 Y: 00000000 Tainted: G W
> [ 41.563717] TPC: <ehci_init_driver+0x0/0x160 [ehci_hcd]>
> [ 41.633584] g0: 00000000012005b8 g1: 00000000100a1800 g2: 0000000010206000 g3: 00000000101de000
> [ 41.747962] g4: fff000000a5af380 g5: 0000000000000000 g6: fff000000aac8000 g7: 0000000000000e7b
> [ 41.862338] o0: 0000000010060118 o1: 000000001020a000 o2: fff000000aa30ce0 o3: 0000000000000e7a
> [ 41.976728] o4: 00000000ff000000 o5: 00ff000000000000 sp: fff000000aacb091 ret_pc: 00000000101de028
> [ 42.095768] RPC: <ehci_pci_init+0x28/0x2000 [ehci_pci]>
> [ 42.164394] l0: 0000000000000000 l1: 0000000100043fff l2: ffffffffff800000 l3: 0000000000800000
> [ 42.278768] l4: fff00000001c8008 l5: 0000000000000000 l6: 00000000013358e0 l7: 0000000001002800
> [ 42.393143] i0: ffffffffffffffed i1: 00000000004db8d8 i2: 0000000000000000 i3: fff000000aa304e0
> [ 42.507517] i4: 0000000001127250 i5: 0000000010060000 i6: fff000000aacb141 i7: 0000000000427d90
> [ 42.621893] I7: <do_one_initcall+0x30/0x200>
> [ 42.677931] Call Trace:
> [ 42.709953] [<0000000000427d90>] do_one_initcall+0x30/0x200
> [ 42.783158] [<00000000004db908>] do_init_module+0x48/0x240
> [ 42.855214] [<00000000004dd82c>] load_module+0x19cc/0x1f20
> [ 42.927270] [<00000000004ddf8c>] init_module_from_file+0x6c/0xa0
> [ 43.006189] [<00000000004de1e4>] sys_finit_module+0x1c4/0x2c0
> [ 43.081677] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> [ 43.158307] Disabling lock debugging due to kernel taint
> [ 43.228077] Caller[0000000000427d90]: do_one_initcall+0x30/0x200
> [ 43.306995] Caller[00000000004db908]: do_init_module+0x48/0x240
> [ 43.384772] Caller[00000000004dd82c]: load_module+0x19cc/0x1f20
> [ 43.462544] Caller[00000000004ddf8c]: init_module_from_file+0x6c/0xa0
> [ 43.547184] Caller[00000000004de1e4]: sys_finit_module+0x1c4/0x2c0
> [ 43.628389] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
> [ 43.710741] Caller[fff000010480e2fc]: 0xfff000010480e2fc
> [ 43.780508] Instruction DUMP:
> [ 43.780511] 00000000
> [ 43.819394] 00000000
> [ 43.850273] 00000000
> [ 43.881153] <00000000>
> [ 43.912036] 00000000
> [ 43.942917] 00000000
> [ 43.973797] 00000000
> [ 44.004678] 00000000
> [ 44.035561] 00000000
> [ 44.066443]
>
> Do you have any suggestion what to bisect?
This does look like kernel range TLB flush related. Not sure how it's related to
userspace huge pages. Perhaps the userspace range TLB flush has issues to? Or
the TLB flush asm needs to be fixed in this another sparc variant?
So far two issues were found with that patch and they were both rare
architectures with broken kernel TLB flushes. Kernel TLB flushes can actually
not be required for a long time, so probably the bug normally looked like
unexplained crashes after days. The VM_FLUSH_RESET_PERMS just made them show up
earlier in a bisectable way.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v5 18/23] bpf: Use vmalloc special flag
2025-08-12 18:49 ` Edgecombe, Rick P
@ 2025-08-12 18:59 ` John Paul Adrian Glaubitz
0 siblings, 0 replies; 5+ messages in thread
From: John Paul Adrian Glaubitz @ 2025-08-12 18:59 UTC (permalink / raw)
To: Edgecombe, Rick P, peterz@infradead.org, mingo@redhat.com,
luto@kernel.org, bp@alien8.de
Cc: sam@gentoo.org, andreas@gaisler.com, nadav.amit@gmail.com,
anthony.yznaga@oracle.com, dave.hansen@linux.intel.com,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux_dti@icloud.com, will.deacon@arm.com, linux-mm@kvack.org,
tglx@linutronix.de, linux-security-module@vger.kernel.org,
sparclinux@vger.kernel.org, hpa@zytor.com,
linux-integrity@vger.kernel.org, daniel@iogearbox.net,
kernel-hardening@lists.openwall.com, ast@kernel.org,
x86@kernel.org, kristen@linux.intel.com
On Tue, 2025-08-12 at 18:49 +0000, Edgecombe, Rick P wrote:
> On Tue, 2025-08-12 at 20:37 +0200, John Paul Adrian Glaubitz wrote:
> > That could be true. I knew about the patch in [1] but I didn't think of applying it.
> >
> > FWIW, the crashes we're seeing on recent kernel versions look like this:
> >
> > [ 40.992851] \|/ ____ \|/
> > [ 40.992851] "@'/ .. \`@"
> > [ 40.992851] /_| \__/ |_\
> > [ 40.992851] \__U_/
> > [ 41.186220] (udev-worker)(88): Kernel illegal instruction [#1]
>
> Possibly re-using some stale TLB executable VA which's page now has other data
> in it.
Makes sense given the memory is actually zero'd out.
> > [ 41.262910] CPU: 0 UID: 0 PID: 88 Comm: (udev-worker) Tainted: G W 6.12.0+ #25
> > [ 41.376151] Tainted: [W]=WARN
> > [ 41.415025] TSTATE: 0000004411001607 TPC: 00000000101c21c0 TNPC: 00000000101c21c4 Y: 00000000 Tainted: G W
> > [ 41.563717] TPC: <ehci_init_driver+0x0/0x160 [ehci_hcd]>
> > [ 41.633584] g0: 00000000012005b8 g1: 00000000100a1800 g2: 0000000010206000 g3: 00000000101de000
> > [ 41.747962] g4: fff000000a5af380 g5: 0000000000000000 g6: fff000000aac8000 g7: 0000000000000e7b
> > [ 41.862338] o0: 0000000010060118 o1: 000000001020a000 o2: fff000000aa30ce0 o3: 0000000000000e7a
> > [ 41.976728] o4: 00000000ff000000 o5: 00ff000000000000 sp: fff000000aacb091 ret_pc: 00000000101de028
> > [ 42.095768] RPC: <ehci_pci_init+0x28/0x2000 [ehci_pci]>
> > [ 42.164394] l0: 0000000000000000 l1: 0000000100043fff l2: ffffffffff800000 l3: 0000000000800000
> > [ 42.278768] l4: fff00000001c8008 l5: 0000000000000000 l6: 00000000013358e0 l7: 0000000001002800
> > [ 42.393143] i0: ffffffffffffffed i1: 00000000004db8d8 i2: 0000000000000000 i3: fff000000aa304e0
> > [ 42.507517] i4: 0000000001127250 i5: 0000000010060000 i6: fff000000aacb141 i7: 0000000000427d90
> > [ 42.621893] I7: <do_one_initcall+0x30/0x200>
> > [ 42.677931] Call Trace:
> > [ 42.709953] [<0000000000427d90>] do_one_initcall+0x30/0x200
> > [ 42.783158] [<00000000004db908>] do_init_module+0x48/0x240
> > [ 42.855214] [<00000000004dd82c>] load_module+0x19cc/0x1f20
> > [ 42.927270] [<00000000004ddf8c>] init_module_from_file+0x6c/0xa0
> > [ 43.006189] [<00000000004de1e4>] sys_finit_module+0x1c4/0x2c0
> > [ 43.081677] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
> > [ 43.158307] Disabling lock debugging due to kernel taint
> > [ 43.228077] Caller[0000000000427d90]: do_one_initcall+0x30/0x200
> > [ 43.306995] Caller[00000000004db908]: do_init_module+0x48/0x240
> > [ 43.384772] Caller[00000000004dd82c]: load_module+0x19cc/0x1f20
> > [ 43.462544] Caller[00000000004ddf8c]: init_module_from_file+0x6c/0xa0
> > [ 43.547184] Caller[00000000004de1e4]: sys_finit_module+0x1c4/0x2c0
> > [ 43.628389] Caller[0000000000406174]: linux_sparc_syscall+0x34/0x44
> > [ 43.710741] Caller[fff000010480e2fc]: 0xfff000010480e2fc
> > [ 43.780508] Instruction DUMP:
> > [ 43.780511] 00000000
> > [ 43.819394] 00000000
> > [ 43.850273] 00000000
> > [ 43.881153] <00000000>
> > [ 43.912036] 00000000
> > [ 43.942917] 00000000
> > [ 43.973797] 00000000
> > [ 44.004678] 00000000
> > [ 44.035561] 00000000
> > [ 44.066443]
> >
> > Do you have any suggestion what to bisect?
>
> This does look like kernel range TLB flush related. Not sure how it's related to
> userspace huge pages. Perhaps the userspace range TLB flush has issues to? Or
> the TLB flush asm needs to be fixed in this another sparc variant?
The patch you previously linked actually fixed this particular SPARC variant which
is sun4u, i.e. the non-hypervisor variant with sun4v being the hypervisor one.
I was already thinking that the fix in d3c976c14ad8 was possible incomplete.
> So far two issues were found with that patch and they were both rare
> architectures with broken kernel TLB flushes. Kernel TLB flushes can actually
> not be required for a long time, so probably the bug normally looked like
> unexplained crashes after days. The VM_FLUSH_RESET_PERMS just made them show up
> earlier in a bisectable way.
Yeah, I think that could actually be the case.
I wonder whether I can revert both d3c976c14ad8 and a74ad5e660a9 on a current
tree and see if that fixes the bug.
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-08-12 18:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20190426001143.4983-1-namit@vmware.com>
[not found] ` <20190426001143.4983-19-namit@vmware.com>
2025-08-12 16:43 ` [PATCH v5 18/23] bpf: Use vmalloc special flag John Paul Adrian Glaubitz
2025-08-12 18:03 ` Edgecombe, Rick P
2025-08-12 18:37 ` John Paul Adrian Glaubitz
2025-08-12 18:49 ` Edgecombe, Rick P
2025-08-12 18:59 ` John Paul Adrian Glaubitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).