Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* [PATCHv7 bpf-next 00/29] bpf: tracing_multi link
From: Jiri Olsa @ 2026-06-03 11:05 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Hengqi Chen, bpf, linux-trace-kernel, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, Menglong Dong,
	Steven Rostedt

hi,
adding tracing_multi link support that allows fast attachment
of tracing program to many functions.

RFC: https://lore.kernel.org/bpf/20260203093819.2105105-1-jolsa@kernel.org/
v1: https://lore.kernel.org/bpf/20260220100649.628307-1-jolsa@kernel.org/
v2: https://lore.kernel.org/bpf/20260304222141.497203-1-jolsa@kernel.org/
v3: https://lore.kernel.org/bpf/20260316075138.465430-1-jolsa@kernel.org/
v4: https://lore.kernel.org/bpf/20260324081846.2334094-1-jolsa@kernel.org/
v5: https://lore.kernel.org/bpf/20260417192502.194548-1-jolsa@kernel.org/
v6: https://lore.kernel.org/bpf/20260527113951.46265-1-jolsa@kernel.org/

v7 changes:
- added ftrace_hash_count stub for !CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS cade [sashiko]
- selftests fixes [sashiko]
- use hash_ptr in select_trampoline_lock [sashiko]
- changed the check duplicate logic in check_dup_ids [sashiko]
- use sort_r_nonatomic in check_dup_ids [sashiko]
- added BPF_TRACE_FSESSION_MULTI to can_be_sleepable,
  plus added testcase for sleepable fsession
- make bpf_tracing_multi_opts pointer fields as const
- add ___migrate_enable to trace_blacklist

v6 changes:
- move ftrace_hash_count declaration under CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS [sashiko]
- fix ftrace_hash_remove check/deref [sashiko]
- disable context access for multi programs by using stub function with no arguments 
  for verification [sashiko]
- add __used for bpf_multi_func, and removed arguments, we do not allow direct access [sashiko]
- rebased on latest loongarch changes, fix ppc build
- guard update_ftrace_direct_del with ftrace_hash_count on rollback [sashiko]
- fix noreturn attachment condition in bpf_check_attach_btf_id_multi [sashiko]
- fail early on multiple same IDs provided by user [sashiko]
- fix selftests error paths [sashiko]
- add MAX_RESOLVE_DEPTH check to btf_get_type_size [sashiko]
- use btf__pointer_size [sashiko]
- fixed compilation on powerpc [sashiko]
- added verifier fails selftest
- after discussing with Song, it was determined that cleaning up FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER
  is not strictly necessary — keeping the trampoline in the ipmodify_enabled state is acceptable.
  The race condition this introduces remains unlikely, so the concern raised in [1] will not be
  addressed at this time.
  [1] https://lore.kernel.org/bpf/aec7bAbGlnEo3R1g@krava/

v5 changes:
- add dedicated hashes used for detach, so there's no need to allocate
  them on detach [sashiko]
- safely release old trampoline images [sashiko]
- add cond_resched() to couple of loops [sashiko]
- validate attr->link_create.target_fd [sashiko]
- allow only bpf_get_func_ret() for return value retrieval [sashiko]
- do not allow attachment of fexit/fsession_multi for noreturn functions [sashiko]
- fixed double free/close in libbpf btf cleanup, in separate patch [sashiko]
- make btf_type_is_traceable_func closer to btf_distill_func_proto [sashiko]
- add prog->attach_btf_obj_fd check to collect_func_ids_by_glob,
  to check we don't load module programs for kernel [sashiko]
- make sure program is loaded in bpf_program__attach_tracing_multi [sashiko]
- several selftests fixes [sashiko]
- add attach_type to fdinfo output [Leon Hwang]
- selftests cleanup fixes [Leon Hwang]

v4 changes:
- unlink rollback fix (added ftrace_hash_count) [bot]
- use const for some bpf_link_create_opts tracing_multi members [bot]
- adding missing comment for lockdep keys [bot]
- selftest error path fixes (leaks) and other assorted test fixes [Leon Hwang]
- several compile fixes wrt CONFIG_BPF_SYSCALL and CONFIG_BPF_JIT [kernel test robot]
- make ftrace_hash_clear global, because it's needed in rollback

v3 changes:
- fix module parsing [Leon Hwang]
- use function traceable check from libbpf [Leon Hwang]
- use ptr_to_u64 and fix/updated few comments [ci]
- display cookies as decimal numbers [ci]
- added link_create.flags check [ci]
- fix error path in bpf_trampoline_multi_detach [ci]
- make fentry/fexit.multi not extendable [ci]
- add missing OPTS_VALID to bpf_program__attach_tracing_multi [ci]

v2 changes:
- allocate data.unreg in bpf_trampoline_multi_attach for rollback path [ci]
  and fixed link count setup in rollback path [ci]
- several small assorted fixes [ci]
- added loongarch and powerpc changes for struct bpf_tramp_node change
- added support to attach functions from modules
- added tests for sleepable programs
- added rollback tests

v1 changes:
- added ftrace_hash_count as wrapper for hash_count [Steven]
- added trampoline mutex pool [Andrii]
- reworked 'struct bpf_tramp_node' separatoin [Andrii]
  - the 'struct bpf_tramp_node' now holds pointer to bpf_link,
    which is similar to what we do for uprobe_multi;
    I understand it's not a fundamental change compared to previous
    version which used bpf_prog pointer instead, but I don't see better
    way of doing this.. I'm happy to discuss this further if there's
    better idea
- reworked 'struct bpf_fsession_link' based on bpf_tramp_node
- made btf__find_by_glob_kind function internal helper [Andrii]
- many small assorted fixes [Andrii,CI]
- added session support [Leon Hwang]
- added cookies support
- added more tests


Note I plan to send linkinfo support separately, the patchset is big enough.

thanks,
jirka


Cc: Hengqi Chen <hengqi.chen@gmail.com>
---
Jiri Olsa (29):
      ftrace: Add ftrace_hash_count function
      ftrace: Add ftrace_hash_remove function
      ftrace: Add add_ftrace_hash_entry function
      bpf: Use mutex lock pool for bpf trampolines
      bpf: Add struct bpf_trampoline_ops object
      bpf: Move trampoline image setup into bpf_trampoline_ops callbacks
      bpf: Add bpf_trampoline_add/remove_prog functions
      bpf: Add struct bpf_tramp_node object
      bpf: Factor fsession link to use struct bpf_tramp_node
      bpf: Add multi tracing attach types
      bpf: Move sleepable verification code to btf_id_allow_sleepable
      bpf: Add bpf_trampoline_multi_attach/detach functions
      bpf: Add support for tracing multi link
      bpf: Add support for tracing_multi link cookies
      bpf: Add support for tracing_multi link session
      bpf: Add support for tracing_multi link fdinfo
      libbpf: Add bpf_object_cleanup_btf function
      libbpf: Add bpf_link_create support for tracing_multi link
      libbpf: Add btf_type_is_traceable_func function
      libbpf: Add support to create tracing multi link
      selftests/bpf: Add tracing multi skel/pattern/ids attach tests
      selftests/bpf: Add tracing multi skel/pattern/ids module attach tests
      selftests/bpf: Add tracing multi intersect tests
      selftests/bpf: Add tracing multi cookies test
      selftests/bpf: Add tracing multi session test
      selftests/bpf: Add tracing multi attach fails test
      selftests/bpf: Add tracing multi verifier fails test
      selftests/bpf: Add tracing multi attach benchmark test
      selftests/bpf: Add tracing multi attach rollback tests

 arch/arm64/net/bpf_jit_comp.c                                      |  58 ++--
 arch/loongarch/net/bpf_jit.c                                       |  52 ++--
 arch/powerpc/net/bpf_jit_comp.c                                    |  54 ++--
 arch/riscv/net/bpf_jit_comp64.c                                    |  52 ++--
 arch/s390/net/bpf_jit_comp.c                                       |  44 +--
 arch/x86/net/bpf_jit_comp.c                                        |  54 ++--
 include/linux/bpf.h                                                | 117 ++++++--
 include/linux/bpf_types.h                                          |   1 +
 include/linux/bpf_verifier.h                                       |   4 +
 include/linux/btf_ids.h                                            |   1 +
 include/linux/ftrace.h                                             |   9 +
 include/linux/trace_events.h                                       |   6 +
 include/uapi/linux/bpf.h                                           |   9 +
 kernel/bpf/bpf_struct_ops.c                                        |  27 +-
 kernel/bpf/fixups.c                                                |   2 +
 kernel/bpf/syscall.c                                               |  83 +++---
 kernel/bpf/trampoline.c                                            | 670 +++++++++++++++++++++++++++++++++----------
 kernel/bpf/verifier.c                                              | 183 ++++++++++--
 kernel/trace/bpf_trace.c                                           | 204 +++++++++++++-
 kernel/trace/ftrace.c                                              |  35 ++-
 net/bpf/bpf_dummy_struct_ops.c                                     |  14 +-
 net/bpf/test_run.c                                                 |   3 +
 tools/include/uapi/linux/bpf.h                                     |  10 +
 tools/lib/bpf/bpf.c                                                |   9 +
 tools/lib/bpf/bpf.h                                                |   5 +
 tools/lib/bpf/libbpf.c                                             | 375 ++++++++++++++++++++++++-
 tools/lib/bpf/libbpf.h                                             |  15 +
 tools/lib/bpf/libbpf.map                                           |   1 +
 tools/lib/bpf/libbpf_internal.h                                    |   1 +
 tools/testing/selftests/bpf/Makefile                               |   9 +-
 tools/testing/selftests/bpf/prog_tests/tracing_multi.c             | 936 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/progs/tracing_multi_attach.c           |  39 +++
 tools/testing/selftests/bpf/progs/tracing_multi_attach_module.c    |  25 ++
 tools/testing/selftests/bpf/progs/tracing_multi_bench.c            |  12 +
 tools/testing/selftests/bpf/progs/tracing_multi_check.c            | 214 ++++++++++++++
 tools/testing/selftests/bpf/progs/tracing_multi_fail.c             |  18 ++
 tools/testing/selftests/bpf/progs/tracing_multi_intersect_attach.c |  41 +++
 tools/testing/selftests/bpf/progs/tracing_multi_rollback.c         |  43 +++
 tools/testing/selftests/bpf/progs/tracing_multi_session_attach.c   |  65 +++++
 tools/testing/selftests/bpf/progs/tracing_multi_verifier.c         |  31 ++
 tools/testing/selftests/bpf/trace_helpers.c                        |   7 +-
 tools/testing/selftests/bpf/trace_helpers.h                        |   1 +
 42 files changed, 3110 insertions(+), 429 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi.c
 create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_attach_module.c
 create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_bench.c
 create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_check.c
 create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_intersect_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_rollback.c
 create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_session_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_verifier.c

^ permalink raw reply

* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: David Hildenbrand (Arm) @ 2026-06-03 10:00 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
	mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
	Usama Arif, usamaarif642
In-Reply-To: <19639b08-5bf1-4974-9635-c458d512fa38@redhat.com>


>  next_order:
> -		if ((BIT(order) - 1) & enabled_orders) {
> -			const u8 next_order = order - 1;
> -			const u16 mid_offset = offset + (nr_ptes / 2);
> -
> -			collapse_mthp_stack_push(cc, &stack_size, mid_offset,
> -						 next_order);
> -			collapse_mthp_stack_push(cc, &stack_size, offset,
> -						 next_order);
> +		if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
> +			(BIT(order) - 1) & enabled_orders) {

Why not a test_bit() ?


But, wouldn't you want to skip orders that are not enabled and try with the next
smaller one in any case before you advance the offset?

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: David Hildenbrand (Arm) @ 2026-06-03  9:55 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
	mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
	Usama Arif, usamaarif642
In-Reply-To: <19639b08-5bf1-4974-9635-c458d512fa38@redhat.com>

On 6/2/26 19:23, Nico Pache wrote:
> 
> 
> On 6/1/26 7:15 AM, David Hildenbrand (Arm) wrote:
>>>
>>> So I looked into your items below. It seems logical, and I think it
>>> works the same way; however, your method seems slightly harder to
>>> understand due to all the edge cases and more error-prone to future
>>> changes (the stack holds implicit knowledge of the offset/order that
>>> must now be tracked in the edge cases).
>>>
>>> Given the stack is 24 bytes, I'm not sure if the extra complexity is
>>> worth saving that small amount of memory. Although we would also be
>>> getting rid of (3?) functions, so both approaches have pros and cons.
>>
>> I consider a simple forward loop over the offset ... less complexity compared to
>> a stack structure :)
>>
>>>
>>> I will implement a patch comparing your solution against mine and send
>>> it here, then we can decide which approach is better.
>>
>> Right, throw it over the fence and I'll see how to improve it further.
> 
> Ok heres what the diff looks like on top of my V19. 
> 
> you can access the tree here https://gitlab.com/npache/linux/-/commits/mthp-v19?ref_type=heads for easier review.
> 
> So far I have no problem with this approach it appeared cleaner than i thought. Did some light testing. Gonna throw it more through the ringer tomorrow. 

It's very clean.

Almost too nice to be true ;)

[...]

>  	unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
>  	enum scan_result last_result = SCAN_FAIL;
> -	int collapsed = 0, stack_size = 0;
> +	int collapsed = 0;
>  	bool alloc_failed = false;
>  	unsigned long collapse_address;
> -	struct mthp_range range;
> -	u16 offset;
> -	u8 order;
> +	unsigned int offset = 0;
> +	unsigned int order = HPAGE_PMD_ORDER;


In include/linux/huge_mm.h we have

	highest_order()

and

	next_order()

They essentially allow you to get rid of the test_bit() and just jump to the
next enabled order right away.

I assume with only a handful of enabled_orders, that might be much more efficient.

I tried to optimize it and ended with the following, which is completely untested.

I think it might make sense to defer that and start with the simple approach you have.

I do wonder, though, about the last hunk below: should we bail out early if
enabled_orders is suddenly 0?



From 0d8ff955b3071f354b7fc9b627820fa374fa99dc Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@kernel.org>
Date: Wed, 3 Jun 2026 11:52:44 +0200
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
---
 include/linux/huge_mm.h |   5 ++
 mm/khugepaged.c         | 132 ++++++++++++++++++++++------------------
 2 files changed, 78 insertions(+), 59 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 48496f09909b..099318bc1181 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -205,6 +205,11 @@ static inline int highest_order(unsigned long orders)
 	return fls_long(orders) - 1;
 }
 
+static inline int smallest_order(unsigned long orders)
+{
+	return __ffs(orders);
+}
+
 static inline int next_order(unsigned long *orders, int prev)
 {
 	*orders &= ~BIT(prev);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6de935e76ceb..49be9d1a88cb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -99,8 +99,6 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
 
 static struct kmem_cache *mm_slot_cache __ro_after_init;
 
-#define KHUGEPAGED_MIN_MTHP_ORDER	2
-
 struct collapse_control {
 	bool is_khugepaged;
 
@@ -1454,76 +1452,86 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long s
  */
 static enum scan_result mthp_collapse(struct mm_struct *mm,
 		unsigned long address, int referenced, int unmapped,
-		struct collapse_control *cc, unsigned long enabled_orders)
+		struct collapse_control *cc, const unsigned long enabled_orders)
 {
-	unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
 	enum scan_result last_result = SCAN_FAIL;
 	int collapsed = 0;
 	bool alloc_failed = false;
 	unsigned long collapse_address;
 	unsigned int offset = 0;
-	unsigned int order = HPAGE_PMD_ORDER;
 
+	/* We cannot collapse anon folios to order-1 or order-0. */
+	VM_WARN_ON_ONCE(!enabled_order || (enabled_orders & 0x3));
 
 	while (offset < HPAGE_PMD_NR) {
-		nr_ptes = 1UL << order;
-
-		if (!test_bit(order, &enabled_orders))
-			goto next_order;
-
-		max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
-		nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
-						      offset + nr_ptes);
-
-		if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
-			enum scan_result ret;
-
-			collapse_address = address + offset * PAGE_SIZE;
-			ret = collapse_huge_page(mm, collapse_address, referenced,
-						 unmapped, cc, order);
-
-			switch (ret) {
-			/* Cases where we continue to next collapse candidate */
-			case SCAN_SUCCEED:
-				collapsed += nr_ptes;
-				fallthrough;
-			case SCAN_PTE_MAPPED_HUGEPAGE:
-				goto next_offset;
-			/* Cases where lower orders might still succeed */
-			case SCAN_ALLOC_HUGE_PAGE_FAIL:
-				alloc_failed = true;
-				fallthrough;
-			case SCAN_LACK_REFERENCED_PAGE:
-			case SCAN_EXCEED_NONE_PTE:
-			case SCAN_EXCEED_SWAP_PTE:
-			case SCAN_EXCEED_SHARED_PTE:
-			case SCAN_PAGE_LOCK:
-			case SCAN_PAGE_COUNT:
-			case SCAN_PAGE_NULL:
-			case SCAN_DEL_PAGE_LRU:
-			case SCAN_PTE_NON_PRESENT:
-			case SCAN_PTE_UFFD_WP:
-			case SCAN_PAGE_LAZYFREE:
-				last_result = ret;
-				goto next_order;
-			/* Cases where no further collapse is possible */
-			case SCAN_PMD_MAPPED:
-				fallthrough;
-			default:
-				last_result = ret;
-				goto done;
+		/*
+		 * We can only collapse to a maximum order for a given offset.
+		 * So ignore all orders that do not apply to the current
+		 * offset, then see if any order to collapse to remains.
+		 */
+		unsigned long orders = enabled_orders & GENMASK(__ffs(offset), 0);
+		unsigned int order = highest_order(orders);
+
+		while (order) {
+			const unsigned int nr_ptes = 1UL << order;
+			unsigned int nr_occupied_ptes, max_ptes_none;
+
+			max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
+			nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
+							      offset + nr_ptes);
+
+			if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
+				enum scan_result ret;
+
+				collapse_address = address + offset * PAGE_SIZE;
+				ret = collapse_huge_page(mm, collapse_address, referenced,
+							 unmapped, cc, order);
+
+				switch (ret) {
+				/* Cases where we continue to next collapse candidate */
+				case SCAN_SUCCEED:
+					collapsed += nr_ptes;
+					fallthrough;
+				case SCAN_PTE_MAPPED_HUGEPAGE:
+					goto next_offset;
+				/* Cases where lower orders might still succeed */
+				case SCAN_ALLOC_HUGE_PAGE_FAIL:
+					alloc_failed = true;
+					fallthrough;
+				case SCAN_LACK_REFERENCED_PAGE:
+				case SCAN_EXCEED_NONE_PTE:
+				case SCAN_EXCEED_SWAP_PTE:
+				case SCAN_EXCEED_SHARED_PTE:
+				case SCAN_PAGE_LOCK:
+				case SCAN_PAGE_COUNT:
+				case SCAN_PAGE_NULL:
+				case SCAN_DEL_PAGE_LRU:
+				case SCAN_PTE_NON_PRESENT:
+				case SCAN_PTE_UFFD_WP:
+				case SCAN_PAGE_LAZYFREE:
+					last_result = ret;
+					break;
+				/* Cases where no further collapse is possible */
+				case SCAN_PMD_MAPPED:
+					fallthrough;
+				default:
+					last_result = ret;
+					goto done;
+				}
 			}
-		}
 
-next_order:
-		if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
-			(BIT(order) - 1) & enabled_orders) {
-			order = order - 1;
-			continue;
+			order = next_order(&orders, order);
 		}
+
 next_offset:
-		offset += nr_ptes;
-		order = min_t(int, __ffs(offset), HPAGE_PMD_ORDER);
+		/*
+		 * Continue with the next collapse candidate. If we do not
+		 * have an order, skip to nest smallest mTHP we can collapse to.
+		 */
+		if (order)
+			offset += 1UL << order;
+		else
+			offset = ALIGN(offset + 1, smallest_order(enabled_orders));
 	}
 done:
 	if (collapsed)
@@ -1567,6 +1575,12 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 
 	enabled_orders = collapse_allowable_orders(vma, vma->vm_flags, tva_flags);
 
+	if (unlikely(!enabled_orders)) {
+		cc->progress++;
+		result = SCAN_SUCCEED;
+		goto out;
+	}
+
 	/*
 	 * If PMD is the only enabled order, enforce max_ptes_none, otherwise
 	 * scan all pages to populate the bitmap for mTHP collapse.
-- 
2.43.0


-- 
Cheers,

David

^ permalink raw reply related

* Re: [PATCH v2 07/13] rv: Simply hybrid automata monitors's clock variables
From: Gabriele Monaco @ 2026-06-03  9:27 UTC (permalink / raw)
  To: Nam Cao
  Cc: Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
	linux-kernel
In-Reply-To: <ed1719ebe4af8872673af4264fdbf9ad96425b7f.1779956342.git.namcao@linutronix.de>

s/Simply/Simplify/ from the patch title.

On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
>  /*
> @@ -389,14 +357,14 @@ static inline void ha_setup_timer(struct
> ha_monitor *ha_mon)
>  static inline void ha_start_timer_jiffy(struct ha_monitor *ha_mon,
> enum envs env,
>  					u64 expire, u64 time_ns)
>  {
> -	u64 passed = ha_invariant_passed_jiffy(ha_mon, env, expire,
> time_ns);
> +	u64 passed = ha_invariant_passed_jiffy(ha_mon, env,
> time_ns);
>  
>  	mod_timer(&ha_mon->timer, get_jiffies_64() + expire -
> passed);
>  }
>  static inline void ha_start_timer_ns(struct ha_monitor *ha_mon, enum
> envs env,
>  				     u64 expire, u64 time_ns)
>  {
> -	u64 passed = ha_invariant_passed_ns(ha_mon, env, expire,
> time_ns);
> +	u64 passed = ha_invariant_passed_ns(ha_mon, env, time_ns);
>  
>  	ha_start_timer_jiffy(ha_mon, ENV_MAX_STORED,
>  			     nsecs_to_jiffies(expire - passed +
> TICK_NSEC - 1), time_ns);
> @@ -438,7 +406,7 @@ static inline void ha_start_timer_ns(struct
> ha_monitor *ha_mon, enum envs env,
>  				     u64 expire, u64 time_ns)
>  {
>  	int mode = HRTIMER_MODE_REL_HARD;
> -	u64 passed = ha_invariant_passed_ns(ha_mon, env, expire,
> time_ns);
> +	u64 passed = ha_invariant_passed_ns(ha_mon, env, time_ns);
>  

You need to remove expire also for ha_invariant_passed_jiffy in the
hrtimer flavour (just set HA_TIMER_HRTIMER in stall and you see it
won't compile). Jiffy granularity monitors with hrtimers are an
unlikely usecase but still supported.

Other than that it looks good.

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

Thanks,
Gabriele

>  	if (RV_MON_TYPE == RV_MON_PER_CPU)
>  		mode |= HRTIMER_MODE_PINNED;
> diff --git a/kernel/trace/rv/monitors/nomiss/nomiss.c
> b/kernel/trace/rv/monitors/nomiss/nomiss.c
> index a0b5641a1858..19d0e9aa4d58 100644
> --- a/kernel/trace/rv/monitors/nomiss/nomiss.c
> +++ b/kernel/trace/rv/monitors/nomiss/nomiss.c
> @@ -57,24 +57,12 @@ static inline bool ha_verify_invariants(struct
> ha_monitor *ha_mon,
>  					enum states next_state, u64
> time_ns)
>  {
>  	if (curr_state == ready_nomiss)
> -		return ha_check_invariant_ns(ha_mon, clk_nomiss,
> time_ns);
> +		return ha_check_invariant_ns(ha_mon, clk_nomiss,
> time_ns, DEADLINE_NS(ha_mon));
>  	else if (curr_state == running_nomiss)
> -		return ha_check_invariant_ns(ha_mon, clk_nomiss,
> time_ns);
> +		return ha_check_invariant_ns(ha_mon, clk_nomiss,
> time_ns, DEADLINE_NS(ha_mon));
>  	return true;
>  }
>  
> -static inline void ha_convert_inv_guard(struct ha_monitor *ha_mon,
> -					enum states curr_state, enum
> events event,
> -					enum states next_state, u64
> time_ns)
> -{
> -	if (curr_state == next_state)
> -		return;
> -	if (curr_state == ready_nomiss)
> -		ha_inv_to_guard(ha_mon, clk_nomiss,
> DEADLINE_NS(ha_mon), time_ns);
> -	else if (curr_state == running_nomiss)
> -		ha_inv_to_guard(ha_mon, clk_nomiss,
> DEADLINE_NS(ha_mon), time_ns);
> -}
> -
>  static inline bool ha_verify_guards(struct ha_monitor *ha_mon,
>  				    enum states curr_state, enum
> events event,
>  				    enum states next_state, u64
> time_ns)
> @@ -122,8 +110,6 @@ static bool ha_verify_constraint(struct
> ha_monitor *ha_mon,
>  	if (!ha_verify_invariants(ha_mon, curr_state, event,
> next_state, time_ns))
>  		return false;
>  
> -	ha_convert_inv_guard(ha_mon, curr_state, event, next_state,
> time_ns);
> -
>  	if (!ha_verify_guards(ha_mon, curr_state, event, next_state,
> time_ns))
>  		return false;
>  
> diff --git a/kernel/trace/rv/monitors/stall/stall.c
> b/kernel/trace/rv/monitors/stall/stall.c
> index 9ccfda6b0e73..1aa65d7e690d 100644
> --- a/kernel/trace/rv/monitors/stall/stall.c
> +++ b/kernel/trace/rv/monitors/stall/stall.c
> @@ -38,7 +38,7 @@ static inline bool ha_verify_invariants(struct
> ha_monitor *ha_mon,
>  					enum states next_state, u64
> time_ns)
>  {
>  	if (curr_state == enqueued_stall)
> -		return ha_check_invariant_jiffy(ha_mon, clk_stall,
> time_ns);
> +		return ha_check_invariant_jiffy(ha_mon, clk_stall,
> time_ns, threshold_jiffies);
>  	return true;
>  }
>  


^ permalink raw reply

* Re: [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages
From: Suzuki K Poulose @ 2026-06-03  8:58 UTC (permalink / raw)
  To: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
	pratyush, aneesh.kumar, liam, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
  Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CAEvNRgE1dCVAxJWd_hyFa8N=m9JLfn97ip9tAmvHxspWJ50oGg@mail.gmail.com>

On 02/06/2026 23:41, Ackerley Tng wrote:
> Suzuki K Poulose <suzuki.poulose@arm.com> writes:
> 
>>
>> [...snip...]
>>
>>>> @@ -914,7 +916,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct
>>>> kvm_memory_slot *slot,
>>>>            folio_mark_uptodate(folio);
>>>>        }
>>>> -    r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>>>> +    if (kvm_gmem_is_private_mem(inode, index))
>>>
>>> Don't we need to make sure the entire folio is private ? Not just the
>>> page at the index ?
>>>       if (kvm_gmem_range_is_private(, index, folio_nr_pages(folio)) ?
> 
> I was thinking to fix this when I do huge pages, for now guest_memfd is
> always just PAGE_SIZE, so just looking up index is fine.
> 
> Is that okay?

Thats fine, but would be good to enforce that here, so that we don't 
miss out when we add support for multi page folios.

> 
>>
>> Or rather, we should go through the individual pages and apply the
>> prepare for ones that are private ?
>>
>> Suzuki
>>
> 
> IIRC the plan was to make kvm_gmem_prepare_folio() idempotent, as in, if
> a page is already private, just skip. Currently sev_gmem_prepare() does
> a pr_debug(), which I guess is technically still idempotent.
> 
> I'm thinking that the information tha needs tracking to make
> .gmem_prepare() idempotent should be tracked by arch code.
> 
> Does this work for ARM CCA?

We don't hook into the prepare yet, but have plans to do that. We should
be able to handle the pages that are already private. (For CCA context,
RMI_GRANULE_DELEGATE_RANGE can skip over already REALM pages). So this
should be fine.

My point is, in a given folio, there may be pages that are shared.
Like you said, this could be dealt with when we support hugepages.

Suzuki


> 
>>>
>>> [...snip...]
>>>


^ permalink raw reply

* Re: [syzbot] [trace?] KASAN: use-after-free Write in ring_buffer_read_page
From: Alexander Potapenko @ 2026-06-03  8:49 UTC (permalink / raw)
  To: Aleksandr Nogikh
  Cc: Masami Hiramatsu, Steven Rostedt, syzbot, linux-kernel,
	linux-trace-kernel, mathieu.desnoyers, syzkaller-bugs
In-Reply-To: <CANp29Y55QBfKT=FLpn=trH5Tmxj2P_7H7yhJG_xXCbCdR3Lv_A@mail.gmail.com>

On Wed, Jun 3, 2026 at 8:38 AM Aleksandr Nogikh <nogikh@google.com> wrote:
>
> On Wed, Jun 3, 2026 at 3:34 AM 'Masami Hiramatsu' via syzkaller-bugs
> <syzkaller-bugs@googlegroups.com> wrote:
> >
> > On Tue, 2 Jun 2026 12:28:29 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > > On Tue, 02 Jun 2026 06:45:31 -0700
> > > syzbot <syzbot+2dd9d02f60775ce5c1fb@syzkaller.appspotmail.com> wrote:
> > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit:    e7ae89a0c97c Linux 7.1-rc5
> > > > git tree:       upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=16f06e2e580000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=58acee1ac5406016
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=2dd9d02f60775ce5c1fb
> > > > compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > Looks like the test was doing something really weird to trigger this.
> > > Without a reproducer, it's pretty much impossible to find out what
> > > happened. Maybe AI could do it?
> > >
> >
> > Does the "I don't have any reproducer for this issue yet." means
> > this is not reproducible even if it runs completely same sequence
> > in the console output? If so, might this be a timing related issue?
> > (e.g. read v.s. write-event)
>
> Yes, syzbot normally re-plays the sequence of last programs executed
> on the crashed VM to find a reproducer, and, in many cases, they no
> longer crash the kernel..
>
> In the meanwhile, syzbot's AI bug reproduction functionality has found
> a C reproducer for a KASAN crash in the kernel/trace's ring buffer,
> although with a slightly different stack trace:
> https://syzkaller.appspot.com/ai_job?id=b2620161-1632-4d4e-9314-114a8a5e79ef
>
> Cc Alexander Potapenko

Yes, the bug that the AI reproduced manifests with a different stack:

BUG: KASAN: slab-use-after-free in instrument_copy_to_user
include/linux/instrumented.h:129 [inline]
BUG: KASAN: slab-use-after-free in _inline_copy_to_user
include/linux/uaccess.h:205 [inline]
BUG: KASAN: slab-use-after-free in _copy_to_user+0x79/0xb0 lib/usercopy.c:26
Read of size 12288 at addr ffff888180423000 by task syz-executor144/5941

CPU: 1 UID: 0 PID: 5941 Comm: syz-executor144 Not tainted syzkaller #1
PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 check_region_inline mm/kasan/generic.c:-1 [inline]
 kasan_check_range+0x264/0x2c0 mm/kasan/generic.c:200
 instrument_copy_to_user include/linux/instrumented.h:129 [inline]
 _inline_copy_to_user include/linux/uaccess.h:205 [inline]
 _copy_to_user+0x79/0xb0 lib/usercopy.c:26
 copy_to_user include/linux/uaccess.h:236 [inline]
 tracing_buffers_read+0x4cd/0xd60 kernel/trace/trace.c:7158
 vfs_read+0x20c/0xa70 fs/read_write.c:572
 ksys_read+0x150/0x270 fs/read_write.c:717
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x15f/0x560 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f9facd00cde
Code: 08 0f 85 f5 e2 ff ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89
c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 90
41 57 41 56 4d 89 c6 41 55 4d 89 cd 41 54 55 53 48 83 ec 08
RSP: 002b:00007f9fabc9e198 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007f9fabca26c0 RCX: 00007f9facd00cde
RDX: 0000000000004000 RSI: 00007f9fabc9e200 RDI: 0000000000000006
RBP: 00007f9fabc9e200 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9facd7ab20
R13: 0000000000000000 R14: 00007ffd6ed73110 R15: 00007ffd6ed731f8
 </TASK>

Quoting the AI itself:

"""
The reproducer successfully triggered a KASAN use-after-free crash in
the tracing subsystem. Although the exact crash signature differs
slightly (a read in `_copy_to_user` called from `tracing_buffers_read`
vs a write in `ring_buffer_read_page` called from
`tracing_buffers_read`), both crashes are use-after-free bugs on ring
buffer pages accessed during `tracing_buffers_read`. The reproduced
crash shows the page being freed by `ring_buffer_subbuf_order_set`
(via `buffer_subbuf_size_write`) while being concurrently accessed by
`tracing_buffers_read`. This confirms the underlying race condition
between reading the trace buffers and modifying the buffer size/order
has been successfully reproduced.
"""

I took a glance at the reports, and the above makes sense: we just
happen to access filp->private_data->spare at different times after it
has been freed.

PS. Please bear with repro-c, it's making its baby steps.
The reproducer contains some dead code, and the results are hard to navigate.
At some point we'll probably be able to link AI-generated repros from
the original bugs.

^ permalink raw reply

* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Arnd Bergmann @ 2026-06-03  8:41 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Andy Shevchenko, Arnd Bergmann, Steven Rostedt, Masami Hiramatsu,
	Andrew Morton, Petr Mladek, Nathan Chancellor, Dennis Dalessandro,
	Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
	Miri Korenblit, Mathieu Desnoyers, Sergey Senozhatsky,
	Nick Desaulniers, Bill Wendling, Justin Stitt,
	Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <875x40hz7k.fsf@prevas.dk>

On Wed, Jun 3, 2026, at 09:15, Rasmus Villemoes wrote:
> On Tue, Jun 02 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:
>> On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
>>> On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
>
> May I suggest a different approach, that avoids having that extra
> function emitted (which presumably compiles to a single jump
> instruction, but still, with retpoline and CFI and all that it all adds
> up): Keep the declaration of __vsnprintf() in the header without the
> __print() attribute, but then do
>
> int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args) 
>    __alias(vsnprintf);
>
> in vsprintf.c. Aside from reusing the same entry point, I could well
> imagine a compiler some day complaining about seeing the printf
> attribute applied in a local extra declaration but not having it in the
> header file.
>
> Presumably it will need its own EXPORT_SYMBOL if any of the intended
> users are modular, and it certainly still needs a comment.

I had tried that earlier but given up because the attributes have to
match exactly.

This definition works with all currently supported versions of gcc,
but may have to change when the there is a new version that adds
even more attributes:

int
__printf(3, 0)
__attribute__((nothrow))
__attribute__((nonnull(1)))
__vsnprintf(char *__restrict buf, size_t size,
            const char * __restrict fmt_str, va_list args)
               __alias(vsnprintf);

We'd probably want to also add __nothrow and __nonnull macros
in linux/compiler-attributes.h if we do this.

For reference, see below for the alternative idea I had
that avoids adding the __vsnprintf() alias altogether by
passing down the va_format using "%pV".

I don't think I actually got this one right in the end
since I only build-tested it, but I expect it could be done
if someone is able to test and fix all the corner cases
properly.

       Arnd

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 4715330c7b6b..8e44fc3e60b0 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -956,14 +956,11 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
  * gcc warns that you can not use a va_list in an inlined
  * function. But lets me make it into a macro :-/
  */
-#define __trace_event_vstr_len(fmt, va)			\
+#define __trace_event_vstr_len(vf)			\
 ({							\
-	va_list __ap;					\
 	int __ret;					\
 							\
-	va_copy(__ap, *(va));				\
-	__ret = __vsnprintf(NULL, 0, fmt, __ap) + 1;	\
-	va_end(__ap);					\
+	__ret = snprintf(NULL, 0, "%pV", vf) + 1;	\
 							\
 	min(__ret, TRACE_EVENT_STR_MAX);		\
 })
diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index 1a05fc153353..2f3ee3632e77 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -143,20 +143,20 @@
  *         saved string into the "foo" field.
  *
  *   __vstring: This is similar to __string() but instead of taking a
- *         dynamic length, it takes a variable list va_list 'va' variable.
+ *         dynamic length, it takes a variable list va_format 'vaf' variable.
  *         Some event callers already have a message from parameters saved
- *         in a va_list. Passing in the format and the va_list variable
- *         will save just enough on the ring buffer for that string.
- *         Note, the va variable used is a pointer to a va_list, not
- *         to the va_list directly.
+ *         in a va_format. Passing in the va_format variable will save just
+ *	   enough on the ring buffer for that string.
  *
- *           (va_list *va)
+ *           (va_format *vaf)
  *
- *         __vstring(foo, fmt, va)  is similar to:  vsnprintf(foo, fmt, va)
+ *         __vstring(foo, vaf)  is similar to:
+ *
+ *	     vsnprintf(foo, "%pV", vaf)
  *
  *         To assign the string, use the helper macro __assign_vstr().
  *
- *         __assign_vstr(foo, fmt, va);
+ *         __assign_vstr(foo, vaf);
  *
  *         In most cases, the __assign_vstr() macro will take the same
  *         parameters as the __vstring() macro had to declare the string.
@@ -292,9 +292,9 @@ TRACE_EVENT(foo_bar,
 
 	TP_PROTO(const char *foo, int bar, const int *lst,
 		 const char *string, const struct cpumask *mask,
-		 const char *fmt, va_list *va),
+		 struct va_format *vaf),
 
-	TP_ARGS(foo, bar, lst, string, mask, fmt, va),
+	TP_ARGS(foo, bar, lst, string, mask, vaf),
 
 	TP_STRUCT__entry(
 		__array(	char,	foo,    10		)
@@ -303,7 +303,7 @@ TRACE_EVENT(foo_bar,
 		__string(	str,	string			)
 		__bitmask(	cpus,	num_possible_cpus()	)
 		__cpumask(	cpum				)
-		__vstring(	vstr,	fmt,	va		)
+		__vstring(	vstr,	vaf			)
 		__string_len(	lstr,	foo,	bar / 2 < strlen(foo) ? bar / 2 : strlen(foo) )
 	),
 
@@ -314,7 +314,7 @@ TRACE_EVENT(foo_bar,
 		       __length_of(lst) * sizeof(int));
 		__assign_str(str);
 		__assign_str(lstr);
-		__assign_vstr(vstr, fmt, va);
+		__assign_vstr(vstr, vaf);
 		__assign_bitmask(cpus, cpumask_bits(mask), num_possible_cpus());
 		__assign_cpumask(cpum, cpumask_bits(mask));
 	),
diff --git a/include/trace/stages/stage6_event_callback.h b/include/trace/stages/stage6_event_callback.h
index 7d6a6ca6e779..2a4611b20afa 100644
--- a/include/trace/stages/stage6_event_callback.h
+++ b/include/trace/stages/stage6_event_callback.h
@@ -28,7 +28,7 @@
 #define __string_len(item, src, len) __dynamic_array(char, item, -1)
 
 #undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+#define __vstring(item, vf) __dynamic_array(char, item, -1)
 
 #undef __assign_str
 #define __assign_str(dst)						\
@@ -41,13 +41,8 @@
 	} while (0)
 
 #undef __assign_vstr
-#define __assign_vstr(dst, fmt, va)					\
-	do {								\
-		va_list __cp_va;					\
-		va_copy(__cp_va, *(va));				\
-		__vsnprintf(__get_str(dst), TRACE_EVENT_STR_MAX, fmt, __cp_va); \
-		va_end(__cp_va);					\
-	} while (0)
+#define __assign_vstr(dst, vf)						\
+	snprintf(__get_str(dst), TRACE_EVENT_STR_MAX, "%pV", vf);
 
 #undef __bitmask
 #define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)
diff --git a/drivers/infiniband/hw/hfi1/trace_dbg.h b/drivers/infiniband/hw/hfi1/trace_dbg.h
index 05c4f1354269..c96144d516db 100644
--- a/drivers/infiniband/hw/hfi1/trace_dbg.h
+++ b/drivers/infiniband/hw/hfi1/trace_dbg.h
@@ -26,10 +26,10 @@ DECLARE_EVENT_CLASS(hfi1_trace_template,
 		    TP_PROTO(const char *function, struct va_format *vaf),
 		    TP_ARGS(function, vaf),
 		    TP_STRUCT__entry(__string(function, function)
-				     __vstring(msg, vaf->fmt, vaf->va)
+				     __vstring(msg, vaf)
 				     ),
 		    TP_fast_assign(__assign_str(function);
-				   __assign_vstr(msg, vaf->fmt, vaf->va);
+				   __assign_vstr(msg, vaf);
 				   ),
 		    TP_printk("(%s) %s",
 			      __get_str(function),
diff --git a/drivers/net/wireless/ath/ath10k/trace.h b/drivers/net/wireless/ath/ath10k/trace.h
index 68b78ca17eaa..c258ad7de79e 100644
--- a/drivers/net/wireless/ath/ath10k/trace.h
+++ b/drivers/net/wireless/ath/ath10k/trace.h
@@ -52,12 +52,12 @@ DECLARE_EVENT_CLASS(ath10k_log_event,
 	TP_STRUCT__entry(
 		__string(device, dev_name(ar->dev))
 		__string(driver, dev_driver_string(ar->dev))
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__assign_str(device);
 		__assign_str(driver);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk(
 		"%s %s %s",
@@ -89,13 +89,13 @@ TRACE_EVENT(ath10k_log_dbg,
 		__string(device, dev_name(ar->dev))
 		__string(driver, dev_driver_string(ar->dev))
 		__field(unsigned int, level)
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__assign_str(device);
 		__assign_str(driver);
 		__entry->level = level;
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk(
 		"%s %s %s",
diff --git a/drivers/net/wireless/ath/ath11k/trace.h b/drivers/net/wireless/ath/ath11k/trace.h
index 75246b0a82e3..0ac14b72deac 100644
--- a/drivers/net/wireless/ath/ath11k/trace.h
+++ b/drivers/net/wireless/ath/ath11k/trace.h
@@ -127,12 +127,12 @@ DECLARE_EVENT_CLASS(ath11k_log_event,
 	TP_STRUCT__entry(
 		__string(device, dev_name(ab->dev))
 		__string(driver, dev_driver_string(ab->dev))
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__assign_str(device);
 		__assign_str(driver);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk(
 		"%s %s %s",
diff --git a/drivers/net/wireless/ath/ath6kl/trace.h b/drivers/net/wireless/ath/ath6kl/trace.h
index 8577aa459c58..d46fe6b675f9 100644
--- a/drivers/net/wireless/ath/ath6kl/trace.h
+++ b/drivers/net/wireless/ath/ath6kl/trace.h
@@ -253,10 +253,10 @@ DECLARE_EVENT_CLASS(ath6kl_log_event,
 	TP_PROTO(struct va_format *vaf),
 	TP_ARGS(vaf),
 	TP_STRUCT__entry(
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s", __get_str(msg))
 );
@@ -281,11 +281,11 @@ TRACE_EVENT(ath6kl_log_dbg,
 	TP_ARGS(level, vaf),
 	TP_STRUCT__entry(
 		__field(unsigned int, level)
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__entry->level = level;
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s", __get_str(msg))
 );
diff --git a/drivers/net/wireless/ath/trace.h b/drivers/net/wireless/ath/trace.h
index 82aac0a4baff..298a56349ea7 100644
--- a/drivers/net/wireless/ath/trace.h
+++ b/drivers/net/wireless/ath/trace.h
@@ -40,13 +40,13 @@ TRACE_EVENT(ath_log,
 	    TP_STRUCT__entry(
 		    __string(device, wiphy_name(wiphy))
 		    __string(driver, KBUILD_MODNAME)
-		    __vstring(msg, vaf->fmt, vaf->va)
+		    __vstring(msg, vaf)
 	    ),
 
 	    TP_fast_assign(
 		    __assign_str(device);
 		    __assign_str(driver);
-		    __assign_vstr(msg, vaf->fmt, vaf->va);
+		    __assign_vstr(msg, vaf);
 	    ),
 
 	    TP_printk(
diff --git a/drivers/net/wireless/ath/wil6210/trace.h b/drivers/net/wireless/ath/wil6210/trace.h
index 201f44612c31..7eb6ca2b0cb6 100644
--- a/drivers/net/wireless/ath/wil6210/trace.h
+++ b/drivers/net/wireless/ath/wil6210/trace.h
@@ -70,10 +70,10 @@ DECLARE_EVENT_CLASS(wil6210_log_event,
 	TP_PROTO(struct va_format *vaf),
 	TP_ARGS(vaf),
 	TP_STRUCT__entry(
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s", __get_str(msg))
 );
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.h
index 6c4e00e9ccd1..66b179adb80c 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.h
@@ -33,11 +33,11 @@ TRACE_EVENT(brcmf_err,
 	TP_ARGS(func, vaf),
 	TP_STRUCT__entry(
 		__string(func, func)
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__assign_str(func);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s: %s", __get_str(func), __get_str(msg))
 );
@@ -48,12 +48,12 @@ TRACE_EVENT(brcmf_dbg,
 	TP_STRUCT__entry(
 		__field(u32, level)
 		__string(func, func)
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__entry->level = level;
 		__assign_str(func);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s: %s", __get_str(func), __get_str(msg))
 );
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/brcms_trace_brcmsmac_msg.h b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/brcms_trace_brcmsmac_msg.h
index dc296d8bf775..369171af1a30 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/brcms_trace_brcmsmac_msg.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/brcms_trace_brcmsmac_msg.h
@@ -28,10 +28,10 @@ DECLARE_EVENT_CLASS(brcms_msg_event,
 	TP_PROTO(struct va_format *vaf),
 	TP_ARGS(vaf),
 	TP_STRUCT__entry(
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s", __get_str(msg))
 );
@@ -62,12 +62,12 @@ TRACE_EVENT(brcms_dbg,
 	TP_STRUCT__entry(
 		__field(u32, level)
 		__string(func, func)
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__entry->level = level;
 		__assign_str(func);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s: %s", __get_str(func), __get_str(msg))
 );
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-msg.h b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-msg.h
index 0db1fa5477af..80cfb9fc8ad8 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-msg.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-msg.h
@@ -18,10 +18,10 @@ DECLARE_EVENT_CLASS(iwlwifi_msg_event,
 	TP_PROTO(struct va_format *vaf),
 	TP_ARGS(vaf),
 	TP_STRUCT__entry(
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s", __get_str(msg))
 );
@@ -53,12 +53,12 @@ TRACE_EVENT(iwlwifi_dbg,
 	TP_STRUCT__entry(
 		__field(u32, level)
 		__string(function, function)
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__entry->level = level;
 		__assign_str(function);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s", __get_str(msg))
 );
diff --git a/drivers/usb/chipidea/trace.h b/drivers/usb/chipidea/trace.h
index 1875419cd17f..9ec0df074872 100644
--- a/drivers/usb/chipidea/trace.h
+++ b/drivers/usb/chipidea/trace.h
@@ -28,11 +28,11 @@ TRACE_EVENT(ci_log,
 	TP_ARGS(ci, vaf),
 	TP_STRUCT__entry(
 		__string(name, dev_name(ci->dev))
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__assign_str(name);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s: %s", __get_str(name), __get_str(msg))
 );
diff --git a/drivers/usb/host/xhci-trace.h b/drivers/usb/host/xhci-trace.h
index 724cba2dbb78..575c02109b4b 100644
--- a/drivers/usb/host/xhci-trace.h
+++ b/drivers/usb/host/xhci-trace.h
@@ -28,9 +28,9 @@
 DECLARE_EVENT_CLASS(xhci_log_msg,
 	TP_PROTO(struct va_format *vaf),
 	TP_ARGS(vaf),
-	TP_STRUCT__entry(__vstring(msg, vaf->fmt, vaf->va)),
+	TP_STRUCT__entry(__vstring(msg, vaf)),
 	TP_fast_assign(
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s", __get_str(msg))
 );
diff --git a/drivers/usb/mtu3/mtu3_trace.h b/drivers/usb/mtu3/mtu3_trace.h
index 89870175d635..56c9263a99d8 100644
--- a/drivers/usb/mtu3/mtu3_trace.h
+++ b/drivers/usb/mtu3/mtu3_trace.h
@@ -23,11 +23,11 @@ TRACE_EVENT(mtu3_log,
 	TP_ARGS(dev, vaf),
 	TP_STRUCT__entry(
 		__string(name, dev_name(dev))
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__assign_str(name);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s: %s", __get_str(name), __get_str(msg))
 );
diff --git a/drivers/usb/musb/musb_trace.h b/drivers/usb/musb/musb_trace.h
index 726e6697d475..7dba44b0496d 100644
--- a/drivers/usb/musb/musb_trace.h
+++ b/drivers/usb/musb/musb_trace.h
@@ -28,11 +28,11 @@ TRACE_EVENT(musb_log,
 	TP_ARGS(musb, vaf),
 	TP_STRUCT__entry(
 		__string(name, dev_name(musb->controller))
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__assign_str(name);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 	TP_printk("%s: %s", __get_str(name), __get_str(msg))
 );
diff --git a/include/trace/events/iscsi.h b/include/trace/events/iscsi.h
index 990fd154f586..2e2667658b51 100644
--- a/include/trace/events/iscsi.h
+++ b/include/trace/events/iscsi.h
@@ -26,12 +26,12 @@ DECLARE_EVENT_CLASS(iscsi_log_msg,
 
 	TP_STRUCT__entry(
 		__string(dname, 	dev_name(dev)		)
-		__vstring(msg,		vaf->fmt, vaf->va)
+		__vstring(msg,		vaf)
 	),
 
 	TP_fast_assign(
 		__assign_str(dname);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 
 	TP_printk("%s: %s",__get_str(dname),  __get_str(msg)
diff --git a/include/trace/events/qla.h b/include/trace/events/qla.h
index 74a7534b99b6..554ae9a623c6 100644
--- a/include/trace/events/qla.h
+++ b/include/trace/events/qla.h
@@ -17,11 +17,11 @@ DECLARE_EVENT_CLASS(qla_log_event,
 
 	TP_STRUCT__entry(
 		__string(buf, buf)
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 	TP_fast_assign(
 		__assign_str(buf);
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 
 	TP_printk("%s %s", __get_str(buf), __get_str(msg))
diff --git a/include/trace/stages/stage1_struct_define.h b/include/trace/stages/stage1_struct_define.h
index 69e0dae453bf..0ae49a935d16 100644
--- a/include/trace/stages/stage1_struct_define.h
+++ b/include/trace/stages/stage1_struct_define.h
@@ -27,7 +27,7 @@
 #define __string_len(item, src, len) __dynamic_array(char, item, -1)
 
 #undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+#define __vstring(item, vf) __dynamic_array(char, item, -1)
 
 #undef __bitmask
 #define __bitmask(item, nr_bits) __dynamic_array(char, item, -1)
diff --git a/include/trace/stages/stage2_data_offsets.h b/include/trace/stages/stage2_data_offsets.h
index 8b0cff06d346..5c6dc3092e07 100644
--- a/include/trace/stages/stage2_data_offsets.h
+++ b/include/trace/stages/stage2_data_offsets.h
@@ -33,7 +33,7 @@
 #define __string_len(item, src, len) __dynamic_array(char, item, -1)
 
 #undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+#define __vstring(item, vf) __dynamic_array(char, item, -1)
 
 #undef __bitmask
 #define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)
diff --git a/include/trace/stages/stage4_event_fields.h b/include/trace/stages/stage4_event_fields.h
index b6f679ae21aa..77f74d509760 100644
--- a/include/trace/stages/stage4_event_fields.h
+++ b/include/trace/stages/stage4_event_fields.h
@@ -42,7 +42,7 @@
 #define __string_len(item, src, len) __dynamic_array(char, item, -1)
 
 #undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item, -1)
+#define __vstring(item, vf) __dynamic_array(char, item, -1)
 
 #undef __bitmask
 #define __bitmask(item, nr_bits) __dynamic_array(unsigned long, item, -1)
diff --git a/include/trace/stages/stage5_get_offsets.h b/include/trace/stages/stage5_get_offsets.h
index c6a62dfb18ef..1ce5ca15a8ed 100644
--- a/include/trace/stages/stage5_get_offsets.h
+++ b/include/trace/stages/stage5_get_offsets.h
@@ -65,8 +65,8 @@ static inline const char *__string_src(const char *str)
 	__data_offsets->item##_ptr_ = src;
 
 #undef __vstring
-#define __vstring(item, fmt, ap) __dynamic_array(char, item,		\
-		      __trace_event_vstr_len(fmt, ap))
+#define __vstring(item, vf) __dynamic_array(char, item,		\
+		      __trace_event_vstr_len(vf))
 
 #undef __rel_dynamic_array
 #define __rel_dynamic_array(type, item, len)				\
diff --git a/net/batman-adv/trace.h b/net/batman-adv/trace.h
index 7da692ec38e9..ac88789330a3 100644
--- a/net/batman-adv/trace.h
+++ b/net/batman-adv/trace.h
@@ -36,13 +36,13 @@ TRACE_EVENT(batadv_dbg,
 	    TP_STRUCT__entry(
 		    __string(device, bat_priv->mesh_iface->name)
 		    __string(driver, KBUILD_MODNAME)
-		    __vstring(msg, vaf->fmt, vaf->va)
+		    __vstring(msg, vaf)
 	    ),
 
 	    TP_fast_assign(
 		    __assign_str(device);
 		    __assign_str(driver);
-		    __assign_vstr(msg, vaf->fmt, vaf->va);
+		    __assign_vstr(msg, vaf);
 	    ),
 
 	    TP_printk(
diff --git a/net/mac80211/trace_msg.h b/net/mac80211/trace_msg.h
index aea4ce55c5ac..0de50dfa13ed 100644
--- a/net/mac80211/trace_msg.h
+++ b/net/mac80211/trace_msg.h
@@ -22,11 +22,11 @@ DECLARE_EVENT_CLASS(mac80211_msg_event,
 	TP_ARGS(vaf),
 
 	TP_STRUCT__entry(
-		__vstring(msg, vaf->fmt, vaf->va)
+		__vstring(msg, vaf)
 	),
 
 	TP_fast_assign(
-		__assign_vstr(msg, vaf->fmt, vaf->va);
+		__assign_vstr(msg, vaf);
 	),
 
 	TP_printk("%s", __get_str(msg))
diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
index ecc7db237f2e..07096eadfb7b 100644
--- a/samples/trace_events/trace-events-sample.c
+++ b/samples/trace_events/trace-events-sample.c
@@ -23,6 +23,7 @@ static void do_simple_thread_func(int cnt, const char *fmt, ...)
 {
 	unsigned long bitmask[1] = {0xdeadbeefUL};
 	va_list va;
+	struct va_format vf = { .fmt = fmt };
 	int array[6];
 	int len = cnt % 5;
 	int i;
@@ -35,10 +36,11 @@ static void do_simple_thread_func(int cnt, const char *fmt, ...)
 	array[i] = 0;
 
 	va_start(va, fmt);
+	vf.va = &va;
 
 	/* Silly tracepoints */
 	trace_foo_bar("hello", cnt, array, random_strings[len],
-		      current->cpus_ptr, fmt, &va);
+		      current->cpus_ptr, &vf);
 
 	va_end(va);
 

^ permalink raw reply related

* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: David Hildenbrand (Arm) @ 2026-06-03  8:05 UTC (permalink / raw)
  To: Lance Yang, Nico Pache
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat, mhocko,
	peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
	rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
	surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe
In-Reply-To: <185f5699-3797-4300-8c54-bb99fc2a45e0@linux.dev>

On 6/2/26 17:44, Lance Yang wrote:
> 
> 
> On 2026/6/2 18:58, Nico Pache wrote:
>> On Sun, May 31, 2026 at 1:19 AM Lance Yang <lance.yang@linux.dev> wrote:
>>>
>>>
>>> [...]
>>>
>>> Hmm ... don't we lose the allocation-failure result here?
>>>
>>> Previously collapse_scan_pmd() propagated SCAN_ALLOC_HUGE_PAGE_FAIL from
>>> collapse_huge_page(), so khugepaged would call khugepaged_alloc_sleep()
>>> in khugepaged_do_scan().
>>>
>>> Now if allocation fails and nr_collapsed stays 0, we just return
>>> SCAN_FAIL. So we won't back off via khugepaged_alloc_sleep() anymore?
>>
>> Ok I did the error propagation! I think I handled both of these cases
>> you brought up pretty easily.
> 
> Thanks.
> 
>> However I don't know what to do in the following case: We successfully
>> collapsed some portion of the PMD, but during that process, we also
>> hit an allocation failure. Is it best to back off entirely? or can we
>> treat some forward progress as a sign we can continue trying collapses
>> without sleeping.
>>
>> Basically, do we prioritize SCAN_ALLOC_HUGE_PAGE_FAIL or the
>> successful collapses as the returned value?
> 
> Thinking out loud, forward progress should win here, the allocation
> failure only matter if we made no progress at all?

Agreed, in the first approach, forward progress makes sense.

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Rasmus Villemoes @ 2026-06-03  7:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andy Shevchenko, Arnd Bergmann, Steven Rostedt, Masami Hiramatsu,
	Andrew Morton, Petr Mladek, Nathan Chancellor, Dennis Dalessandro,
	Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
	Miri Korenblit, Mathieu Desnoyers, Sergey Senozhatsky,
	Nick Desaulniers, Bill Wendling, Justin Stitt,
	Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <35c1ba62-e74d-4abc-aa73-ccd35968ff89@app.fastmail.com>

On Tue, Jun 02 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:

> On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
>> On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
>>>
>>> A number of tracing headers turn off -Wsuggest-attribute=format for
>>> gcc, but they don't turn it off for clang, so the same warning still
>>> happens on new versions of clang that support the format attribute.
>>>
>>> To avoid duplicating the same thing in each tracing header, as well
>>> as changing all of them to also turn it off for clang, add a new
>>> __vsnprintf() helper that is not annotated this way in linux/sprintf.h
>>> but is defined to work the same way as the regular vsprintf.
>>
>> vsprintf()
>
> Fixed now
>
>> Why the __printf() annotation is in the C file and not here?
>> Is this all about headers as the second paragraph in the commit message
>> explains?
>> I would add a comment to explain it here, otherwise we might see false
>> patches to "make things consistent" in a wrong way.
>
> I've tried to come up with a kerneldoc comment now, similar to
> the one for the vsnprintf() function, and added a separate prototype
> in the header. Does this address your concern?
>
>       Arnd
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 3caf0796f54d..7c696aea2ed3 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2975,7 +2975,23 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
>  }
>  EXPORT_SYMBOL(vsnprintf);
>
> -int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> +/**
> + * __vsnprintf - vsnprintf() wrapper without __printf() attribute
> + * @buf: The buffer to place the result into
> + * @size: The size of the buffer, including the trailing null space
> + * @fmt_str: The format string to use
> + * @args: Arguments for the format string
> + *
> + * This has the exact same behavior as vsnprintf() but can be used in call
> + * sites that are missing a __printf() annotation, e.g. because they
> + * get a 'va_format' argument instead of format and varargs.
> + *
> + * For this to work, the attribute is added to the declaration here but
> + * not in the header.
> + */
> +int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args);
> +
> +int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
>  {
>  	return vsnprintf(buf, size, fmt_str, args);
>  }

May I suggest a different approach, that avoids having that extra
function emitted (which presumably compiles to a single jump
instruction, but still, with retpoline and CFI and all that it all adds
up): Keep the declaration of __vsnprintf() in the header without the
__print() attribute, but then do

int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args) 
   __alias(vsnprintf);

in vsprintf.c. Aside from reusing the same entry point, I could well
imagine a compiler some day complaining about seeing the printf
attribute applied in a local extra declaration but not having it in the
header file.

Presumably it will need its own EXPORT_SYMBOL if any of the intended
users are modular, and it certainly still needs a comment.

Rasmus

^ permalink raw reply

* Re: [PATCH 1/8] scripts/sorttable: Handle RISC-V patchable ftrace entries
From: Chen Pei @ 2026-06-03  7:14 UTC (permalink / raw)
  To: wanghan
  Cc: acme, alex, andybnac, aou, bjorn, catalin.marinas, conor.dooley,
	cp0613, debug, jikos, joe.lawrence, jpoimboe, linux-kernel,
	linux-kselftest, linux-perf-users, linux-riscv,
	linux-trace-kernel, live-patching, mark.rutland, mbenes, mhiramat,
	mingo, namhyung, palmer, peterz, pjw, pmladek, puranjay, rostedt,
	shuah
In-Reply-To: <20260527123530.2593918-2-wanghan@linux.alibaba.com>

On Wed, 27 May 2026 20:35:23 +0800, wanghan@linux.alibaba.com wrote:

> On an affected RISC-V QEMU boot with both CONFIG_FTRACE_SORT_STARTUP_TEST
> and CONFIG_FTRACE_STARTUP_TEST enabled, the sort check still passes
> while ftrace reports zero usable entries and the early selftests fail:
> 
>   [    0.000000] ftrace section at ffffffff8101da98 sorted properly
>   [    0.000000] ftrace: allocating 0 entries in 128 pages
>   [    0.054999] Testing tracer function: .. no entries found ..FAILED!
>   [    0.172407] tracer: function failed selftest, disabling
>   [    0.178186] Failed to init function_graph tracer, init returned -19
> 
> Handle RISC-V like arm64 for the function-range check and allow
> patchable entries up to 8 bytes before the function address.
> 
> With this fix, a RISC-V QEMU smoke boot with ftrace startup tests shows
> the vmlinux ftrace table is populated and dynamic ftrace still works:
> 
>   [    0.000000] ftrace: allocating 46749 entries in 184 pages
>   [    0.051115] Testing tracer function: PASSED
>   [    1.283782] Testing dynamic ftrace: PASSED
>   [    6.275456] Testing tracer function_graph: PASSED
> 
> Fixes: 0ca1724b56af ("riscv: ftrace: select HAVE_BUILDTIME_MCOUNT_SORT")

Oops, sorry for missing that. Thanks for the quick fix!

Reviewed-by: Chen Pei <cp0613@linux.alibaba.com>

^ permalink raw reply

* Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Gregory Price @ 2026-06-03  7:02 UTC (permalink / raw)
  To: Balbir Singh
  Cc: lsf-pc, linux-kernel, linux-cxl, cgroups, linux-mm,
	linux-trace-kernel, damon, kernel-team, gregkh, rafael, dakr,
	dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, longman, akpm, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	osalvador, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, axelrasmussen, yuanchu, weixugc,
	yury.norov, linux, mhiramat, mathieu.desnoyers, tj, hannes,
	mkoutny, jackmanb, sj, baolin.wang, npache, ryan.roberts,
	dev.jain, baohua, lance.yang, muchun.song, xu.xin16,
	chengming.zhou, jannh, linmiaohe, nao.horiguchi, pfalcato,
	rientjes, shakeel.butt, riel, harry.yoo, cl, roman.gushchin,
	chrisl, kasong, shikemeng, nphamcs, bhe, zhengqi.arch,
	terry.bowman
In-Reply-To: <ah-0CyZurn5D1ezY@parvat>

On Wed, Jun 03, 2026 at 03:00:01PM +1000, Balbir Singh wrote:
> On Tue, Jun 02, 2026 at 09:57:48AM +0100, Gregory Price wrote:
> > On Tue, Jun 02, 2026 at 12:16:50PM +1000, Balbir Singh wrote:
> > > 
> > > I was think we wouldn't need explicit flags and that allocations would
> > > happen from user space using __GFP_THISNODE to the node or via a nodemask
> > > based on nodes of interest. Is there a reason to add this flag, a system
> > > might have more than one source of N_MEMORY_PRIVATE?
> > > 
> > 
> > There's a few things to unpack here.  I discussed this many times on
> > list and at LSF, but to reiterate.
> > 
> > 1) __GFP_THISNODE is insufficient to enforce isolation and otherwise
> >    not particularly useful.  Additionally, from userland, it's not
> >    something you can actually set.
> 
> I was thinking mbind()/mempolicy() is how we get to it. It already
> accepts a nodemask.
>

First let me say:  I want to enable mbind access to these nodes.

But let me caveat:  I think that needs more time to develop, and
in the meantime, we can enable the /dev/xxx pattern somewhat trivially.

First let me address a few things about mbind/mempolicy and how it
interacts with page_alloc.c, I gave this overview at LSF but I don't
remember if I posted it in any of my follow ups.

1) Fallback lists are filtered by nodemask, the nodemask does not replace
   the fallback list.

Here is how the page allocator fallback lists and nodemasks interact:

   Fallbacks A:  A B 
   Fallbacks B:  B A
   Fallbacks C:  C A B   (Private)
   Fallbacks D:  D B A   (Private)

Lets say you pass:

   alloc_pages_node(C, ..., nodemask(A,C,D))

So we get

  Fallback(C,A,B) & nodemask(A,C,D) -> iterate(C,A)

If we wanted to change this behavior, realistically we'd be looking for
a way to add specific nodes to certain fallback lists - rather than
modify the nodemask interaction in some way.

I think this is out of scope for the first iteration - so supporting
anything other than mbind() from the start is just pointless.

The only feasible mempolicy you can apply is single-node bind, so
realistically you can only support mbind.

2) full mempolicy support doesn't really make sense

   task mempolicy PROBABLY should never really touch private nodes,
   while VMA policy certainly can.  Assuming we're able to support
   multi-private-node masks, none of the non-bind mempolicies even
   make sense for most private nodes (interleave? weighted interleave?)

   I haven't worked through all the implications of a task policy having
   a private node attached, but the longer I think about it, the less it
   makes sense to just support this outright.

3) Introducing mbind support is not just a simple nodemask on a VMA,
   It also implies migration, cgroup/cpuset, and UAPI interactions.

   a) migration:

      mbind/mempolicy can and will engage migration when it is called
      with certain flags.  Migration has subtle LRU interactions, but
      the patch set I have at least allows this to work.

   b) cgroup/cpuset:

      cpuset.mems rebinding will cause private nodes to be quietly
      rebound to non-private nodes within a nodemask.

   c) between A and B - we really want MPOL_F_STATIC to be required
      for mbind to be applied to private node so that it is never
      forcefully remapped.

      That's a UAPI semantic change specific for private nodes we
      should really take time to consider.

4) File VMA interactions don't entirely make sense with mbind

   In theory you might want:

   fd = open("somefile", ...);
   mem = mmap(fd, ...);
   mbind(mem, ..., private_node);
   for page in mem:
      mem[page_off] /* fault file into private memory */

   In reality: This does not work the way you want.

   I went digging and we need a few mild extensions to allow
   migration on mbind to work for pagecache pages, and the fault
   path does not necessarily respect the vma mempolicy always.

   You also start getting into the question of "what happens when
   the node is out of memory and you don't have reclaim support?".
   The OOM implications jump out at you pretty aggressively.

   Moreover other tasks can force the page cache pages to be moved
   as well.  So the programming model here just kind of sucks.

   Works great for anon memory though :]

For all these reasons, I think the be mbind/mempolicy support with
private nodes needs to be brought in with follow up work - not
introduced as part of the baseline set.

> > 
> >    for node in possible_nodes:
> >        alloc_pages_node(private_node, __GFP_THISNODE)
> > 
> >    In fact it's the opposite semantic of what we want.
> >    THISNODE says: "Do not fallback back to OTHER nodes".
> > 
> 
> That's why we need to control the fallback nodes carefully for
> N_MEMORY_PRIVATE
>

My point is that __GFP_THISNODE is not actually useful.

If we go by nodemask, submitting a single-node nodemask is the
equivalent of an empty fallback list.

If we gate access to a private node by __GFP_THISNODE... this is the
same as just providing a single-node nodelist (putting aside the OOM
implications for a moment).

And it doesn't even buy you any new filtering ability against existing
nodemask iterators that may already utilize __GFP_THISNODE.  i.e.

   for node in online_nodes:
       alloc_pages_node(node, __GFP_THISNODE, ...)
       /* Alloc per-node resources */

   This pattern is undesirable, but completely valid.

So overloading/requiring __GFP_THISNODE is just not useful.

I will follow up soon with a new version that limits the private node
interface to just nodemask and fallback list controls.

I need to test a few more things related to removing normal nodes from
private node fallbacks before I feel comfortable shipping without
__GFP_PRIVATE.

> >    The semantic we want is "Do not allow allocations from private
> >    nodes UNLESS we specifically request" (__GFP_PRIVATE).
> > 
> >    __GFP_THISNODE does not actually buy you anything here, AND it's
> >    worse, in the scenario where a private node makes its way into the
> >    preferred slot (via possible_nodes or some other nodemask), the
> >    allocator cannot fall back to a node it can access.
> > 
> >    __GFP_THISNODE cannot be overloaded to do anything useful here.
> 
> Let me clarify, I meant to say, let's use a nodemask for allocation
> and __GFP_THISNODE gets us to the node we desire, if that is the only
> node. My earlier comment might not have been clear.
>

My point was that __GFP_THISNODE is pointless and reduces to providing a
single node nodemask anyway.

The contention over __GFP_PRIVATE is a bit ideological - do we want:

  1) A hard guarantee that allocations to a private node are controlled
     (__GFP_PRIVATE implies the caller knows what it's doing)

  or

  2) A soft guarantee (fallback list isolation only), and needing to
     deal with undesired behavior that's "not technically a bug"
     associated with existing users of global nodemasks (possible,
     online, etc).

I am arguing for #1 - the community has argued for #2 and "fixing
existing nodemask users".  I think we can ship #2 and pivot to #1 if we
find fixing existing users is infeasible or too much of a maintenance
burden.

> 
> Why not use mbind() API's? Do we want to gate allocation/privileges
> via a /dev?
>

We want to eventually enable it, but we really need to treat these
extensions as a separate step from the base so that the UAPI
implications are given proper scrutiny.

In the short term, /dev/xxx and driver-local/service-local control
of a node is still very useful.

For example, for my compressed memory work, I have found that if
implemented as a swap backend - the kernel can manage the node without
any UAPI implications at all :].

A driver managing memory on a private node could do the same.

~Gregory

^ permalink raw reply

* Re: [syzbot] [trace?] KASAN: use-after-free Write in ring_buffer_read_page
From: Aleksandr Nogikh @ 2026-06-03  6:38 UTC (permalink / raw)
  To: Masami Hiramatsu, Alexander Potapenko
  Cc: Steven Rostedt, syzbot, linux-kernel, linux-trace-kernel,
	mathieu.desnoyers, syzkaller-bugs
In-Reply-To: <20260603103445.236f260a3c5eafe140055761@kernel.org>

On Wed, Jun 3, 2026 at 3:34 AM 'Masami Hiramatsu' via syzkaller-bugs
<syzkaller-bugs@googlegroups.com> wrote:
>
> On Tue, 2 Jun 2026 12:28:29 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Tue, 02 Jun 2026 06:45:31 -0700
> > syzbot <syzbot+2dd9d02f60775ce5c1fb@syzkaller.appspotmail.com> wrote:
> >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:    e7ae89a0c97c Linux 7.1-rc5
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=16f06e2e580000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=58acee1ac5406016
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=2dd9d02f60775ce5c1fb
> > > compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Looks like the test was doing something really weird to trigger this.
> > Without a reproducer, it's pretty much impossible to find out what
> > happened. Maybe AI could do it?
> >
>
> Does the "I don't have any reproducer for this issue yet." means
> this is not reproducible even if it runs completely same sequence
> in the console output? If so, might this be a timing related issue?
> (e.g. read v.s. write-event)

Yes, syzbot normally re-plays the sequence of last programs executed
on the crashed VM to find a reproducer, and, in many cases, they no
longer crash the kernel..

In the meanwhile, syzbot's AI bug reproduction functionality has found
a C reproducer for a KASAN crash in the kernel/trace's ring buffer,
although with a slightly different stack trace:
https://syzkaller.appspot.com/ai_job?id=b2620161-1632-4d4e-9314-114a8a5e79ef

Cc Alexander Potapenko

>
> Thanks,
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
>

^ permalink raw reply

* [PATCH 2/2] tracing/synthetic: Free type string on error path
From: Yu Peng @ 2026-06-03  6:25 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
  Cc: linux-trace-kernel, linux-kernel, Yu Peng
In-Reply-To: <20260603062533.1096320-1-pengyu@kylinos.cn>

parse_synth_field() builds a "__data_loc ..." type string before
assigning it to field->type. If the seq_buf check fails, the common
cleanup cannot free the temporary string. Free it before leaving.

Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
 kernel/trace/trace_events_synth.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index cdd5b93328358..dc15658a887cb 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -839,8 +839,10 @@ static struct synth_field *parse_synth_field(int argc, char **argv,
 			seq_buf_puts(&s, "__data_loc ");
 			seq_buf_puts(&s, field->type);
 
-			if (WARN_ON_ONCE(!seq_buf_buffer_left(&s)))
+			if (WARN_ON_ONCE(!seq_buf_buffer_left(&s))) {
+				kfree(type);
 				goto free;
+			}
 			s.buffer[s.len] = '\0';
 
 			kfree(field->type);
-- 
2.43.0


^ permalink raw reply related

* [PATCH 1/2] tracing/synthetic: Free pending field on error path
From: Yu Peng @ 2026-06-03  6:25 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
  Cc: linux-trace-kernel, linux-kernel, Yu Peng

Some __create_synth_event() error paths run after parse_synth_field() 
succeeds but before the field is stored in fields[]. The common cleanup 
then misses the field. Free it before freeing argv.

Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
 kernel/trace/trace_events_synth.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index e6871230bde96..cdd5b93328358 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -1446,13 +1446,13 @@ static int __create_synth_event(const char *name, const char *raw_fields)
 			if (cmd_version > 1 && n_fields_this_loop >= 1) {
 				synth_err(SYNTH_ERR_INVALID_CMD, errpos(field_str));
 				ret = -EINVAL;
-				goto err_free_arg;
+				goto err_free_field;
 			}
 
 			if (n_fields == SYNTH_FIELDS_MAX) {
 				synth_err(SYNTH_ERR_TOO_MANY_FIELDS, 0);
 				ret = -EINVAL;
-				goto err_free_arg;
+				goto err_free_field;
 			}
 			fields[n_fields++] = field;
 
@@ -1491,6 +1491,8 @@ static int __create_synth_event(const char *name, const char *raw_fields)
 	kfree(saved_fields);
 
 	return ret;
+ err_free_field:
+	free_synth_field(field);
  err_free_arg:
 	argv_free(argv);
  err:
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Andy Shevchenko @ 2026-06-03  5:46 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Arnd Bergmann, Steven Rostedt, Andrew Morton, Petr Mladek,
	Nathan Chancellor, Arnd Bergmann, Dennis Dalessandro,
	Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
	Miri Korenblit, Mathieu Desnoyers, Rasmus Villemoes,
	Sergey Senozhatsky, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Vlastimil Babka, linux-rdma, linux-kernel, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <20260603105842.1e0ef8cb4a55cb776d6a4971@kernel.org>

On Wed, Jun 03, 2026 at 10:58:42AM +0900, Masami Hiramatsu wrote:
> On Tue,  2 Jun 2026 17:07:05 +0200
> Arnd Bergmann <arnd@kernel.org> wrote:

...

> I think this is a slightly confusing name. What about vsnprintf_nocheck()?

What check? If you want to be more precise: vsnprintf_no_printf_attr() or
vsnprintf_no_format_check(). But they also seem to me not the good choices.
(Just slight preference to the latter one no_format_check.)

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Balbir Singh @ 2026-06-03  5:00 UTC (permalink / raw)
  To: Gregory Price
  Cc: lsf-pc, linux-kernel, linux-cxl, cgroups, linux-mm,
	linux-trace-kernel, damon, kernel-team, gregkh, rafael, dakr,
	dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, longman, akpm, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	osalvador, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, axelrasmussen, yuanchu, weixugc,
	yury.norov, linux, mhiramat, mathieu.desnoyers, tj, hannes,
	mkoutny, jackmanb, sj, baolin.wang, npache, ryan.roberts,
	dev.jain, baohua, lance.yang, muchun.song, xu.xin16,
	chengming.zhou, jannh, linmiaohe, nao.horiguchi, pfalcato,
	rientjes, shakeel.butt, riel, harry.yoo, cl, roman.gushchin,
	chrisl, kasong, shikemeng, nphamcs, bhe, zhengqi.arch,
	terry.bowman
In-Reply-To: <ah6bDNxlB1zBUnzN@gourry-fedora-PF4VCD3F>

On Tue, Jun 02, 2026 at 09:57:48AM +0100, Gregory Price wrote:
> On Tue, Jun 02, 2026 at 12:16:50PM +1000, Balbir Singh wrote:
> > On Sun, May 24, 2026 at 09:50:06PM -0400, Gregory Price wrote:
> > > 
> > > I'm debating on whether to include OPS_MEMPOLICY in the initial version
> > > if only because it's not intuitive how it interacts with pagecache. That
> > > needs more time to bake.
> > >
> > 
> > It makes sense to look at it and then decide if it makes sense.
> >
> 
> I am thinking i will ship without any OPS flags at all for now and the
> have the introduction of ops as a separate series.
> 
> > > alloc_pages_node() is the kernel interface
> > 
> > I was think we wouldn't need explicit flags and that allocations would
> > happen from user space using __GFP_THISNODE to the node or via a nodemask
> > based on nodes of interest. Is there a reason to add this flag, a system
> > might have more than one source of N_MEMORY_PRIVATE?
> > 
> 
> There's a few things to unpack here.  I discussed this many times on
> list and at LSF, but to reiterate.
> 
> 1) __GFP_THISNODE is insufficient to enforce isolation and otherwise
>    not particularly useful.  Additionally, from userland, it's not
>    something you can actually set.

I was thinking mbind()/mempolicy() is how we get to it. It already
accepts a nodemask.

> 
>    for node in possible_nodes:
>        alloc_pages_node(private_node, __GFP_THISNODE)
> 
>    In fact it's the opposite semantic of what we want.
>    THISNODE says: "Do not fallback back to OTHER nodes".
> 

That's why we need to control the fallback nodes carefully for
N_MEMORY_PRIVATE

>    The semantic we want is "Do not allow allocations from private
>    nodes UNLESS we specifically request" (__GFP_PRIVATE).
> 
>    __GFP_THISNODE does not actually buy you anything here, AND it's
>    worse, in the scenario where a private node makes its way into the
>    preferred slot (via possible_nodes or some other nodemask), the
>    allocator cannot fall back to a node it can access.
> 
>    __GFP_THISNODE cannot be overloaded to do anything useful here.

Let me clarify, I meant to say, let's use a nodemask for allocation
and __GFP_THISNODE gets us to the node we desire, if that is the only
node. My earlier comment might not have been clear.

> 
> 2) We're trying not to expose *ANY* userland APIs for this, at all.
> 
>    The ultimate goal here should be one of two things:
> 
>    1) fd = open(/dev/xxx, ...);
>       mem = mmap(fd, ...);
>       mem[0] = 0xDEADBEEF; /* Fault device page into page table */
> 
>       In this case, the driver is responsible for doing the
>       alloc_pages_node() call.
> 
>    or
> 
>    2) mem = mmap(NULL, ..., ANON);
>       mbind(mem, ..., private_node);
>       mem[0] = 0xDEADBEEF; /* Fault device page into page table */
> 
>       in this case mempolicy.c is responsible for doing the
>       alloc_pages_node() call via the _mpol() alloc variants.
> 
> Addition OPT flags (reclaim, compaction, whatever), would
> (optionally) allow mm/ to operate on the device memory with, for
> example, mmu_notifier callbacks to tell the device to invalidate
> whatever it's caching about that page.
> 
> This would all be relatively transparent the userland, all userland
> "knows" is that it's getting memory from a device (/dev/xxx) or a
> node it's otherwise aware of hosting device memory somehow.
> 

Why not use mbind() API's? Do we want to gate allocation/privileges
via a /dev?

Balbir

^ permalink raw reply

* Re: [PATCH v8 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE
From: Miaohe Lin @ 2026-06-03  2:33 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Breno Leitao
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team, Lance Yang, Andrew Morton,
	Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, Naoya Horiguchi,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Liam R. Howlett
In-Reply-To: <21732071-14a1-486a-951c-34de97b7c757@kernel.org>

On 2026/6/2 17:41, David Hildenbrand (Arm) wrote:
> On 6/2/26 05:08, Miaohe Lin wrote:
>> On 2026/6/1 21:22, David Hildenbrand (Arm) wrote:
>>> On 6/1/26 14:28, Miaohe Lin wrote:
>>>>
>>>> Thanks for your patch.
>>>>
>>>>
>>>> Once shake_page finds a lightweight range-based way to shrink slab, slab pages could be freed
>>>> into buddy and above PageSlab test should be removed then. Maybe add a TODO or XXX here?
>>>>
>>>>
>>>> I'm not sure but is it safe or a common way to test PageReserved, PageSlab,
>>>> PageTable and PageLargeKmalloc without extra page refcnt?
>>>
>>> Checking typed pages in a racy fashion is fine (PageSlab, PageTable,
>>> PageLargeKmalloc).
>>
>> Got it. Thanks.
>>
>>> Checking PageReserved in a racy fashion is fine as well. TESTPAGEFLAG() will
>>> allow checking it on compound pages.
>>
>> It seems PageReserved is not intended to be set on compound pages. I see there are PF_NO_COMPOUND
>> in its definition: PAGEFLAG(Reserved, reserved, PF_NO_COMPOUND).
>>
>>>
>>> For PageLargeKmalloc, we would want to check the head page, though. The page
>>> type is only stored for the head page.
>>
>> Maybe we should check the head page for PageSlab and PageTable too? alloc_slab_page only
>> set PageSlab on the head page and __pagetable_ctor uses __folio_set_pgtable to set PageTable
>> on folio.
>>
>>>
>>> So maybe we want to lookup the compound head (if any) and perform the type
>>> checks against that?
>>
>> Maybe we should or we might miss some pages that could have been handled. And
>> if compound head is required, should we hold an extra page refcnt to guard against
>> possible folio split race?
> 
> Races are fine. We might miss some pages, but that can happen on races either way.
> 
> 
> I'd just do something like
> 
> if (PageReserved(page))
> 	return true;
> 
> head = compound_head(page);

If @head is split just after compound_head. And then @head is freed into buddy and re-allocated as slab
page while @page is still in the buddy. We would panic on this scene as @head is PageSlab. But we were
supposed to successfully handle @page. Or am I miss something?

Thanks.
.

> return PageSlab(head) || ...;
> 	
> 


^ permalink raw reply

* Re: [PATCH v2 1/8] scripts/sorttable: Handle RISC-V patchable ftrace entries
From: Shuai Xue @ 2026-06-03  2:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Masami Hiramatsu, Mark Rutland, Catalin Marinas,
	Chen Pei, Andy Chiu, Björn Töpel, Deepak Gupta,
	Puranjay Mohan, Conor Dooley, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260601095746.70c01d24@fedora>



On 6/1/26 9:57 PM, Steven Rostedt wrote:
> On Mon, 1 Jun 2026 14:17:08 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> 
>>> diff --git a/scripts/sorttable.c b/scripts/sorttable.c
>>> index e8ed11c680c6..4c10e85bb5af 100644
>>> --- a/scripts/sorttable.c
>>> +++ b/scripts/sorttable.c
>>> @@ -891,17 +891,21 @@ static int do_file(char const *const fname, void *addr)
>>>    	table_sort_t custom_sort = NULL;
>>>    
>>>    	switch (elf_map_machine(ehdr)) {
>>> -	case EM_AARCH64:
>>>    #ifdef MCOUNT_SORT_ENABLED
>>> +	case EM_AARCH64:
>>>    		sort_reloc = true;
>>>    		rela_type = 0x403;
>>> -		/* arm64 uses patchable function entry placing before function */
>>> +		/* fallthrough */
>>> +	case EM_RISCV:
>>> +		/* arm64 and RISC-V place patchable entries before the function */
>>>    		before_func = 8;
>>
>> Nit: The shared comment now sits under `case EM_RISCV:` but the two
>> lines above it (sort_reloc / rela_type = 0x403) are strictly
>> arm64-only — they configure the RELA-based weak-function fixup that
>> RISC-V does not need. On a quick read it is easy to wonder if RISC-V
>> is implicitly inheriting that path. Splitting the comments would
>> help, e.g.:
>>
>>          case EM_AARCH64:
>>              /* arm64 needs RELA-based weak-function fixup */
>>              sort_reloc = true;
>>              rela_type = 0x403;
>>              /* fallthrough */
>>          case EM_RISCV:
>>              /* arm64 and RISC-V place patchable entries before the function */
>>              before_func = 8;
> 
> Makes sense.
> 
> Care to send a v3?
> 
> -- Steve

Hi, Steve,

It's a pure comment cosmetic, not worth a respin on its own. But for the
rest of the feedback on this series (the frame-record metadata contract
in patch 2 and the dead state->regs field / Call Trace output change in
patch 6) are the ones actually worth a new version.

Just to get the routing straight: are you planning to pick this one up
through the tracing tree on its own?

It feels like a good candidate for that -- it's an independent
regression fix (Fixes: 0ca1724b56af) that breaks *all* RISC-V dynamic
ftrace, not just livepatch, so it shouldn't have to wait on the rest of
the livepatch series.

Thanks.
Shuai



^ permalink raw reply

* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Masami Hiramatsu @ 2026-06-03  1:58 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Steven Rostedt, Andrew Morton, Petr Mladek, Nathan Chancellor,
	Arnd Bergmann, Dennis Dalessandro, Jason Gunthorpe,
	Leon Romanovsky, Arend van Spriel, Miri Korenblit,
	Mathieu Desnoyers, Andy Shevchenko, Rasmus Villemoes,
	Sergey Senozhatsky, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Vlastimil Babka, linux-rdma, linux-kernel, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <20260602150904.2258624-1-arnd@kernel.org>

On Tue,  2 Jun 2026 17:07:05 +0200
Arnd Bergmann <arnd@kernel.org> wrote:

> diff --git a/include/linux/sprintf.h b/include/linux/sprintf.h
> index f06f7b785091..036a247b7c1e 100644
> --- a/include/linux/sprintf.h
> +++ b/include/linux/sprintf.h
> @@ -12,6 +12,7 @@ __printf(2, 3) int sprintf(char *buf, const char * fmt, ...);
>  __printf(2, 0) int vsprintf(char *buf, const char *, va_list);
>  __printf(3, 4) int snprintf(char *buf, size_t size, const char *fmt, ...);
>  __printf(3, 0) int vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
> +int __vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
>  __printf(3, 4) int scnprintf(char *buf, size_t size, const char *fmt, ...);
>  __printf(3, 0) int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
>  __printf(2, 3) __malloc char *kasprintf(gfp_t gfp, const char *fmt, ...);
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index d49338c44014..4715330c7b6b 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -962,7 +962,7 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
>  	int __ret;					\
>  							\
>  	va_copy(__ap, *(va));				\
> -	__ret = vsnprintf(NULL, 0, fmt, __ap) + 1;	\
> +	__ret = __vsnprintf(NULL, 0, fmt, __ap) + 1;	\
>  	va_end(__ap);					\
>  							\
>  	min(__ret, TRACE_EVENT_STR_MAX);		\

I think this is a slightly confusing name. What about vsnprintf_nocheck()?

Thanks,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v2 8/8] selftests/livepatch: Add RISC-V syscall wrapper prefix
From: Shuai Xue @ 2026-06-03  1:54 UTC (permalink / raw)
  To: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: Steven Rostedt, Alexandre Ghiti, Masami Hiramatsu, Mark Rutland,
	Catalin Marinas, Chen Pei, Andy Chiu, Björn Töpel,
	Deepak Gupta, Puranjay Mohan, Conor Dooley, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260528082310.1994388-9-wanghan@linux.alibaba.com>



On 5/28/26 4:23 PM, Wang Han wrote:
> The syscall livepatch selftest resolves and patches a syscall wrapper
> symbol. To use that test for RISC-V livepatch validation, add the
> RISC-V FN_PREFIX definition for ARCH_HAS_SYSCALL_WRAPPER.
> 
> Without this macro, the syscall livepatch selftest cannot resolve the
> RISC-V target symbol, and the syscall-related livepatch test fails on
> RISC-V.
> 
> Signed-off-by: Wang Han <wanghan@linux.alibaba.com>
> ---
>   .../testing/selftests/livepatch/test_modules/test_klp_syscall.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c b/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
> index dd802783ea84..275e4b10cf59 100644
> --- a/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
> +++ b/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
> @@ -18,6 +18,8 @@
>   #define FN_PREFIX __s390x_
>   #elif defined(__aarch64__)
>   #define FN_PREFIX __arm64_
> +#elif defined(__riscv)
> +#define FN_PREFIX __riscv_
>   #else
>   /* powerpc does not select ARCH_HAS_SYSCALL_WRAPPER */
>   #define FN_PREFIX

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>

Thanks.
Shuai

^ permalink raw reply

* Re: [PATCH v2 7/8] riscv: Kconfig: enable HAVE_RELIABLE_STACKTRACE and HAVE_LIVEPATCH
From: Shuai Xue @ 2026-06-03  1:49 UTC (permalink / raw)
  To: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: Steven Rostedt, Alexandre Ghiti, Masami Hiramatsu, Mark Rutland,
	Catalin Marinas, Chen Pei, Andy Chiu, Björn Töpel,
	Deepak Gupta, Puranjay Mohan, Conor Dooley, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260528082310.1994388-8-wanghan@linux.alibaba.com>



On 5/28/26 4:23 PM, Wang Han wrote:
> Now that the metadata frame records, the kunwind state machine and
> arch_stack_walk_reliable() are all in place, advertise the capability
> to the rest of the kernel:
> 
>    * select HAVE_RELIABLE_STACKTRACE under FRAME_POINTER && 64BIT, so
>      only the configurations that actually have the metadata records
>      and the FP-based reliable walker enable it.


The 64BIT gate is conservative scoping rather than a hard technical
requirement: the metadata frame record, kunwind state machine and
arch_stack_walk_reliable() all build on RV32 too (and the
call_on_irq_stack change in patch 2/8 actually fixes a latent RV32
issue). However, the syscall livepatch selftest and module relocation
path have only been exercised on RV64 (QEMU virt SMP=2/4/8). The
64BIT gate can be dropped in a follow-up once RV32 has equivalent
coverage.

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>

Thanks.
Shuai

^ permalink raw reply

* Re: [PATCH v2 6/8] riscv: stacktrace: switch to frame-pointer based unwinder
From: Shuai Xue @ 2026-06-03  1:35 UTC (permalink / raw)
  To: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: Steven Rostedt, Alexandre Ghiti, Masami Hiramatsu, Mark Rutland,
	Catalin Marinas, Chen Pei, Andy Chiu, Björn Töpel,
	Deepak Gupta, Puranjay Mohan, Conor Dooley, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260528082310.1994388-7-wanghan@linux.alibaba.com>



On 5/28/26 4:23 PM, Wang Han wrote:
> Replace the open-coded frame-pointer walker in arch_stack_walk() with a
> robust kunwind state machine, modelled on arch/arm64/kernel/stacktrace.c
> and retargeted to the RISC-V {fp, ra} frame record convention. The new
> walker tracks stack bounds, consumes frame records monotonically,
> understands the metadata pt_regs records added in the previous frame
> record metadata patch, and recovers return addresses replaced by
> function graph tracing and kretprobes.
> 
> This commit introduces arch_stack_walk_reliable() but does not yet
> select HAVE_RELIABLE_STACKTRACE; that is done in a follow-up Kconfig
> patch so this commit can be reviewed and bisected as a pure unwinder
> replacement. Until that Kconfig change lands, livepatch is not yet
> enabled and arch_stack_walk_reliable() has no in-tree caller.
> 
> Three related callers are updated to keep the same frame-record
> assumptions everywhere:
> 
>    * Function graph tracing: the old RISC-V unwinder matched function
>      graph return-stack entries by the saved return-address slot. That
>      was consistent with the static mcount path, but not with the dynamic
>      ftrace path where the parent slot is ftrace_regs::ra. Use the
>      architectural frame pointer as the function graph return-address
>      cookie, matching the kunwind walker.
> 
>    * Perf callchains: route kernel callchain collection through
>      arch_stack_walk() so perf sees the same frame-pointer unwind
>      behaviour as dump_stack() and the upcoming livepatch path.
> 
>    * dump_backtrace() / __get_wchan() / show_stack(): these now go
>      through arch_stack_walk(); the explicit "Call Trace:" header is
>      moved into dump_backtrace() to preserve the original output.
> 
> The non-frame-pointer fallback walker is kept untouched for
> !CONFIG_FRAME_POINTER builds.
> 
> Signed-off-by: Wang Han <wanghan@linux.alibaba.com>
> ---
>   arch/riscv/kernel/ftrace.c         |   6 +-
>   arch/riscv/kernel/perf_callchain.c |   2 +-
>   arch/riscv/kernel/stacktrace.c     | 560 ++++++++++++++++++++++++-----
>   3 files changed, 472 insertions(+), 96 deletions(-)
> 
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index b430edfb83f4..5d55199a9230 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -242,7 +242,8 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
>   	 */
>   	old = *parent;
>   
> -	if (!function_graph_enter(old, self_addr, frame_pointer, parent))
> +	if (!function_graph_enter(old, self_addr, frame_pointer,
> +				  (void *)frame_pointer))
>   		*parent = return_hooker;
>   }
>   
> @@ -264,7 +265,8 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
>   	 */
>   	old = *parent;
>   
> -	if (!function_graph_enter_regs(old, ip, frame_pointer, parent, fregs))
> +	if (!function_graph_enter_regs(old, ip, frame_pointer,
> +				       (void *)frame_pointer, fregs))
>   		*parent = return_hooker;
>   }
>   #endif /* CONFIG_DYNAMIC_FTRACE */
> diff --git a/arch/riscv/kernel/perf_callchain.c b/arch/riscv/kernel/perf_callchain.c
> index b465bc9eb870..436af96ea59c 100644
> --- a/arch/riscv/kernel/perf_callchain.c
> +++ b/arch/riscv/kernel/perf_callchain.c
> @@ -44,5 +44,5 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
>   		return;
>   	}
>   
> -	walk_stackframe(NULL, regs, fill_callchain, entry);
> +	arch_stack_walk(fill_callchain, entry, NULL, regs);
>   }
> diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
> index 2692d3a06afa..0d76320b3a29 100644
> --- a/arch/riscv/kernel/stacktrace.c
> +++ b/arch/riscv/kernel/stacktrace.c
> @@ -11,98 +11,16 @@
>   #include <linux/sched/task_stack.h>
>   #include <linux/stacktrace.h>
>   #include <linux/ftrace.h>
> +#include <linux/kprobes.h>
> +#include <linux/llist.h>
>   
>   #include <asm/stacktrace.h>
>   
> -#ifdef CONFIG_FRAME_POINTER
> -
>   /*
> - * This disables KASAN checking when reading a value from another task's stack,
> - * since the other task could be running on another CPU and could have poisoned
> - * the stack in the meantime.
> + * Non-frame-pointer fallback unwinder.
> + * Only compiled when CONFIG_FRAME_POINTER is not enabled.
>    */
> -#define READ_ONCE_TASK_STACK(task, x)			\
> -({							\
> -	unsigned long val;				\
> -	unsigned long addr = x;				\
> -	if ((task) == current)				\
> -		val = READ_ONCE(addr);			\
> -	else						\
> -		val = READ_ONCE_NOCHECK(addr);		\
> -	val;						\
> -})
> -
> -extern asmlinkage void handle_exception(void);
> -extern unsigned long ret_from_exception_end;
> -
> -static inline int fp_is_valid(unsigned long fp, unsigned long sp)
> -{
> -	unsigned long low, high;
> -
> -	low = sp + sizeof(struct stackframe);
> -	high = ALIGN(sp, THREAD_SIZE);
> -
> -	return !(fp < low || fp > high || fp & 0x07);
> -}
> -
> -void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
> -			     bool (*fn)(void *, unsigned long), void *arg)
> -{
> -	unsigned long fp, sp, pc;
> -	int graph_idx = 0;
> -	int level = 0;
> -
> -	if (regs) {
> -		fp = frame_pointer(regs);
> -		sp = user_stack_pointer(regs);
> -		pc = instruction_pointer(regs);
> -	} else if (task == NULL || task == current) {
> -		fp = (unsigned long)__builtin_frame_address(0);
> -		sp = current_stack_pointer;
> -		pc = (unsigned long)walk_stackframe;
> -		level = -1;
> -	} else {
> -		/* task blocked in __switch_to */
> -		fp = task->thread.s[0];
> -		sp = task->thread.sp;
> -		pc = task->thread.ra;
> -	}
> -
> -	for (;;) {
> -		struct stackframe *frame;
> -
> -		if (unlikely(!__kernel_text_address(pc) || (level++ >= 0 && !fn(arg, pc))))
> -			break;
> -
> -		if (unlikely(!fp_is_valid(fp, sp)))
> -			break;
> -
> -		/* Unwind stack frame */
> -		frame = (struct stackframe *)fp - 1;
> -		sp = fp;
> -		if (regs && (regs->epc == pc) && fp_is_valid(frame->ra, sp)) {
> -			/* We hit function where ra is not saved on the stack */
> -			fp = frame->ra;
> -			pc = regs->ra;
> -		} else {
> -			fp = READ_ONCE_TASK_STACK(task, frame->fp);
> -			pc = READ_ONCE_TASK_STACK(task, frame->ra);
> -			pc = ftrace_graph_ret_addr(task, &graph_idx, pc,
> -						   &frame->ra);
> -			if (pc >= (unsigned long)handle_exception &&
> -			    pc < (unsigned long)&ret_from_exception_end) {
> -				if (unlikely(!fn(arg, pc)))
> -					break;
> -
> -				pc = ((struct pt_regs *)sp)->epc;
> -				fp = ((struct pt_regs *)sp)->s0;
> -			}
> -		}
> -
> -	}
> -}
> -
> -#else /* !CONFIG_FRAME_POINTER */
> +#ifndef CONFIG_FRAME_POINTER
>   
>   void notrace walk_stackframe(struct task_struct *task,
>   	struct pt_regs *regs, bool (*fn)(void *, unsigned long), void *arg)
> @@ -133,7 +51,12 @@ void notrace walk_stackframe(struct task_struct *task,
>   	}
>   }
>   
> -#endif /* CONFIG_FRAME_POINTER */
> +#endif /* !CONFIG_FRAME_POINTER */
> +
> +/*
> + * Common trace helpers.
> + * These are used by both the FP (kunwind) and non-FP (walk_stackframe) paths.
> + */
>   
>   static bool print_trace_address(void *arg, unsigned long pc)
>   {
> @@ -146,12 +69,12 @@ static bool print_trace_address(void *arg, unsigned long pc)
>   noinline void dump_backtrace(struct pt_regs *regs, struct task_struct *task,
>   		    const char *loglvl)
>   {
> -	walk_stackframe(task, regs, print_trace_address, (void *)loglvl);
> +	printk("%sCall Trace:\n", loglvl);
> +	arch_stack_walk(print_trace_address, (void *)loglvl, task, regs);
>   }
>   
>   void show_stack(struct task_struct *task, unsigned long *sp, const char *loglvl)
>   {
> -	pr_cont("%sCall Trace:\n", loglvl);
>   	dump_backtrace(NULL, task, loglvl);
>   }
>   
> @@ -171,17 +94,468 @@ unsigned long __get_wchan(struct task_struct *task)
>   
>   	if (!try_get_task_stack(task))
>   		return 0;
> -	walk_stackframe(task, NULL, save_wchan, &pc);
> +	arch_stack_walk(save_wchan, &pc, task, NULL);
>   	put_task_stack(task);
>   	return pc;
>   }
>   
> -noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> -		     struct task_struct *task, struct pt_regs *regs)
> +/*
> + * Frame-pointer-based kernel unwind infrastructure.
> + * Only compiled when CONFIG_FRAME_POINTER is enabled.
> + *
> + * See: arch/arm64/kernel/stacktrace.c for the reference implementation.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +
> +/*
> + * Per-cpu stacks are only accessible when unwinding the current task in a
> + * non-preemptible context.
> + */
> +#define STACKINFO_CPU(task, name)				\
> +	({							\
> +		(((task) == current) && !preemptible())		\
> +			? stackinfo_get_##name()		\
> +			: stackinfo_get_unknown();		\
> +	})
> +
> +enum kunwind_source {
> +	KUNWIND_SOURCE_UNKNOWN,
> +	KUNWIND_SOURCE_FRAME,
> +	KUNWIND_SOURCE_CALLER,
> +	KUNWIND_SOURCE_TASK,
> +	KUNWIND_SOURCE_REGS_PC,
> +};
> +
> +union unwind_flags {
> +	unsigned long	all;
> +	struct {
> +		unsigned long	fgraph : 1,
> +				kretprobe : 1;
> +	};
> +};
> +
> +/*
> + * Kernel unwind state
> + *
> + * @common:    Common unwind state.
> + * @task:      The task being unwound.
> + * @graph_idx: Used by ftrace_graph_ret_addr() for optimized stack unwinding.
> + * @kr_cur:    When KRETPROBES is selected, holds the kretprobe instance
> + *             associated with the most recently encountered replacement ra
> + *             value.
> + */
> +struct kunwind_state {
> +	struct unwind_state common;
> +	struct task_struct *task;
> +	int graph_idx;
> +#ifdef CONFIG_KRETPROBES
> +	struct llist_node *kr_cur;
> +#endif
> +	enum kunwind_source source;
> +	union unwind_flags flags;
> +	struct pt_regs *regs;
> +};
> +
> +static __always_inline void
> +kunwind_init(struct kunwind_state *state,
> +	     struct task_struct *task)
> +{
> +	unwind_init_common(&state->common);
> +	state->task = task;
> +	state->source = KUNWIND_SOURCE_UNKNOWN;
> +	state->flags.all = 0;
> +	state->regs = NULL;
> +}
> +
> +/*
> + * Start an unwind from a pt_regs.
> + *
> + * The unwind will begin at the PC within the regs.
> + *
> + * The regs must be on a stack currently owned by the calling task.
> + */
> +static __always_inline void
> +kunwind_init_from_regs(struct kunwind_state *state,
> +		       struct pt_regs *regs)
> +{
> +	kunwind_init(state, current);
> +
> +	state->regs = regs;
> +	state->common.fp = frame_pointer(regs);
> +	state->common.pc = instruction_pointer(regs);
> +	state->source = KUNWIND_SOURCE_REGS_PC;
> +}
> +
> +/*
> + * Start an unwind from a caller.
> + *
> + * The unwind will begin at the caller of whichever function this is inlined
> + * into.
> + *
> + * The function which invokes this must be noinline.
> + */
> +static __always_inline void
> +kunwind_init_from_caller(struct kunwind_state *state)
> +{
> +	unsigned long fp = (unsigned long)__builtin_frame_address(0);
> +	struct frame_record *record = (struct frame_record *)fp - 1;
> +
> +	kunwind_init(state, current);
> +
> +	state->common.fp = READ_ONCE(record->fp);
> +	state->common.pc = READ_ONCE(record->ra);
> +	state->source = KUNWIND_SOURCE_CALLER;
> +}
> +
> +/*
> + * Start an unwind from a blocked task.
> + *
> + * The unwind will begin at the blocked task's saved PC (i.e. the caller of
> + * __switch_to).
> + *
> + * The caller should ensure the task is blocked in __switch_to for the
> + * duration of the unwind, or the unwind will be bogus. It is never valid to
> + * call this for the current task.
> + */
> +static __always_inline void
> +kunwind_init_from_task(struct kunwind_state *state,
> +		       struct task_struct *task)
> +{
> +	kunwind_init(state, task);
> +
> +	state->common.fp = task->thread.s[0];
> +	state->common.pc = task->thread.ra;
> +	state->source = KUNWIND_SOURCE_TASK;
> +}
> +
> +static __always_inline int
> +kunwind_recover_return_address(struct kunwind_state *state)
> +{
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +	if (state->task->ret_stack &&
> +	    state->common.pc == (unsigned long)return_to_handler) {
> +		unsigned long orig_pc;
> +
> +		orig_pc = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> +						state->common.pc,
> +						(void *)state->common.fp);
> +		if (state->common.pc == orig_pc) {
> +			WARN_ON_ONCE(state->task == current);
> +			return -EINVAL;
> +		}
> +		state->common.pc = orig_pc;
> +		state->flags.fgraph = 1;
> +	}
> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +#ifdef CONFIG_KRETPROBES
> +	if (is_kretprobe_trampoline(state->common.pc)) {
> +		unsigned long orig_pc;
> +
> +		orig_pc = kretprobe_find_ret_addr(state->task,
> +						  (void *)state->common.fp,
> +						  &state->kr_cur);
> +		if (!orig_pc)
> +			return -EINVAL;
> +		state->common.pc = orig_pc;
> +		state->flags.kretprobe = 1;
> +	}
> +#endif /* CONFIG_KRETPROBES */
> +
> +	return 0;
> +}
> +
> +/*
> + * When we reach an exception boundary marked by a metadata frame record,
> + * extract pt_regs from the stack and continue unwinding from the saved
> + * context (epc and s0/fp).
> + *
> + * On RISC-V, fp points above the metadata record, so the record's
> + * frame_record portion is at fp - sizeof(struct frame_record).
> + */
> +static __always_inline int
> +kunwind_next_regs_pc(struct kunwind_state *state)
> +{
> +	struct stack_info *info;
> +	unsigned long fp = state->common.fp;
> +	struct pt_regs *regs;
> +
> +	regs = container_of((unsigned long *)(fp - sizeof(struct frame_record)),
> +			    struct pt_regs, stackframe.record.fp);
> +
> +	info = unwind_find_stack(&state->common, (unsigned long)regs,
> +				 sizeof(*regs));
> +	if (!info)
> +		return -EINVAL;
> +
> +	unwind_consume_stack(&state->common, info, (unsigned long)regs,
> +			     sizeof(*regs));
> +
> +	state->regs = regs;
> +	state->common.pc = regs->epc;
> +	state->common.fp = frame_pointer(regs);
> +	state->regs = NULL;

state->regs is a dead field, and kunwind_next_regs_pc() clears
         it in a way that contradicts both arm64 and your own
         init_from_regs.

struct kunwind_state has a `struct pt_regs *regs`, but I can't find any
reader of it anywhere in the file — arch_kunwind_consume_entry() and
arch_reliable_kunwind_consume_entry() only ever read common.pc and
source. It is written in three places:

     kunwind_init():           state->regs = NULL;
     kunwind_init_from_regs():  state->regs = regs;     /* not cleared */
     kunwind_next_regs_pc():    state->regs = regs;
                                state->common.pc = regs->epc;
                                state->common.fp = frame_pointer(regs);
                                state->regs = NULL;      /* cleared! */

Two things:

   (a) The field has no consumer, so it's currently dead.

   (b) In kunwind_next_regs_pc() you set state->regs = regs and then
       immediately reset it to NULL two lines later. The arm64 reference
       does *not* clear it there, and your own kunwind_init_from_regs()
       leaves it set. So the three REGS_PC producers disagree on whether
       ->regs is valid.

It's harmless today only because nothing reads ->regs. But the moment
someone adds a consumer (e.g. to expose the pt_regs at an exception
boundary for a reliable dump), the stray `state->regs = NULL;` in
kunwind_next_regs_pc() becomes a silent bug.

Please either:
   - drop the field and all three writes, if it's genuinely unused, or
   - keep it and remove the `state->regs = NULL;` in
     kunwind_next_regs_pc() so it matches arm64 and init_from_regs.

> +	state->source = KUNWIND_SOURCE_REGS_PC;
> +	return 0;
> +}
> +
> +/*
> + * Handle a metadata frame record embedded in pt_regs.
> + *
> + * On RISC-V, fp points above the record (fp = metadata + 16), so the
> + * frame_record_meta starts at fp - sizeof(struct frame_record).
> + *
> + * FRAME_META_TYPE_FINAL: This is the outermost exception entry
> + *   (user -> kernel). Unwinding terminates successfully.
> + * FRAME_META_TYPE_PT_REGS: This is a nested exception entry
> + *   (kernel -> kernel). Continue unwinding from the saved context.
> + */
> +static __always_inline int
> +kunwind_next_frame_record_meta(struct kunwind_state *state)
> +{
> +	struct task_struct *tsk = state->task;
> +	unsigned long fp = state->common.fp;
> +	unsigned long meta_base = fp - sizeof(struct frame_record);
> +	struct frame_record_meta *meta;
> +	struct stack_info *info;
> +
> +	info = unwind_find_stack(&state->common, meta_base, sizeof(*meta));
> +	if (!info)
> +		return -EINVAL;
> +
> +	meta = (struct frame_record_meta *)meta_base;
> +	switch (READ_ONCE(meta->type)) {
> +	case FRAME_META_TYPE_FINAL:
> +		if (meta == &task_pt_regs(tsk)->stackframe)
> +			return -ENOENT;
> +		WARN_ON_ONCE(tsk == current);
> +		return -EINVAL;
> +	case FRAME_META_TYPE_PT_REGS:
> +		return kunwind_next_regs_pc(state);
> +	default:
> +		WARN_ON_ONCE(tsk == current);
> +		return -EINVAL;
> +	}
> +}
> +
> +/*
> + * Unwind from one frame record to the next.
> + *
> + * On RISC-V, the frame record sits at fp - sizeof(struct frame_record),
> + * immediately below the address pointed to by fp/s0. This applies to both
> + * normal frame records and metadata frame records (embedded in pt_regs).
> + *
> + * A metadata record is identified by both fp and ra being zero in the
> + * frame_record portion, with a type value following at fp + 16.
> + */
> +static __always_inline int
> +kunwind_next_frame_record(struct kunwind_state *state)
> +{
> +	unsigned long fp = state->common.fp;
> +	struct frame_record *record;
> +	struct stack_info *info;
> +	unsigned long new_fp, new_pc;
> +	unsigned long record_base;
> +
> +	if (fp & 0x7)
> +		return -EINVAL;
> +
> +	record_base = fp - sizeof(*record);
> +
> +	info = unwind_find_stack(&state->common, record_base, sizeof(*record));
> +	if (!info)
> +		return -EINVAL;
> +
> +	record = (struct frame_record *)record_base;
> +	new_fp = READ_ONCE(record->fp);
> +	new_pc = READ_ONCE(record->ra);
> +
> +	if (!new_fp && !new_pc)
> +		return kunwind_next_frame_record_meta(state);
> +
> +	unwind_consume_stack(&state->common, info, record_base,
> +			     sizeof(*record));
> +
> +	state->common.fp = new_fp;
> +	state->common.pc = new_pc;
> +	state->source = KUNWIND_SOURCE_FRAME;
> +
> +	return 0;
> +}
> +
> +/*
> + * Unwind from one frame record (A) to the next frame record (B).
> + *
> + * We terminate early if the location of B indicates a malformed chain of frame
> + * records (e.g. a cycle), determined based on the location and fp value of A
> + * and the location (but not the fp value) of B.
> + */
> +static __always_inline int
> +kunwind_next(struct kunwind_state *state)
> +{
> +	int err;
> +
> +	state->flags.all = 0;
> +
> +	switch (state->source) {
> +	case KUNWIND_SOURCE_FRAME:
> +	case KUNWIND_SOURCE_CALLER:
> +	case KUNWIND_SOURCE_TASK:
> +	case KUNWIND_SOURCE_REGS_PC:
> +		err = kunwind_next_frame_record(state);
> +		break;
> +	default:
> +		err = -EINVAL;
> +	}
> +
> +	if (err)
> +		return err;
> +
> +	return kunwind_recover_return_address(state);
> +}
> +
> +typedef bool (*kunwind_consume_fn)(const struct kunwind_state *state, void *cookie);
> +
> +static __always_inline int
> +do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state,
> +	   void *cookie)
> +{
> +	int ret;
> +
> +	ret = kunwind_recover_return_address(state);
> +	if (ret)
> +		return ret;
> +
> +	while (1) {
> +		if (!consume_state(state, cookie))
> +			return -EINVAL;
> +		ret = kunwind_next(state);
> +		if (ret == -ENOENT)
> +			return 0;
> +		if (ret < 0)
> +			return ret;
> +	}
> +}
> +
> +static __always_inline int
> +kunwind_stack_walk(kunwind_consume_fn consume_state,
> +		   void *cookie, struct task_struct *task,
> +		   struct pt_regs *regs)
> +{
> +	struct task_struct *tsk = task ?: current;
> +	struct stack_info stacks[] = {
> +		stackinfo_get_task(tsk),
> +		STACKINFO_CPU(tsk, irq),
> +#ifdef CONFIG_VMAP_STACK
> +		STACKINFO_CPU(tsk, overflow),
> +#endif
> +	};
> +	struct kunwind_state state = {
> +		.common = {
> +			.stacks = stacks,
> +			.nr_stacks = ARRAY_SIZE(stacks),
> +		},
> +	};
> +
> +	if (regs) {
> +		if (tsk != current)
> +			return -EINVAL;
> +		kunwind_init_from_regs(&state, regs);
> +	} else if (tsk == current) {
> +		kunwind_init_from_caller(&state);
> +	} else {
> +		kunwind_init_from_task(&state, tsk);
> +	}
> +
> +	return do_kunwind(&state, consume_state, cookie);
> +}
> +
> +struct kunwind_consume_entry_data {
> +	stack_trace_consume_fn consume_entry;
> +	void *cookie;
> +};
> +
> +static __always_inline bool
> +arch_kunwind_consume_entry(const struct kunwind_state *state, void *cookie)
> +{
> +	struct kunwind_consume_entry_data *data = cookie;
> +
> +	return data->consume_entry(data->cookie, state->common.pc);
> +}
> +
> +static __always_inline bool
> +arch_reliable_kunwind_consume_entry(const struct kunwind_state *state, void *cookie)
> +{
> +	/*
> +	 * At an exception boundary we can reliably consume the saved PC. We do
> +	 * not know whether the LR was live when the exception was taken, and

Nit: s/LR/ra/ here. RISC-V has no link register; the equivalent is the
return-address register ra (x1). You already localized this correctly in
the kr_cur docstring ("replacement ra value"), so this comment is just an
oversight.

Thanks.
Shuai


^ permalink raw reply

* Re: [syzbot] [trace?] KASAN: use-after-free Write in ring_buffer_read_page
From: Masami Hiramatsu @ 2026-06-03  1:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: syzbot, linux-kernel, linux-trace-kernel, mathieu.desnoyers,
	mhiramat, syzkaller-bugs
In-Reply-To: <20260602122829.4a91864f@gandalf.local.home>

On Tue, 2 Jun 2026 12:28:29 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 02 Jun 2026 06:45:31 -0700
> syzbot <syzbot+2dd9d02f60775ce5c1fb@syzkaller.appspotmail.com> wrote:
> 
> > syzbot found the following issue on:
> > 
> > HEAD commit:    e7ae89a0c97c Linux 7.1-rc5
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16f06e2e580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=58acee1ac5406016
> > dashboard link: https://syzkaller.appspot.com/bug?extid=2dd9d02f60775ce5c1fb
> > compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> > 
> > Unfortunately, I don't have any reproducer for this issue yet.
> 
> Looks like the test was doing something really weird to trigger this.
> Without a reproducer, it's pretty much impossible to find out what
> happened. Maybe AI could do it?
> 

Does the "I don't have any reproducer for this issue yet." means
this is not reproducible even if it runs completely same sequence
in the console output? If so, might this be a timing related issue?
(e.g. read v.s. write-event)

Thanks,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v2 5/8] riscv: stacktrace: introduce stack-bound tracking helpers
From: Shuai Xue @ 2026-06-03  1:23 UTC (permalink / raw)
  To: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: Steven Rostedt, Alexandre Ghiti, Masami Hiramatsu, Mark Rutland,
	Catalin Marinas, Chen Pei, Andy Chiu, Björn Töpel,
	Deepak Gupta, Puranjay Mohan, Conor Dooley, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260528082310.1994388-6-wanghan@linux.alibaba.com>



On 5/28/26 4:23 PM, Wang Han wrote:
> A reliable unwinder needs to validate that every frame record it reads
> is fully contained in a known kernel stack, and it needs to refuse to
> walk back into a stack it has already left. Add the building blocks
> for that:
> 
>    * struct stack_info / struct unwind_state in a new
>      asm/stacktrace/common.h, modelled on the arm64 reference
>      implementation.
>    * stackinfo_get_irq() / stackinfo_get_task() / stackinfo_get_overflow()
>      plus the corresponding on_*_stack() predicates in asm/stacktrace.h,
>      so callers can ask "is this object on stack X?" by stack kind
>      rather than open-coded address arithmetic.
>    * unwind_init_common(), unwind_find_stack() and
>      unwind_consume_stack() helpers that enforce the
>      forward-progress-only invariant required for reliability.
> 
> No existing user is wired up to these helpers in this commit; the
> unwinder switch comes in a follow-up. The header changes leave
> on_thread_stack() with the same semantics as before, just expressed in
> terms of the new helpers.
> 
> Signed-off-by: Wang Han <wanghan@linux.alibaba.com>
> ---
>   arch/riscv/include/asm/stacktrace.h        |  65 ++++++++-
>   arch/riscv/include/asm/stacktrace/common.h | 159 +++++++++++++++++++++
>   2 files changed, 222 insertions(+), 2 deletions(-)
>   create mode 100644 arch/riscv/include/asm/stacktrace/common.h
> 
> diff --git a/arch/riscv/include/asm/stacktrace.h b/arch/riscv/include/asm/stacktrace.h
> index b1495a7e06ce..bc87c4940379 100644
> --- a/arch/riscv/include/asm/stacktrace.h
> +++ b/arch/riscv/include/asm/stacktrace.h
> @@ -3,8 +3,13 @@
>   #ifndef _ASM_RISCV_STACKTRACE_H
>   #define _ASM_RISCV_STACKTRACE_H
>   
> +#include <linux/percpu.h>
>   #include <linux/sched.h>
> +#include <linux/sched/task_stack.h>
> +
> +#include <asm/irq_stack.h>
>   #include <asm/ptrace.h>
> +#include <asm/stacktrace/common.h>
>   
>   struct stackframe {
>   	unsigned long fp;
> @@ -16,14 +21,70 @@ extern void notrace walk_stackframe(struct task_struct *task, struct pt_regs *re
>   extern void dump_backtrace(struct pt_regs *regs, struct task_struct *task,
>   			   const char *loglvl);
>   
> -static inline bool on_thread_stack(void)
> +/*
> + * IRQ stack accessors
> + */
> +static inline struct stack_info stackinfo_get_irq(void)
> +{
> +	unsigned long low = (unsigned long)raw_cpu_read(irq_stack_ptr);
> +	unsigned long high = low + IRQ_STACK_SIZE;
> +
> +	return (struct stack_info) {
> +		.low = low,
> +		.high = high,
> +	};
> +}
> +
> +static inline bool on_irq_stack(unsigned long sp, unsigned long size)
> +{
> +	struct stack_info info = stackinfo_get_irq();
> +
> +	return stackinfo_on_stack(&info, sp, size);
> +}
> +
> +/*
> + * Task stack accessors
> + */
> +static inline struct stack_info stackinfo_get_task(const struct task_struct *tsk)
>   {
> -	return !(((unsigned long)(current->stack) ^ current_stack_pointer) & ~(THREAD_SIZE - 1));
> +	unsigned long low = (unsigned long)task_stack_page(tsk);
> +	unsigned long high = low + THREAD_SIZE;
> +
> +	return (struct stack_info) {
> +		.low = low,
> +		.high = high,
> +	};
> +}
> +
> +static inline bool on_task_stack(const struct task_struct *tsk,
> +				 unsigned long sp, unsigned long size)
> +{
> +	struct stack_info info = stackinfo_get_task(tsk);
> +
> +	return stackinfo_on_stack(&info, sp, size);
>   }
>   
> +/*
> + * Cast is necessary since current->stack is an opaque ptr.
> + */
> +#define on_thread_stack()	(on_task_stack(current, current_stack_pointer, 1))
>   
> +/*
> + * Overflow stack accessors
> + */
>   #ifdef CONFIG_VMAP_STACK
>   DECLARE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack);
> +
> +static inline struct stack_info stackinfo_get_overflow(void)
> +{
> +	unsigned long low = (unsigned long)raw_cpu_ptr(overflow_stack);
> +	unsigned long high = low + OVERFLOW_STACK_SIZE;
> +
> +	return (struct stack_info) {
> +		.low = low,
> +		.high = high,
> +	};
> +}
>   #endif /* CONFIG_VMAP_STACK */
>   
>   #endif /* _ASM_RISCV_STACKTRACE_H */
> diff --git a/arch/riscv/include/asm/stacktrace/common.h b/arch/riscv/include/asm/stacktrace/common.h
> new file mode 100644
> index 000000000000..87d6d40672f3
> --- /dev/null
> +++ b/arch/riscv/include/asm/stacktrace/common.h
> @@ -0,0 +1,159 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * RISC-V common stack unwinder types and helpers.
> + *
> + * See: arch/arm64/include/asm/stacktrace/common.h for the reference
> + * implementation.
> + *
> + * Copyright (C) 2024

Nit: The new common.h carries "Copyright (C) 2024", but this is a 2026
submission.

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>

Thanks.
Shuai


^ permalink raw reply

* Re: [PATCH v2] tracing: fix CFI violation in probestub test
From: Masami Hiramatsu @ 2026-06-03  1:20 UTC (permalink / raw)
  To: kernel test robot
  Cc: Eva Kurchatova, mhiramat, rostedt, oe-kbuild-all,
	linux-trace-kernel, linux-kernel, mathieu.desnoyers, peterz,
	jpoimboe, samitolvanen
In-Reply-To: <202606022312.7cKiQBmg-lkp@intel.com>

On Tue, 2 Jun 2026 23:40:51 +0200
kernel test robot <lkp@intel.com> wrote:

> Hi Eva,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on trace/for-next]
> [also build test ERROR on linus/master v6.16-rc1 next-20260602]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Eva-Kurchatova/tracing-fix-CFI-violation-in-probestub-test/20260602-222302
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace for-next
> patch link:    https://lore.kernel.org/r/20260602135425.542073-1-eva.kurchatova%40virtuozzo.com
> patch subject: [PATCH v2] tracing: fix CFI violation in probestub test
> config: x86_64-rhel-9.4-kselftests (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/config)
> compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202606022312.7cKiQBmg-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from drivers/dma-buf/sync_trace.h:10,
>                     from drivers/dma-buf/sw_sync.c:18:
>    include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~

Hmm, it seems that this macro is not defined in this build
configuration? Maybe we need:

#include <linux/cfi.h>

instead of asm/cfi.h?

Thanks,

>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
>       12 | TRACE_EVENT(sync_timeline,
>          | ^~~~~~~~~~~
>    include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
>       12 | TRACE_EVENT(sync_timeline,
>          | ^~~~~~~~~~~
> >> include/trace/../../drivers/dma-buf/sync_trace.h:13:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
>       13 |         TP_PROTO(struct sync_timeline *timeline),
>          |                         ^~~~~~~~~~~~~
>    include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
>      394 |         void __probestub_##_name(void *__data, proto)                   \
>          |                                                ^~~~~
>    include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |                                         ^~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |                            ^~~~~~
>    include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
>       12 | TRACE_EVENT(sync_timeline,
>          | ^~~~~~~~~~~
>    include/trace/../../drivers/dma-buf/sync_trace.h:13:9: note: in expansion of macro 'TP_PROTO'
>       13 |         TP_PROTO(struct sync_timeline *timeline),
>          |         ^~~~~~~~
> --
>    In file included from include/trace/events/lock.h:9,
>                     from kernel/locking/mutex.c:35:
>    include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
>       24 | TRACE_EVENT(lock_acquire,
>          | ^~~~~~~~~~~
>    include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
>       24 | TRACE_EVENT(lock_acquire,
>          | ^~~~~~~~~~~
> >> include/trace/events/lock.h:28:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
>       28 |                 struct lockdep_map *next_lock, unsigned long ip),
>          |                        ^~~~~~~~~~~
>    include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
>      394 |         void __probestub_##_name(void *__data, proto)                   \
>          |                                                ^~~~~
>    include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |                                         ^~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |                            ^~~~~~
>    include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
>       24 | TRACE_EVENT(lock_acquire,
>          | ^~~~~~~~~~~
>    include/trace/events/lock.h:26:9: note: in expansion of macro 'TP_PROTO'
>       26 |         TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
>          |         ^~~~~~~~
>    include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
>       61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
>       69 | DEFINE_EVENT(lock, lock_release,
>          | ^~~~~~~~~~~~
>    include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
>       61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
>       69 | DEFINE_EVENT(lock, lock_release,
>          | ^~~~~~~~~~~~
>    include/trace/events/lock.h:71:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
>       71 |         TP_PROTO(struct lockdep_map *lock, unsigned long ip),
>          |                         ^~~~~~~~~~~
>    include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
>      394 |         void __probestub_##_name(void *__data, proto)                   \
>          |                                                ^~~~~
>    include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |                                         ^~~~~~
>    include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
>       61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/define_trace.h:61:28: note: in expansion of macro 'PARAMS'
>       61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |                            ^~~~~~
>    include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
>       69 | DEFINE_EVENT(lock, lock_release,
>          | ^~~~~~~~~~~~
>    include/trace/events/lock.h:71:9: note: in expansion of macro 'TP_PROTO'
>       71 |         TP_PROTO(struct lockdep_map *lock, unsigned long ip),
>          |         ^~~~~~~~
>    include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
>       95 | TRACE_EVENT(contention_begin,
>          | ^~~~~~~~~~~
>    include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
>       95 | TRACE_EVENT(contention_begin,
>          | ^~~~~~~~~~~
>    include/linux/tracepoint.h:380:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
>      380 |                 struct tracepoint_func *it_func_ptr;                    \
>          |                        ^~~~~~~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
>       95 | TRACE_EVENT(contention_begin,
> 
> 
> vim +13 include/trace/../../drivers/dma-buf/sync_trace.h
> 
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  11  
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  12  TRACE_EVENT(sync_timeline,
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28 @13  	TP_PROTO(struct sync_timeline *timeline),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  14  
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  15  	TP_ARGS(timeline),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  16  
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  17  	TP_STRUCT__entry(
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  18  			__string(name, timeline->name)
> 5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan         2016-05-31  19  			__field(u32, value)
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  20  	),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  21  
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  22  	TP_fast_assign(
> 2c92ca849fcc6ee drivers/dma-buf/sync_trace.h         Steven Rostedt (Google  2024-05-16  23) 			__assign_str(name);
> 5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan         2016-05-31  24  			__entry->value = timeline->value;
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  25  	),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  26  
> 5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan         2016-05-31  27  	TP_printk("name=%s value=%d", __get_str(name), __entry->value)
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  28  );
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  29  
> 
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox