Linux Trace Kernel
 help / color / mirror / Atom feed
* Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Balbir Singh @ 2026-06-03  5:00 UTC (permalink / raw)
  To: Gregory Price
  Cc: lsf-pc, linux-kernel, linux-cxl, cgroups, linux-mm,
	linux-trace-kernel, damon, kernel-team, gregkh, rafael, dakr,
	dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, longman, akpm, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	osalvador, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, axelrasmussen, yuanchu, weixugc,
	yury.norov, linux, mhiramat, mathieu.desnoyers, tj, hannes,
	mkoutny, jackmanb, sj, baolin.wang, npache, ryan.roberts,
	dev.jain, baohua, lance.yang, muchun.song, xu.xin16,
	chengming.zhou, jannh, linmiaohe, nao.horiguchi, pfalcato,
	rientjes, shakeel.butt, riel, harry.yoo, cl, roman.gushchin,
	chrisl, kasong, shikemeng, nphamcs, bhe, zhengqi.arch,
	terry.bowman
In-Reply-To: <ah6bDNxlB1zBUnzN@gourry-fedora-PF4VCD3F>

On Tue, Jun 02, 2026 at 09:57:48AM +0100, Gregory Price wrote:
> On Tue, Jun 02, 2026 at 12:16:50PM +1000, Balbir Singh wrote:
> > On Sun, May 24, 2026 at 09:50:06PM -0400, Gregory Price wrote:
> > > 
> > > I'm debating on whether to include OPS_MEMPOLICY in the initial version
> > > if only because it's not intuitive how it interacts with pagecache. That
> > > needs more time to bake.
> > >
> > 
> > It makes sense to look at it and then decide if it makes sense.
> >
> 
> I am thinking i will ship without any OPS flags at all for now and the
> have the introduction of ops as a separate series.
> 
> > > alloc_pages_node() is the kernel interface
> > 
> > I was think we wouldn't need explicit flags and that allocations would
> > happen from user space using __GFP_THISNODE to the node or via a nodemask
> > based on nodes of interest. Is there a reason to add this flag, a system
> > might have more than one source of N_MEMORY_PRIVATE?
> > 
> 
> There's a few things to unpack here.  I discussed this many times on
> list and at LSF, but to reiterate.
> 
> 1) __GFP_THISNODE is insufficient to enforce isolation and otherwise
>    not particularly useful.  Additionally, from userland, it's not
>    something you can actually set.

I was thinking mbind()/mempolicy() is how we get to it. It already
accepts a nodemask.

> 
>    for node in possible_nodes:
>        alloc_pages_node(private_node, __GFP_THISNODE)
> 
>    In fact it's the opposite semantic of what we want.
>    THISNODE says: "Do not fallback back to OTHER nodes".
> 

That's why we need to control the fallback nodes carefully for
N_MEMORY_PRIVATE

>    The semantic we want is "Do not allow allocations from private
>    nodes UNLESS we specifically request" (__GFP_PRIVATE).
> 
>    __GFP_THISNODE does not actually buy you anything here, AND it's
>    worse, in the scenario where a private node makes its way into the
>    preferred slot (via possible_nodes or some other nodemask), the
>    allocator cannot fall back to a node it can access.
> 
>    __GFP_THISNODE cannot be overloaded to do anything useful here.

Let me clarify, I meant to say, let's use a nodemask for allocation
and __GFP_THISNODE gets us to the node we desire, if that is the only
node. My earlier comment might not have been clear.

> 
> 2) We're trying not to expose *ANY* userland APIs for this, at all.
> 
>    The ultimate goal here should be one of two things:
> 
>    1) fd = open(/dev/xxx, ...);
>       mem = mmap(fd, ...);
>       mem[0] = 0xDEADBEEF; /* Fault device page into page table */
> 
>       In this case, the driver is responsible for doing the
>       alloc_pages_node() call.
> 
>    or
> 
>    2) mem = mmap(NULL, ..., ANON);
>       mbind(mem, ..., private_node);
>       mem[0] = 0xDEADBEEF; /* Fault device page into page table */
> 
>       in this case mempolicy.c is responsible for doing the
>       alloc_pages_node() call via the _mpol() alloc variants.
> 
> Addition OPT flags (reclaim, compaction, whatever), would
> (optionally) allow mm/ to operate on the device memory with, for
> example, mmu_notifier callbacks to tell the device to invalidate
> whatever it's caching about that page.
> 
> This would all be relatively transparent the userland, all userland
> "knows" is that it's getting memory from a device (/dev/xxx) or a
> node it's otherwise aware of hosting device memory somehow.
> 

Why not use mbind() API's? Do we want to gate allocation/privileges
via a /dev?

Balbir

^ permalink raw reply

* Re: [PATCH v8 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE
From: Miaohe Lin @ 2026-06-03  2:33 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Breno Leitao
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team, Lance Yang, Andrew Morton,
	Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, Naoya Horiguchi,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Liam R. Howlett
In-Reply-To: <21732071-14a1-486a-951c-34de97b7c757@kernel.org>

On 2026/6/2 17:41, David Hildenbrand (Arm) wrote:
> On 6/2/26 05:08, Miaohe Lin wrote:
>> On 2026/6/1 21:22, David Hildenbrand (Arm) wrote:
>>> On 6/1/26 14:28, Miaohe Lin wrote:
>>>>
>>>> Thanks for your patch.
>>>>
>>>>
>>>> Once shake_page finds a lightweight range-based way to shrink slab, slab pages could be freed
>>>> into buddy and above PageSlab test should be removed then. Maybe add a TODO or XXX here?
>>>>
>>>>
>>>> I'm not sure but is it safe or a common way to test PageReserved, PageSlab,
>>>> PageTable and PageLargeKmalloc without extra page refcnt?
>>>
>>> Checking typed pages in a racy fashion is fine (PageSlab, PageTable,
>>> PageLargeKmalloc).
>>
>> Got it. Thanks.
>>
>>> Checking PageReserved in a racy fashion is fine as well. TESTPAGEFLAG() will
>>> allow checking it on compound pages.
>>
>> It seems PageReserved is not intended to be set on compound pages. I see there are PF_NO_COMPOUND
>> in its definition: PAGEFLAG(Reserved, reserved, PF_NO_COMPOUND).
>>
>>>
>>> For PageLargeKmalloc, we would want to check the head page, though. The page
>>> type is only stored for the head page.
>>
>> Maybe we should check the head page for PageSlab and PageTable too? alloc_slab_page only
>> set PageSlab on the head page and __pagetable_ctor uses __folio_set_pgtable to set PageTable
>> on folio.
>>
>>>
>>> So maybe we want to lookup the compound head (if any) and perform the type
>>> checks against that?
>>
>> Maybe we should or we might miss some pages that could have been handled. And
>> if compound head is required, should we hold an extra page refcnt to guard against
>> possible folio split race?
> 
> Races are fine. We might miss some pages, but that can happen on races either way.
> 
> 
> I'd just do something like
> 
> if (PageReserved(page))
> 	return true;
> 
> head = compound_head(page);

If @head is split just after compound_head. And then @head is freed into buddy and re-allocated as slab
page while @page is still in the buddy. We would panic on this scene as @head is PageSlab. But we were
supposed to successfully handle @page. Or am I miss something?

Thanks.
.

> return PageSlab(head) || ...;
> 	
> 


^ permalink raw reply

* Re: [PATCH v2 1/8] scripts/sorttable: Handle RISC-V patchable ftrace entries
From: Shuai Xue @ 2026-06-03  2:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Masami Hiramatsu, Mark Rutland, Catalin Marinas,
	Chen Pei, Andy Chiu, Björn Töpel, Deepak Gupta,
	Puranjay Mohan, Conor Dooley, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence, Shuah Khan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260601095746.70c01d24@fedora>



On 6/1/26 9:57 PM, Steven Rostedt wrote:
> On Mon, 1 Jun 2026 14:17:08 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> 
>>> diff --git a/scripts/sorttable.c b/scripts/sorttable.c
>>> index e8ed11c680c6..4c10e85bb5af 100644
>>> --- a/scripts/sorttable.c
>>> +++ b/scripts/sorttable.c
>>> @@ -891,17 +891,21 @@ static int do_file(char const *const fname, void *addr)
>>>    	table_sort_t custom_sort = NULL;
>>>    
>>>    	switch (elf_map_machine(ehdr)) {
>>> -	case EM_AARCH64:
>>>    #ifdef MCOUNT_SORT_ENABLED
>>> +	case EM_AARCH64:
>>>    		sort_reloc = true;
>>>    		rela_type = 0x403;
>>> -		/* arm64 uses patchable function entry placing before function */
>>> +		/* fallthrough */
>>> +	case EM_RISCV:
>>> +		/* arm64 and RISC-V place patchable entries before the function */
>>>    		before_func = 8;
>>
>> Nit: The shared comment now sits under `case EM_RISCV:` but the two
>> lines above it (sort_reloc / rela_type = 0x403) are strictly
>> arm64-only — they configure the RELA-based weak-function fixup that
>> RISC-V does not need. On a quick read it is easy to wonder if RISC-V
>> is implicitly inheriting that path. Splitting the comments would
>> help, e.g.:
>>
>>          case EM_AARCH64:
>>              /* arm64 needs RELA-based weak-function fixup */
>>              sort_reloc = true;
>>              rela_type = 0x403;
>>              /* fallthrough */
>>          case EM_RISCV:
>>              /* arm64 and RISC-V place patchable entries before the function */
>>              before_func = 8;
> 
> Makes sense.
> 
> Care to send a v3?
> 
> -- Steve

Hi, Steve,

It's a pure comment cosmetic, not worth a respin on its own. But for the
rest of the feedback on this series (the frame-record metadata contract
in patch 2 and the dead state->regs field / Call Trace output change in
patch 6) are the ones actually worth a new version.

Just to get the routing straight: are you planning to pick this one up
through the tracing tree on its own?

It feels like a good candidate for that -- it's an independent
regression fix (Fixes: 0ca1724b56af) that breaks *all* RISC-V dynamic
ftrace, not just livepatch, so it shouldn't have to wait on the rest of
the livepatch series.

Thanks.
Shuai



^ permalink raw reply

* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Masami Hiramatsu @ 2026-06-03  1:58 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Steven Rostedt, Andrew Morton, Petr Mladek, Nathan Chancellor,
	Arnd Bergmann, Dennis Dalessandro, Jason Gunthorpe,
	Leon Romanovsky, Arend van Spriel, Miri Korenblit,
	Mathieu Desnoyers, Andy Shevchenko, Rasmus Villemoes,
	Sergey Senozhatsky, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Vlastimil Babka, linux-rdma, linux-kernel, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <20260602150904.2258624-1-arnd@kernel.org>

On Tue,  2 Jun 2026 17:07:05 +0200
Arnd Bergmann <arnd@kernel.org> wrote:

> diff --git a/include/linux/sprintf.h b/include/linux/sprintf.h
> index f06f7b785091..036a247b7c1e 100644
> --- a/include/linux/sprintf.h
> +++ b/include/linux/sprintf.h
> @@ -12,6 +12,7 @@ __printf(2, 3) int sprintf(char *buf, const char * fmt, ...);
>  __printf(2, 0) int vsprintf(char *buf, const char *, va_list);
>  __printf(3, 4) int snprintf(char *buf, size_t size, const char *fmt, ...);
>  __printf(3, 0) int vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
> +int __vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
>  __printf(3, 4) int scnprintf(char *buf, size_t size, const char *fmt, ...);
>  __printf(3, 0) int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
>  __printf(2, 3) __malloc char *kasprintf(gfp_t gfp, const char *fmt, ...);
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index d49338c44014..4715330c7b6b 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -962,7 +962,7 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
>  	int __ret;					\
>  							\
>  	va_copy(__ap, *(va));				\
> -	__ret = vsnprintf(NULL, 0, fmt, __ap) + 1;	\
> +	__ret = __vsnprintf(NULL, 0, fmt, __ap) + 1;	\
>  	va_end(__ap);					\
>  							\
>  	min(__ret, TRACE_EVENT_STR_MAX);		\

I think this is a slightly confusing name. What about vsnprintf_nocheck()?

Thanks,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v2 8/8] selftests/livepatch: Add RISC-V syscall wrapper prefix
From: Shuai Xue @ 2026-06-03  1:54 UTC (permalink / raw)
  To: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: Steven Rostedt, Alexandre Ghiti, Masami Hiramatsu, Mark Rutland,
	Catalin Marinas, Chen Pei, Andy Chiu, Björn Töpel,
	Deepak Gupta, Puranjay Mohan, Conor Dooley, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260528082310.1994388-9-wanghan@linux.alibaba.com>



On 5/28/26 4:23 PM, Wang Han wrote:
> The syscall livepatch selftest resolves and patches a syscall wrapper
> symbol. To use that test for RISC-V livepatch validation, add the
> RISC-V FN_PREFIX definition for ARCH_HAS_SYSCALL_WRAPPER.
> 
> Without this macro, the syscall livepatch selftest cannot resolve the
> RISC-V target symbol, and the syscall-related livepatch test fails on
> RISC-V.
> 
> Signed-off-by: Wang Han <wanghan@linux.alibaba.com>
> ---
>   .../testing/selftests/livepatch/test_modules/test_klp_syscall.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c b/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
> index dd802783ea84..275e4b10cf59 100644
> --- a/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
> +++ b/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
> @@ -18,6 +18,8 @@
>   #define FN_PREFIX __s390x_
>   #elif defined(__aarch64__)
>   #define FN_PREFIX __arm64_
> +#elif defined(__riscv)
> +#define FN_PREFIX __riscv_
>   #else
>   /* powerpc does not select ARCH_HAS_SYSCALL_WRAPPER */
>   #define FN_PREFIX

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>

Thanks.
Shuai

^ permalink raw reply

* Re: [PATCH v2 7/8] riscv: Kconfig: enable HAVE_RELIABLE_STACKTRACE and HAVE_LIVEPATCH
From: Shuai Xue @ 2026-06-03  1:49 UTC (permalink / raw)
  To: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: Steven Rostedt, Alexandre Ghiti, Masami Hiramatsu, Mark Rutland,
	Catalin Marinas, Chen Pei, Andy Chiu, Björn Töpel,
	Deepak Gupta, Puranjay Mohan, Conor Dooley, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260528082310.1994388-8-wanghan@linux.alibaba.com>



On 5/28/26 4:23 PM, Wang Han wrote:
> Now that the metadata frame records, the kunwind state machine and
> arch_stack_walk_reliable() are all in place, advertise the capability
> to the rest of the kernel:
> 
>    * select HAVE_RELIABLE_STACKTRACE under FRAME_POINTER && 64BIT, so
>      only the configurations that actually have the metadata records
>      and the FP-based reliable walker enable it.


The 64BIT gate is conservative scoping rather than a hard technical
requirement: the metadata frame record, kunwind state machine and
arch_stack_walk_reliable() all build on RV32 too (and the
call_on_irq_stack change in patch 2/8 actually fixes a latent RV32
issue). However, the syscall livepatch selftest and module relocation
path have only been exercised on RV64 (QEMU virt SMP=2/4/8). The
64BIT gate can be dropped in a follow-up once RV32 has equivalent
coverage.

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>

Thanks.
Shuai

^ permalink raw reply

* Re: [PATCH v2 6/8] riscv: stacktrace: switch to frame-pointer based unwinder
From: Shuai Xue @ 2026-06-03  1:35 UTC (permalink / raw)
  To: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: Steven Rostedt, Alexandre Ghiti, Masami Hiramatsu, Mark Rutland,
	Catalin Marinas, Chen Pei, Andy Chiu, Björn Töpel,
	Deepak Gupta, Puranjay Mohan, Conor Dooley, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260528082310.1994388-7-wanghan@linux.alibaba.com>



On 5/28/26 4:23 PM, Wang Han wrote:
> Replace the open-coded frame-pointer walker in arch_stack_walk() with a
> robust kunwind state machine, modelled on arch/arm64/kernel/stacktrace.c
> and retargeted to the RISC-V {fp, ra} frame record convention. The new
> walker tracks stack bounds, consumes frame records monotonically,
> understands the metadata pt_regs records added in the previous frame
> record metadata patch, and recovers return addresses replaced by
> function graph tracing and kretprobes.
> 
> This commit introduces arch_stack_walk_reliable() but does not yet
> select HAVE_RELIABLE_STACKTRACE; that is done in a follow-up Kconfig
> patch so this commit can be reviewed and bisected as a pure unwinder
> replacement. Until that Kconfig change lands, livepatch is not yet
> enabled and arch_stack_walk_reliable() has no in-tree caller.
> 
> Three related callers are updated to keep the same frame-record
> assumptions everywhere:
> 
>    * Function graph tracing: the old RISC-V unwinder matched function
>      graph return-stack entries by the saved return-address slot. That
>      was consistent with the static mcount path, but not with the dynamic
>      ftrace path where the parent slot is ftrace_regs::ra. Use the
>      architectural frame pointer as the function graph return-address
>      cookie, matching the kunwind walker.
> 
>    * Perf callchains: route kernel callchain collection through
>      arch_stack_walk() so perf sees the same frame-pointer unwind
>      behaviour as dump_stack() and the upcoming livepatch path.
> 
>    * dump_backtrace() / __get_wchan() / show_stack(): these now go
>      through arch_stack_walk(); the explicit "Call Trace:" header is
>      moved into dump_backtrace() to preserve the original output.
> 
> The non-frame-pointer fallback walker is kept untouched for
> !CONFIG_FRAME_POINTER builds.
> 
> Signed-off-by: Wang Han <wanghan@linux.alibaba.com>
> ---
>   arch/riscv/kernel/ftrace.c         |   6 +-
>   arch/riscv/kernel/perf_callchain.c |   2 +-
>   arch/riscv/kernel/stacktrace.c     | 560 ++++++++++++++++++++++++-----
>   3 files changed, 472 insertions(+), 96 deletions(-)
> 
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index b430edfb83f4..5d55199a9230 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -242,7 +242,8 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
>   	 */
>   	old = *parent;
>   
> -	if (!function_graph_enter(old, self_addr, frame_pointer, parent))
> +	if (!function_graph_enter(old, self_addr, frame_pointer,
> +				  (void *)frame_pointer))
>   		*parent = return_hooker;
>   }
>   
> @@ -264,7 +265,8 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
>   	 */
>   	old = *parent;
>   
> -	if (!function_graph_enter_regs(old, ip, frame_pointer, parent, fregs))
> +	if (!function_graph_enter_regs(old, ip, frame_pointer,
> +				       (void *)frame_pointer, fregs))
>   		*parent = return_hooker;
>   }
>   #endif /* CONFIG_DYNAMIC_FTRACE */
> diff --git a/arch/riscv/kernel/perf_callchain.c b/arch/riscv/kernel/perf_callchain.c
> index b465bc9eb870..436af96ea59c 100644
> --- a/arch/riscv/kernel/perf_callchain.c
> +++ b/arch/riscv/kernel/perf_callchain.c
> @@ -44,5 +44,5 @@ void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
>   		return;
>   	}
>   
> -	walk_stackframe(NULL, regs, fill_callchain, entry);
> +	arch_stack_walk(fill_callchain, entry, NULL, regs);
>   }
> diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
> index 2692d3a06afa..0d76320b3a29 100644
> --- a/arch/riscv/kernel/stacktrace.c
> +++ b/arch/riscv/kernel/stacktrace.c
> @@ -11,98 +11,16 @@
>   #include <linux/sched/task_stack.h>
>   #include <linux/stacktrace.h>
>   #include <linux/ftrace.h>
> +#include <linux/kprobes.h>
> +#include <linux/llist.h>
>   
>   #include <asm/stacktrace.h>
>   
> -#ifdef CONFIG_FRAME_POINTER
> -
>   /*
> - * This disables KASAN checking when reading a value from another task's stack,
> - * since the other task could be running on another CPU and could have poisoned
> - * the stack in the meantime.
> + * Non-frame-pointer fallback unwinder.
> + * Only compiled when CONFIG_FRAME_POINTER is not enabled.
>    */
> -#define READ_ONCE_TASK_STACK(task, x)			\
> -({							\
> -	unsigned long val;				\
> -	unsigned long addr = x;				\
> -	if ((task) == current)				\
> -		val = READ_ONCE(addr);			\
> -	else						\
> -		val = READ_ONCE_NOCHECK(addr);		\
> -	val;						\
> -})
> -
> -extern asmlinkage void handle_exception(void);
> -extern unsigned long ret_from_exception_end;
> -
> -static inline int fp_is_valid(unsigned long fp, unsigned long sp)
> -{
> -	unsigned long low, high;
> -
> -	low = sp + sizeof(struct stackframe);
> -	high = ALIGN(sp, THREAD_SIZE);
> -
> -	return !(fp < low || fp > high || fp & 0x07);
> -}
> -
> -void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
> -			     bool (*fn)(void *, unsigned long), void *arg)
> -{
> -	unsigned long fp, sp, pc;
> -	int graph_idx = 0;
> -	int level = 0;
> -
> -	if (regs) {
> -		fp = frame_pointer(regs);
> -		sp = user_stack_pointer(regs);
> -		pc = instruction_pointer(regs);
> -	} else if (task == NULL || task == current) {
> -		fp = (unsigned long)__builtin_frame_address(0);
> -		sp = current_stack_pointer;
> -		pc = (unsigned long)walk_stackframe;
> -		level = -1;
> -	} else {
> -		/* task blocked in __switch_to */
> -		fp = task->thread.s[0];
> -		sp = task->thread.sp;
> -		pc = task->thread.ra;
> -	}
> -
> -	for (;;) {
> -		struct stackframe *frame;
> -
> -		if (unlikely(!__kernel_text_address(pc) || (level++ >= 0 && !fn(arg, pc))))
> -			break;
> -
> -		if (unlikely(!fp_is_valid(fp, sp)))
> -			break;
> -
> -		/* Unwind stack frame */
> -		frame = (struct stackframe *)fp - 1;
> -		sp = fp;
> -		if (regs && (regs->epc == pc) && fp_is_valid(frame->ra, sp)) {
> -			/* We hit function where ra is not saved on the stack */
> -			fp = frame->ra;
> -			pc = regs->ra;
> -		} else {
> -			fp = READ_ONCE_TASK_STACK(task, frame->fp);
> -			pc = READ_ONCE_TASK_STACK(task, frame->ra);
> -			pc = ftrace_graph_ret_addr(task, &graph_idx, pc,
> -						   &frame->ra);
> -			if (pc >= (unsigned long)handle_exception &&
> -			    pc < (unsigned long)&ret_from_exception_end) {
> -				if (unlikely(!fn(arg, pc)))
> -					break;
> -
> -				pc = ((struct pt_regs *)sp)->epc;
> -				fp = ((struct pt_regs *)sp)->s0;
> -			}
> -		}
> -
> -	}
> -}
> -
> -#else /* !CONFIG_FRAME_POINTER */
> +#ifndef CONFIG_FRAME_POINTER
>   
>   void notrace walk_stackframe(struct task_struct *task,
>   	struct pt_regs *regs, bool (*fn)(void *, unsigned long), void *arg)
> @@ -133,7 +51,12 @@ void notrace walk_stackframe(struct task_struct *task,
>   	}
>   }
>   
> -#endif /* CONFIG_FRAME_POINTER */
> +#endif /* !CONFIG_FRAME_POINTER */
> +
> +/*
> + * Common trace helpers.
> + * These are used by both the FP (kunwind) and non-FP (walk_stackframe) paths.
> + */
>   
>   static bool print_trace_address(void *arg, unsigned long pc)
>   {
> @@ -146,12 +69,12 @@ static bool print_trace_address(void *arg, unsigned long pc)
>   noinline void dump_backtrace(struct pt_regs *regs, struct task_struct *task,
>   		    const char *loglvl)
>   {
> -	walk_stackframe(task, regs, print_trace_address, (void *)loglvl);
> +	printk("%sCall Trace:\n", loglvl);
> +	arch_stack_walk(print_trace_address, (void *)loglvl, task, regs);
>   }
>   
>   void show_stack(struct task_struct *task, unsigned long *sp, const char *loglvl)
>   {
> -	pr_cont("%sCall Trace:\n", loglvl);
>   	dump_backtrace(NULL, task, loglvl);
>   }
>   
> @@ -171,17 +94,468 @@ unsigned long __get_wchan(struct task_struct *task)
>   
>   	if (!try_get_task_stack(task))
>   		return 0;
> -	walk_stackframe(task, NULL, save_wchan, &pc);
> +	arch_stack_walk(save_wchan, &pc, task, NULL);
>   	put_task_stack(task);
>   	return pc;
>   }
>   
> -noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> -		     struct task_struct *task, struct pt_regs *regs)
> +/*
> + * Frame-pointer-based kernel unwind infrastructure.
> + * Only compiled when CONFIG_FRAME_POINTER is enabled.
> + *
> + * See: arch/arm64/kernel/stacktrace.c for the reference implementation.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +
> +/*
> + * Per-cpu stacks are only accessible when unwinding the current task in a
> + * non-preemptible context.
> + */
> +#define STACKINFO_CPU(task, name)				\
> +	({							\
> +		(((task) == current) && !preemptible())		\
> +			? stackinfo_get_##name()		\
> +			: stackinfo_get_unknown();		\
> +	})
> +
> +enum kunwind_source {
> +	KUNWIND_SOURCE_UNKNOWN,
> +	KUNWIND_SOURCE_FRAME,
> +	KUNWIND_SOURCE_CALLER,
> +	KUNWIND_SOURCE_TASK,
> +	KUNWIND_SOURCE_REGS_PC,
> +};
> +
> +union unwind_flags {
> +	unsigned long	all;
> +	struct {
> +		unsigned long	fgraph : 1,
> +				kretprobe : 1;
> +	};
> +};
> +
> +/*
> + * Kernel unwind state
> + *
> + * @common:    Common unwind state.
> + * @task:      The task being unwound.
> + * @graph_idx: Used by ftrace_graph_ret_addr() for optimized stack unwinding.
> + * @kr_cur:    When KRETPROBES is selected, holds the kretprobe instance
> + *             associated with the most recently encountered replacement ra
> + *             value.
> + */
> +struct kunwind_state {
> +	struct unwind_state common;
> +	struct task_struct *task;
> +	int graph_idx;
> +#ifdef CONFIG_KRETPROBES
> +	struct llist_node *kr_cur;
> +#endif
> +	enum kunwind_source source;
> +	union unwind_flags flags;
> +	struct pt_regs *regs;
> +};
> +
> +static __always_inline void
> +kunwind_init(struct kunwind_state *state,
> +	     struct task_struct *task)
> +{
> +	unwind_init_common(&state->common);
> +	state->task = task;
> +	state->source = KUNWIND_SOURCE_UNKNOWN;
> +	state->flags.all = 0;
> +	state->regs = NULL;
> +}
> +
> +/*
> + * Start an unwind from a pt_regs.
> + *
> + * The unwind will begin at the PC within the regs.
> + *
> + * The regs must be on a stack currently owned by the calling task.
> + */
> +static __always_inline void
> +kunwind_init_from_regs(struct kunwind_state *state,
> +		       struct pt_regs *regs)
> +{
> +	kunwind_init(state, current);
> +
> +	state->regs = regs;
> +	state->common.fp = frame_pointer(regs);
> +	state->common.pc = instruction_pointer(regs);
> +	state->source = KUNWIND_SOURCE_REGS_PC;
> +}
> +
> +/*
> + * Start an unwind from a caller.
> + *
> + * The unwind will begin at the caller of whichever function this is inlined
> + * into.
> + *
> + * The function which invokes this must be noinline.
> + */
> +static __always_inline void
> +kunwind_init_from_caller(struct kunwind_state *state)
> +{
> +	unsigned long fp = (unsigned long)__builtin_frame_address(0);
> +	struct frame_record *record = (struct frame_record *)fp - 1;
> +
> +	kunwind_init(state, current);
> +
> +	state->common.fp = READ_ONCE(record->fp);
> +	state->common.pc = READ_ONCE(record->ra);
> +	state->source = KUNWIND_SOURCE_CALLER;
> +}
> +
> +/*
> + * Start an unwind from a blocked task.
> + *
> + * The unwind will begin at the blocked task's saved PC (i.e. the caller of
> + * __switch_to).
> + *
> + * The caller should ensure the task is blocked in __switch_to for the
> + * duration of the unwind, or the unwind will be bogus. It is never valid to
> + * call this for the current task.
> + */
> +static __always_inline void
> +kunwind_init_from_task(struct kunwind_state *state,
> +		       struct task_struct *task)
> +{
> +	kunwind_init(state, task);
> +
> +	state->common.fp = task->thread.s[0];
> +	state->common.pc = task->thread.ra;
> +	state->source = KUNWIND_SOURCE_TASK;
> +}
> +
> +static __always_inline int
> +kunwind_recover_return_address(struct kunwind_state *state)
> +{
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +	if (state->task->ret_stack &&
> +	    state->common.pc == (unsigned long)return_to_handler) {
> +		unsigned long orig_pc;
> +
> +		orig_pc = ftrace_graph_ret_addr(state->task, &state->graph_idx,
> +						state->common.pc,
> +						(void *)state->common.fp);
> +		if (state->common.pc == orig_pc) {
> +			WARN_ON_ONCE(state->task == current);
> +			return -EINVAL;
> +		}
> +		state->common.pc = orig_pc;
> +		state->flags.fgraph = 1;
> +	}
> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +#ifdef CONFIG_KRETPROBES
> +	if (is_kretprobe_trampoline(state->common.pc)) {
> +		unsigned long orig_pc;
> +
> +		orig_pc = kretprobe_find_ret_addr(state->task,
> +						  (void *)state->common.fp,
> +						  &state->kr_cur);
> +		if (!orig_pc)
> +			return -EINVAL;
> +		state->common.pc = orig_pc;
> +		state->flags.kretprobe = 1;
> +	}
> +#endif /* CONFIG_KRETPROBES */
> +
> +	return 0;
> +}
> +
> +/*
> + * When we reach an exception boundary marked by a metadata frame record,
> + * extract pt_regs from the stack and continue unwinding from the saved
> + * context (epc and s0/fp).
> + *
> + * On RISC-V, fp points above the metadata record, so the record's
> + * frame_record portion is at fp - sizeof(struct frame_record).
> + */
> +static __always_inline int
> +kunwind_next_regs_pc(struct kunwind_state *state)
> +{
> +	struct stack_info *info;
> +	unsigned long fp = state->common.fp;
> +	struct pt_regs *regs;
> +
> +	regs = container_of((unsigned long *)(fp - sizeof(struct frame_record)),
> +			    struct pt_regs, stackframe.record.fp);
> +
> +	info = unwind_find_stack(&state->common, (unsigned long)regs,
> +				 sizeof(*regs));
> +	if (!info)
> +		return -EINVAL;
> +
> +	unwind_consume_stack(&state->common, info, (unsigned long)regs,
> +			     sizeof(*regs));
> +
> +	state->regs = regs;
> +	state->common.pc = regs->epc;
> +	state->common.fp = frame_pointer(regs);
> +	state->regs = NULL;

state->regs is a dead field, and kunwind_next_regs_pc() clears
         it in a way that contradicts both arm64 and your own
         init_from_regs.

struct kunwind_state has a `struct pt_regs *regs`, but I can't find any
reader of it anywhere in the file — arch_kunwind_consume_entry() and
arch_reliable_kunwind_consume_entry() only ever read common.pc and
source. It is written in three places:

     kunwind_init():           state->regs = NULL;
     kunwind_init_from_regs():  state->regs = regs;     /* not cleared */
     kunwind_next_regs_pc():    state->regs = regs;
                                state->common.pc = regs->epc;
                                state->common.fp = frame_pointer(regs);
                                state->regs = NULL;      /* cleared! */

Two things:

   (a) The field has no consumer, so it's currently dead.

   (b) In kunwind_next_regs_pc() you set state->regs = regs and then
       immediately reset it to NULL two lines later. The arm64 reference
       does *not* clear it there, and your own kunwind_init_from_regs()
       leaves it set. So the three REGS_PC producers disagree on whether
       ->regs is valid.

It's harmless today only because nothing reads ->regs. But the moment
someone adds a consumer (e.g. to expose the pt_regs at an exception
boundary for a reliable dump), the stray `state->regs = NULL;` in
kunwind_next_regs_pc() becomes a silent bug.

Please either:
   - drop the field and all three writes, if it's genuinely unused, or
   - keep it and remove the `state->regs = NULL;` in
     kunwind_next_regs_pc() so it matches arm64 and init_from_regs.

> +	state->source = KUNWIND_SOURCE_REGS_PC;
> +	return 0;
> +}
> +
> +/*
> + * Handle a metadata frame record embedded in pt_regs.
> + *
> + * On RISC-V, fp points above the record (fp = metadata + 16), so the
> + * frame_record_meta starts at fp - sizeof(struct frame_record).
> + *
> + * FRAME_META_TYPE_FINAL: This is the outermost exception entry
> + *   (user -> kernel). Unwinding terminates successfully.
> + * FRAME_META_TYPE_PT_REGS: This is a nested exception entry
> + *   (kernel -> kernel). Continue unwinding from the saved context.
> + */
> +static __always_inline int
> +kunwind_next_frame_record_meta(struct kunwind_state *state)
> +{
> +	struct task_struct *tsk = state->task;
> +	unsigned long fp = state->common.fp;
> +	unsigned long meta_base = fp - sizeof(struct frame_record);
> +	struct frame_record_meta *meta;
> +	struct stack_info *info;
> +
> +	info = unwind_find_stack(&state->common, meta_base, sizeof(*meta));
> +	if (!info)
> +		return -EINVAL;
> +
> +	meta = (struct frame_record_meta *)meta_base;
> +	switch (READ_ONCE(meta->type)) {
> +	case FRAME_META_TYPE_FINAL:
> +		if (meta == &task_pt_regs(tsk)->stackframe)
> +			return -ENOENT;
> +		WARN_ON_ONCE(tsk == current);
> +		return -EINVAL;
> +	case FRAME_META_TYPE_PT_REGS:
> +		return kunwind_next_regs_pc(state);
> +	default:
> +		WARN_ON_ONCE(tsk == current);
> +		return -EINVAL;
> +	}
> +}
> +
> +/*
> + * Unwind from one frame record to the next.
> + *
> + * On RISC-V, the frame record sits at fp - sizeof(struct frame_record),
> + * immediately below the address pointed to by fp/s0. This applies to both
> + * normal frame records and metadata frame records (embedded in pt_regs).
> + *
> + * A metadata record is identified by both fp and ra being zero in the
> + * frame_record portion, with a type value following at fp + 16.
> + */
> +static __always_inline int
> +kunwind_next_frame_record(struct kunwind_state *state)
> +{
> +	unsigned long fp = state->common.fp;
> +	struct frame_record *record;
> +	struct stack_info *info;
> +	unsigned long new_fp, new_pc;
> +	unsigned long record_base;
> +
> +	if (fp & 0x7)
> +		return -EINVAL;
> +
> +	record_base = fp - sizeof(*record);
> +
> +	info = unwind_find_stack(&state->common, record_base, sizeof(*record));
> +	if (!info)
> +		return -EINVAL;
> +
> +	record = (struct frame_record *)record_base;
> +	new_fp = READ_ONCE(record->fp);
> +	new_pc = READ_ONCE(record->ra);
> +
> +	if (!new_fp && !new_pc)
> +		return kunwind_next_frame_record_meta(state);
> +
> +	unwind_consume_stack(&state->common, info, record_base,
> +			     sizeof(*record));
> +
> +	state->common.fp = new_fp;
> +	state->common.pc = new_pc;
> +	state->source = KUNWIND_SOURCE_FRAME;
> +
> +	return 0;
> +}
> +
> +/*
> + * Unwind from one frame record (A) to the next frame record (B).
> + *
> + * We terminate early if the location of B indicates a malformed chain of frame
> + * records (e.g. a cycle), determined based on the location and fp value of A
> + * and the location (but not the fp value) of B.
> + */
> +static __always_inline int
> +kunwind_next(struct kunwind_state *state)
> +{
> +	int err;
> +
> +	state->flags.all = 0;
> +
> +	switch (state->source) {
> +	case KUNWIND_SOURCE_FRAME:
> +	case KUNWIND_SOURCE_CALLER:
> +	case KUNWIND_SOURCE_TASK:
> +	case KUNWIND_SOURCE_REGS_PC:
> +		err = kunwind_next_frame_record(state);
> +		break;
> +	default:
> +		err = -EINVAL;
> +	}
> +
> +	if (err)
> +		return err;
> +
> +	return kunwind_recover_return_address(state);
> +}
> +
> +typedef bool (*kunwind_consume_fn)(const struct kunwind_state *state, void *cookie);
> +
> +static __always_inline int
> +do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state,
> +	   void *cookie)
> +{
> +	int ret;
> +
> +	ret = kunwind_recover_return_address(state);
> +	if (ret)
> +		return ret;
> +
> +	while (1) {
> +		if (!consume_state(state, cookie))
> +			return -EINVAL;
> +		ret = kunwind_next(state);
> +		if (ret == -ENOENT)
> +			return 0;
> +		if (ret < 0)
> +			return ret;
> +	}
> +}
> +
> +static __always_inline int
> +kunwind_stack_walk(kunwind_consume_fn consume_state,
> +		   void *cookie, struct task_struct *task,
> +		   struct pt_regs *regs)
> +{
> +	struct task_struct *tsk = task ?: current;
> +	struct stack_info stacks[] = {
> +		stackinfo_get_task(tsk),
> +		STACKINFO_CPU(tsk, irq),
> +#ifdef CONFIG_VMAP_STACK
> +		STACKINFO_CPU(tsk, overflow),
> +#endif
> +	};
> +	struct kunwind_state state = {
> +		.common = {
> +			.stacks = stacks,
> +			.nr_stacks = ARRAY_SIZE(stacks),
> +		},
> +	};
> +
> +	if (regs) {
> +		if (tsk != current)
> +			return -EINVAL;
> +		kunwind_init_from_regs(&state, regs);
> +	} else if (tsk == current) {
> +		kunwind_init_from_caller(&state);
> +	} else {
> +		kunwind_init_from_task(&state, tsk);
> +	}
> +
> +	return do_kunwind(&state, consume_state, cookie);
> +}
> +
> +struct kunwind_consume_entry_data {
> +	stack_trace_consume_fn consume_entry;
> +	void *cookie;
> +};
> +
> +static __always_inline bool
> +arch_kunwind_consume_entry(const struct kunwind_state *state, void *cookie)
> +{
> +	struct kunwind_consume_entry_data *data = cookie;
> +
> +	return data->consume_entry(data->cookie, state->common.pc);
> +}
> +
> +static __always_inline bool
> +arch_reliable_kunwind_consume_entry(const struct kunwind_state *state, void *cookie)
> +{
> +	/*
> +	 * At an exception boundary we can reliably consume the saved PC. We do
> +	 * not know whether the LR was live when the exception was taken, and

Nit: s/LR/ra/ here. RISC-V has no link register; the equivalent is the
return-address register ra (x1). You already localized this correctly in
the kr_cur docstring ("replacement ra value"), so this comment is just an
oversight.

Thanks.
Shuai


^ permalink raw reply

* Re: [syzbot] [trace?] KASAN: use-after-free Write in ring_buffer_read_page
From: Masami Hiramatsu @ 2026-06-03  1:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: syzbot, linux-kernel, linux-trace-kernel, mathieu.desnoyers,
	mhiramat, syzkaller-bugs
In-Reply-To: <20260602122829.4a91864f@gandalf.local.home>

On Tue, 2 Jun 2026 12:28:29 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 02 Jun 2026 06:45:31 -0700
> syzbot <syzbot+2dd9d02f60775ce5c1fb@syzkaller.appspotmail.com> wrote:
> 
> > syzbot found the following issue on:
> > 
> > HEAD commit:    e7ae89a0c97c Linux 7.1-rc5
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16f06e2e580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=58acee1ac5406016
> > dashboard link: https://syzkaller.appspot.com/bug?extid=2dd9d02f60775ce5c1fb
> > compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
> > 
> > Unfortunately, I don't have any reproducer for this issue yet.
> 
> Looks like the test was doing something really weird to trigger this.
> Without a reproducer, it's pretty much impossible to find out what
> happened. Maybe AI could do it?
> 

Does the "I don't have any reproducer for this issue yet." means
this is not reproducible even if it runs completely same sequence
in the console output? If so, might this be a timing related issue?
(e.g. read v.s. write-event)

Thanks,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v2 5/8] riscv: stacktrace: introduce stack-bound tracking helpers
From: Shuai Xue @ 2026-06-03  1:23 UTC (permalink / raw)
  To: Wang Han, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: Steven Rostedt, Alexandre Ghiti, Masami Hiramatsu, Mark Rutland,
	Catalin Marinas, Chen Pei, Andy Chiu, Björn Töpel,
	Deepak Gupta, Puranjay Mohan, Conor Dooley, Josh Poimboeuf,
	Jiri Kosina, Miroslav Benes, Petr Mladek, Joe Lawrence,
	Shuah Khan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, oliver.yang, zhuo.song, jkchen, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260528082310.1994388-6-wanghan@linux.alibaba.com>



On 5/28/26 4:23 PM, Wang Han wrote:
> A reliable unwinder needs to validate that every frame record it reads
> is fully contained in a known kernel stack, and it needs to refuse to
> walk back into a stack it has already left. Add the building blocks
> for that:
> 
>    * struct stack_info / struct unwind_state in a new
>      asm/stacktrace/common.h, modelled on the arm64 reference
>      implementation.
>    * stackinfo_get_irq() / stackinfo_get_task() / stackinfo_get_overflow()
>      plus the corresponding on_*_stack() predicates in asm/stacktrace.h,
>      so callers can ask "is this object on stack X?" by stack kind
>      rather than open-coded address arithmetic.
>    * unwind_init_common(), unwind_find_stack() and
>      unwind_consume_stack() helpers that enforce the
>      forward-progress-only invariant required for reliability.
> 
> No existing user is wired up to these helpers in this commit; the
> unwinder switch comes in a follow-up. The header changes leave
> on_thread_stack() with the same semantics as before, just expressed in
> terms of the new helpers.
> 
> Signed-off-by: Wang Han <wanghan@linux.alibaba.com>
> ---
>   arch/riscv/include/asm/stacktrace.h        |  65 ++++++++-
>   arch/riscv/include/asm/stacktrace/common.h | 159 +++++++++++++++++++++
>   2 files changed, 222 insertions(+), 2 deletions(-)
>   create mode 100644 arch/riscv/include/asm/stacktrace/common.h
> 
> diff --git a/arch/riscv/include/asm/stacktrace.h b/arch/riscv/include/asm/stacktrace.h
> index b1495a7e06ce..bc87c4940379 100644
> --- a/arch/riscv/include/asm/stacktrace.h
> +++ b/arch/riscv/include/asm/stacktrace.h
> @@ -3,8 +3,13 @@
>   #ifndef _ASM_RISCV_STACKTRACE_H
>   #define _ASM_RISCV_STACKTRACE_H
>   
> +#include <linux/percpu.h>
>   #include <linux/sched.h>
> +#include <linux/sched/task_stack.h>
> +
> +#include <asm/irq_stack.h>
>   #include <asm/ptrace.h>
> +#include <asm/stacktrace/common.h>
>   
>   struct stackframe {
>   	unsigned long fp;
> @@ -16,14 +21,70 @@ extern void notrace walk_stackframe(struct task_struct *task, struct pt_regs *re
>   extern void dump_backtrace(struct pt_regs *regs, struct task_struct *task,
>   			   const char *loglvl);
>   
> -static inline bool on_thread_stack(void)
> +/*
> + * IRQ stack accessors
> + */
> +static inline struct stack_info stackinfo_get_irq(void)
> +{
> +	unsigned long low = (unsigned long)raw_cpu_read(irq_stack_ptr);
> +	unsigned long high = low + IRQ_STACK_SIZE;
> +
> +	return (struct stack_info) {
> +		.low = low,
> +		.high = high,
> +	};
> +}
> +
> +static inline bool on_irq_stack(unsigned long sp, unsigned long size)
> +{
> +	struct stack_info info = stackinfo_get_irq();
> +
> +	return stackinfo_on_stack(&info, sp, size);
> +}
> +
> +/*
> + * Task stack accessors
> + */
> +static inline struct stack_info stackinfo_get_task(const struct task_struct *tsk)
>   {
> -	return !(((unsigned long)(current->stack) ^ current_stack_pointer) & ~(THREAD_SIZE - 1));
> +	unsigned long low = (unsigned long)task_stack_page(tsk);
> +	unsigned long high = low + THREAD_SIZE;
> +
> +	return (struct stack_info) {
> +		.low = low,
> +		.high = high,
> +	};
> +}
> +
> +static inline bool on_task_stack(const struct task_struct *tsk,
> +				 unsigned long sp, unsigned long size)
> +{
> +	struct stack_info info = stackinfo_get_task(tsk);
> +
> +	return stackinfo_on_stack(&info, sp, size);
>   }
>   
> +/*
> + * Cast is necessary since current->stack is an opaque ptr.
> + */
> +#define on_thread_stack()	(on_task_stack(current, current_stack_pointer, 1))
>   
> +/*
> + * Overflow stack accessors
> + */
>   #ifdef CONFIG_VMAP_STACK
>   DECLARE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack);
> +
> +static inline struct stack_info stackinfo_get_overflow(void)
> +{
> +	unsigned long low = (unsigned long)raw_cpu_ptr(overflow_stack);
> +	unsigned long high = low + OVERFLOW_STACK_SIZE;
> +
> +	return (struct stack_info) {
> +		.low = low,
> +		.high = high,
> +	};
> +}
>   #endif /* CONFIG_VMAP_STACK */
>   
>   #endif /* _ASM_RISCV_STACKTRACE_H */
> diff --git a/arch/riscv/include/asm/stacktrace/common.h b/arch/riscv/include/asm/stacktrace/common.h
> new file mode 100644
> index 000000000000..87d6d40672f3
> --- /dev/null
> +++ b/arch/riscv/include/asm/stacktrace/common.h
> @@ -0,0 +1,159 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * RISC-V common stack unwinder types and helpers.
> + *
> + * See: arch/arm64/include/asm/stacktrace/common.h for the reference
> + * implementation.
> + *
> + * Copyright (C) 2024

Nit: The new common.h carries "Copyright (C) 2024", but this is a 2026
submission.

Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>

Thanks.
Shuai


^ permalink raw reply

* Re: [PATCH v2] tracing: fix CFI violation in probestub test
From: Masami Hiramatsu @ 2026-06-03  1:20 UTC (permalink / raw)
  To: kernel test robot
  Cc: Eva Kurchatova, mhiramat, rostedt, oe-kbuild-all,
	linux-trace-kernel, linux-kernel, mathieu.desnoyers, peterz,
	jpoimboe, samitolvanen
In-Reply-To: <202606022312.7cKiQBmg-lkp@intel.com>

On Tue, 2 Jun 2026 23:40:51 +0200
kernel test robot <lkp@intel.com> wrote:

> Hi Eva,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on trace/for-next]
> [also build test ERROR on linus/master v6.16-rc1 next-20260602]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Eva-Kurchatova/tracing-fix-CFI-violation-in-probestub-test/20260602-222302
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace for-next
> patch link:    https://lore.kernel.org/r/20260602135425.542073-1-eva.kurchatova%40virtuozzo.com
> patch subject: [PATCH v2] tracing: fix CFI violation in probestub test
> config: x86_64-rhel-9.4-kselftests (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/config)
> compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202606022312.7cKiQBmg-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from drivers/dma-buf/sync_trace.h:10,
>                     from drivers/dma-buf/sw_sync.c:18:
>    include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~

Hmm, it seems that this macro is not defined in this build
configuration? Maybe we need:

#include <linux/cfi.h>

instead of asm/cfi.h?

Thanks,

>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
>       12 | TRACE_EVENT(sync_timeline,
>          | ^~~~~~~~~~~
>    include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
>       12 | TRACE_EVENT(sync_timeline,
>          | ^~~~~~~~~~~
> >> include/trace/../../drivers/dma-buf/sync_trace.h:13:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
>       13 |         TP_PROTO(struct sync_timeline *timeline),
>          |                         ^~~~~~~~~~~~~
>    include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
>      394 |         void __probestub_##_name(void *__data, proto)                   \
>          |                                                ^~~~~
>    include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |                                         ^~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |                            ^~~~~~
>    include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
>       12 | TRACE_EVENT(sync_timeline,
>          | ^~~~~~~~~~~
>    include/trace/../../drivers/dma-buf/sync_trace.h:13:9: note: in expansion of macro 'TP_PROTO'
>       13 |         TP_PROTO(struct sync_timeline *timeline),
>          |         ^~~~~~~~
> --
>    In file included from include/trace/events/lock.h:9,
>                     from kernel/locking/mutex.c:35:
>    include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
>       24 | TRACE_EVENT(lock_acquire,
>          | ^~~~~~~~~~~
>    include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
>       24 | TRACE_EVENT(lock_acquire,
>          | ^~~~~~~~~~~
> >> include/trace/events/lock.h:28:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
>       28 |                 struct lockdep_map *next_lock, unsigned long ip),
>          |                        ^~~~~~~~~~~
>    include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
>      394 |         void __probestub_##_name(void *__data, proto)                   \
>          |                                                ^~~~~
>    include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |                                         ^~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |                            ^~~~~~
>    include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
>       24 | TRACE_EVENT(lock_acquire,
>          | ^~~~~~~~~~~
>    include/trace/events/lock.h:26:9: note: in expansion of macro 'TP_PROTO'
>       26 |         TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
>          |         ^~~~~~~~
>    include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
>       61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
>       69 | DEFINE_EVENT(lock, lock_release,
>          | ^~~~~~~~~~~~
>    include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
>       61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
>       69 | DEFINE_EVENT(lock, lock_release,
>          | ^~~~~~~~~~~~
>    include/trace/events/lock.h:71:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
>       71 |         TP_PROTO(struct lockdep_map *lock, unsigned long ip),
>          |                         ^~~~~~~~~~~
>    include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
>      394 |         void __probestub_##_name(void *__data, proto)                   \
>          |                                                ^~~~~
>    include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |                                         ^~~~~~
>    include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
>       61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/define_trace.h:61:28: note: in expansion of macro 'PARAMS'
>       61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |                            ^~~~~~
>    include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
>       69 | DEFINE_EVENT(lock, lock_release,
>          | ^~~~~~~~~~~~
>    include/trace/events/lock.h:71:9: note: in expansion of macro 'TP_PROTO'
>       71 |         TP_PROTO(struct lockdep_map *lock, unsigned long ip),
>          |         ^~~~~~~~
>    include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
>       95 | TRACE_EVENT(contention_begin,
>          | ^~~~~~~~~~~
>    include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
>      403 |         CFI_NOSEAL(__probestub_##_name);                                \
>          |         ^~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
>       95 | TRACE_EVENT(contention_begin,
>          | ^~~~~~~~~~~
>    include/linux/tracepoint.h:380:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
>      380 |                 struct tracepoint_func *it_func_ptr;                    \
>          |                        ^~~~~~~~~~~~~~~
>    include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
>      424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
>          |         ^~~~~~~~~~~~~~~~~~
>    include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
>       28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
>          |         ^~~~~~~~~~~~
>    include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
>       95 | TRACE_EVENT(contention_begin,
> 
> 
> vim +13 include/trace/../../drivers/dma-buf/sync_trace.h
> 
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  11  
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  12  TRACE_EVENT(sync_timeline,
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28 @13  	TP_PROTO(struct sync_timeline *timeline),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  14  
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  15  	TP_ARGS(timeline),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  16  
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  17  	TP_STRUCT__entry(
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  18  			__string(name, timeline->name)
> 5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan         2016-05-31  19  			__field(u32, value)
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  20  	),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  21  
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  22  	TP_fast_assign(
> 2c92ca849fcc6ee drivers/dma-buf/sync_trace.h         Steven Rostedt (Google  2024-05-16  23) 			__assign_str(name);
> 5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan         2016-05-31  24  			__entry->value = timeline->value;
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  25  	),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  26  
> 5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan         2016-05-31  27  	TP_printk("name=%s value=%d", __get_str(name), __entry->value)
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  28  );
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  29  
> 
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH] tracing/events: Expand ring buffer for in-kernel event enables
From: manjunath.b.patil @ 2026-06-02 23:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260602090025.06ae761f@gandalf.local.home>


On 6/2/26 6:00 AM, Steven Rostedt wrote:
> On Mon,  1 Jun 2026 16:24:43 -0700
> Manjunath Patil <manjunath.b.patil@oracle.com> wrote:
> 
>> Ftrace keeps trace arrays at a boot-minimum ring-buffer size until
>> tracing is used. Tracefs event-enable paths already call
>> tracing_update_buffers() before enabling events, but the exported
>> in-kernel helpers trace_set_clr_event() and trace_array_set_clr_event()
>> directly enable events through __ftrace_set_clr_event().
>>
>> This can leave events enabled by in-kernel users recording into the tiny
>> boot-minimum buffer instead of the configured default-sized buffer. Any
>> caller that enables events through these exported helpers observes
>> different buffer-expansion behavior than a userspace tracefs event enable.
>>
>> Expand the relevant trace array before enabling events through the
>> exported in-kernel helpers, matching the tracefs event-enable behavior.
>> Disabling events remains unchanged.
> 
> The above explains everything correctly, but you left out what needs this?
> 
> Internal code should not be using the main ring buffer except for
> debugging, in which case you can use trace_printk(), which will cause the
> tracing buffers to be expanded by default.
> 
> Other areas of the kernel should create their own trace array which will be
> created expanded by default too.
> 
> -- Steve


Thanks Steve, that makes sense.

The concrete case I was trying to address is not an upstream in-tree 
user. It is a downstream UEK RDS compatibility path where the legacy 
  
             rds_rt_debug_bitmap module parameter is mapped to RDS 
tracepoints by calling trace_set_clr_event() against the global trace 
array during module init. That can leave the default RDS tracepoints 
recording into the boot-minimum buffer unless userspace expands tracing 
first.

Looking at mainline again, trace-array users are already created 
expanded by default, and the remaining global-buffer use is 
debugging-style. So I agree this generic tracing/events change does not 
have a good upstream justification.

Please drop this patch. We will handle the downstream RDS compatibility 
case downstream, or move it to an explicit userspace/trace-array setup.

Thanks,
Manjunath


^ permalink raw reply

* Re: [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages
From: Ackerley Tng @ 2026-06-02 22:41 UTC (permalink / raw)
  To: Suzuki K Poulose, aik, andrew.jones, binbin.wu, brauner,
	chao.p.peng, david, ira.weiny, jmattson, jthoughton, michael.roth,
	oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
	shivankg, steven.price, tabba, willy, wyihan, yan.y.zhao,
	forkloop, pratyush, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
  Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <144bbb9f-39a2-4c90-8903-51521e022da0@arm.com>

Suzuki K Poulose <suzuki.poulose@arm.com> writes:

>
> [...snip...]
>
>>> @@ -914,7 +916,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct
>>> kvm_memory_slot *slot,
>>>           folio_mark_uptodate(folio);
>>>       }
>>> -    r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>>> +    if (kvm_gmem_is_private_mem(inode, index))
>>
>> Don't we need to make sure the entire folio is private ? Not just the
>> page at the index ?
>>      if (kvm_gmem_range_is_private(, index, folio_nr_pages(folio)) ?

I was thinking to fix this when I do huge pages, for now guest_memfd is
always just PAGE_SIZE, so just looking up index is fine.

Is that okay?

>
> Or rather, we should go through the individual pages and apply the
> prepare for ones that are private ?
>
> Suzuki
>

IIRC the plan was to make kvm_gmem_prepare_folio() idempotent, as in, if
a page is already private, just skip. Currently sev_gmem_prepare() does
a pr_debug(), which I guess is technically still idempotent.

I'm thinking that the information tha needs tracking to make
.gmem_prepare() idempotent should be tracked by arch code.

Does this work for ARM CCA?

>>
>> [...snip...]
>>

^ permalink raw reply

* Re: [PATCH v2] tracing: fix CFI violation in probestub test
From: kernel test robot @ 2026-06-02 21:40 UTC (permalink / raw)
  To: Eva Kurchatova, mhiramat, rostedt
  Cc: oe-kbuild-all, linux-trace-kernel, linux-kernel,
	mathieu.desnoyers, peterz, jpoimboe, samitolvanen, eva.kurchatova
In-Reply-To: <20260602135425.542073-1-eva.kurchatova@virtuozzo.com>

Hi Eva,

kernel test robot noticed the following build errors:

[auto build test ERROR on trace/for-next]
[also build test ERROR on linus/master v6.16-rc1 next-20260602]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Eva-Kurchatova/tracing-fix-CFI-violation-in-probestub-test/20260602-222302
base:   https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace for-next
patch link:    https://lore.kernel.org/r/20260602135425.542073-1-eva.kurchatova%40virtuozzo.com
patch subject: [PATCH v2] tracing: fix CFI violation in probestub test
config: x86_64-rhel-9.4-kselftests (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606022312.7cKiQBmg-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/dma-buf/sync_trace.h:10,
                    from drivers/dma-buf/sw_sync.c:18:
   include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
     403 |         CFI_NOSEAL(__probestub_##_name);                                \
         |         ^~~~~~~~~~
   include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |         ^~~~~~~~~~~~~~~~~~
   include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
      12 | TRACE_EVENT(sync_timeline,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
     403 |         CFI_NOSEAL(__probestub_##_name);                                \
         |         ^~~~~~~~~~
   include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |         ^~~~~~~~~~~~~~~~~~
   include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
      12 | TRACE_EVENT(sync_timeline,
         | ^~~~~~~~~~~
>> include/trace/../../drivers/dma-buf/sync_trace.h:13:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
      13 |         TP_PROTO(struct sync_timeline *timeline),
         |                         ^~~~~~~~~~~~~
   include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
     394 |         void __probestub_##_name(void *__data, proto)                   \
         |                                                ^~~~~
   include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |                                         ^~~~~~
   include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                            ^~~~~~
   include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
      12 | TRACE_EVENT(sync_timeline,
         | ^~~~~~~~~~~
   include/trace/../../drivers/dma-buf/sync_trace.h:13:9: note: in expansion of macro 'TP_PROTO'
      13 |         TP_PROTO(struct sync_timeline *timeline),
         |         ^~~~~~~~
--
   In file included from include/trace/events/lock.h:9,
                    from kernel/locking/mutex.c:35:
   include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
     403 |         CFI_NOSEAL(__probestub_##_name);                                \
         |         ^~~~~~~~~~
   include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |         ^~~~~~~~~~~~~~~~~~
   include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
      24 | TRACE_EVENT(lock_acquire,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
     403 |         CFI_NOSEAL(__probestub_##_name);                                \
         |         ^~~~~~~~~~
   include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |         ^~~~~~~~~~~~~~~~~~
   include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
      24 | TRACE_EVENT(lock_acquire,
         | ^~~~~~~~~~~
>> include/trace/events/lock.h:28:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
      28 |                 struct lockdep_map *next_lock, unsigned long ip),
         |                        ^~~~~~~~~~~
   include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
     394 |         void __probestub_##_name(void *__data, proto)                   \
         |                                                ^~~~~
   include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |                                         ^~~~~~
   include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                            ^~~~~~
   include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
      24 | TRACE_EVENT(lock_acquire,
         | ^~~~~~~~~~~
   include/trace/events/lock.h:26:9: note: in expansion of macro 'TP_PROTO'
      26 |         TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
         |         ^~~~~~~~
   include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
     403 |         CFI_NOSEAL(__probestub_##_name);                                \
         |         ^~~~~~~~~~
   include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |         ^~~~~~~~~~~~~~~~~~
   include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
      61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
      69 | DEFINE_EVENT(lock, lock_release,
         | ^~~~~~~~~~~~
   include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
     403 |         CFI_NOSEAL(__probestub_##_name);                                \
         |         ^~~~~~~~~~
   include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |         ^~~~~~~~~~~~~~~~~~
   include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
      61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
      69 | DEFINE_EVENT(lock, lock_release,
         | ^~~~~~~~~~~~
   include/trace/events/lock.h:71:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
      71 |         TP_PROTO(struct lockdep_map *lock, unsigned long ip),
         |                         ^~~~~~~~~~~
   include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
     394 |         void __probestub_##_name(void *__data, proto)                   \
         |                                                ^~~~~
   include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |                                         ^~~~~~
   include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
      61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/define_trace.h:61:28: note: in expansion of macro 'PARAMS'
      61 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                            ^~~~~~
   include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
      69 | DEFINE_EVENT(lock, lock_release,
         | ^~~~~~~~~~~~
   include/trace/events/lock.h:71:9: note: in expansion of macro 'TP_PROTO'
      71 |         TP_PROTO(struct lockdep_map *lock, unsigned long ip),
         |         ^~~~~~~~
   include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
     403 |         CFI_NOSEAL(__probestub_##_name);                                \
         |         ^~~~~~~~~~
   include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |         ^~~~~~~~~~~~~~~~~~
   include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
      95 | TRACE_EVENT(contention_begin,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
     403 |         CFI_NOSEAL(__probestub_##_name);                                \
         |         ^~~~~~~~~~
   include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |         ^~~~~~~~~~~~~~~~~~
   include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
      95 | TRACE_EVENT(contention_begin,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:380:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
     380 |                 struct tracepoint_func *it_func_ptr;                    \
         |                        ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
     424 |         __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
         |         ^~~~~~~~~~~~~~~~~~
   include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
      28 |         DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
         |         ^~~~~~~~~~~~
   include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
      95 | TRACE_EVENT(contention_begin,


vim +13 include/trace/../../drivers/dma-buf/sync_trace.h

b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  11  
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  12  TRACE_EVENT(sync_timeline,
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28 @13  	TP_PROTO(struct sync_timeline *timeline),
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  14  
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  15  	TP_ARGS(timeline),
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  16  
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  17  	TP_STRUCT__entry(
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  18  			__string(name, timeline->name)
5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan         2016-05-31  19  			__field(u32, value)
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  20  	),
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  21  
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  22  	TP_fast_assign(
2c92ca849fcc6ee drivers/dma-buf/sync_trace.h         Steven Rostedt (Google  2024-05-16  23) 			__assign_str(name);
5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan         2016-05-31  24  			__entry->value = timeline->value;
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  25  	),
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  26  
5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan         2016-05-31  27  	TP_printk("name=%s value=%d", __get_str(name), __entry->value)
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  28  );
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling            2013-02-28  29  

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v7 34/42] KVM: selftests: Test conversion with elevated page refcount
From: Askar Safin @ 2026-06-02 21:26 UTC (permalink / raw)
  To: devnull+ackerleytng.google.com
  Cc: ackerleytng, aik, akpm, andrew.jones, aneesh.kumar, axelrasmussen,
	baohua, bhe, binbin.wu, bp, brauner, chao.p.peng, chrisl, corbet,
	dave.hansen, david, forkloop, hpa, ira.weiny, jgg, jmattson,
	jthoughton, kas, kasong, kvm, liam, linux-coco, linux-doc,
	linux-kernel, linux-kselftest, linux-mm, linux-trace-kernel,
	mathieu.desnoyers, mhiramat, michael.roth, mingo, nphamcs, oupton,
	pankaj.gupta, pbonzini, pratyush, qi.zheng, qperret,
	rick.p.edgecombe, rientjes, rostedt, seanjc, shakeel.butt,
	shikemeng, shivankg, shuah, skhan, steven.price, suzuki.poulose,
	tabba, tglx, vannapurve, vbabka, weixugc, willy, wyihan, x86,
	yan.y.zhao, youngjun.park, yuanchu
In-Reply-To: <20260522-gmem-inplace-conversion-v7-34-2f0fae496530@google.com>

Ackerley Tng via B4 Relay <devnull+ackerleytng.google.com@kernel.org>:
> This test uses vmsplice to increment the refcount of a specific page

I recently submitted a patch, which makes vmsplice equivalent to
preadv2/pwritev2, and it was accepted to next.

For now it is just an experiment, it is possible it will be reverted.

https://lore.kernel.org/all/20260601-aufweichen-dissens-ausrechnen-0d9b84728113@brauner/

-- 
Askar Safin

^ permalink raw reply

* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Andy Shevchenko @ 2026-06-02 21:05 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Arnd Bergmann, Steven Rostedt, Masami Hiramatsu, Andrew Morton,
	Petr Mladek, Nathan Chancellor, Dennis Dalessandro,
	Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
	Miri Korenblit, Mathieu Desnoyers, Rasmus Villemoes,
	Sergey Senozhatsky, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <35c1ba62-e74d-4abc-aa73-ccd35968ff89@app.fastmail.com>

On Tue, Jun 02, 2026 at 10:32:04PM +0200, Arnd Bergmann wrote:
> On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
> > On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:

...

> > Why the __printf() annotation is in the C file and not here?
> > Is this all about headers as the second paragraph in the commit message 
> > explains?
> > I would add a comment to explain it here, otherwise we might see false 
> > patches to "make things consistent" in a wrong way.
> 
> I've tried to come up with a kerneldoc comment now, similar to
> the one for the vsnprintf() function, and added a separate prototype
> in the header. Does this address your concern?

Yes, see one nit, though.

> -int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> +/**
> + * __vsnprintf - vsnprintf() wrapper without __printf() attribute
> + * @buf: The buffer to place the result into
> + * @size: The size of the buffer, including the trailing null space
> + * @fmt_str: The format string to use
> + * @args: Arguments for the format string
> + *
> + * This has the exact same behavior as vsnprintf() but can be used in call
> + * sites that are missing a __printf() annotation, e.g. because they
> + * get a 'va_format' argument instead of format and varargs.
> + *
> + * For this to work, the attribute is added to the declaration here but
> + * not in the header.

+ *
+ * Return: ...

> + */
> +int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args);

> +
> +int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)

Something slipped here...

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages
From: Ackerley Tng @ 2026-06-02 20:46 UTC (permalink / raw)
  To: Suzuki K Poulose, aik, andrew.jones, binbin.wu, brauner,
	chao.p.peng, david, ira.weiny, jmattson, jthoughton, michael.roth,
	oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
	shivankg, steven.price, tabba, willy, wyihan, yan.y.zhao,
	forkloop, pratyush, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
  Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <d01cf1ec-b85d-4af6-9810-8107c0e2a4ec@arm.com>

Suzuki K Poulose <suzuki.poulose@arm.com> writes:

> On 23/05/2026 01:17, Ackerley Tng via B4 Relay wrote:
>> From: Ackerley Tng <ackerleytng@google.com>
>>
>> All-shared guest_memfd used to be only supported for non-CoCo VMs where
>> preparation doesn't apply. INIT_SHARED is about to be supported for
>> non-CoCo VMs in a later patch in this series.
>
> nit: s/non-CoCo/CoCo ?
>

Yes, thanks!

>>
>> In addition, KVM_SET_MEMORY_ATTRIBUTES2 is about to be supported in
>> guest_memfd in a later patch in this series.
>>
>> This means that the kvm fault handler may now call kvm_gmem_get_pfn() on a
>> shared folio for a CoCo VM where preparation applies.
>>
>> Add a check to make sure that preparation is only performed for private
>> folios.
>>
>> Preparation will be undone on freeing (see kvm_gmem_free_folio()) and on
>> conversion to shared.
>>
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>
> nit: Missing Co-Developed-by: ?
>

IIRC this should have been

Suggested-by: Michael Roth <michael.roth@amd.com>

IIRC Michael suggested this on one of the guest_memfd calls, Michael
please let me know if you remember otherwise!

>>
>> [...snip...]
>>

^ permalink raw reply

* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Arnd Bergmann @ 2026-06-02 20:32 UTC (permalink / raw)
  To: Andy Shevchenko, Arnd Bergmann
  Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Petr Mladek,
	Nathan Chancellor, Dennis Dalessandro, Jason Gunthorpe,
	Leon Romanovsky, Arend van Spriel, Miri Korenblit,
	Mathieu Desnoyers, Rasmus Villemoes, Sergey Senozhatsky,
	Nick Desaulniers, Bill Wendling, Justin Stitt,
	Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <ah8n-Nk305S5hRwN@ashevche-desk.local>

On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
> On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
>> 
>> A number of tracing headers turn off -Wsuggest-attribute=format for
>> gcc, but they don't turn it off for clang, so the same warning still
>> happens on new versions of clang that support the format attribute.
>> 
>> To avoid duplicating the same thing in each tracing header, as well
>> as changing all of them to also turn it off for clang, add a new
>> __vsnprintf() helper that is not annotated this way in linux/sprintf.h
>> but is defined to work the same way as the regular vsprintf.
>
> vsprintf()

Fixed now

> Why the __printf() annotation is in the C file and not here?
> Is this all about headers as the second paragraph in the commit message 
> explains?
> I would add a comment to explain it here, otherwise we might see false 
> patches to "make things consistent" in a wrong way.

I've tried to come up with a kerneldoc comment now, similar to
the one for the vsnprintf() function, and added a separate prototype
in the header. Does this address your concern?

      Arnd

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 3caf0796f54d..7c696aea2ed3 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2975,7 +2975,23 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
 }
 EXPORT_SYMBOL(vsnprintf);
 
-int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
+/**
+ * __vsnprintf - vsnprintf() wrapper without __printf() attribute
+ * @buf: The buffer to place the result into
+ * @size: The size of the buffer, including the trailing null space
+ * @fmt_str: The format string to use
+ * @args: Arguments for the format string
+ *
+ * This has the exact same behavior as vsnprintf() but can be used in call
+ * sites that are missing a __printf() annotation, e.g. because they
+ * get a 'va_format' argument instead of format and varargs.
+ *
+ * For this to work, the attribute is added to the declaration here but
+ * not in the header.
+ */
+int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args);
+
+int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
 {
 	return vsnprintf(buf, size, fmt_str, args);
 }

^ permalink raw reply related

* Re: [PATCH] tracing: Reject tracefs buffer size values that overflow bytes
From: Steven Rostedt @ 2026-06-02 20:03 UTC (permalink / raw)
  To: Sam Moelius
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260602184335.1554470-1-sam.moelius@trailofbits.com>

On Tue,  2 Jun 2026 18:43:34 +0000
Sam Moelius <sam.moelius@trailofbits.com> wrote:

> From: Samuel Moelius <sam.moelius@trailofbits.com>
> 
> `tracing_entries_write()` accepts a `buffer_size_kb` value as
> `unsigned long`, checks only for zero, then shifts left by 10. On
> 64-bit, writing `18014398509481984` KB wraps the byte count to zero
> and the ring buffer resize path accepts it as a tiny buffer instead
> of rejecting an impossible huge size.
> 
> The fix also adds the same pre-scale overflow check to
> `buffer_subbuf_size_write()`.

Honestly, enter stupid values, get stupid results.

I don't think this is necessary. Nothing breaks but the person may get
confused by being confused by the confusing entries they make.

-- Steve

^ permalink raw reply

* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Andy Shevchenko @ 2026-06-02 18:59 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Petr Mladek,
	Nathan Chancellor, Arnd Bergmann, Dennis Dalessandro,
	Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
	Miri Korenblit, Mathieu Desnoyers, Rasmus Villemoes,
	Sergey Senozhatsky, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Vlastimil Babka, linux-rdma, linux-kernel, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <20260602150904.2258624-1-arnd@kernel.org>

On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
> 
> A number of tracing headers turn off -Wsuggest-attribute=format for
> gcc, but they don't turn it off for clang, so the same warning still
> happens on new versions of clang that support the format attribute.
> 
> To avoid duplicating the same thing in each tracing header, as well
> as changing all of them to also turn it off for clang, add a new
> __vsnprintf() helper that is not annotated this way in linux/sprintf.h
> but is defined to work the same way as the regular vsprintf.

vsprintf()

> Aside from tracing, the same thing can be used in va_format(),
> which is part of lib/vsprintf.c itself.

...

> --- a/include/linux/sprintf.h
> +++ b/include/linux/sprintf.h
> @@ -12,6 +12,7 @@ __printf(2, 3) int sprintf(char *buf, const char * fmt, ...);
>  __printf(2, 0) int vsprintf(char *buf, const char *, va_list);
>  __printf(3, 4) int snprintf(char *buf, size_t size, const char *fmt, ...);
>  __printf(3, 0) int vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
> +int __vsnprintf(char *buf, size_t size, const char *fmt, va_list args);

Why the __printf() annotation is in the C file and not here?
Is this all about headers as the second paragraph in the commit message explains?
I would add a comment to explain it here, otherwise we might see false patches to
"make things consistent" in a wrong way.

>  __printf(3, 4) int scnprintf(char *buf, size_t size, const char *fmt, ...);
>  __printf(3, 0) int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
>  __printf(2, 3) __malloc char *kasprintf(gfp_t gfp, const char *fmt, ...);

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* [PATCH] tracing: Reject tracefs buffer size values that overflow bytes
From: Sam Moelius @ 2026-06-02 18:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Samuel Moelius, Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel

From: Samuel Moelius <sam.moelius@trailofbits.com>

`tracing_entries_write()` accepts a `buffer_size_kb` value as
`unsigned long`, checks only for zero, then shifts left by 10. On
64-bit, writing `18014398509481984` KB wraps the byte count to zero
and the ring buffer resize path accepts it as a tiny buffer instead
of rejecting an impossible huge size.

The fix also adds the same pre-scale overflow check to
`buffer_subbuf_size_write()`.

Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com>
---
 kernel/trace/trace.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6eb4d3097a4d..79da29c3d525 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5735,7 +5735,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
 		return ret;
 
 	/* must have at least 1 entry */
-	if (!val)
+	if (!val || val > ULONG_MAX >> 10)
 		return -EINVAL;
 
 	/* value is in KB */
@@ -8206,6 +8206,9 @@ buffer_subbuf_size_write(struct file *filp, const char __user *ubuf,
 	if (ret)
 		return ret;
 
+	if (!val || val > ULONG_MAX / 1024)
+		return -EINVAL;
+
 	val *= 1024; /* value passed in is in KB */
 
 	pages = DIV_ROUND_UP(val, PAGE_SIZE);
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Nico Pache @ 2026-06-02 17:26 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
	mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
	Usama Arif, usamaarif642
In-Reply-To: <19639b08-5bf1-4974-9635-c458d512fa38@redhat.com>

On Tue, Jun 2, 2026 at 11:22 AM Nico Pache <npache@redhat.com> wrote:
>
>
>
> On 6/1/26 7:15 AM, David Hildenbrand (Arm) wrote:
> >>>
> >>> Reading this, it is unclear why exactly do we need the stack.
> >>
> >> So I looked into your items below. It seems logical, and I think it
> >> works the same way; however, your method seems slightly harder to
> >> understand due to all the edge cases and more error-prone to future
> >> changes (the stack holds implicit knowledge of the offset/order that
> >> must now be tracked in the edge cases).
> >>
> >> Given the stack is 24 bytes, I'm not sure if the extra complexity is
> >> worth saving that small amount of memory. Although we would also be
> >> getting rid of (3?) functions, so both approaches have pros and cons.
> >
> > I consider a simple forward loop over the offset ... less complexity compared to
> > a stack structure :)
> >
> >>
> >> I will implement a patch comparing your solution against mine and send
> >> it here, then we can decide which approach is better.
> >
> > Right, throw it over the fence and I'll see how to improve it further.
>
> Ok heres what the diff looks like on top of my V19.
>
> you can access the tree here https://gitlab.com/npache/linux/-/commits/mthp-v19?ref_type=heads for easier review.
>
> So far I have no problem with this approach it appeared cleaner than i thought. Did some light testing. Gonna throw it more through the ringer tomorrow.

not sure why this didnt send with the proper encoding I guess my email
is still a little screwed up

>
>
> From 9496c5d17eba7f6d04820d78c7c6f1592a58888a Mon Sep 17 00:00:00 2001
> From: Nico Pache <npache@redhat.com>
> Date: Tue, 2 Jun 2026 10:26:18 -0600
> Subject: [PATCH] convert from stack to forward loop
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/khugepaged.c | 96 ++++++++-----------------------------------------
>  1 file changed, 15 insertions(+), 81 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 498eba009751..6de935e76ceb 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -100,28 +100,6 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
>  static struct kmem_cache *mm_slot_cache __ro_after_init;
>
>  #define KHUGEPAGED_MIN_MTHP_ORDER      2
> -/*
> - * mthp_collapse() does an iterative DFS over a binary tree, from
> - * HPAGE_PMD_ORDER down to KHUGEPAGED_MIN_MTHP_ORDER. The max stack
> - * size needed for a DFS on a binary tree is height + 1, where
> - * height = HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER.
> - *
> - * ilog2 is used in place of HPAGE_PMD_ORDER because some architectures
> - * (e.g. ppc64le) do not define HPAGE_PMD_ORDER until after build time.
> - */
> -#define MTHP_STACK_SIZE        (ilog2(MAX_PTRS_PER_PTE) - KHUGEPAGED_MIN_MTHP_ORDER + 1)
> -
> -/*
> - * Defines a range of PTE entries in a PTE page table which are being
> - * considered for mTHP collapse.
> - *
> - * @offset: the offset of the first PTE entry in a PMD range.
> - * @order: the order of the PTE entries being considered for collapse.
> - */
> -struct mthp_range {
> -       u16 offset;
> -       u8 order;
> -};
>
>  struct collapse_control {
>         bool is_khugepaged;
> @@ -137,7 +115,6 @@ struct collapse_control {
>
>         /* Each bit represents a single occupied (!none/zero) page. */
>         DECLARE_BITMAP(mthp_present_ptes, MAX_PTRS_PER_PTE);
> -       struct mthp_range mthp_bitmap_stack[MTHP_STACK_SIZE];
>  };
>
>  /**
> @@ -1458,50 +1435,14 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long s
>         return result;
>  }
>
> -static void collapse_mthp_stack_push(struct collapse_control *cc, int *stack_size,
> -                                    u16 offset, u8 order)
> -{
> -       const int size = *stack_size;
> -       struct mthp_range *stack = &cc->mthp_bitmap_stack[size];
> -
> -       VM_WARN_ON_ONCE(size >= MTHP_STACK_SIZE);
> -       stack->order = order;
> -       stack->offset = offset;
> -       (*stack_size)++;
> -}
> -
> -static struct mthp_range collapse_mthp_stack_pop(struct collapse_control *cc,
> -                                                int *stack_size)
> -{
> -       const int size = *stack_size;
> -
> -       VM_WARN_ON_ONCE(size <= 0);
> -       (*stack_size)--;
> -       return cc->mthp_bitmap_stack[size - 1];
> -}
> -
>  /*
>   * mthp_collapse() consumes the bitmap that is generated during
>   * collapse_scan_pmd() to determine what regions and mTHP orders fit best.
>   *
>   * Each bit in cc->mthp_present_ptes represents a single occupied (!none/zero)
> - * page. A stack structure cc->mthp_bitmap_stack is used to check different
> - * regions of the bitmap for collapse eligibility. The stack maintains a pair
> - * of variables (offset, order), indicating the number of PTEs from the start
> - * of the PMD, and the order of the potential collapse candidate respectively.
> - * We start at the PMD order and check if it is eligible for collapse; if not,
> - * we add two entries to the stack at a lower order to represent the left and
> - * right halves of the PTE page table we are examining.
> - *
> - *                         offset       mid_offset
> - *                         |         |
> - *                         |         |
> - *                         v         v
> - *      --------------------------------------
> - *      |       cc->mthp_present_ptes         |
> - *      --------------------------------------
> - *                         <-------><------->
> - *                          order-1  order-1
> + * page. We start at the PMD order and check if it is eligible for collapse;
> + * if not, we check the left and right halves of the PTE page table we are
> + * examining at a lower order.
>   *
>   * For each of these, we determine how many PTE entries are occupied in the
>   * range of PTE entries we propose to collapse, then we compare this to a
> @@ -1517,26 +1458,20 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
>  {
>         unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
>         enum scan_result last_result = SCAN_FAIL;
> -       int collapsed = 0, stack_size = 0;
> +       int collapsed = 0;
>         bool alloc_failed = false;
>         unsigned long collapse_address;
> -       struct mthp_range range;
> -       u16 offset;
> -       u8 order;
> +       unsigned int offset = 0;
> +       unsigned int order = HPAGE_PMD_ORDER;
>
> -       collapse_mthp_stack_push(cc, &stack_size, 0, HPAGE_PMD_ORDER);
>
> -       while (stack_size) {
> -               range = collapse_mthp_stack_pop(cc, &stack_size);
> -               order = range.order;
> -               offset = range.offset;
> +       while (offset < HPAGE_PMD_NR) {
>                 nr_ptes = 1UL << order;
>
>                 if (!test_bit(order, &enabled_orders))
>                         goto next_order;
>
>                 max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
> -
>                 nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
>                                                       offset + nr_ptes);
>
> @@ -1553,7 +1488,7 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
>                                 collapsed += nr_ptes;
>                                 fallthrough;
>                         case SCAN_PTE_MAPPED_HUGEPAGE:
> -                               continue;
> +                               goto next_offset;
>                         /* Cases where lower orders might still succeed */
>                         case SCAN_ALLOC_HUGE_PAGE_FAIL:
>                                 alloc_failed = true;
> @@ -1581,15 +1516,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
>                 }
>
>  next_order:
> -               if ((BIT(order) - 1) & enabled_orders) {
> -                       const u8 next_order = order - 1;
> -                       const u16 mid_offset = offset + (nr_ptes / 2);
> -
> -                       collapse_mthp_stack_push(cc, &stack_size, mid_offset,
> -                                                next_order);
> -                       collapse_mthp_stack_push(cc, &stack_size, offset,
> -                                                next_order);
> +               if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
> +                       (BIT(order) - 1) & enabled_orders) {
> +                       order = order - 1;
> +                       continue;
>                 }
> +next_offset:
> +               offset += nr_ptes;
> +               order = min_t(int, __ffs(offset), HPAGE_PMD_ORDER);
>         }
>  done:
>         if (collapsed)
> --
> 2.54.0
>
>
>
> >
> > [...]
> >
> >>>> +     bitmap_zero(cc->mthp_bitmap, MAX_PTRS_PER_PTE);
> >>>>       memset(cc->node_load, 0, sizeof(cc->node_load));
> >>>>       nodes_clear(cc->alloc_nmask);
> >>>> +
> >>>> +     enabled_orders = collapse_allowable_orders(vma, vma->vm_flags, tva_flags);
> >>>> +
> >>>> +     /*
> >>>> +      * If PMD is the only enabled order, enforce max_ptes_none, otherwise
> >>>> +      * scan all pages to populate the bitmap for mTHP collapse.
> >>>> +      */
> >>>
> >>> You should note here, that we re-verify in mthp_collapse().
> >>>
> >>> But the question is, whether we should relocate the check completely into
> >>> mthp_collapse(), instead of conditionally duplicating it.
> >>>
> >>> What speaks against always populating the bitmap and making the decision in
> >>> mthp_collapse()?
> >>>
> >>> Sure, we might scan a page table a bit longer, but the code gets clearer ... and
> >>> I am not sure if scanning some more page table entries is really that critical here.
> >>
> >> Someone asked me to preserve the legacy behavior (PMD only). Although
> >> rather trivial, if you set max_ptes_none=0 for example, we'd still
> >> have to do 511 iterations for no reason if PMD collapse is the only
> >> enabled order rather than bailing immediately.
> >>
> >> I'm ok with dropping it, but I think its the correct approach (despite
> >> the extra complexity). @Usama Arif brought up this point here
> >> https://lore.kernel.org/all/f8f7bb71-ca31-46ee-a62d-7ddfd83e0ead@gmail.com/
> >
> > We talk about regressions, but I am not sure if we care about scanning speed
> > within a page table that much?
> >
> > After all, we locked it and already read some entries.
> >
> > Having the same check at two places to optimize for PMD order might right now
> > feel like a good optimization, but likely an irrelevant one in a near future?
> >
> > Anyhow, won't push back, as long as we document why we are special casing things
> > here.
> >
>


^ permalink raw reply

* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Nico Pache @ 2026-06-02 17:23 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
	mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
	Usama Arif, usamaarif642
In-Reply-To: <d3c2b00c-6810-434a-b837-0707b0a11611@kernel.org>



On 6/1/26 7:15 AM, David Hildenbrand (Arm) wrote:
>>>
>>> Reading this, it is unclear why exactly do we need the stack.
>>
>> So I looked into your items below. It seems logical, and I think it
>> works the same way; however, your method seems slightly harder to
>> understand due to all the edge cases and more error-prone to future
>> changes (the stack holds implicit knowledge of the offset/order that
>> must now be tracked in the edge cases).
>>
>> Given the stack is 24 bytes, I'm not sure if the extra complexity is
>> worth saving that small amount of memory. Although we would also be
>> getting rid of (3?) functions, so both approaches have pros and cons.
> 
> I consider a simple forward loop over the offset ... less complexity compared to
> a stack structure :)
> 
>>
>> I will implement a patch comparing your solution against mine and send
>> it here, then we can decide which approach is better.
> 
> Right, throw it over the fence and I'll see how to improve it further.

Ok heres what the diff looks like on top of my V19. 

you can access the tree here https://gitlab.com/npache/linux/-/commits/mthp-v19?ref_type=heads for easier review.

So far I have no problem with this approach it appeared cleaner than i thought. Did some light testing. Gonna throw it more through the ringer tomorrow. 


From 9496c5d17eba7f6d04820d78c7c6f1592a58888a Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Tue, 2 Jun 2026 10:26:18 -0600
Subject: [PATCH] convert from stack to forward loop

Signed-off-by: Nico Pache <npache@redhat.com>
---
 mm/khugepaged.c | 96 ++++++++-----------------------------------------
 1 file changed, 15 insertions(+), 81 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 498eba009751..6de935e76ceb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -100,28 +100,6 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
 static struct kmem_cache *mm_slot_cache __ro_after_init;
 
 #define KHUGEPAGED_MIN_MTHP_ORDER	2
-/*
- * mthp_collapse() does an iterative DFS over a binary tree, from
- * HPAGE_PMD_ORDER down to KHUGEPAGED_MIN_MTHP_ORDER. The max stack
- * size needed for a DFS on a binary tree is height + 1, where
- * height = HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER.
- *
- * ilog2 is used in place of HPAGE_PMD_ORDER because some architectures
- * (e.g. ppc64le) do not define HPAGE_PMD_ORDER until after build time.
- */
-#define MTHP_STACK_SIZE	(ilog2(MAX_PTRS_PER_PTE) - KHUGEPAGED_MIN_MTHP_ORDER + 1)
-
-/*
- * Defines a range of PTE entries in a PTE page table which are being
- * considered for mTHP collapse.
- *
- * @offset: the offset of the first PTE entry in a PMD range.
- * @order: the order of the PTE entries being considered for collapse.
- */
-struct mthp_range {
-	u16 offset;
-	u8 order;
-};
 
 struct collapse_control {
 	bool is_khugepaged;
@@ -137,7 +115,6 @@ struct collapse_control {
 
 	/* Each bit represents a single occupied (!none/zero) page. */
 	DECLARE_BITMAP(mthp_present_ptes, MAX_PTRS_PER_PTE);
-	struct mthp_range mthp_bitmap_stack[MTHP_STACK_SIZE];
 };
 
 /**
@@ -1458,50 +1435,14 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long s
 	return result;
 }
 
-static void collapse_mthp_stack_push(struct collapse_control *cc, int *stack_size,
-				     u16 offset, u8 order)
-{
-	const int size = *stack_size;
-	struct mthp_range *stack = &cc->mthp_bitmap_stack[size];
-
-	VM_WARN_ON_ONCE(size >= MTHP_STACK_SIZE);
-	stack->order = order;
-	stack->offset = offset;
-	(*stack_size)++;
-}
-
-static struct mthp_range collapse_mthp_stack_pop(struct collapse_control *cc,
-						 int *stack_size)
-{
-	const int size = *stack_size;
-
-	VM_WARN_ON_ONCE(size <= 0);
-	(*stack_size)--;
-	return cc->mthp_bitmap_stack[size - 1];
-}
-
 /*
  * mthp_collapse() consumes the bitmap that is generated during
  * collapse_scan_pmd() to determine what regions and mTHP orders fit best.
  *
  * Each bit in cc->mthp_present_ptes represents a single occupied (!none/zero)
- * page. A stack structure cc->mthp_bitmap_stack is used to check different
- * regions of the bitmap for collapse eligibility. The stack maintains a pair
- * of variables (offset, order), indicating the number of PTEs from the start
- * of the PMD, and the order of the potential collapse candidate respectively.
- * We start at the PMD order and check if it is eligible for collapse; if not,
- * we add two entries to the stack at a lower order to represent the left and
- * right halves of the PTE page table we are examining.
- *
- *                         offset       mid_offset
- *                         |         |
- *                         |         |
- *                         v         v
- *      --------------------------------------
- *      |       cc->mthp_present_ptes         |
- *      --------------------------------------
- *                         <-------><------->
- *                          order-1  order-1
+ * page. We start at the PMD order and check if it is eligible for collapse;
+ * if not, we check the left and right halves of the PTE page table we are
+ * examining at a lower order.
  *
  * For each of these, we determine how many PTE entries are occupied in the
  * range of PTE entries we propose to collapse, then we compare this to a
@@ -1517,26 +1458,20 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
 {
 	unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
 	enum scan_result last_result = SCAN_FAIL;
-	int collapsed = 0, stack_size = 0;
+	int collapsed = 0;
 	bool alloc_failed = false;
 	unsigned long collapse_address;
-	struct mthp_range range;
-	u16 offset;
-	u8 order;
+	unsigned int offset = 0;
+	unsigned int order = HPAGE_PMD_ORDER;
 
-	collapse_mthp_stack_push(cc, &stack_size, 0, HPAGE_PMD_ORDER);
 
-	while (stack_size) {
-		range = collapse_mthp_stack_pop(cc, &stack_size);
-		order = range.order;
-		offset = range.offset;
+	while (offset < HPAGE_PMD_NR) {
 		nr_ptes = 1UL << order;
 
 		if (!test_bit(order, &enabled_orders))
 			goto next_order;
 
 		max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
-
 		nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
 						      offset + nr_ptes);
 
@@ -1553,7 +1488,7 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
 				collapsed += nr_ptes;
 				fallthrough;
 			case SCAN_PTE_MAPPED_HUGEPAGE:
-				continue;
+				goto next_offset;
 			/* Cases where lower orders might still succeed */
 			case SCAN_ALLOC_HUGE_PAGE_FAIL:
 				alloc_failed = true;
@@ -1581,15 +1516,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
 		}
 
 next_order:
-		if ((BIT(order) - 1) & enabled_orders) {
-			const u8 next_order = order - 1;
-			const u16 mid_offset = offset + (nr_ptes / 2);
-
-			collapse_mthp_stack_push(cc, &stack_size, mid_offset,
-						 next_order);
-			collapse_mthp_stack_push(cc, &stack_size, offset,
-						 next_order);
+		if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
+			(BIT(order) - 1) & enabled_orders) {
+			order = order - 1;
+			continue;
 		}
+next_offset:
+		offset += nr_ptes;
+		order = min_t(int, __ffs(offset), HPAGE_PMD_ORDER);
 	}
 done:
 	if (collapsed)
-- 
2.54.0



> 
> [...]
> 
>>>> +     bitmap_zero(cc->mthp_bitmap, MAX_PTRS_PER_PTE);
>>>>       memset(cc->node_load, 0, sizeof(cc->node_load));
>>>>       nodes_clear(cc->alloc_nmask);
>>>> +
>>>> +     enabled_orders = collapse_allowable_orders(vma, vma->vm_flags, tva_flags);
>>>> +
>>>> +     /*
>>>> +      * If PMD is the only enabled order, enforce max_ptes_none, otherwise
>>>> +      * scan all pages to populate the bitmap for mTHP collapse.
>>>> +      */
>>>
>>> You should note here, that we re-verify in mthp_collapse().
>>>
>>> But the question is, whether we should relocate the check completely into
>>> mthp_collapse(), instead of conditionally duplicating it.
>>>
>>> What speaks against always populating the bitmap and making the decision in
>>> mthp_collapse()?
>>>
>>> Sure, we might scan a page table a bit longer, but the code gets clearer ... and
>>> I am not sure if scanning some more page table entries is really that critical here.
>>
>> Someone asked me to preserve the legacy behavior (PMD only). Although
>> rather trivial, if you set max_ptes_none=0 for example, we'd still
>> have to do 511 iterations for no reason if PMD collapse is the only
>> enabled order rather than bailing immediately.
>>
>> I'm ok with dropping it, but I think its the correct approach (despite
>> the extra complexity). @Usama Arif brought up this point here
>> https://lore.kernel.org/all/f8f7bb71-ca31-46ee-a62d-7ddfd83e0ead@gmail.com/
> 
> We talk about regressions, but I am not sure if we care about scanning speed
> within a page table that much?
> 
> After all, we locked it and already read some entries.
> 
> Having the same check at two places to optimize for PMD order might right now
> feel like a good optimization, but likely an irrelevant one in a near future?
> 
> Anyhow, won't push back, as long as we document why we are special casing things
> here.
> 


^ permalink raw reply related

* [PATCH 08/13] ring-buffer: Add ring_buffer_read_remote_meta_page()
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
  To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
  Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>

In preparation for the introduction of a panic handler for trace
remotes, add a ring_buffer_read_remote_meta_page(). This is basically
similar to ring_buffer_poll_remote, but it doesn't try to wake-up
readers and, in the !RING_BUFFER_ALL_CPUS case, uses panic-friendly
locks.

While at it, update trace_remote_has_cpu() to use this new function
instead of ring_buffer_poll_remote(), avoiding unnecessary wakeups when
verifying if a CPU buffer is active.

Finally, the distracted engineer who wrote that
ring_buffer_poll_remote() forgot to document it. Add a kerneldoc for
that function too.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 994f52b34344..6e008a548063 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -298,6 +298,7 @@ struct ring_buffer_remote {
 	void				*priv;
 };
 
+int ring_buffer_read_remote_meta_page(struct trace_buffer *buffer, int cpu);
 int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu);
 
 struct trace_buffer *
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7b07d2004cc6..88ac346c65ec 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6623,6 +6623,59 @@ bool ring_buffer_empty_cpu(struct trace_buffer *buffer, int cpu)
 }
 EXPORT_SYMBOL_GPL(ring_buffer_empty_cpu);
 
+/**
+ * ring_buffer_read_remote_meta_page - read the meta page of a remote ring buffer
+ * @buffer: The ring buffer
+ * @cpu: The CPU buffer to read (or RING_BUFFER_ALL_CPUS)
+ *
+ * Returns:
+ *  0 on success, or -EINVAL if the CPU is not in the buffer's cpumask.
+ */
+int ring_buffer_read_remote_meta_page(struct trace_buffer *buffer, int cpu)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+
+	if (cpu != RING_BUFFER_ALL_CPUS) {
+		unsigned long flags;
+		bool dolock;
+
+		if (!cpumask_test_cpu(cpu, buffer->cpumask))
+			return -EINVAL;
+
+		cpu_buffer = buffer->buffers[cpu];
+
+		local_irq_save(flags);
+		dolock = rb_reader_lock(cpu_buffer);
+		rb_read_remote_meta_page(cpu_buffer);
+		rb_reader_unlock(cpu_buffer, dolock);
+		local_irq_restore(flags);
+		return 0;
+	}
+
+	guard(cpus_read_lock)();
+
+	for_each_buffer_cpu(buffer, cpu) {
+		cpu_buffer = buffer->buffers[cpu];
+
+		guard(raw_spinlock)(&cpu_buffer->reader_lock);
+		rb_read_remote_meta_page(cpu_buffer);
+	}
+
+	return 0;
+}
+
+/**
+ * ring_buffer_poll_remote - poll a remote ring buffer for new data
+ * @buffer: The ring buffer
+ * @cpu: The CPU buffer to poll (or RING_BUFFER_ALL_CPUS)
+ *
+ * This function polls the specified remote CPU buffer (or all of them)
+ * by reading its meta page to update the local reader's view. If new
+ * entries are detected, it triggers wakeups for any waiting readers.
+ *
+ * Returns:
+ *  0 on success, or -EINVAL if the CPU is not in the buffer's cpumask.
+ */
 int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c
index 1bf0ba159c92..e708dcd7d258 100644
--- a/kernel/trace/trace_remote.c
+++ b/kernel/trace/trace_remote.c
@@ -291,7 +291,7 @@ static bool trace_remote_has_cpu(struct trace_remote *remote, int cpu)
 	if (cpu == RING_BUFFER_ALL_CPUS)
 		return true;
 
-	return ring_buffer_poll_remote(remote->trace_buffer, cpu) == 0;
+	return ring_buffer_read_remote_meta_page(remote->trace_buffer, cpu) == 0;
 }
 
 static void __free_ring_buffer_iter(struct trace_remote_iterator *iter, int cpu)
-- 
2.54.0.1032.g2f8565e1d1-goog


^ permalink raw reply related

* [PATCH 06/13] tracing/remotes: selftests: Add a test for the printk tracefs file
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
  To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
  Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>

Exercise the newly introduced printk tracefs file that turns on and off
the dmesg redirection.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/tools/testing/selftests/ftrace/test.d/remotes/functions b/tools/testing/selftests/ftrace/test.d/remotes/functions
index 05224fac3653..8dd9c961977b 100644
--- a/tools/testing/selftests/ftrace/test.d/remotes/functions
+++ b/tools/testing/selftests/ftrace/test.d/remotes/functions
@@ -8,6 +8,7 @@ setup_remote()
 
 	cd remotes/$name/
 	echo 0 > tracing_on
+	echo 0 > printk
 	clear_trace
 	echo 7 > buffer_size_kb
 	echo 0 > events/enable
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/printk.tc b/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/printk.tc
new file mode 100644
index 000000000000..aca7a2bfe293
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/printk.tc
@@ -0,0 +1,11 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Test trace remote printk, the dmesg redirection
+# requires: remotes/hypervisor/write_event
+
+SOURCE_REMOTE_TEST=1
+. $TEST_DIR/remotes/printk.tc
+
+set -e
+setup_remote "hypervisor"
+test_printk
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/printk.tc b/tools/testing/selftests/ftrace/test.d/remotes/printk.tc
new file mode 100644
index 000000000000..80eaf13e240d
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/remotes/printk.tc
@@ -0,0 +1,72 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Test trace remote printk, the dmesg redirection
+# requires: remotes/test
+
+. $TEST_DIR/remotes/functions
+
+test_printk()
+{
+    echo 0 > tracing_on
+    assert_unloaded
+
+    #
+    # Test printk on/off when tracing is disabled
+    #
+    echo 1 > printk
+    test $(cat printk) -eq 1
+    assert_loaded
+
+    echo 0 > printk
+    test $(cat printk) -eq 0
+    assert_unloaded
+
+    #
+    # Test events are logged to dmesg
+    #
+    dmesg -c > /dev/null
+
+    echo 1 > tracing_on
+    assert_loaded
+    echo 1 > printk
+    test $(cat printk) -eq 1
+
+    nr_events=128
+    for i in $(seq 1 $nr_events); do
+        echo $i > write_event
+    done
+
+    sleep 1
+    output=$(mktemp $TMPDIR/remote_test.XXXXXX)
+    dmesg | grep "selftest id=" | sed 's/^[^]]*] //'> $output
+
+    check_trace 1 $nr_events $output
+
+    rm $output
+
+    #
+    # Disable printk and Test events were not consumed by dmesg
+    #
+    echo 0 > printk
+    test $(cat printk) -eq 0
+
+    start_id=$(($nr_events + 1))
+    end_id=$(($start_id + $nr_events))
+
+    for i in $(seq $start_id $end_id); do
+        echo $i > write_event
+    done
+
+    sleep 1
+
+    output=$(dump_trace_pipe)
+    check_trace $start_id $end_id $output
+    rm $output
+}
+
+if [ -z "$SOURCE_REMOTE_TEST" ]; then
+    set -e
+
+    setup_remote_test
+    test_printk
+fi
-- 
2.54.0.1032.g2f8565e1d1-goog


^ permalink raw reply related

* [PATCH 05/13] tracing/remotes: Add printk tracefs file
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
  To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
  Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>

When enabled, the printk tracefs file enables the redirection of all
events to dmesg. This is similar to tp_printk.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c
index 21583fae1bd9..1bf0ba159c92 100644
--- a/kernel/trace/trace_remote.c
+++ b/kernel/trace/trace_remote.c
@@ -21,6 +21,7 @@
 enum tri_type {
 	TRI_CONSUMING,
 	TRI_NONCONSUMING,
+	TRI_PRINTK,
 };
 
 struct trace_remote_iterator {
@@ -42,6 +43,7 @@ struct trace_remote {
 	void				*priv;
 	struct trace_buffer		*trace_buffer;
 	struct trace_buffer_desc	*trace_buffer_desc;
+	struct trace_remote_iterator	*printk;
 	struct dentry			*dentry;
 	struct eventfs_inode		*eventfs_root;
 	struct eventfs_inode		*eventfs_subdir;
@@ -335,6 +337,8 @@ static int __alloc_ring_buffer_iter(struct trace_remote_iterator *iter, int cpu)
 	return 0;
 }
 
+static void trace_remote_do_printk(struct trace_remote *remote);
+
 static void __poll_remote(struct work_struct *work)
 {
 	struct delayed_work *dwork = to_delayed_work(work);
@@ -342,6 +346,7 @@ static void __poll_remote(struct work_struct *work)
 
 	remote = container_of(dwork, struct trace_remote, poll_work);
 	ring_buffer_poll_remote(remote->trace_buffer, RING_BUFFER_ALL_CPUS);
+	trace_remote_do_printk(remote);
 
 	schedule_delayed_work(dwork, msecs_to_jiffies(remote->poll_ms));
 }
@@ -351,6 +356,8 @@ static void trace_remote_inc_poll(struct trace_remote *remote)
 	/* poll_cnt <= nr_readers, inherits its overflow protection */
 	if (!remote->poll_cnt++) {
 		ring_buffer_poll_remote(remote->trace_buffer, RING_BUFFER_ALL_CPUS);
+		trace_remote_do_printk(remote);
+
 		schedule_delayed_work(&remote->poll_work, msecs_to_jiffies(remote->poll_ms));
 	}
 }
@@ -393,6 +400,14 @@ static struct trace_remote_iterator
 		trace_seq_init(&iter->seq);
 
 		switch (type) {
+		case TRI_PRINTK:
+			/* only one printk iter allowed */
+			if (WARN_ON_ONCE(remote->printk)) {
+				ret = -EBUSY;
+				break;
+			}
+			smp_store_release(&remote->printk, iter);
+			fallthrough;
 		case TRI_CONSUMING:
 			trace_remote_inc_poll(remote);
 			break;
@@ -427,6 +442,11 @@ static void trace_remote_iter_free(struct trace_remote_iterator *iter)
 	lockdep_assert_held(&remote->lock);
 
 	switch (iter->type) {
+	case TRI_PRINTK:
+		WARN_ON_ONCE(remote->printk != iter);
+		smp_store_release(&remote->printk, NULL);
+		flush_delayed_work(&remote->poll_work);
+		fallthrough;
 	case TRI_CONSUMING:
 		trace_remote_dec_poll(remote);
 		break;
@@ -504,6 +524,7 @@ __peek_event(struct trace_remote_iterator *iter, int cpu, u64 *ts, unsigned long
 	struct ring_buffer_iter *rb_iter;
 
 	switch (iter->type) {
+	case TRI_PRINTK:
 	case TRI_CONSUMING:
 		return ring_buffer_peek(iter->remote->trace_buffer, cpu, ts, lost_events);
 	case TRI_NONCONSUMING:
@@ -571,6 +592,7 @@ static void trace_remote_iter_move(struct trace_remote_iterator *iter)
 	struct trace_buffer *trace_buffer = iter->remote->trace_buffer;
 
 	switch (iter->type) {
+	case TRI_PRINTK:
 	case TRI_CONSUMING:
 		ring_buffer_consume(trace_buffer, iter->evt_cpu, NULL, NULL);
 		break;
@@ -814,6 +836,80 @@ static const struct file_operations trace_fops = {
 	.release	= trace_release,
 };
 
+static void trace_remote_do_printk(struct trace_remote *remote)
+{
+	struct trace_remote_iterator *iter = smp_load_acquire(&remote->printk);
+
+	if (!iter)
+		return;
+
+	trace_remote_iter_read_start(iter);
+
+	while (trace_remote_iter_read_event(iter)) {
+		trace_seq_init(&iter->seq);
+
+		trace_remote_iter_print_event(iter);
+		if (!pr_emerg("%s", iter->seq.buffer))
+			break;
+
+		trace_remote_iter_move(iter);
+	}
+
+	trace_remote_iter_read_finished(iter);
+}
+
+static int trace_remote_enable_printk(struct trace_remote *remote, bool enable)
+{
+	struct trace_remote_iterator *iter = remote->printk;
+
+	lockdep_assert_held(&remote->lock);
+
+	if (enable == !!iter)
+		return 0;
+
+	if (enable) {
+		iter = trace_remote_iter(remote, RING_BUFFER_ALL_CPUS, TRI_PRINTK);
+		if (IS_ERR(iter))
+			return PTR_ERR(iter);
+	} else {
+		trace_remote_iter_free(remote->printk);
+		/* trace_remote_iter_free has reset remote->printk */
+	}
+
+	return 0;
+}
+
+static ssize_t
+printk_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *ppos)
+{
+	struct seq_file *seq = filp->private_data;
+	struct trace_remote *remote = seq->private;
+	bool val;
+	int ret;
+
+	ret = kstrtobool_from_user(ubuf, cnt, &val);
+	if (ret)
+		return ret;
+
+	guard(mutex)(&remote->lock);
+
+	ret = trace_remote_enable_printk(remote, val);
+	if (ret)
+		return ret;
+
+	return cnt;
+}
+
+static int printk_show(struct seq_file *s, void *unused)
+{
+	struct trace_remote *remote = s->private;
+
+	seq_printf(s, "%d\n", !!remote->printk);
+
+	return 0;
+}
+DEFINE_SHOW_STORE_ATTRIBUTE(printk);
+
 static struct dentry *tracefs_root;
 static DEFINE_MUTEX(tracefs_lock);
 static u64 tracefs_root_count;
@@ -858,6 +954,10 @@ static int trace_remote_init_tracefs(const char *name, struct trace_remote *remo
 	if (!d)
 		goto err;
 
+	d = trace_create_file("printk", TRACEFS_MODE_WRITE, remote_d, remote, &printk_fops);
+	if (!d)
+		goto err;
+
 	percpu_d = tracefs_create_dir("per_cpu", remote_d);
 	if (!percpu_d) {
 		pr_err("Failed to create tracefs dir "TRACEFS_DIR"%s/per_cpu/\n", name);
-- 
2.54.0.1032.g2f8565e1d1-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox