* Re: [PATCH v2] tracing: fix CFI violation in probestub test
From: Masami Hiramatsu @ 2026-06-03 1:20 UTC (permalink / raw)
To: kernel test robot
Cc: Eva Kurchatova, mhiramat, rostedt, oe-kbuild-all,
linux-trace-kernel, linux-kernel, mathieu.desnoyers, peterz,
jpoimboe, samitolvanen
In-Reply-To: <202606022312.7cKiQBmg-lkp@intel.com>
On Tue, 2 Jun 2026 23:40:51 +0200
kernel test robot <lkp@intel.com> wrote:
> Hi Eva,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on trace/for-next]
> [also build test ERROR on linus/master v6.16-rc1 next-20260602]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Eva-Kurchatova/tracing-fix-CFI-violation-in-probestub-test/20260602-222302
> base: https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace for-next
> patch link: https://lore.kernel.org/r/20260602135425.542073-1-eva.kurchatova%40virtuozzo.com
> patch subject: [PATCH v2] tracing: fix CFI violation in probestub test
> config: x86_64-rhel-9.4-kselftests (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/config)
> compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202606022312.7cKiQBmg-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> In file included from drivers/dma-buf/sync_trace.h:10,
> from drivers/dma-buf/sw_sync.c:18:
> include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
> 403 | CFI_NOSEAL(__probestub_##_name); \
> | ^~~~~~~~~~
Hmm, it seems that this macro is not defined in this build
configuration? Maybe we need:
#include <linux/cfi.h>
instead of asm/cfi.h?
Thanks,
> include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~~~~~~~~~~~~~
> include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
> 12 | TRACE_EVENT(sync_timeline,
> | ^~~~~~~~~~~
> include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
> 403 | CFI_NOSEAL(__probestub_##_name); \
> | ^~~~~~~~~~
> include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~~~~~~~~~~~~~
> include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
> 12 | TRACE_EVENT(sync_timeline,
> | ^~~~~~~~~~~
> >> include/trace/../../drivers/dma-buf/sync_trace.h:13:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
> 13 | TP_PROTO(struct sync_timeline *timeline),
> | ^~~~~~~~~~~~~
> include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
> 394 | void __probestub_##_name(void *__data, proto) \
> | ^~~~~
> include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~
> include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~
> include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
> 12 | TRACE_EVENT(sync_timeline,
> | ^~~~~~~~~~~
> include/trace/../../drivers/dma-buf/sync_trace.h:13:9: note: in expansion of macro 'TP_PROTO'
> 13 | TP_PROTO(struct sync_timeline *timeline),
> | ^~~~~~~~
> --
> In file included from include/trace/events/lock.h:9,
> from kernel/locking/mutex.c:35:
> include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
> 403 | CFI_NOSEAL(__probestub_##_name); \
> | ^~~~~~~~~~
> include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~~~~~~~~~~~~~
> include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
> 24 | TRACE_EVENT(lock_acquire,
> | ^~~~~~~~~~~
> include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
> 403 | CFI_NOSEAL(__probestub_##_name); \
> | ^~~~~~~~~~
> include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~~~~~~~~~~~~~
> include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
> 24 | TRACE_EVENT(lock_acquire,
> | ^~~~~~~~~~~
> >> include/trace/events/lock.h:28:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
> 28 | struct lockdep_map *next_lock, unsigned long ip),
> | ^~~~~~~~~~~
> include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
> 394 | void __probestub_##_name(void *__data, proto) \
> | ^~~~~
> include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~
> include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~
> include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
> 24 | TRACE_EVENT(lock_acquire,
> | ^~~~~~~~~~~
> include/trace/events/lock.h:26:9: note: in expansion of macro 'TP_PROTO'
> 26 | TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
> | ^~~~~~~~
> include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
> 403 | CFI_NOSEAL(__probestub_##_name); \
> | ^~~~~~~~~~
> include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~~~~~~~~~~~~~
> include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
> 61 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
> 69 | DEFINE_EVENT(lock, lock_release,
> | ^~~~~~~~~~~~
> include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
> 403 | CFI_NOSEAL(__probestub_##_name); \
> | ^~~~~~~~~~
> include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~~~~~~~~~~~~~
> include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
> 61 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
> 69 | DEFINE_EVENT(lock, lock_release,
> | ^~~~~~~~~~~~
> include/trace/events/lock.h:71:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
> 71 | TP_PROTO(struct lockdep_map *lock, unsigned long ip),
> | ^~~~~~~~~~~
> include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
> 394 | void __probestub_##_name(void *__data, proto) \
> | ^~~~~
> include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~
> include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
> 61 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/define_trace.h:61:28: note: in expansion of macro 'PARAMS'
> 61 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~
> include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
> 69 | DEFINE_EVENT(lock, lock_release,
> | ^~~~~~~~~~~~
> include/trace/events/lock.h:71:9: note: in expansion of macro 'TP_PROTO'
> 71 | TP_PROTO(struct lockdep_map *lock, unsigned long ip),
> | ^~~~~~~~
> include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
> 403 | CFI_NOSEAL(__probestub_##_name); \
> | ^~~~~~~~~~
> include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~~~~~~~~~~~~~
> include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
> 95 | TRACE_EVENT(contention_begin,
> | ^~~~~~~~~~~
> include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
> 403 | CFI_NOSEAL(__probestub_##_name); \
> | ^~~~~~~~~~
> include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~~~~~~~~~~~~~
> include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
> 95 | TRACE_EVENT(contention_begin,
> | ^~~~~~~~~~~
> include/linux/tracepoint.h:380:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
> 380 | struct tracepoint_func *it_func_ptr; \
> | ^~~~~~~~~~~~~~~
> include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
> 424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
> | ^~~~~~~~~~~~~~~~~~
> include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
> 28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
> | ^~~~~~~~~~~~
> include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
> 95 | TRACE_EVENT(contention_begin,
>
>
> vim +13 include/trace/../../drivers/dma-buf/sync_trace.h
>
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 11
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 12 TRACE_EVENT(sync_timeline,
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 @13 TP_PROTO(struct sync_timeline *timeline),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 14
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 15 TP_ARGS(timeline),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 16
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 17 TP_STRUCT__entry(
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 18 __string(name, timeline->name)
> 5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan 2016-05-31 19 __field(u32, value)
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 20 ),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 21
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 22 TP_fast_assign(
> 2c92ca849fcc6ee drivers/dma-buf/sync_trace.h Steven Rostedt (Google 2024-05-16 23) __assign_str(name);
> 5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan 2016-05-31 24 __entry->value = timeline->value;
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 25 ),
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 26
> 5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan 2016-05-31 27 TP_printk("name=%s value=%d", __get_str(name), __entry->value)
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 28 );
> b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 29
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH] tracing/events: Expand ring buffer for in-kernel event enables
From: manjunath.b.patil @ 2026-06-02 23:53 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
In-Reply-To: <20260602090025.06ae761f@gandalf.local.home>
On 6/2/26 6:00 AM, Steven Rostedt wrote:
> On Mon, 1 Jun 2026 16:24:43 -0700
> Manjunath Patil <manjunath.b.patil@oracle.com> wrote:
>
>> Ftrace keeps trace arrays at a boot-minimum ring-buffer size until
>> tracing is used. Tracefs event-enable paths already call
>> tracing_update_buffers() before enabling events, but the exported
>> in-kernel helpers trace_set_clr_event() and trace_array_set_clr_event()
>> directly enable events through __ftrace_set_clr_event().
>>
>> This can leave events enabled by in-kernel users recording into the tiny
>> boot-minimum buffer instead of the configured default-sized buffer. Any
>> caller that enables events through these exported helpers observes
>> different buffer-expansion behavior than a userspace tracefs event enable.
>>
>> Expand the relevant trace array before enabling events through the
>> exported in-kernel helpers, matching the tracefs event-enable behavior.
>> Disabling events remains unchanged.
>
> The above explains everything correctly, but you left out what needs this?
>
> Internal code should not be using the main ring buffer except for
> debugging, in which case you can use trace_printk(), which will cause the
> tracing buffers to be expanded by default.
>
> Other areas of the kernel should create their own trace array which will be
> created expanded by default too.
>
> -- Steve
Thanks Steve, that makes sense.
The concrete case I was trying to address is not an upstream in-tree
user. It is a downstream UEK RDS compatibility path where the legacy
rds_rt_debug_bitmap module parameter is mapped to RDS
tracepoints by calling trace_set_clr_event() against the global trace
array during module init. That can leave the default RDS tracepoints
recording into the boot-minimum buffer unless userspace expands tracing
first.
Looking at mainline again, trace-array users are already created
expanded by default, and the remaining global-buffer use is
debugging-style. So I agree this generic tracing/events change does not
have a good upstream justification.
Please drop this patch. We will handle the downstream RDS compatibility
case downstream, or move it to an explicit userspace/trace-array setup.
Thanks,
Manjunath
^ permalink raw reply
* Re: [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages
From: Ackerley Tng @ 2026-06-02 22:41 UTC (permalink / raw)
To: Suzuki K Poulose, aik, andrew.jones, binbin.wu, brauner,
chao.p.peng, david, ira.weiny, jmattson, jthoughton, michael.roth,
oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
shivankg, steven.price, tabba, willy, wyihan, yan.y.zhao,
forkloop, pratyush, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <144bbb9f-39a2-4c90-8903-51521e022da0@arm.com>
Suzuki K Poulose <suzuki.poulose@arm.com> writes:
>
> [...snip...]
>
>>> @@ -914,7 +916,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct
>>> kvm_memory_slot *slot,
>>> folio_mark_uptodate(folio);
>>> }
>>> - r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>>> + if (kvm_gmem_is_private_mem(inode, index))
>>
>> Don't we need to make sure the entire folio is private ? Not just the
>> page at the index ?
>> if (kvm_gmem_range_is_private(, index, folio_nr_pages(folio)) ?
I was thinking to fix this when I do huge pages, for now guest_memfd is
always just PAGE_SIZE, so just looking up index is fine.
Is that okay?
>
> Or rather, we should go through the individual pages and apply the
> prepare for ones that are private ?
>
> Suzuki
>
IIRC the plan was to make kvm_gmem_prepare_folio() idempotent, as in, if
a page is already private, just skip. Currently sev_gmem_prepare() does
a pr_debug(), which I guess is technically still idempotent.
I'm thinking that the information tha needs tracking to make
.gmem_prepare() idempotent should be tracked by arch code.
Does this work for ARM CCA?
>>
>> [...snip...]
>>
^ permalink raw reply
* Re: [PATCH v2] tracing: fix CFI violation in probestub test
From: kernel test robot @ 2026-06-02 21:40 UTC (permalink / raw)
To: Eva Kurchatova, mhiramat, rostedt
Cc: oe-kbuild-all, linux-trace-kernel, linux-kernel,
mathieu.desnoyers, peterz, jpoimboe, samitolvanen, eva.kurchatova
In-Reply-To: <20260602135425.542073-1-eva.kurchatova@virtuozzo.com>
Hi Eva,
kernel test robot noticed the following build errors:
[auto build test ERROR on trace/for-next]
[also build test ERROR on linus/master v6.16-rc1 next-20260602]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Eva-Kurchatova/tracing-fix-CFI-violation-in-probestub-test/20260602-222302
base: https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace for-next
patch link: https://lore.kernel.org/r/20260602135425.542073-1-eva.kurchatova%40virtuozzo.com
patch subject: [PATCH v2] tracing: fix CFI violation in probestub test
config: x86_64-rhel-9.4-kselftests (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260602/202606022312.7cKiQBmg-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606022312.7cKiQBmg-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/dma-buf/sync_trace.h:10,
from drivers/dma-buf/sw_sync.c:18:
include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
403 | CFI_NOSEAL(__probestub_##_name); \
| ^~~~~~~~~~
include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~~~~~~~~~~~~~
include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
12 | TRACE_EVENT(sync_timeline,
| ^~~~~~~~~~~
include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
403 | CFI_NOSEAL(__probestub_##_name); \
| ^~~~~~~~~~
include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~~~~~~~~~~~~~
include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
12 | TRACE_EVENT(sync_timeline,
| ^~~~~~~~~~~
>> include/trace/../../drivers/dma-buf/sync_trace.h:13:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
13 | TP_PROTO(struct sync_timeline *timeline),
| ^~~~~~~~~~~~~
include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
394 | void __probestub_##_name(void *__data, proto) \
| ^~~~~
include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~
include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~
include/trace/../../drivers/dma-buf/sync_trace.h:12:1: note: in expansion of macro 'TRACE_EVENT'
12 | TRACE_EVENT(sync_timeline,
| ^~~~~~~~~~~
include/trace/../../drivers/dma-buf/sync_trace.h:13:9: note: in expansion of macro 'TP_PROTO'
13 | TP_PROTO(struct sync_timeline *timeline),
| ^~~~~~~~
--
In file included from include/trace/events/lock.h:9,
from kernel/locking/mutex.c:35:
include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
403 | CFI_NOSEAL(__probestub_##_name); \
| ^~~~~~~~~~
include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~~~~~~~~~~~~~
include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
24 | TRACE_EVENT(lock_acquire,
| ^~~~~~~~~~~
include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
403 | CFI_NOSEAL(__probestub_##_name); \
| ^~~~~~~~~~
include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~~~~~~~~~~~~~
include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
24 | TRACE_EVENT(lock_acquire,
| ^~~~~~~~~~~
>> include/trace/events/lock.h:28:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
28 | struct lockdep_map *next_lock, unsigned long ip),
| ^~~~~~~~~~~
include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
394 | void __probestub_##_name(void *__data, proto) \
| ^~~~~
include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~
include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/define_trace.h:28:28: note: in expansion of macro 'PARAMS'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~
include/trace/events/lock.h:24:1: note: in expansion of macro 'TRACE_EVENT'
24 | TRACE_EVENT(lock_acquire,
| ^~~~~~~~~~~
include/trace/events/lock.h:26:9: note: in expansion of macro 'TP_PROTO'
26 | TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
| ^~~~~~~~
include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
403 | CFI_NOSEAL(__probestub_##_name); \
| ^~~~~~~~~~
include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~~~~~~~~~~~~~
include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
61 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
69 | DEFINE_EVENT(lock, lock_release,
| ^~~~~~~~~~~~
include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
403 | CFI_NOSEAL(__probestub_##_name); \
| ^~~~~~~~~~
include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~~~~~~~~~~~~~
include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
61 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
69 | DEFINE_EVENT(lock, lock_release,
| ^~~~~~~~~~~~
include/trace/events/lock.h:71:25: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
71 | TP_PROTO(struct lockdep_map *lock, unsigned long ip),
| ^~~~~~~~~~~
include/linux/tracepoint.h:394:48: note: in definition of macro '__DEFINE_TRACE_EXT'
394 | void __probestub_##_name(void *__data, proto) \
| ^~~~~
include/linux/tracepoint.h:424:41: note: in expansion of macro 'PARAMS'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~
include/trace/define_trace.h:61:9: note: in expansion of macro 'DEFINE_TRACE'
61 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/define_trace.h:61:28: note: in expansion of macro 'PARAMS'
61 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~
include/trace/events/lock.h:69:1: note: in expansion of macro 'DEFINE_EVENT'
69 | DEFINE_EVENT(lock, lock_release,
| ^~~~~~~~~~~~
include/trace/events/lock.h:71:9: note: in expansion of macro 'TP_PROTO'
71 | TP_PROTO(struct lockdep_map *lock, unsigned long ip),
| ^~~~~~~~
include/linux/tracepoint.h:403:9: warning: data definition has no type or storage class
403 | CFI_NOSEAL(__probestub_##_name); \
| ^~~~~~~~~~
include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~~~~~~~~~~~~~
include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
95 | TRACE_EVENT(contention_begin,
| ^~~~~~~~~~~
include/linux/tracepoint.h:403:9: error: type defaults to 'int' in declaration of 'CFI_NOSEAL' [-Wimplicit-int]
403 | CFI_NOSEAL(__probestub_##_name); \
| ^~~~~~~~~~
include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~~~~~~~~~~~~~
include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
95 | TRACE_EVENT(contention_begin,
| ^~~~~~~~~~~
include/linux/tracepoint.h:380:24: error: parameter names (without types) in function declaration [-Wdeclaration-missing-parameter-type]
380 | struct tracepoint_func *it_func_ptr; \
| ^~~~~~~~~~~~~~~
include/linux/tracepoint.h:424:9: note: in expansion of macro '__DEFINE_TRACE_EXT'
424 | __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
| ^~~~~~~~~~~~~~~~~~
include/trace/define_trace.h:28:9: note: in expansion of macro 'DEFINE_TRACE'
28 | DEFINE_TRACE(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~
include/trace/events/lock.h:95:1: note: in expansion of macro 'TRACE_EVENT'
95 | TRACE_EVENT(contention_begin,
vim +13 include/trace/../../drivers/dma-buf/sync_trace.h
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 11
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 12 TRACE_EVENT(sync_timeline,
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 @13 TP_PROTO(struct sync_timeline *timeline),
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 14
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 15 TP_ARGS(timeline),
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 16
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 17 TP_STRUCT__entry(
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 18 __string(name, timeline->name)
5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan 2016-05-31 19 __field(u32, value)
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 20 ),
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 21
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 22 TP_fast_assign(
2c92ca849fcc6ee drivers/dma-buf/sync_trace.h Steven Rostedt (Google 2024-05-16 23) __assign_str(name);
5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan 2016-05-31 24 __entry->value = timeline->value;
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 25 ),
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 26
5c1401f83a16b7e drivers/staging/android/trace/sync.h Gustavo Padovan 2016-05-31 27 TP_printk("name=%s value=%d", __get_str(name), __entry->value)
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 28 );
b699a644f82110e drivers/staging/android/trace/sync.h Erik Gilling 2013-02-28 29
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [PATCH v7 34/42] KVM: selftests: Test conversion with elevated page refcount
From: Askar Safin @ 2026-06-02 21:26 UTC (permalink / raw)
To: devnull+ackerleytng.google.com
Cc: ackerleytng, aik, akpm, andrew.jones, aneesh.kumar, axelrasmussen,
baohua, bhe, binbin.wu, bp, brauner, chao.p.peng, chrisl, corbet,
dave.hansen, david, forkloop, hpa, ira.weiny, jgg, jmattson,
jthoughton, kas, kasong, kvm, liam, linux-coco, linux-doc,
linux-kernel, linux-kselftest, linux-mm, linux-trace-kernel,
mathieu.desnoyers, mhiramat, michael.roth, mingo, nphamcs, oupton,
pankaj.gupta, pbonzini, pratyush, qi.zheng, qperret,
rick.p.edgecombe, rientjes, rostedt, seanjc, shakeel.butt,
shikemeng, shivankg, shuah, skhan, steven.price, suzuki.poulose,
tabba, tglx, vannapurve, vbabka, weixugc, willy, wyihan, x86,
yan.y.zhao, youngjun.park, yuanchu
In-Reply-To: <20260522-gmem-inplace-conversion-v7-34-2f0fae496530@google.com>
Ackerley Tng via B4 Relay <devnull+ackerleytng.google.com@kernel.org>:
> This test uses vmsplice to increment the refcount of a specific page
I recently submitted a patch, which makes vmsplice equivalent to
preadv2/pwritev2, and it was accepted to next.
For now it is just an experiment, it is possible it will be reverted.
https://lore.kernel.org/all/20260601-aufweichen-dissens-ausrechnen-0d9b84728113@brauner/
--
Askar Safin
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Andy Shevchenko @ 2026-06-02 21:05 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Arnd Bergmann, Steven Rostedt, Masami Hiramatsu, Andrew Morton,
Petr Mladek, Nathan Chancellor, Dennis Dalessandro,
Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
Miri Korenblit, Mathieu Desnoyers, Rasmus Villemoes,
Sergey Senozhatsky, Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <35c1ba62-e74d-4abc-aa73-ccd35968ff89@app.fastmail.com>
On Tue, Jun 02, 2026 at 10:32:04PM +0200, Arnd Bergmann wrote:
> On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
> > On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
...
> > Why the __printf() annotation is in the C file and not here?
> > Is this all about headers as the second paragraph in the commit message
> > explains?
> > I would add a comment to explain it here, otherwise we might see false
> > patches to "make things consistent" in a wrong way.
>
> I've tried to come up with a kerneldoc comment now, similar to
> the one for the vsnprintf() function, and added a separate prototype
> in the header. Does this address your concern?
Yes, see one nit, though.
> -int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> +/**
> + * __vsnprintf - vsnprintf() wrapper without __printf() attribute
> + * @buf: The buffer to place the result into
> + * @size: The size of the buffer, including the trailing null space
> + * @fmt_str: The format string to use
> + * @args: Arguments for the format string
> + *
> + * This has the exact same behavior as vsnprintf() but can be used in call
> + * sites that are missing a __printf() annotation, e.g. because they
> + * get a 'va_format' argument instead of format and varargs.
> + *
> + * For this to work, the attribute is added to the declaration here but
> + * not in the header.
+ *
+ * Return: ...
> + */
> +int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args);
> +
> +int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
Something slipped here...
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages
From: Ackerley Tng @ 2026-06-02 20:46 UTC (permalink / raw)
To: Suzuki K Poulose, aik, andrew.jones, binbin.wu, brauner,
chao.p.peng, david, ira.weiny, jmattson, jthoughton, michael.roth,
oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
shivankg, steven.price, tabba, willy, wyihan, yan.y.zhao,
forkloop, pratyush, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <d01cf1ec-b85d-4af6-9810-8107c0e2a4ec@arm.com>
Suzuki K Poulose <suzuki.poulose@arm.com> writes:
> On 23/05/2026 01:17, Ackerley Tng via B4 Relay wrote:
>> From: Ackerley Tng <ackerleytng@google.com>
>>
>> All-shared guest_memfd used to be only supported for non-CoCo VMs where
>> preparation doesn't apply. INIT_SHARED is about to be supported for
>> non-CoCo VMs in a later patch in this series.
>
> nit: s/non-CoCo/CoCo ?
>
Yes, thanks!
>>
>> In addition, KVM_SET_MEMORY_ATTRIBUTES2 is about to be supported in
>> guest_memfd in a later patch in this series.
>>
>> This means that the kvm fault handler may now call kvm_gmem_get_pfn() on a
>> shared folio for a CoCo VM where preparation applies.
>>
>> Add a check to make sure that preparation is only performed for private
>> folios.
>>
>> Preparation will be undone on freeing (see kvm_gmem_free_folio()) and on
>> conversion to shared.
>>
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>
> nit: Missing Co-Developed-by: ?
>
IIRC this should have been
Suggested-by: Michael Roth <michael.roth@amd.com>
IIRC Michael suggested this on one of the guest_memfd calls, Michael
please let me know if you remember otherwise!
>>
>> [...snip...]
>>
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Arnd Bergmann @ 2026-06-02 20:32 UTC (permalink / raw)
To: Andy Shevchenko, Arnd Bergmann
Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Petr Mladek,
Nathan Chancellor, Dennis Dalessandro, Jason Gunthorpe,
Leon Romanovsky, Arend van Spriel, Miri Korenblit,
Mathieu Desnoyers, Rasmus Villemoes, Sergey Senozhatsky,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <ah8n-Nk305S5hRwN@ashevche-desk.local>
On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
> On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
>>
>> A number of tracing headers turn off -Wsuggest-attribute=format for
>> gcc, but they don't turn it off for clang, so the same warning still
>> happens on new versions of clang that support the format attribute.
>>
>> To avoid duplicating the same thing in each tracing header, as well
>> as changing all of them to also turn it off for clang, add a new
>> __vsnprintf() helper that is not annotated this way in linux/sprintf.h
>> but is defined to work the same way as the regular vsprintf.
>
> vsprintf()
Fixed now
> Why the __printf() annotation is in the C file and not here?
> Is this all about headers as the second paragraph in the commit message
> explains?
> I would add a comment to explain it here, otherwise we might see false
> patches to "make things consistent" in a wrong way.
I've tried to come up with a kerneldoc comment now, similar to
the one for the vsnprintf() function, and added a separate prototype
in the header. Does this address your concern?
Arnd
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 3caf0796f54d..7c696aea2ed3 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2975,7 +2975,23 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
}
EXPORT_SYMBOL(vsnprintf);
-int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
+/**
+ * __vsnprintf - vsnprintf() wrapper without __printf() attribute
+ * @buf: The buffer to place the result into
+ * @size: The size of the buffer, including the trailing null space
+ * @fmt_str: The format string to use
+ * @args: Arguments for the format string
+ *
+ * This has the exact same behavior as vsnprintf() but can be used in call
+ * sites that are missing a __printf() annotation, e.g. because they
+ * get a 'va_format' argument instead of format and varargs.
+ *
+ * For this to work, the attribute is added to the declaration here but
+ * not in the header.
+ */
+int __printf(3, 0) __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args);
+
+int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
{
return vsnprintf(buf, size, fmt_str, args);
}
^ permalink raw reply related
* Re: [PATCH] tracing: Reject tracefs buffer size values that overflow bytes
From: Steven Rostedt @ 2026-06-02 20:03 UTC (permalink / raw)
To: Sam Moelius
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
In-Reply-To: <20260602184335.1554470-1-sam.moelius@trailofbits.com>
On Tue, 2 Jun 2026 18:43:34 +0000
Sam Moelius <sam.moelius@trailofbits.com> wrote:
> From: Samuel Moelius <sam.moelius@trailofbits.com>
>
> `tracing_entries_write()` accepts a `buffer_size_kb` value as
> `unsigned long`, checks only for zero, then shifts left by 10. On
> 64-bit, writing `18014398509481984` KB wraps the byte count to zero
> and the ring buffer resize path accepts it as a tiny buffer instead
> of rejecting an impossible huge size.
>
> The fix also adds the same pre-scale overflow check to
> `buffer_subbuf_size_write()`.
Honestly, enter stupid values, get stupid results.
I don't think this is necessary. Nothing breaks but the person may get
confused by being confused by the confusing entries they make.
-- Steve
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Andy Shevchenko @ 2026-06-02 18:59 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Steven Rostedt, Masami Hiramatsu, Andrew Morton, Petr Mladek,
Nathan Chancellor, Arnd Bergmann, Dennis Dalessandro,
Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
Miri Korenblit, Mathieu Desnoyers, Rasmus Villemoes,
Sergey Senozhatsky, Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka, linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <20260602150904.2258624-1-arnd@kernel.org>
On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
>
> A number of tracing headers turn off -Wsuggest-attribute=format for
> gcc, but they don't turn it off for clang, so the same warning still
> happens on new versions of clang that support the format attribute.
>
> To avoid duplicating the same thing in each tracing header, as well
> as changing all of them to also turn it off for clang, add a new
> __vsnprintf() helper that is not annotated this way in linux/sprintf.h
> but is defined to work the same way as the regular vsprintf.
vsprintf()
> Aside from tracing, the same thing can be used in va_format(),
> which is part of lib/vsprintf.c itself.
...
> --- a/include/linux/sprintf.h
> +++ b/include/linux/sprintf.h
> @@ -12,6 +12,7 @@ __printf(2, 3) int sprintf(char *buf, const char * fmt, ...);
> __printf(2, 0) int vsprintf(char *buf, const char *, va_list);
> __printf(3, 4) int snprintf(char *buf, size_t size, const char *fmt, ...);
> __printf(3, 0) int vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
> +int __vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
Why the __printf() annotation is in the C file and not here?
Is this all about headers as the second paragraph in the commit message explains?
I would add a comment to explain it here, otherwise we might see false patches to
"make things consistent" in a wrong way.
> __printf(3, 4) int scnprintf(char *buf, size_t size, const char *fmt, ...);
> __printf(3, 0) int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
> __printf(2, 3) __malloc char *kasprintf(gfp_t gfp, const char *fmt, ...);
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* [PATCH] tracing: Reject tracefs buffer size values that overflow bytes
From: Sam Moelius @ 2026-06-02 18:43 UTC (permalink / raw)
To: Steven Rostedt
Cc: Samuel Moelius, Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Samuel Moelius <sam.moelius@trailofbits.com>
`tracing_entries_write()` accepts a `buffer_size_kb` value as
`unsigned long`, checks only for zero, then shifts left by 10. On
64-bit, writing `18014398509481984` KB wraps the byte count to zero
and the ring buffer resize path accepts it as a tiny buffer instead
of rejecting an impossible huge size.
The fix also adds the same pre-scale overflow check to
`buffer_subbuf_size_write()`.
Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com>
---
kernel/trace/trace.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6eb4d3097a4d..79da29c3d525 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5735,7 +5735,7 @@ tracing_entries_write(struct file *filp, const char __user *ubuf,
return ret;
/* must have at least 1 entry */
- if (!val)
+ if (!val || val > ULONG_MAX >> 10)
return -EINVAL;
/* value is in KB */
@@ -8206,6 +8206,9 @@ buffer_subbuf_size_write(struct file *filp, const char __user *ubuf,
if (ret)
return ret;
+ if (!val || val > ULONG_MAX / 1024)
+ return -EINVAL;
+
val *= 1024; /* value passed in is in KB */
pages = DIV_ROUND_UP(val, PAGE_SIZE);
--
2.43.0
^ permalink raw reply related
* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Nico Pache @ 2026-06-02 17:26 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
Usama Arif, usamaarif642
In-Reply-To: <19639b08-5bf1-4974-9635-c458d512fa38@redhat.com>
On Tue, Jun 2, 2026 at 11:22 AM Nico Pache <npache@redhat.com> wrote:
>
>
>
> On 6/1/26 7:15 AM, David Hildenbrand (Arm) wrote:
> >>>
> >>> Reading this, it is unclear why exactly do we need the stack.
> >>
> >> So I looked into your items below. It seems logical, and I think it
> >> works the same way; however, your method seems slightly harder to
> >> understand due to all the edge cases and more error-prone to future
> >> changes (the stack holds implicit knowledge of the offset/order that
> >> must now be tracked in the edge cases).
> >>
> >> Given the stack is 24 bytes, I'm not sure if the extra complexity is
> >> worth saving that small amount of memory. Although we would also be
> >> getting rid of (3?) functions, so both approaches have pros and cons.
> >
> > I consider a simple forward loop over the offset ... less complexity compared to
> > a stack structure :)
> >
> >>
> >> I will implement a patch comparing your solution against mine and send
> >> it here, then we can decide which approach is better.
> >
> > Right, throw it over the fence and I'll see how to improve it further.
>
> Ok heres what the diff looks like on top of my V19.
>
> you can access the tree here https://gitlab.com/npache/linux/-/commits/mthp-v19?ref_type=heads for easier review.
>
> So far I have no problem with this approach it appeared cleaner than i thought. Did some light testing. Gonna throw it more through the ringer tomorrow.
not sure why this didnt send with the proper encoding I guess my email
is still a little screwed up
>
>
> From 9496c5d17eba7f6d04820d78c7c6f1592a58888a Mon Sep 17 00:00:00 2001
> From: Nico Pache <npache@redhat.com>
> Date: Tue, 2 Jun 2026 10:26:18 -0600
> Subject: [PATCH] convert from stack to forward loop
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 96 ++++++++-----------------------------------------
> 1 file changed, 15 insertions(+), 81 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 498eba009751..6de935e76ceb 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -100,28 +100,6 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
> static struct kmem_cache *mm_slot_cache __ro_after_init;
>
> #define KHUGEPAGED_MIN_MTHP_ORDER 2
> -/*
> - * mthp_collapse() does an iterative DFS over a binary tree, from
> - * HPAGE_PMD_ORDER down to KHUGEPAGED_MIN_MTHP_ORDER. The max stack
> - * size needed for a DFS on a binary tree is height + 1, where
> - * height = HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER.
> - *
> - * ilog2 is used in place of HPAGE_PMD_ORDER because some architectures
> - * (e.g. ppc64le) do not define HPAGE_PMD_ORDER until after build time.
> - */
> -#define MTHP_STACK_SIZE (ilog2(MAX_PTRS_PER_PTE) - KHUGEPAGED_MIN_MTHP_ORDER + 1)
> -
> -/*
> - * Defines a range of PTE entries in a PTE page table which are being
> - * considered for mTHP collapse.
> - *
> - * @offset: the offset of the first PTE entry in a PMD range.
> - * @order: the order of the PTE entries being considered for collapse.
> - */
> -struct mthp_range {
> - u16 offset;
> - u8 order;
> -};
>
> struct collapse_control {
> bool is_khugepaged;
> @@ -137,7 +115,6 @@ struct collapse_control {
>
> /* Each bit represents a single occupied (!none/zero) page. */
> DECLARE_BITMAP(mthp_present_ptes, MAX_PTRS_PER_PTE);
> - struct mthp_range mthp_bitmap_stack[MTHP_STACK_SIZE];
> };
>
> /**
> @@ -1458,50 +1435,14 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long s
> return result;
> }
>
> -static void collapse_mthp_stack_push(struct collapse_control *cc, int *stack_size,
> - u16 offset, u8 order)
> -{
> - const int size = *stack_size;
> - struct mthp_range *stack = &cc->mthp_bitmap_stack[size];
> -
> - VM_WARN_ON_ONCE(size >= MTHP_STACK_SIZE);
> - stack->order = order;
> - stack->offset = offset;
> - (*stack_size)++;
> -}
> -
> -static struct mthp_range collapse_mthp_stack_pop(struct collapse_control *cc,
> - int *stack_size)
> -{
> - const int size = *stack_size;
> -
> - VM_WARN_ON_ONCE(size <= 0);
> - (*stack_size)--;
> - return cc->mthp_bitmap_stack[size - 1];
> -}
> -
> /*
> * mthp_collapse() consumes the bitmap that is generated during
> * collapse_scan_pmd() to determine what regions and mTHP orders fit best.
> *
> * Each bit in cc->mthp_present_ptes represents a single occupied (!none/zero)
> - * page. A stack structure cc->mthp_bitmap_stack is used to check different
> - * regions of the bitmap for collapse eligibility. The stack maintains a pair
> - * of variables (offset, order), indicating the number of PTEs from the start
> - * of the PMD, and the order of the potential collapse candidate respectively.
> - * We start at the PMD order and check if it is eligible for collapse; if not,
> - * we add two entries to the stack at a lower order to represent the left and
> - * right halves of the PTE page table we are examining.
> - *
> - * offset mid_offset
> - * | |
> - * | |
> - * v v
> - * --------------------------------------
> - * | cc->mthp_present_ptes |
> - * --------------------------------------
> - * <-------><------->
> - * order-1 order-1
> + * page. We start at the PMD order and check if it is eligible for collapse;
> + * if not, we check the left and right halves of the PTE page table we are
> + * examining at a lower order.
> *
> * For each of these, we determine how many PTE entries are occupied in the
> * range of PTE entries we propose to collapse, then we compare this to a
> @@ -1517,26 +1458,20 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
> {
> unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
> enum scan_result last_result = SCAN_FAIL;
> - int collapsed = 0, stack_size = 0;
> + int collapsed = 0;
> bool alloc_failed = false;
> unsigned long collapse_address;
> - struct mthp_range range;
> - u16 offset;
> - u8 order;
> + unsigned int offset = 0;
> + unsigned int order = HPAGE_PMD_ORDER;
>
> - collapse_mthp_stack_push(cc, &stack_size, 0, HPAGE_PMD_ORDER);
>
> - while (stack_size) {
> - range = collapse_mthp_stack_pop(cc, &stack_size);
> - order = range.order;
> - offset = range.offset;
> + while (offset < HPAGE_PMD_NR) {
> nr_ptes = 1UL << order;
>
> if (!test_bit(order, &enabled_orders))
> goto next_order;
>
> max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
> -
> nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
> offset + nr_ptes);
>
> @@ -1553,7 +1488,7 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
> collapsed += nr_ptes;
> fallthrough;
> case SCAN_PTE_MAPPED_HUGEPAGE:
> - continue;
> + goto next_offset;
> /* Cases where lower orders might still succeed */
> case SCAN_ALLOC_HUGE_PAGE_FAIL:
> alloc_failed = true;
> @@ -1581,15 +1516,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
> }
>
> next_order:
> - if ((BIT(order) - 1) & enabled_orders) {
> - const u8 next_order = order - 1;
> - const u16 mid_offset = offset + (nr_ptes / 2);
> -
> - collapse_mthp_stack_push(cc, &stack_size, mid_offset,
> - next_order);
> - collapse_mthp_stack_push(cc, &stack_size, offset,
> - next_order);
> + if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
> + (BIT(order) - 1) & enabled_orders) {
> + order = order - 1;
> + continue;
> }
> +next_offset:
> + offset += nr_ptes;
> + order = min_t(int, __ffs(offset), HPAGE_PMD_ORDER);
> }
> done:
> if (collapsed)
> --
> 2.54.0
>
>
>
> >
> > [...]
> >
> >>>> + bitmap_zero(cc->mthp_bitmap, MAX_PTRS_PER_PTE);
> >>>> memset(cc->node_load, 0, sizeof(cc->node_load));
> >>>> nodes_clear(cc->alloc_nmask);
> >>>> +
> >>>> + enabled_orders = collapse_allowable_orders(vma, vma->vm_flags, tva_flags);
> >>>> +
> >>>> + /*
> >>>> + * If PMD is the only enabled order, enforce max_ptes_none, otherwise
> >>>> + * scan all pages to populate the bitmap for mTHP collapse.
> >>>> + */
> >>>
> >>> You should note here, that we re-verify in mthp_collapse().
> >>>
> >>> But the question is, whether we should relocate the check completely into
> >>> mthp_collapse(), instead of conditionally duplicating it.
> >>>
> >>> What speaks against always populating the bitmap and making the decision in
> >>> mthp_collapse()?
> >>>
> >>> Sure, we might scan a page table a bit longer, but the code gets clearer ... and
> >>> I am not sure if scanning some more page table entries is really that critical here.
> >>
> >> Someone asked me to preserve the legacy behavior (PMD only). Although
> >> rather trivial, if you set max_ptes_none=0 for example, we'd still
> >> have to do 511 iterations for no reason if PMD collapse is the only
> >> enabled order rather than bailing immediately.
> >>
> >> I'm ok with dropping it, but I think its the correct approach (despite
> >> the extra complexity). @Usama Arif brought up this point here
> >> https://lore.kernel.org/all/f8f7bb71-ca31-46ee-a62d-7ddfd83e0ead@gmail.com/
> >
> > We talk about regressions, but I am not sure if we care about scanning speed
> > within a page table that much?
> >
> > After all, we locked it and already read some entries.
> >
> > Having the same check at two places to optimize for PMD order might right now
> > feel like a good optimization, but likely an irrelevant one in a near future?
> >
> > Anyhow, won't push back, as long as we document why we are special casing things
> > here.
> >
>
^ permalink raw reply
* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Nico Pache @ 2026-06-02 17:23 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
Usama Arif, usamaarif642
In-Reply-To: <d3c2b00c-6810-434a-b837-0707b0a11611@kernel.org>
On 6/1/26 7:15 AM, David Hildenbrand (Arm) wrote:
>>>
>>> Reading this, it is unclear why exactly do we need the stack.
>>
>> So I looked into your items below. It seems logical, and I think it
>> works the same way; however, your method seems slightly harder to
>> understand due to all the edge cases and more error-prone to future
>> changes (the stack holds implicit knowledge of the offset/order that
>> must now be tracked in the edge cases).
>>
>> Given the stack is 24 bytes, I'm not sure if the extra complexity is
>> worth saving that small amount of memory. Although we would also be
>> getting rid of (3?) functions, so both approaches have pros and cons.
>
> I consider a simple forward loop over the offset ... less complexity compared to
> a stack structure :)
>
>>
>> I will implement a patch comparing your solution against mine and send
>> it here, then we can decide which approach is better.
>
> Right, throw it over the fence and I'll see how to improve it further.
Ok heres what the diff looks like on top of my V19.
you can access the tree here https://gitlab.com/npache/linux/-/commits/mthp-v19?ref_type=heads for easier review.
So far I have no problem with this approach it appeared cleaner than i thought. Did some light testing. Gonna throw it more through the ringer tomorrow.
From 9496c5d17eba7f6d04820d78c7c6f1592a58888a Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Tue, 2 Jun 2026 10:26:18 -0600
Subject: [PATCH] convert from stack to forward loop
Signed-off-by: Nico Pache <npache@redhat.com>
---
mm/khugepaged.c | 96 ++++++++-----------------------------------------
1 file changed, 15 insertions(+), 81 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 498eba009751..6de935e76ceb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -100,28 +100,6 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
static struct kmem_cache *mm_slot_cache __ro_after_init;
#define KHUGEPAGED_MIN_MTHP_ORDER 2
-/*
- * mthp_collapse() does an iterative DFS over a binary tree, from
- * HPAGE_PMD_ORDER down to KHUGEPAGED_MIN_MTHP_ORDER. The max stack
- * size needed for a DFS on a binary tree is height + 1, where
- * height = HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER.
- *
- * ilog2 is used in place of HPAGE_PMD_ORDER because some architectures
- * (e.g. ppc64le) do not define HPAGE_PMD_ORDER until after build time.
- */
-#define MTHP_STACK_SIZE (ilog2(MAX_PTRS_PER_PTE) - KHUGEPAGED_MIN_MTHP_ORDER + 1)
-
-/*
- * Defines a range of PTE entries in a PTE page table which are being
- * considered for mTHP collapse.
- *
- * @offset: the offset of the first PTE entry in a PMD range.
- * @order: the order of the PTE entries being considered for collapse.
- */
-struct mthp_range {
- u16 offset;
- u8 order;
-};
struct collapse_control {
bool is_khugepaged;
@@ -137,7 +115,6 @@ struct collapse_control {
/* Each bit represents a single occupied (!none/zero) page. */
DECLARE_BITMAP(mthp_present_ptes, MAX_PTRS_PER_PTE);
- struct mthp_range mthp_bitmap_stack[MTHP_STACK_SIZE];
};
/**
@@ -1458,50 +1435,14 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long s
return result;
}
-static void collapse_mthp_stack_push(struct collapse_control *cc, int *stack_size,
- u16 offset, u8 order)
-{
- const int size = *stack_size;
- struct mthp_range *stack = &cc->mthp_bitmap_stack[size];
-
- VM_WARN_ON_ONCE(size >= MTHP_STACK_SIZE);
- stack->order = order;
- stack->offset = offset;
- (*stack_size)++;
-}
-
-static struct mthp_range collapse_mthp_stack_pop(struct collapse_control *cc,
- int *stack_size)
-{
- const int size = *stack_size;
-
- VM_WARN_ON_ONCE(size <= 0);
- (*stack_size)--;
- return cc->mthp_bitmap_stack[size - 1];
-}
-
/*
* mthp_collapse() consumes the bitmap that is generated during
* collapse_scan_pmd() to determine what regions and mTHP orders fit best.
*
* Each bit in cc->mthp_present_ptes represents a single occupied (!none/zero)
- * page. A stack structure cc->mthp_bitmap_stack is used to check different
- * regions of the bitmap for collapse eligibility. The stack maintains a pair
- * of variables (offset, order), indicating the number of PTEs from the start
- * of the PMD, and the order of the potential collapse candidate respectively.
- * We start at the PMD order and check if it is eligible for collapse; if not,
- * we add two entries to the stack at a lower order to represent the left and
- * right halves of the PTE page table we are examining.
- *
- * offset mid_offset
- * | |
- * | |
- * v v
- * --------------------------------------
- * | cc->mthp_present_ptes |
- * --------------------------------------
- * <-------><------->
- * order-1 order-1
+ * page. We start at the PMD order and check if it is eligible for collapse;
+ * if not, we check the left and right halves of the PTE page table we are
+ * examining at a lower order.
*
* For each of these, we determine how many PTE entries are occupied in the
* range of PTE entries we propose to collapse, then we compare this to a
@@ -1517,26 +1458,20 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
{
unsigned int nr_occupied_ptes, nr_ptes, max_ptes_none;
enum scan_result last_result = SCAN_FAIL;
- int collapsed = 0, stack_size = 0;
+ int collapsed = 0;
bool alloc_failed = false;
unsigned long collapse_address;
- struct mthp_range range;
- u16 offset;
- u8 order;
+ unsigned int offset = 0;
+ unsigned int order = HPAGE_PMD_ORDER;
- collapse_mthp_stack_push(cc, &stack_size, 0, HPAGE_PMD_ORDER);
- while (stack_size) {
- range = collapse_mthp_stack_pop(cc, &stack_size);
- order = range.order;
- offset = range.offset;
+ while (offset < HPAGE_PMD_NR) {
nr_ptes = 1UL << order;
if (!test_bit(order, &enabled_orders))
goto next_order;
max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
-
nr_occupied_ptes = bitmap_weight_from(cc->mthp_present_ptes, offset,
offset + nr_ptes);
@@ -1553,7 +1488,7 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
collapsed += nr_ptes;
fallthrough;
case SCAN_PTE_MAPPED_HUGEPAGE:
- continue;
+ goto next_offset;
/* Cases where lower orders might still succeed */
case SCAN_ALLOC_HUGE_PAGE_FAIL:
alloc_failed = true;
@@ -1581,15 +1516,14 @@ static enum scan_result mthp_collapse(struct mm_struct *mm,
}
next_order:
- if ((BIT(order) - 1) & enabled_orders) {
- const u8 next_order = order - 1;
- const u16 mid_offset = offset + (nr_ptes / 2);
-
- collapse_mthp_stack_push(cc, &stack_size, mid_offset,
- next_order);
- collapse_mthp_stack_push(cc, &stack_size, offset,
- next_order);
+ if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
+ (BIT(order) - 1) & enabled_orders) {
+ order = order - 1;
+ continue;
}
+next_offset:
+ offset += nr_ptes;
+ order = min_t(int, __ffs(offset), HPAGE_PMD_ORDER);
}
done:
if (collapsed)
--
2.54.0
>
> [...]
>
>>>> + bitmap_zero(cc->mthp_bitmap, MAX_PTRS_PER_PTE);
>>>> memset(cc->node_load, 0, sizeof(cc->node_load));
>>>> nodes_clear(cc->alloc_nmask);
>>>> +
>>>> + enabled_orders = collapse_allowable_orders(vma, vma->vm_flags, tva_flags);
>>>> +
>>>> + /*
>>>> + * If PMD is the only enabled order, enforce max_ptes_none, otherwise
>>>> + * scan all pages to populate the bitmap for mTHP collapse.
>>>> + */
>>>
>>> You should note here, that we re-verify in mthp_collapse().
>>>
>>> But the question is, whether we should relocate the check completely into
>>> mthp_collapse(), instead of conditionally duplicating it.
>>>
>>> What speaks against always populating the bitmap and making the decision in
>>> mthp_collapse()?
>>>
>>> Sure, we might scan a page table a bit longer, but the code gets clearer ... and
>>> I am not sure if scanning some more page table entries is really that critical here.
>>
>> Someone asked me to preserve the legacy behavior (PMD only). Although
>> rather trivial, if you set max_ptes_none=0 for example, we'd still
>> have to do 511 iterations for no reason if PMD collapse is the only
>> enabled order rather than bailing immediately.
>>
>> I'm ok with dropping it, but I think its the correct approach (despite
>> the extra complexity). @Usama Arif brought up this point here
>> https://lore.kernel.org/all/f8f7bb71-ca31-46ee-a62d-7ddfd83e0ead@gmail.com/
>
> We talk about regressions, but I am not sure if we care about scanning speed
> within a page table that much?
>
> After all, we locked it and already read some entries.
>
> Having the same check at two places to optimize for PMD order might right now
> feel like a good optimization, but likely an irrelevant one in a near future?
>
> Anyhow, won't push back, as long as we document why we are special casing things
> here.
>
^ permalink raw reply related
* [PATCH 13/13] Documentation/kernel-parameters: Add trace_remote
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
The trace_remote parameter allows to configure a trace remote on
registration. The syntax is similar to trace_instance.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 97007f4f69d4..dec8fd1f5ff9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -7792,6 +7792,22 @@ Kernel parameters
See also Documentation/trace/ftrace.rst "trace options"
section.
+ trace_remote=[remote-info]
+ [FTRACE] Configure a trace remote instance at boot.
+ Format: <name>[^option1[^option2...]][,event1[,event2...]]
+
+ Supported options:
+ dump_on_oops - Enable panic load to dump the trace
+ buffer on oops/panic.
+ printk - Redirect tracing output to printk.
+ buf_size=<size> - Set trace buffer size (e.g. 2M).
+
+ Events are a comma-separated list of events to enable.
+ If events are specified, tracing is automatically enabled.
+
+ Multiple remotes can be configured by specifying this
+ parameter multiple times.
+
trace_trigger=[trigger-list]
[FTRACE] Add an event trigger on specific events.
Set a trigger on top of a specific event, with an optional
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 12/13] tracing/remotes: Add trace_remote cmdline options
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
Following the same format as trace_instance, add a cmdline to configure
a trace remote on registration:
trace_remote=<remote>^<opt1>^<opt2>,<evt1>,<evt2>
Enabling events automatically turns on tracing.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c
index 3164f327d4d8..0b44636b6eea 100644
--- a/kernel/trace/trace_remote.c
+++ b/kernel/trace/trace_remote.c
@@ -13,6 +13,8 @@
#include <linux/trace_seq.h>
#include <linux/types.h>
+#include <asm/setup.h>
+
#include "trace.h"
#define TRACEFS_DIR "remotes"
@@ -1057,6 +1059,124 @@ static int dump_on_oops_show(struct seq_file *s, void *unused)
}
DEFINE_SHOW_STORE_ATTRIBUTE(dump_on_oops);
+static char trace_remote_cmdline[COMMAND_LINE_SIZE];
+
+static int __init set_trace_remote_cmdline(char *str)
+{
+ static int idx;
+
+ if (!str)
+ return 0;
+
+ strscpy(trace_remote_cmdline + idx, str, COMMAND_LINE_SIZE - idx);
+ idx += strlen(str);
+ trace_remote_cmdline[idx++] = '\t';
+ return 1;
+}
+__setup("trace_remote=", set_trace_remote_cmdline);
+
+static void trace_remote_apply_cmdline_opts(struct trace_remote *remote, char *cmdline)
+{
+ bool printk_on = false;
+ char *opt;
+ int ret;
+
+ while ((opt = strsep(&cmdline, "^"))) {
+ if (!*opt)
+ continue;
+
+ if (!strcmp(opt, "dump_on_oops")) {
+ remote->panic_on = true;
+ } else if (!strcmp(opt, "printk")) {
+ printk_on = true;
+ } else if (!strncmp(opt, "buf_size=", 9)) {
+ /* buf_size can only be applied if the buffer is unloaded */
+ if (!WARN_ON(trace_remote_loaded(remote)))
+ remote->trace_buffer_size = memparse(opt + 9, NULL);
+ } else {
+ pr_warn("Unknown trace remote option '%s'\n", opt);
+ }
+ }
+
+ if (printk_on) {
+ ret = trace_remote_enable_printk(remote, true);
+ if (ret)
+ pr_warn("Failed to enable trace remote printk (%d)\n", ret);
+ }
+}
+
+static struct remote_event *
+trace_remote_find_event_by_name(struct trace_remote *remote, const char *name);
+
+static int
+trace_remote_enable_event(struct trace_remote *remote, struct remote_event *evt, bool enable);
+
+static void trace_remote_apply_cmdline_events(struct trace_remote *remote, char *cmdline)
+{
+ bool tracing_on = false;
+ char *token;
+ int ret;
+
+ while ((token = strsep(&cmdline, ","))) {
+ struct remote_event *evt;
+
+ if (!*token)
+ continue;
+
+ evt = trace_remote_find_event_by_name(remote, token);
+ if (!evt) {
+ pr_warn("trace remote event '%s' not found\n", token);
+ continue;
+ }
+
+ ret = trace_remote_enable_event(remote, evt, true);
+ if (ret)
+ pr_warn("Failed to enable trace remote event '%s' (%d)\n", token, ret);
+ else
+ tracing_on = true;
+ }
+
+ if (tracing_on) {
+ ret = trace_remote_enable_tracing(remote);
+ if (ret)
+ pr_warn("Failed to enable trace remote tracing (%d)\n", ret);
+ }
+}
+
+static void trace_remote_apply_cmdline(const char *name, struct trace_remote *remote)
+{
+ char *cmdline __free(kfree) = NULL;
+ char *events_cmdline = NULL;
+ char *opts_cmdline = NULL;
+ char *curr, *next;
+
+ if (!trace_remote_cmdline[0])
+ return;
+
+ cmdline = kstrdup(trace_remote_cmdline, GFP_KERNEL);
+ if (!cmdline)
+ return;
+
+ next = cmdline;
+ while ((curr = strsep(&next, "\t"))) {
+ char *token = strsep(&curr, ",");
+ char *rname = strsep(&token, "^");
+
+ if (strcmp(rname, name) == 0) {
+ opts_cmdline = token;
+ events_cmdline = curr;
+ break;
+ }
+ }
+
+ guard(mutex)(&remote->lock);
+
+ if (opts_cmdline)
+ trace_remote_apply_cmdline_opts(remote, opts_cmdline);
+ if (events_cmdline)
+ trace_remote_apply_cmdline_events(remote, events_cmdline);
+}
+
static struct dentry *tracefs_root;
static DEFINE_MUTEX(tracefs_lock);
static u64 tracefs_root_count;
@@ -1228,6 +1348,7 @@ int trace_remote_register(const char *name, struct trace_remote_callbacks *cbs,
goto err_unregister_events;
}
+ trace_remote_apply_cmdline(name, remote);
no_free_ptr(remote);
return 0;
@@ -1708,3 +1829,15 @@ static struct remote_event *trace_remote_find_event(struct trace_remote *remote,
return bsearch((const void *)(unsigned long)id, remote->events, remote->nr_events,
sizeof(*remote->events), __cmp_events);
}
+
+static struct remote_event *
+trace_remote_find_event_by_name(struct trace_remote *remote, const char *name)
+{
+ int i;
+
+ for (i = 0; i < remote->nr_events; i++) {
+ if (!strcmp(remote->events[i].name, name))
+ return &remote->events[i];
+ }
+ return NULL;
+}
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 11/13] Documentation: tracing/remotes: Add detailed tracefs layout
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
Add a description for each tracefs file available in a trace remote
instance.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/Documentation/trace/remotes.rst b/Documentation/trace/remotes.rst
index 1f9d764f69aa..bf63b929271e 100644
--- a/Documentation/trace/remotes.rst
+++ b/Documentation/trace/remotes.rst
@@ -19,8 +19,8 @@ for which the host kernel can see and expose to user space.
Register a remote
=================
-A remote must provide a set of callbacks `struct trace_remote_callbacks` whom
-description can be found below. Those callbacks allows Tracefs to enable and
+A remote must provide a set of callbacks `struct trace_remote_callbacks` whose
+description can be found below. Those callbacks allow Tracefs to enable and
disable tracing and events, to load and unload a tracing buffer (a set of
ring-buffers) and to swap a reader page with the head page, which enables
consuming reading.
@@ -28,8 +28,63 @@ consuming reading.
.. kernel-doc:: include/linux/trace_remote.h
Once registered, an instance will appear for this remote in the Tracefs
-directory **remotes/**. Buffers can then be read using the usual Tracefs files
-**trace_pipe** and **trace**.
+directory **remotes/**. The files within this directory allow configuring
+and reading the remote buffer (see `The File System` below).
+
+The File System
+===============
+A remote tracing instance is represented by a directory in Tracefs under
+**remotes/**. The layout and files within it are very similar to standard ftrace
+instances. Inside the remote directory, the following files and directories are
+available:
+
+ tracing_on
+ This file allows enabling or disabling the remote tracing.
+
+ buffer_size_kb
+ This file displays and allows changing the size of the per-CPU ring
+ buffers used by the remote. It also shows if the buffer is **loaded** or
+ **unloaded**. To change the size, the remote buffers must be unloaded
+ first. Remote buffers are automatically unloaded when **tracing_on** is
+ off, no one is reading the buffer (either by accessing **trace_pipe** or
+ when **printk** is on) and no events remain in the buffer.
+
+ trace
+ Display the human-readable content of the remote buffers. Reading this
+ file is non-consuming. Writing to this file clears the ring buffers.
+
+ trace_pipe
+ Similar to **trace** but reading it consumes the events from the ring
+ buffers (consuming read). It blocks if there are no new events.
+
+ printk
+ When enabled, all events from the remote are redirected to the kernel
+ dmesg. This is similar to the **tp_printk** option for in-kernel events.
+ It counts as a reader of the remote buffers and prevents unloading.
+
+ dump_on_oops
+ When enabled, the remote tracing buffer is dumped to the console when a
+ kernel panic occurs.
+
+ per_cpu/
+ This directory contains subdirectories for each possible CPU (e.g.,
+ **cpu0/**, **cpu1/** ...)
+
+ per_cpu/cpuX/trace
+ This is similar to the **trace** file, but it will only display the data
+ specific for the CPU. If written to, it only clears the specific CPU
+ buffer.
+
+ per_cpu/cpuX/trace_pipe
+ This is similar to the **trace_pipe** file, and is a consuming read, but
+ it will only display (and consume) the data specific to the CPU.
+
+ events/
+ This directory contains remote events that can be enabled or disabled.
+
+ events/enable
+ Allows enabling or disabling all the remote events.
+
Declare a remote event
======================
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 10/13] tracing/remotes: selftests: Add a test for the dump_on_oops tracefs file
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
Exercise the newly introduced dump_on_oops tracefs file that turns on
or off the trace remote buffer dump on system panic.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/dump_on_oops.tc b/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/dump_on_oops.tc
new file mode 100644
index 000000000000..6c3f93ab922d
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/dump_on_oops.tc
@@ -0,0 +1,11 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Test hypervisor trace dump_on_oops
+# requires: remotes/hypervisor/write_event
+
+SOURCE_REMOTE_TEST=1
+. $TEST_DIR/remotes/dump_on_oops.tc
+
+set -e
+setup_remote "hypervisor"
+test_dump_on_oops
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/dump_on_oops.tc b/tools/testing/selftests/ftrace/test.d/remotes/dump_on_oops.tc
new file mode 100644
index 000000000000..4f5f26cb3291
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/remotes/dump_on_oops.tc
@@ -0,0 +1,51 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Test trace remote dump_on_oops
+# requires: remotes/test
+
+. $TEST_DIR/remotes/functions
+
+test_dump_on_oops()
+{
+ #
+ # Toggle when the buffer is unloaded
+ #
+ echo 1 > dump_on_oops
+ echo 0 > dump_on_oops
+
+ #
+ # Toggle when the buffer is loaded
+ #
+ echo 1 > tracing_on
+ assert_loaded
+
+ echo 1 > dump_on_oops
+ echo 0 > dump_on_oops
+
+ #
+ # Load and unload buffer while dump_on_oops is enabled
+ #
+ echo 0 > tracing_on
+ assert_unloaded
+
+ echo 1 > dump_on_oops
+ echo 1 > tracing_on
+ echo 0 > tracing_on
+
+ # REMOVE ME FOR A PROPER OOPS TEST
+ return
+
+ echo 1 > tracing_on
+
+ for i in $(seq 1 32); do
+ echo $i > write_event
+ done
+
+ echo c > /proc/sysrq-trigger
+}
+
+if [ -z "$SOURCE_REMOTE_TEST" ]; then
+ set -e
+ setup_remote_test
+ test_dump_on_oops
+fi
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/functions b/tools/testing/selftests/ftrace/test.d/remotes/functions
index 8dd9c961977b..887b9296d0a7 100644
--- a/tools/testing/selftests/ftrace/test.d/remotes/functions
+++ b/tools/testing/selftests/ftrace/test.d/remotes/functions
@@ -9,6 +9,7 @@ setup_remote()
cd remotes/$name/
echo 0 > tracing_on
echo 0 > printk
+ echo 0 > dump_on_oops
clear_trace
echo 7 > buffer_size_kb
echo 0 > events/enable
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 09/13] tracing/remotes: Add dump_on_oops tracefs file
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
When enabled, dump_on_oops will dump the content of the trace remote
buffer if the system panics.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c
index e708dcd7d258..3164f327d4d8 100644
--- a/kernel/trace/trace_remote.c
+++ b/kernel/trace/trace_remote.c
@@ -7,6 +7,7 @@
#include <linux/kstrtox.h>
#include <linux/lockdep.h>
#include <linux/mutex.h>
+#include <linux/panic_notifier.h>
#include <linux/tracefs.h>
#include <linux/trace_remote.h>
#include <linux/trace_seq.h>
@@ -22,6 +23,7 @@ enum tri_type {
TRI_CONSUMING,
TRI_NONCONSUMING,
TRI_PRINTK,
+ TRI_PANIC,
};
struct trace_remote_iterator {
@@ -58,6 +60,9 @@ struct trace_remote {
unsigned int poll_ms;
struct delayed_work poll_work;
unsigned int poll_cnt;
+ struct notifier_block panic_notifier;
+ struct trace_remote_iterator *panic_iter;
+ bool panic_on;
bool tracing_on;
};
@@ -66,10 +71,15 @@ static bool trace_remote_loaded(struct trace_remote *remote)
return !!remote->trace_buffer;
}
+static void trace_remote_unload(struct trace_remote *remote);
+static int trace_remote_panic_load(struct trace_remote *remote);
+static void trace_remote_panic_unload(struct trace_remote *remote);
+
static int trace_remote_load(struct trace_remote *remote)
{
struct ring_buffer_remote *rb_remote = &remote->rb_remote;
struct trace_buffer_desc *desc;
+ int ret = 0;
lockdep_assert_held(&remote->lock);
@@ -84,15 +94,28 @@ static int trace_remote_load(struct trace_remote *remote)
rb_remote->swap_reader_page = remote->cbs->swap_reader_page;
rb_remote->priv = remote->priv;
rb_remote->reset = remote->cbs->reset;
+ remote->trace_buffer_desc = desc;
remote->trace_buffer = ring_buffer_alloc_remote(rb_remote);
if (!remote->trace_buffer) {
remote->cbs->unload_trace_buffer(desc, remote->priv);
return -ENOMEM;
}
- remote->trace_buffer_desc = desc;
+ if (remote->panic_on) {
+ ret = trace_remote_panic_load(remote);
+ if (ret)
+ trace_remote_unload(remote);
+ }
- return 0;
+ return ret;
+}
+
+static void trace_remote_unload(struct trace_remote *remote)
+{
+ trace_remote_panic_unload(remote);
+ ring_buffer_free(remote->trace_buffer);
+ remote->trace_buffer = NULL;
+ remote->cbs->unload_trace_buffer(remote->trace_buffer_desc, remote->priv);
}
static void trace_remote_try_unload(struct trace_remote *remote)
@@ -110,9 +133,7 @@ static void trace_remote_try_unload(struct trace_remote *remote)
if (!ring_buffer_empty(remote->trace_buffer))
return;
- ring_buffer_free(remote->trace_buffer);
- remote->trace_buffer = NULL;
- remote->cbs->unload_trace_buffer(remote->trace_buffer_desc, remote->priv);
+ trace_remote_unload(remote);
}
static int trace_remote_enable_tracing(struct trace_remote *remote)
@@ -375,58 +396,68 @@ static void trace_remote_dec_poll(struct trace_remote *remote)
static struct trace_remote_iterator
*trace_remote_iter(struct trace_remote *remote, int cpu, enum tri_type type)
{
- struct trace_remote_iterator *iter = NULL;
+ struct trace_remote_iterator *iter __free(kfree) = kzalloc_obj(*iter);
int ret;
lockdep_assert_held(&remote->lock);
- if (type == TRI_NONCONSUMING && !trace_remote_loaded(remote))
- return NULL;
+ if (!iter)
+ return ERR_PTR(-ENOMEM);
- ret = trace_remote_get(remote, cpu);
- if (ret)
- return ERR_PTR(ret);
+ switch (type) {
+ case TRI_NONCONSUMING:
+ if (!trace_remote_loaded(remote))
+ return NULL;
+ fallthrough;
+ case TRI_CONSUMING:
+ case TRI_PRINTK:
+ ret = trace_remote_get(remote, cpu);
+ if (ret)
+ return ERR_PTR(ret);
+ break;
+ case TRI_PANIC:
+ break;
+ }
if (!trace_remote_has_cpu(remote, cpu)) {
ret = -ENODEV;
goto err;
}
- iter = kzalloc_obj(*iter);
- if (iter) {
- iter->remote = remote;
- iter->cpu = cpu;
- iter->type = type;
- trace_seq_init(&iter->seq);
+ iter->remote = remote;
+ iter->cpu = cpu;
+ iter->type = type;
+ trace_seq_init(&iter->seq);
- switch (type) {
- case TRI_PRINTK:
- /* only one printk iter allowed */
- if (WARN_ON_ONCE(remote->printk)) {
- ret = -EBUSY;
- break;
- }
- smp_store_release(&remote->printk, iter);
- fallthrough;
- case TRI_CONSUMING:
- trace_remote_inc_poll(remote);
- break;
- case TRI_NONCONSUMING:
- ret = __alloc_ring_buffer_iter(iter, cpu);
- break;
+ switch (type) {
+ case TRI_PRINTK:
+ /* only one printk iter allowed */
+ if (WARN_ON_ONCE(remote->printk)) {
+ ret = -EBUSY;
+ goto err;
}
-
+ smp_store_release(&remote->printk, iter);
+ fallthrough;
+ case TRI_CONSUMING:
+ trace_remote_inc_poll(remote);
+ break;
+ case TRI_PANIC:
+ case TRI_NONCONSUMING:
+ ret = __alloc_ring_buffer_iter(iter, cpu);
if (ret)
goto err;
-
- return iter;
+ break;
}
- ret = -ENOMEM;
-err:
- kfree(iter);
- trace_remote_put(remote);
+ return no_free_ptr(iter);
+err:
+ switch (type) {
+ case TRI_PANIC:
+ break;
+ default:
+ trace_remote_put(remote);
+ }
return ERR_PTR(ret);
}
@@ -449,14 +480,18 @@ static void trace_remote_iter_free(struct trace_remote_iterator *iter)
fallthrough;
case TRI_CONSUMING:
trace_remote_dec_poll(remote);
+ trace_remote_put(remote);
break;
case TRI_NONCONSUMING:
+ trace_remote_put(remote);
+ __free_ring_buffer_iter(iter, iter->cpu);
+ break;
+ case TRI_PANIC:
__free_ring_buffer_iter(iter, iter->cpu);
break;
}
kfree(iter);
- trace_remote_put(remote);
}
static void trace_remote_iter_read_start(struct trace_remote_iterator *iter)
@@ -527,6 +562,7 @@ __peek_event(struct trace_remote_iterator *iter, int cpu, u64 *ts, unsigned long
case TRI_PRINTK:
case TRI_CONSUMING:
return ring_buffer_peek(iter->remote->trace_buffer, cpu, ts, lost_events);
+ case TRI_PANIC:
case TRI_NONCONSUMING:
rb_iter = __get_rb_iter(iter, cpu);
if (!rb_iter)
@@ -596,6 +632,7 @@ static void trace_remote_iter_move(struct trace_remote_iterator *iter)
case TRI_CONSUMING:
ring_buffer_consume(trace_buffer, iter->evt_cpu, NULL, NULL);
break;
+ case TRI_PANIC:
case TRI_NONCONSUMING:
ring_buffer_iter_advance(__get_rb_iter(iter, iter->evt_cpu));
break;
@@ -910,6 +947,116 @@ static int printk_show(struct seq_file *s, void *unused)
}
DEFINE_SHOW_STORE_ATTRIBUTE(printk);
+static int trace_remote_panic_handler(struct notifier_block *self, unsigned long ev, void *v)
+{
+ struct trace_remote *remote = container_of(self, struct trace_remote, panic_notifier);
+ struct trace_remote_iterator *iter = smp_load_acquire(&remote->panic_iter);
+ int cpu;
+
+ if (!iter) {
+ pr_warn("Unexpected error: no panic iterator for the trace remote\n");
+ return NOTIFY_DONE;
+ }
+
+ for_each_possible_cpu(cpu) {
+ if (iter->rb_iters[cpu]) {
+ /* No RING_BUFFER_ALL_CPUS to avoid taking cpu_read_lock() */
+ ring_buffer_read_remote_meta_page(remote->trace_buffer, cpu);
+ ring_buffer_iter_reset(iter->rb_iters[cpu]);
+ }
+ }
+
+ while (trace_remote_iter_read_event(iter)) {
+ trace_seq_init(&iter->seq);
+
+ trace_remote_iter_print_event(iter);
+ pr_emerg("%s", iter->seq.buffer);
+
+ trace_remote_iter_move(iter);
+ }
+
+ return NOTIFY_DONE;
+}
+
+static int trace_remote_panic_load(struct trace_remote *remote)
+{
+ struct notifier_block *notifier = &remote->panic_notifier;
+ struct trace_remote_iterator *iter;
+
+ lockdep_assert_held(&remote->lock);
+
+ if (remote->panic_iter)
+ return 0;
+
+ iter = trace_remote_iter(remote, RING_BUFFER_ALL_CPUS, TRI_PANIC);
+ if (IS_ERR(iter))
+ return PTR_ERR(iter);
+
+ smp_store_release(&remote->panic_iter, iter);
+
+ notifier->notifier_call = trace_remote_panic_handler;
+ notifier->priority = INT_MAX - 1;
+ atomic_notifier_chain_register(&panic_notifier_list, notifier);
+
+ return 0;
+}
+
+static void trace_remote_panic_unload(struct trace_remote *remote)
+{
+ struct trace_remote_iterator *iter = remote->panic_iter;
+
+ lockdep_assert_held(&remote->lock);
+
+ if (!iter)
+ return;
+
+ atomic_notifier_chain_unregister(&panic_notifier_list, &remote->panic_notifier);
+ smp_store_release(&remote->panic_iter, NULL);
+ trace_remote_iter_free(iter);
+}
+
+static ssize_t dump_on_oops_write(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ struct seq_file *seq = filp->private_data;
+ struct trace_remote *remote = seq->private;
+ bool enable;
+ int ret;
+
+ ret = kstrtobool_from_user(ubuf, cnt, &enable);
+ if (ret)
+ return ret;
+
+ guard(mutex)(&remote->lock);
+
+ if (enable == remote->panic_on)
+ return cnt;
+
+ if (trace_remote_loaded(remote)) {
+ if (enable) {
+ ret = trace_remote_panic_load(remote);
+ if (ret)
+ return ret;
+ } else {
+ trace_remote_panic_unload(remote);
+ }
+ }
+
+ remote->panic_on = enable;
+
+ return cnt;
+}
+
+static int dump_on_oops_show(struct seq_file *s, void *unused)
+{
+ struct trace_remote *remote = s->private;
+
+ seq_printf(s, "%d\n", remote->panic_on);
+
+ return 0;
+}
+DEFINE_SHOW_STORE_ATTRIBUTE(dump_on_oops);
+
static struct dentry *tracefs_root;
static DEFINE_MUTEX(tracefs_lock);
static u64 tracefs_root_count;
@@ -941,6 +1088,11 @@ static int trace_remote_init_tracefs(const char *name, struct trace_remote *remo
if (!d)
goto err;
+ d = trace_create_file("dump_on_oops", TRACEFS_MODE_WRITE, remote_d, remote,
+ &dump_on_oops_fops);
+ if (!d)
+ goto err;
+
d = trace_create_file("buffer_size_kb", TRACEFS_MODE_WRITE, remote_d, remote,
&buffer_size_kb_fops);
if (!d)
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 08/13] ring-buffer: Add ring_buffer_read_remote_meta_page()
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
In preparation for the introduction of a panic handler for trace
remotes, add a ring_buffer_read_remote_meta_page(). This is basically
similar to ring_buffer_poll_remote, but it doesn't try to wake-up
readers and, in the !RING_BUFFER_ALL_CPUS case, uses panic-friendly
locks.
While at it, update trace_remote_has_cpu() to use this new function
instead of ring_buffer_poll_remote(), avoiding unnecessary wakeups when
verifying if a CPU buffer is active.
Finally, the distracted engineer who wrote that
ring_buffer_poll_remote() forgot to document it. Add a kerneldoc for
that function too.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 994f52b34344..6e008a548063 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -298,6 +298,7 @@ struct ring_buffer_remote {
void *priv;
};
+int ring_buffer_read_remote_meta_page(struct trace_buffer *buffer, int cpu);
int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu);
struct trace_buffer *
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7b07d2004cc6..88ac346c65ec 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6623,6 +6623,59 @@ bool ring_buffer_empty_cpu(struct trace_buffer *buffer, int cpu)
}
EXPORT_SYMBOL_GPL(ring_buffer_empty_cpu);
+/**
+ * ring_buffer_read_remote_meta_page - read the meta page of a remote ring buffer
+ * @buffer: The ring buffer
+ * @cpu: The CPU buffer to read (or RING_BUFFER_ALL_CPUS)
+ *
+ * Returns:
+ * 0 on success, or -EINVAL if the CPU is not in the buffer's cpumask.
+ */
+int ring_buffer_read_remote_meta_page(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+
+ if (cpu != RING_BUFFER_ALL_CPUS) {
+ unsigned long flags;
+ bool dolock;
+
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return -EINVAL;
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ local_irq_save(flags);
+ dolock = rb_reader_lock(cpu_buffer);
+ rb_read_remote_meta_page(cpu_buffer);
+ rb_reader_unlock(cpu_buffer, dolock);
+ local_irq_restore(flags);
+ return 0;
+ }
+
+ guard(cpus_read_lock)();
+
+ for_each_buffer_cpu(buffer, cpu) {
+ cpu_buffer = buffer->buffers[cpu];
+
+ guard(raw_spinlock)(&cpu_buffer->reader_lock);
+ rb_read_remote_meta_page(cpu_buffer);
+ }
+
+ return 0;
+}
+
+/**
+ * ring_buffer_poll_remote - poll a remote ring buffer for new data
+ * @buffer: The ring buffer
+ * @cpu: The CPU buffer to poll (or RING_BUFFER_ALL_CPUS)
+ *
+ * This function polls the specified remote CPU buffer (or all of them)
+ * by reading its meta page to update the local reader's view. If new
+ * entries are detected, it triggers wakeups for any waiting readers.
+ *
+ * Returns:
+ * 0 on success, or -EINVAL if the CPU is not in the buffer's cpumask.
+ */
int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu)
{
struct ring_buffer_per_cpu *cpu_buffer;
diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c
index 1bf0ba159c92..e708dcd7d258 100644
--- a/kernel/trace/trace_remote.c
+++ b/kernel/trace/trace_remote.c
@@ -291,7 +291,7 @@ static bool trace_remote_has_cpu(struct trace_remote *remote, int cpu)
if (cpu == RING_BUFFER_ALL_CPUS)
return true;
- return ring_buffer_poll_remote(remote->trace_buffer, cpu) == 0;
+ return ring_buffer_read_remote_meta_page(remote->trace_buffer, cpu) == 0;
}
static void __free_ring_buffer_iter(struct trace_remote_iterator *iter, int cpu)
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 07/13] tracing/remotes: selftests: Prefix hypervisor folder
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
Avoid interleaving run tests by prefixing the hypervisor folder.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/buffer_size.tc b/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/buffer_size.tc
similarity index 100%
rename from tools/testing/selftests/ftrace/test.d/remotes/hypervisor/buffer_size.tc
rename to tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/buffer_size.tc
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/hotplug.tc b/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/hotplug.tc
similarity index 100%
rename from tools/testing/selftests/ftrace/test.d/remotes/hypervisor/hotplug.tc
rename to tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/hotplug.tc
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/printk.tc b/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/printk.tc
similarity index 100%
rename from tools/testing/selftests/ftrace/test.d/remotes/hypervisor/printk.tc
rename to tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/printk.tc
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/reset.tc b/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/reset.tc
similarity index 100%
rename from tools/testing/selftests/ftrace/test.d/remotes/hypervisor/reset.tc
rename to tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/reset.tc
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace.tc b/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/trace.tc
similarity index 100%
rename from tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace.tc
rename to tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/trace.tc
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace_pipe.tc b/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/trace_pipe.tc
similarity index 100%
rename from tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace_pipe.tc
rename to tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/trace_pipe.tc
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/unloading.tc b/tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/unloading.tc
similarity index 100%
rename from tools/testing/selftests/ftrace/test.d/remotes/hypervisor/unloading.tc
rename to tools/testing/selftests/ftrace/test.d/remotes/00hypervisor/unloading.tc
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply
* [PATCH 06/13] tracing/remotes: selftests: Add a test for the printk tracefs file
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
Exercise the newly introduced printk tracefs file that turns on and off
the dmesg redirection.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/functions b/tools/testing/selftests/ftrace/test.d/remotes/functions
index 05224fac3653..8dd9c961977b 100644
--- a/tools/testing/selftests/ftrace/test.d/remotes/functions
+++ b/tools/testing/selftests/ftrace/test.d/remotes/functions
@@ -8,6 +8,7 @@ setup_remote()
cd remotes/$name/
echo 0 > tracing_on
+ echo 0 > printk
clear_trace
echo 7 > buffer_size_kb
echo 0 > events/enable
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/printk.tc b/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/printk.tc
new file mode 100644
index 000000000000..aca7a2bfe293
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/remotes/hypervisor/printk.tc
@@ -0,0 +1,11 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Test trace remote printk, the dmesg redirection
+# requires: remotes/hypervisor/write_event
+
+SOURCE_REMOTE_TEST=1
+. $TEST_DIR/remotes/printk.tc
+
+set -e
+setup_remote "hypervisor"
+test_printk
diff --git a/tools/testing/selftests/ftrace/test.d/remotes/printk.tc b/tools/testing/selftests/ftrace/test.d/remotes/printk.tc
new file mode 100644
index 000000000000..80eaf13e240d
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/remotes/printk.tc
@@ -0,0 +1,72 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Test trace remote printk, the dmesg redirection
+# requires: remotes/test
+
+. $TEST_DIR/remotes/functions
+
+test_printk()
+{
+ echo 0 > tracing_on
+ assert_unloaded
+
+ #
+ # Test printk on/off when tracing is disabled
+ #
+ echo 1 > printk
+ test $(cat printk) -eq 1
+ assert_loaded
+
+ echo 0 > printk
+ test $(cat printk) -eq 0
+ assert_unloaded
+
+ #
+ # Test events are logged to dmesg
+ #
+ dmesg -c > /dev/null
+
+ echo 1 > tracing_on
+ assert_loaded
+ echo 1 > printk
+ test $(cat printk) -eq 1
+
+ nr_events=128
+ for i in $(seq 1 $nr_events); do
+ echo $i > write_event
+ done
+
+ sleep 1
+ output=$(mktemp $TMPDIR/remote_test.XXXXXX)
+ dmesg | grep "selftest id=" | sed 's/^[^]]*] //'> $output
+
+ check_trace 1 $nr_events $output
+
+ rm $output
+
+ #
+ # Disable printk and Test events were not consumed by dmesg
+ #
+ echo 0 > printk
+ test $(cat printk) -eq 0
+
+ start_id=$(($nr_events + 1))
+ end_id=$(($start_id + $nr_events))
+
+ for i in $(seq $start_id $end_id); do
+ echo $i > write_event
+ done
+
+ sleep 1
+
+ output=$(dump_trace_pipe)
+ check_trace $start_id $end_id $output
+ rm $output
+}
+
+if [ -z "$SOURCE_REMOTE_TEST" ]; then
+ set -e
+
+ setup_remote_test
+ test_printk
+fi
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 05/13] tracing/remotes: Add printk tracefs file
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
When enabled, the printk tracefs file enables the redirection of all
events to dmesg. This is similar to tp_printk.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c
index 21583fae1bd9..1bf0ba159c92 100644
--- a/kernel/trace/trace_remote.c
+++ b/kernel/trace/trace_remote.c
@@ -21,6 +21,7 @@
enum tri_type {
TRI_CONSUMING,
TRI_NONCONSUMING,
+ TRI_PRINTK,
};
struct trace_remote_iterator {
@@ -42,6 +43,7 @@ struct trace_remote {
void *priv;
struct trace_buffer *trace_buffer;
struct trace_buffer_desc *trace_buffer_desc;
+ struct trace_remote_iterator *printk;
struct dentry *dentry;
struct eventfs_inode *eventfs_root;
struct eventfs_inode *eventfs_subdir;
@@ -335,6 +337,8 @@ static int __alloc_ring_buffer_iter(struct trace_remote_iterator *iter, int cpu)
return 0;
}
+static void trace_remote_do_printk(struct trace_remote *remote);
+
static void __poll_remote(struct work_struct *work)
{
struct delayed_work *dwork = to_delayed_work(work);
@@ -342,6 +346,7 @@ static void __poll_remote(struct work_struct *work)
remote = container_of(dwork, struct trace_remote, poll_work);
ring_buffer_poll_remote(remote->trace_buffer, RING_BUFFER_ALL_CPUS);
+ trace_remote_do_printk(remote);
schedule_delayed_work(dwork, msecs_to_jiffies(remote->poll_ms));
}
@@ -351,6 +356,8 @@ static void trace_remote_inc_poll(struct trace_remote *remote)
/* poll_cnt <= nr_readers, inherits its overflow protection */
if (!remote->poll_cnt++) {
ring_buffer_poll_remote(remote->trace_buffer, RING_BUFFER_ALL_CPUS);
+ trace_remote_do_printk(remote);
+
schedule_delayed_work(&remote->poll_work, msecs_to_jiffies(remote->poll_ms));
}
}
@@ -393,6 +400,14 @@ static struct trace_remote_iterator
trace_seq_init(&iter->seq);
switch (type) {
+ case TRI_PRINTK:
+ /* only one printk iter allowed */
+ if (WARN_ON_ONCE(remote->printk)) {
+ ret = -EBUSY;
+ break;
+ }
+ smp_store_release(&remote->printk, iter);
+ fallthrough;
case TRI_CONSUMING:
trace_remote_inc_poll(remote);
break;
@@ -427,6 +442,11 @@ static void trace_remote_iter_free(struct trace_remote_iterator *iter)
lockdep_assert_held(&remote->lock);
switch (iter->type) {
+ case TRI_PRINTK:
+ WARN_ON_ONCE(remote->printk != iter);
+ smp_store_release(&remote->printk, NULL);
+ flush_delayed_work(&remote->poll_work);
+ fallthrough;
case TRI_CONSUMING:
trace_remote_dec_poll(remote);
break;
@@ -504,6 +524,7 @@ __peek_event(struct trace_remote_iterator *iter, int cpu, u64 *ts, unsigned long
struct ring_buffer_iter *rb_iter;
switch (iter->type) {
+ case TRI_PRINTK:
case TRI_CONSUMING:
return ring_buffer_peek(iter->remote->trace_buffer, cpu, ts, lost_events);
case TRI_NONCONSUMING:
@@ -571,6 +592,7 @@ static void trace_remote_iter_move(struct trace_remote_iterator *iter)
struct trace_buffer *trace_buffer = iter->remote->trace_buffer;
switch (iter->type) {
+ case TRI_PRINTK:
case TRI_CONSUMING:
ring_buffer_consume(trace_buffer, iter->evt_cpu, NULL, NULL);
break;
@@ -814,6 +836,80 @@ static const struct file_operations trace_fops = {
.release = trace_release,
};
+static void trace_remote_do_printk(struct trace_remote *remote)
+{
+ struct trace_remote_iterator *iter = smp_load_acquire(&remote->printk);
+
+ if (!iter)
+ return;
+
+ trace_remote_iter_read_start(iter);
+
+ while (trace_remote_iter_read_event(iter)) {
+ trace_seq_init(&iter->seq);
+
+ trace_remote_iter_print_event(iter);
+ if (!pr_emerg("%s", iter->seq.buffer))
+ break;
+
+ trace_remote_iter_move(iter);
+ }
+
+ trace_remote_iter_read_finished(iter);
+}
+
+static int trace_remote_enable_printk(struct trace_remote *remote, bool enable)
+{
+ struct trace_remote_iterator *iter = remote->printk;
+
+ lockdep_assert_held(&remote->lock);
+
+ if (enable == !!iter)
+ return 0;
+
+ if (enable) {
+ iter = trace_remote_iter(remote, RING_BUFFER_ALL_CPUS, TRI_PRINTK);
+ if (IS_ERR(iter))
+ return PTR_ERR(iter);
+ } else {
+ trace_remote_iter_free(remote->printk);
+ /* trace_remote_iter_free has reset remote->printk */
+ }
+
+ return 0;
+}
+
+static ssize_t
+printk_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *ppos)
+{
+ struct seq_file *seq = filp->private_data;
+ struct trace_remote *remote = seq->private;
+ bool val;
+ int ret;
+
+ ret = kstrtobool_from_user(ubuf, cnt, &val);
+ if (ret)
+ return ret;
+
+ guard(mutex)(&remote->lock);
+
+ ret = trace_remote_enable_printk(remote, val);
+ if (ret)
+ return ret;
+
+ return cnt;
+}
+
+static int printk_show(struct seq_file *s, void *unused)
+{
+ struct trace_remote *remote = s->private;
+
+ seq_printf(s, "%d\n", !!remote->printk);
+
+ return 0;
+}
+DEFINE_SHOW_STORE_ATTRIBUTE(printk);
+
static struct dentry *tracefs_root;
static DEFINE_MUTEX(tracefs_lock);
static u64 tracefs_root_count;
@@ -858,6 +954,10 @@ static int trace_remote_init_tracefs(const char *name, struct trace_remote *remo
if (!d)
goto err;
+ d = trace_create_file("printk", TRACEFS_MODE_WRITE, remote_d, remote, &printk_fops);
+ if (!d)
+ goto err;
+
percpu_d = tracefs_create_dir("per_cpu", remote_d);
if (!percpu_d) {
pr_err("Failed to create tracefs dir "TRACEFS_DIR"%s/per_cpu/\n", name);
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 04/13] tracing/simple_ring_buffer: Add support for compressed length
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
The array length is the total size in bytes of the data for the current
event. It is possible to compress this value into the event header type,
which has 28 unused types, saving 32 bits for sufficiently small events.
The compressed length is expressed as a multiple of the ring-buffer
alignment, 4-bytes by default. Enforces this alignment.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/kernel/trace/simple_ring_buffer.c b/kernel/trace/simple_ring_buffer.c
index f4642f5adda3..6be4aa19adca 100644
--- a/kernel/trace/simple_ring_buffer.c
+++ b/kernel/trace/simple_ring_buffer.c
@@ -207,7 +207,12 @@ static unsigned long rb_event_size(unsigned long length)
{
struct ring_buffer_event *event;
- return length + RB_EVNT_HDR_SIZE + sizeof(event->array[0]);
+ length = ALIGN(length, RB_ALIGNMENT);
+
+ if (length > RB_MAX_SMALL_DATA)
+ length += sizeof(event->array[0]);
+
+ return length + RB_EVNT_HDR_SIZE;
}
static struct ring_buffer_event *
@@ -223,12 +228,15 @@ rb_event_add_ts_extend(struct ring_buffer_event *event, u64 delta)
static struct ring_buffer_event *
simple_rb_reserve_next(struct simple_rb_per_cpu *cpu_buffer, unsigned long length, u64 timestamp)
{
- unsigned long ts_ext_size = 0, event_size = rb_event_size(length);
struct simple_buffer_page *tail = cpu_buffer->tail_page;
+ unsigned long event_size, array_size, ts_ext_size = 0;
struct ring_buffer_event *event;
u32 write, prev_write;
u64 time_delta;
+ event_size = rb_event_size(length);
+ array_size = event_size - RB_EVNT_HDR_SIZE;
+
time_delta = timestamp - cpu_buffer->write_stamp;
if (test_time_stamp(time_delta))
@@ -259,9 +267,13 @@ simple_rb_reserve_next(struct simple_rb_per_cpu *cpu_buffer, unsigned long lengt
time_delta = 0;
}
- event->type_len = 0;
+ if (length > RB_MAX_SMALL_DATA) {
+ event->type_len = 0;
+ event->array[0] = array_size;
+ } else {
+ event->type_len = DIV_ROUND_UP(array_size, RB_ALIGNMENT);
+ }
event->time_delta = time_delta;
- event->array[0] = event_size - RB_EVNT_HDR_SIZE;
return event;
}
@@ -284,7 +296,7 @@ void *simple_ring_buffer_reserve(struct simple_rb_per_cpu *cpu_buffer, unsigned
rb_event = simple_rb_reserve_next(cpu_buffer, length, timestamp);
- return &rb_event->array[1];
+ return rb_event->type_len ? &rb_event->array[0] : &rb_event->array[1];
}
EXPORT_SYMBOL_GPL(simple_ring_buffer_reserve);
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 03/13] tracing/remotes: Use a single per-remote polling work
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
Having a per-iterator polling work is wasteful when logging several
trace_remote per_cpu/trace_pipe files in parallel. This result in one
work running per-CPU, where only one would suffice.
Transition to a single per-remote polling work, scheduled on the first
consumer creation and stopped when the last consuming iterator is freed.
This blanket polls all CPUs, regardless of which ones are actually being
read. This is acceptable because the poll consists of reading the
meta-page, which is a fast operation. Also, it is more common to log all
CPUs in the system than only one, so this use-case should be favoured.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c
index 7fed18f28fa7..21583fae1bd9 100644
--- a/kernel/trace/trace_remote.c
+++ b/kernel/trace/trace_remote.c
@@ -26,7 +26,6 @@ enum tri_type {
struct trace_remote_iterator {
struct trace_remote *remote;
struct trace_seq seq;
- struct delayed_work poll_work;
unsigned long lost_events;
u64 ts;
struct ring_buffer_iter *rb_iter;
@@ -55,6 +54,8 @@ struct trace_remote {
struct rw_semaphore *pcpu_reader_locks;
unsigned int nr_readers;
unsigned int poll_ms;
+ struct delayed_work poll_work;
+ unsigned int poll_cnt;
bool tracing_on;
};
@@ -291,17 +292,6 @@ static bool trace_remote_has_cpu(struct trace_remote *remote, int cpu)
return ring_buffer_poll_remote(remote->trace_buffer, cpu) == 0;
}
-static void __poll_remote(struct work_struct *work)
-{
- struct delayed_work *dwork = to_delayed_work(work);
- struct trace_remote_iterator *iter;
-
- iter = container_of(dwork, struct trace_remote_iterator, poll_work);
- ring_buffer_poll_remote(iter->remote->trace_buffer, iter->cpu);
- schedule_delayed_work((struct delayed_work *)work,
- msecs_to_jiffies(iter->remote->poll_ms));
-}
-
static void __free_ring_buffer_iter(struct trace_remote_iterator *iter, int cpu)
{
if (cpu != RING_BUFFER_ALL_CPUS) {
@@ -345,6 +335,36 @@ static int __alloc_ring_buffer_iter(struct trace_remote_iterator *iter, int cpu)
return 0;
}
+static void __poll_remote(struct work_struct *work)
+{
+ struct delayed_work *dwork = to_delayed_work(work);
+ struct trace_remote *remote;
+
+ remote = container_of(dwork, struct trace_remote, poll_work);
+ ring_buffer_poll_remote(remote->trace_buffer, RING_BUFFER_ALL_CPUS);
+
+ schedule_delayed_work(dwork, msecs_to_jiffies(remote->poll_ms));
+}
+
+static void trace_remote_inc_poll(struct trace_remote *remote)
+{
+ /* poll_cnt <= nr_readers, inherits its overflow protection */
+ if (!remote->poll_cnt++) {
+ ring_buffer_poll_remote(remote->trace_buffer, RING_BUFFER_ALL_CPUS);
+ schedule_delayed_work(&remote->poll_work, msecs_to_jiffies(remote->poll_ms));
+ }
+}
+
+static void trace_remote_dec_poll(struct trace_remote *remote)
+{
+ if (WARN_ON_ONCE(!remote->poll_cnt))
+ return;
+
+ remote->poll_cnt--;
+ if (!remote->poll_cnt)
+ cancel_delayed_work_sync(&remote->poll_work);
+}
+
static struct trace_remote_iterator
*trace_remote_iter(struct trace_remote *remote, int cpu, enum tri_type type)
{
@@ -374,9 +394,7 @@ static struct trace_remote_iterator
switch (type) {
case TRI_CONSUMING:
- ring_buffer_poll_remote(remote->trace_buffer, cpu);
- INIT_DELAYED_WORK(&iter->poll_work, __poll_remote);
- schedule_delayed_work(&iter->poll_work, msecs_to_jiffies(remote->poll_ms));
+ trace_remote_inc_poll(remote);
break;
case TRI_NONCONSUMING:
ret = __alloc_ring_buffer_iter(iter, cpu);
@@ -410,7 +428,7 @@ static void trace_remote_iter_free(struct trace_remote_iterator *iter)
switch (iter->type) {
case TRI_CONSUMING:
- cancel_delayed_work_sync(&iter->poll_work);
+ trace_remote_dec_poll(remote);
break;
case TRI_NONCONSUMING:
__free_ring_buffer_iter(iter, iter->cpu);
@@ -939,6 +957,7 @@ int trace_remote_register(const char *name, struct trace_remote_callbacks *cbs,
remote->poll_ms = 100;
mutex_init(&remote->lock);
init_rwsem(&remote->reader_lock);
+ INIT_DELAYED_WORK(&remote->poll_work, __poll_remote);
ret = trace_remote_init_tracefs(name, remote);
if (ret)
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
* [PATCH 02/13] tracing/remotes: Use kstrtobool for boolean tracefs files
From: Vincent Donnefort @ 2026-06-02 17:11 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel
Cc: kernel-team, linux-kernel, Vincent Donnefort
In-Reply-To: <20260602171146.2238998-1-vdonnefort@google.com>
Use kstrtobool in trace_remote.c where possible. This is more user-friendly
as it allows a better variety of input strings.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c
index cfe84e9b8fe6..7fed18f28fa7 100644
--- a/kernel/trace/trace_remote.c
+++ b/kernel/trace/trace_remote.c
@@ -176,10 +176,10 @@ tracing_on_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t
{
struct seq_file *seq = filp->private_data;
struct trace_remote *remote = seq->private;
- unsigned long val;
+ bool val;
int ret;
- ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+ ret = kstrtobool_from_user(ubuf, cnt, &val);
if (ret)
return ret;
@@ -1090,10 +1090,10 @@ static ssize_t remote_event_enable_write(struct file *filp, const char __user *u
struct seq_file *seq = filp->private_data;
struct remote_event *evt = seq->private;
struct trace_remote *remote = evt->remote;
- u8 enable;
+ bool enable;
int ret;
- ret = kstrtou8_from_user(ubuf, count, 10, &enable);
+ ret = kstrtobool_from_user(ubuf, count, &enable);
if (ret)
return ret;
@@ -1174,10 +1174,10 @@ static ssize_t remote_events_dir_enable_write(struct file *filp, const char __us
size_t count, loff_t *ppos)
{
struct trace_remote *remote = file_inode(filp)->i_private;
+ bool enable;
int i, ret;
- u8 enable;
- ret = kstrtou8_from_user(ubuf, count, 10, &enable);
+ ret = kstrtobool_from_user(ubuf, count, &enable);
if (ret)
return ret;
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox