* [PATCH] Documentation: tracing: fix typo in events documentation
From: Yudistira Putra @ 2026-06-22 14:37 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
linux-trace-kernel, linux-doc, linux-kernel, Yudistira Putra
Fix a typo in the tracing events documentation: "can by built up"
should be "can be built up".
Signed-off-by: Yudistira Putra <pyudistira519@gmail.com>
---
Documentation/trace/events.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index 18d112963dec..581f2260614b 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -1064,7 +1064,7 @@ correct command type, and a pointer to an event-specific run_command()
callback that will be called to actually execute the event-specific
command function.
-Once that's done, the command string can by built up by successive
+Once that's done, the command string can be built up by successive
calls to argument-adding functions.
To add a single argument, define and initialize a struct dynevent_arg
--
2.43.0
^ permalink raw reply related
* [RFC PATCH v1.3 06/18] mm/damon/core: use damon_nr_accesses_mvsum() for damos region tracing
From: SeongJae Park @ 2026-06-22 14:21 UTC (permalink / raw)
Cc: SeongJae Park, Andrew Morton, Masami Hiramatsu, Mathieu Desnoyers,
Steven Rostedt, damon, linux-kernel, linux-mm, linux-trace-kernel
In-Reply-To: <20260622142139.30269-1-sj@kernel.org>
damon_nr_accesses_mvsum() returns a value same to nr_accesses_bp. Also
the function is more simple and therefore more tolerant to errors.
Execution of the function would be more expensive than the simple read
of the field, but because the function is quite simple, the overhead
should be negligible. Use it in the DAMON region exporting trace points
instead of the nr_accesses_bp.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
include/trace/events/damon.h | 8 +++++---
mm/damon/core.c | 5 +++--
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 78388538acf44..8851727ae1627 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -78,9 +78,11 @@ TRACE_EVENT_CONDITION(damos_before_apply,
TP_PROTO(unsigned int context_idx, unsigned int scheme_idx,
unsigned int target_idx, struct damon_region *r,
- unsigned int nr_regions, bool do_trace),
+ unsigned int nr_accesses, unsigned int nr_regions,
+ bool do_trace),
- TP_ARGS(context_idx, scheme_idx, target_idx, r, nr_regions, do_trace),
+ TP_ARGS(context_idx, scheme_idx, target_idx, r, nr_accesses,
+ nr_regions, do_trace),
TP_CONDITION(do_trace),
@@ -101,7 +103,7 @@ TRACE_EVENT_CONDITION(damos_before_apply,
__entry->target_idx = target_idx;
__entry->start = r->ar.start;
__entry->end = r->ar.end;
- __entry->nr_accesses = r->nr_accesses_bp / 10000;
+ __entry->nr_accesses = nr_accesses;
__entry->age = r->age;
__entry->nr_regions = nr_regions;
),
diff --git a/mm/damon/core.c b/mm/damon/core.c
index d6cc538172b40..ca68c4835c391 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2442,7 +2442,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t,
struct damos *siter; /* schemes iterator */
unsigned int sidx = 0;
struct damon_target *titer; /* targets iterator */
- unsigned int tidx = 0;
+ unsigned int tidx = 0, nr_accesses = 0;
bool do_trace = false;
/* get indices for trace_damos_before_apply() */
@@ -2457,6 +2457,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t,
break;
tidx++;
}
+ nr_accesses = damon_nr_accesses_mvsum(r, c);
do_trace = true;
}
@@ -2472,7 +2473,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t,
if (damos_core_filter_out(c, t, r, s))
return;
ktime_get_coarse_ts64(&begin);
- trace_damos_before_apply(cidx, sidx, tidx, r,
+ trace_damos_before_apply(cidx, sidx, tidx, r, nr_accesses,
damon_nr_regions(t), do_trace);
sz_applied = c->ops.apply_scheme(c, t, r, s,
&sz_ops_filter_passed);
--
2.47.3
^ permalink raw reply related
* [RFC PATCH v1.3 00/18] mm/damon: optimize out nr_accesses_bp
From: SeongJae Park @ 2026-06-22 14:21 UTC (permalink / raw)
Cc: SeongJae Park, Andrew Morton, Brendan Higgins, David Gow,
Masami Hiramatsu, Mathieu Desnoyers, Shuah Khan, Steven Rostedt,
damon, kunit-dev, linux-kernel, linux-kselftest, linux-mm,
linux-trace-kernel
TLDR: Replace damon_region->nr_accesses_bp, which is easy to be wrong,
with a simpler on-demand moving sum function, damon_nr_accesses_mvsum().
Background
==========
DAMON's monitoring output (access pattern snapshot, or more technically
speaking, damon_region->nr_accesses) is completed once per aggregation
interval, which is 100 ms by default. Users can arbitrarily increase
the interval for demand. Under the suggested intervals auto-tuning
setup, it can span up to 200 seconds. If the aggregation interval is
too long, the snapshot users cannot use it in reasonable time. To
mitigate this, we introduced a new field of damon_region, namely
nr_accesses_bp. It contains a pseudo moving sum of nr_accesses in bp
units and is updated for each sampling interval.
It turned out keeping it correctly updated every sampling interval is
not that easy. From online parameter update feature development and
more experimental hacks, we found it is easy to be corrupted. Once it
is corrupted, DAMON's monitoring outputs become quite insane. Hence we
added a few validation checks. It is easy to be corrupted because it
requires every update per sampling interval to be correct.
Solution
========
There is no real reason to keep it updated every sampling interval. Due
to the simple pseudo-moving sum mechanism and existing helper field
(last_nr_accesses), we can also calculate the pseudo moving sum on
demand in a much simpler way.
Implement a function for getting the pseudo moving sum on demand, and
replace nr_accessses_bp uses with the new function. Also remove no more
needed tests for nr_accesses_bp and the per-sampling interval update
functions. Finally, remove the nr_accesses_bp. The new function is
quite simple.
Discussion
==========
Depending on the use case, multiple nr_accesses readers could be
executed in the same kdamond_fn() main loop iteration, which is executed
once per sampling interval. Such readers include DAMON region exporting
tracepoints (damon_[region_]aggregated and damos_before_apply), DAMOS,
and DAMON sysfs interface logic for update_schemes_tried_regions
command. In this case, the new function will be called multiple times
and this could be overhead compared to the old logic, which simply reads
the field without any additional work. Nonetheless, the new function is
quite simple. And the new approach does nothing while there is no need
to read. The old approach had to execute its update function for each
region for every sampling interval. Hence the new approach is believed
to be even more lightweight in common case, and the overhead is anyway
negligible.
One more advantage of this change is that one field from the
damon_region struct is removed. On setups that uses a high number of
DAMON regions, this could be a potential memory space benefit.
Patches Sequence
================
Patch 1 introduces the new function for getting the pseudo moving sum of
nr_accesses on demands. Patch 2 implements a unit test for the new
function's internal logic. Patch 3 and 4 update monitoring logic and
the new function to ready for safe use on the existing logic. Patches
5-7 replace uses of nr_accesses_bp in DAMOS, tracepoints and DAMON sysfs
interface with the new function, respectively. Patches 8-10 removes
nr_accesses_bp validation functions in DAMON core, one by one. Patches
11 and 12 further remove tests and test helper for nr_accesses_bp,
respectively. Patches 13 removes the setups and updates or
nr_accesses_bp field. Patches 14-16 cleans up function parameters that
are no more being used due to the previous patch. Patch 17 removes the
function that was used for updating nr_accesses_bp field with its unit
test, which is the single remaining caller of the function. Finally,
patch 18 removes damon_region->nr_accesses_bp field.
Changes from RFC v1.2
- RFC v1.2: https://lore.kernel.org/20260621155715.87932-1-sj@kernel.org
- Explicitly ignore nr_accesses from mvsum at the beginning of
aggregation.
- Fix a typo in a commit message.
Changes from RFC v1.1
- RFC v1.1: https://lore.kernel.org/20260620172244.90953-1-sj@kernel.org
- Handle next_aggregation_sis < passed_sample_intervals in
nr_accesses_mvsum().
- Always rescale ->last_nr_accesss for parameter changes.
- Remove unused attrs params from damon_update_region_access_rate() and
its callers.
Changes from RFC v1
- RFC v1: https://lore.kernel.org/20260619193415.73833-1-sj@kernel.org
- Avoid divide-by-zero from zero aggregation interval.
- Call damon_nr_accesses_mvsum() for damos tracing only when it is enabled.
- Remove obsolete mentions of nr_accesses_bp in comments.
SeongJae Park (18):
mm/damon: introduce damon_nr_accesses_mvsum()
mm/damon/tests/core-kunit: test damon_mvsum()
mm/damon/core: always update ->last_nr_accesses for intervals change
mm/damon/core: handle unreset nr_accesses in damon_nr_accesses_mvsum()
mm/damon/core: use damon_nr_accesses_mvsum() in __damos_valid_target()
mm/damon/core: use damon_nr_accesses_mvsum() for damos region tracing
mm/damon/sysfs-schemes: use damon_nr_accesses_mvsum() for damo regions
mm/damon/core: remove damon_warn_fix_nr_accesses_corruption()
mm/damon/core: remove damon_verify_reset_aggregated()
mm/damon/core: remove damon_verify_merge_regions_of()
mm/damon/tests/core-kunit: remove nr_accesses_bp setup and tests
selftests/damon/drgn_dump_damon_status: do not dump nr_accesses_bp
mm/damon/core: remove nr_accesses_bp setups and updates
mm/damon/core: remove attrs param from
damon_update_region_access_rate()
mm/damonn/paddr: remove attrs param from __damon_pa_check_access()
mm/damon/vaddr: remove attrs param from __damon_va_check_access()
mm/damon/core: remove damon_moving_sum() and its unit test
mm/damon: remove damon_region->nr_accesses_bp
include/linux/damon.h | 15 +-
include/trace/events/damon.h | 8 +-
mm/damon/core.c | 201 +++++++-----------
mm/damon/paddr.c | 9 +-
mm/damon/sysfs-schemes.c | 6 +-
mm/damon/tests/core-kunit.h | 37 ++--
mm/damon/vaddr.c | 12 +-
.../selftests/damon/drgn_dump_damon_status.py | 1 -
8 files changed, 119 insertions(+), 170 deletions(-)
base-commit: e08d3bec1dc38cc991fc819afd698bf7bd07bd6d
--
2.47.3
^ permalink raw reply
* Re: [PATCH v2 1/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Yury Norov @ 2026-06-22 13:41 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
Peter Zijlstra, Julia Lawall, Yury Norov
In-Reply-To: <20260622131029.655382134@kernel.org>
On Mon, Jun 22, 2026 at 09:07:40AM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> In order to remove the include to trace_printk.h from kernel.h the tracing
> control prototypes need to be separated into their own header file as they
> are used in other common header files like rcu.h. There's no point in
> removing trace_printk.h from kernel.h if it just gets added back to other
> common headers.
>
> Prototypes are very cheap for the compiler and should not be an issue.
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Yury Norov <yury.norov@gmail.com>
> ---
> Changes since v1: https://patch.msgid.link/20260621093811.007634476@kernel.org
>
> - Instead of moving back into kernel.h, create a new trace_controls.h
> header.
>
> arch/powerpc/xmon/xmon.c | 1 +
> arch/s390/kernel/ipl.c | 1 +
> arch/s390/kernel/machine_kexec.c | 1 +
> drivers/gpu/drm/i915/i915_gem.h | 1 +
> drivers/tty/sysrq.c | 1 +
> include/linux/trace_controls.h | 54 ++++++++++++++++++++++++++++++++
> include/linux/trace_printk.h | 51 ------------------------------
> kernel/debug/debug_core.c | 1 +
> kernel/panic.c | 1 +
> kernel/rcu/rcu.h | 2 ++
> kernel/rcu/rcutorture.c | 1 +
> kernel/trace/trace.h | 1 +
> kernel/trace/trace_benchmark.c | 1 +
> lib/sys_info.c | 1 +
> 14 files changed, 67 insertions(+), 51 deletions(-)
> create mode 100644 include/linux/trace_controls.h
>
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index cb3a3244ae6f..2135f319e0dd 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -27,6 +27,7 @@
> #include <linux/highmem.h>
> #include <linux/security.h>
> #include <linux/debugfs.h>
> +#include <linux/trace_controls.h>
>
> #include <asm/ptrace.h>
> #include <asm/smp.h>
> diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
> index 3c346b02ceb9..baac66cc4de4 100644
> --- a/arch/s390/kernel/ipl.c
> +++ b/arch/s390/kernel/ipl.c
> @@ -22,6 +22,7 @@
> #include <linux/debug_locks.h>
> #include <linux/vmalloc.h>
> #include <linux/secure_boot.h>
> +#include <linux/trace_controls.h>
> #include <asm/asm-extable.h>
> #include <asm/machine.h>
> #include <asm/diag.h>
> diff --git a/arch/s390/kernel/machine_kexec.c b/arch/s390/kernel/machine_kexec.c
> index baeb3dcfc1c8..33f9a89eb3ad 100644
> --- a/arch/s390/kernel/machine_kexec.c
> +++ b/arch/s390/kernel/machine_kexec.c
> @@ -12,6 +12,7 @@
> #include <linux/delay.h>
> #include <linux/reboot.h>
> #include <linux/ftrace.h>
> +#include <linux/trace_controls.h>
> #include <linux/debug_locks.h>
> #include <linux/cpufeature.h>
> #include <asm/guarded_storage.h>
> diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
> index 20b3cb29cfff..1da8fb61c09e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.h
> +++ b/drivers/gpu/drm/i915/i915_gem.h
> @@ -116,6 +116,7 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
> #endif
>
> #if IS_ENABLED(CONFIG_DRM_I915_TRACE_GEM)
> +#include <linux/trace_controls.h>
> #define GEM_TRACE(...) trace_printk(__VA_ARGS__)
> #define GEM_TRACE_ERR(...) do { \
> pr_err(__VA_ARGS__); \
> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> index c2e4b31b699a..d3f72dc430b8 100644
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -324,6 +324,7 @@ static const struct sysrq_key_op sysrq_showstate_blocked_op = {
> };
>
> #ifdef CONFIG_TRACING
> +#include <linux/trace_controls.h>
> #include <linux/ftrace.h>
>
> static void sysrq_ftrace_dump(u8 key)
> diff --git a/include/linux/trace_controls.h b/include/linux/trace_controls.h
> new file mode 100644
> index 000000000000..995b97e963b4
> --- /dev/null
> +++ b/include/linux/trace_controls.h
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_TRACE_CONTROLS_H
> +#define _LINUX_TRACE_CONTROLS_H
> +
> +
> +/*
> + * General tracing related utility functions - trace_printk(),
> + * tracing_on/tracing_off and tracing_start()/tracing_stop
> + *
> + * Use tracing_on/tracing_off when you want to quickly turn on or off
> + * tracing. It simply enables or disables the recording of the trace events.
> + * This also corresponds to the user space /sys/kernel/tracing/tracing_on
> + * file, which gives a means for the kernel and userspace to interact.
> + * Place a tracing_off() in the kernel where you want tracing to end.
> + * From user space, examine the trace, and then echo 1 > tracing_on
> + * to continue tracing.
> + *
> + * tracing_stop/tracing_start has slightly more overhead. It is used
> + * by things like suspend to ram where disabling the recording of the
> + * trace is not enough, but tracing must actually stop because things
> + * like calling smp_processor_id() may crash the system.
> + *
> + * Most likely, you want to use tracing_on/tracing_off.
> + */
> +enum ftrace_dump_mode {
> + DUMP_NONE,
> + DUMP_ALL,
> + DUMP_ORIG,
> + DUMP_PARAM,
> +};
> +
> +#ifdef CONFIG_TRACING
> +void tracing_on(void);
> +void tracing_off(void);
> +int tracing_is_on(void);
> +void tracing_snapshot(void);
> +void tracing_snapshot_alloc(void);
> +void tracing_start(void);
> +void tracing_stop(void);
> +void trace_dump_stack(int skip);
The function description says:
record a stack back trace in the trace buffer
So, to me it sounds like it should go to the trace_printk.h.
> +void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
Same here, based on the function name, it relates to ftrace.h, not the
tracing control itself.
For example, lib/sys_info.c only calls ftrace_dump, and already
includes ftrace.h. If you place ftrace_dump() as suggested, you can
include only ftrace.h in there, and don't include trace_printk.h and
trace_control.h
Thanks,
Yury
> +#else
> +static inline void tracing_start(void) { }
> +static inline void tracing_stop(void) { }
> +static inline void tracing_on(void) { }
> +static inline void tracing_off(void) { }
> +static inline int tracing_is_on(void) { return 0; }
> +static inline void tracing_snapshot(void) { }
> +static inline void tracing_snapshot_alloc(void) { }
> +static inline void trace_dump_stack(int skip) { }
> +static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
> +#endif
> +
> +#endif /* _LINUX_TRACE_CONTROLS_H */
> diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
> index 3d54f440dccf..a488ea9e9f85 100644
> --- a/include/linux/trace_printk.h
> +++ b/include/linux/trace_printk.h
> @@ -7,43 +7,7 @@
> #include <linux/stddef.h>
> #include <linux/stringify.h>
>
> -/*
> - * General tracing related utility functions - trace_printk(),
> - * tracing_on/tracing_off and tracing_start()/tracing_stop
> - *
> - * Use tracing_on/tracing_off when you want to quickly turn on or off
> - * tracing. It simply enables or disables the recording of the trace events.
> - * This also corresponds to the user space /sys/kernel/tracing/tracing_on
> - * file, which gives a means for the kernel and userspace to interact.
> - * Place a tracing_off() in the kernel where you want tracing to end.
> - * From user space, examine the trace, and then echo 1 > tracing_on
> - * to continue tracing.
> - *
> - * tracing_stop/tracing_start has slightly more overhead. It is used
> - * by things like suspend to ram where disabling the recording of the
> - * trace is not enough, but tracing must actually stop because things
> - * like calling smp_processor_id() may crash the system.
> - *
> - * Most likely, you want to use tracing_on/tracing_off.
> - */
> -
> -enum ftrace_dump_mode {
> - DUMP_NONE,
> - DUMP_ALL,
> - DUMP_ORIG,
> - DUMP_PARAM,
> -};
> -
> #ifdef CONFIG_TRACING
> -void tracing_on(void);
> -void tracing_off(void);
> -int tracing_is_on(void);
> -void tracing_snapshot(void);
> -void tracing_snapshot_alloc(void);
> -
> -extern void tracing_start(void);
> -extern void tracing_stop(void);
> -
> static inline __printf(1, 2)
> void ____trace_printk_check_format(const char *fmt, ...)
> {
> @@ -149,8 +113,6 @@ int __trace_printk(unsigned long ip, const char *fmt, ...);
> extern int __trace_bputs(unsigned long ip, const char *str);
> extern int __trace_puts(unsigned long ip, const char *str);
>
> -extern void trace_dump_stack(int skip);
> -
> /*
> * The double __builtin_constant_p is because gcc will give us an error
> * if we try to allocate the static variable to fmt if it is not a
> @@ -173,19 +135,7 @@ __ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap);
>
> extern __printf(2, 0) int
> __ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
> -
> -extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
> #else
> -static inline void tracing_start(void) { }
> -static inline void tracing_stop(void) { }
> -static inline void trace_dump_stack(int skip) { }
> -
> -static inline void tracing_on(void) { }
> -static inline void tracing_off(void) { }
> -static inline int tracing_is_on(void) { return 0; }
> -static inline void tracing_snapshot(void) { }
> -static inline void tracing_snapshot_alloc(void) { }
> -
> static inline __printf(1, 2)
> int trace_printk(const char *fmt, ...)
> {
> @@ -196,7 +146,6 @@ ftrace_vprintk(const char *fmt, va_list ap)
> {
> return 0;
> }
> -static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
> #endif /* CONFIG_TRACING */
>
> #endif
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index b276504c1c6b..f9c83a470c98 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -27,6 +27,7 @@
>
> #define pr_fmt(fmt) "KGDB: " fmt
>
> +#include <linux/trace_controls.h>
> #include <linux/pid_namespace.h>
> #include <linux/clocksource.h>
> #include <linux/serial_core.h>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 213725b612aa..1415e910371d 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -9,6 +9,7 @@
> * This function is used through-out the kernel (including mm and fs)
> * to indicate a major problem.
> */
> +#include <linux/trace_controls.h>
> #include <linux/debug_locks.h>
> #include <linux/sched/debug.h>
> #include <linux/interrupt.h>
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index fa6d30ce73d1..b3e2c8f25a4f 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -280,6 +280,8 @@ extern int rcu_cpu_stall_notifiers;
>
> #ifdef CONFIG_RCU_STALL_COMMON
>
> +#include <linux/trace_controls.h>
> +
> extern int rcu_cpu_stall_ftrace_dump;
> extern int rcu_cpu_stall_suppress;
> extern int rcu_cpu_stall_timeout;
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 882a158ada7b..76bf0184b267 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -39,6 +39,7 @@
> #include <linux/srcu.h>
> #include <linux/slab.h>
> #include <linux/trace_clock.h>
> +#include <linux/trace_controls.h>
> #include <asm/byteorder.h>
> #include <linux/torture.h>
> #include <linux/vmalloc.h>
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 80fe152af1dd..2537c33ddd49 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -22,6 +22,7 @@
> #include <linux/ctype.h>
> #include <linux/once_lite.h>
> #include <linux/ftrace_regs.h>
> +#include <linux/trace_controls.h>
> #include <linux/llist.h>
>
> #include "pid_list.h"
> diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
> index e19c32f2a938..69cc39008c36 100644
> --- a/kernel/trace/trace_benchmark.c
> +++ b/kernel/trace/trace_benchmark.c
> @@ -3,6 +3,7 @@
> #include <linux/module.h>
> #include <linux/kthread.h>
> #include <linux/trace_clock.h>
> +#include <linux/trace_controls.h>
>
> #define CREATE_TRACE_POINTS
> #include "trace_benchmark.h"
> diff --git a/lib/sys_info.c b/lib/sys_info.c
> index f32a06ec9ed4..e3c9ca05601b 100644
> --- a/lib/sys_info.c
> +++ b/lib/sys_info.c
> @@ -8,6 +8,7 @@
> #include <linux/ftrace.h>
> #include <linux/nmi.h>
> #include <linux/sched/debug.h>
> +#include <linux/trace_controls.h>
> #include <linux/string.h>
> #include <linux/sysctl.h>
>
> --
> 2.53.0
>
^ permalink raw reply
* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Yury Norov @ 2026-06-22 13:11 UTC (permalink / raw)
To: Steven Rostedt
Cc: Christophe Leroy (CS GROUP), linux-kernel, linux-trace-kernel,
Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
kvm, intel-gfx
In-Reply-To: <20260622090826.20efadb3@fedora>
On Mon, Jun 22, 2026 at 09:08:26AM -0400, Steven Rostedt wrote:
> On Mon, 22 Jun 2026 10:05:13 +0200
> "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> wrote:
>
> > > There's been complaints about trace_printk() being defined in kernel.h as it
> > > can increase the compilation time. As it is only used by some developers for
> > > debugging purposes, it should not be in kernel.h causing lots of wasted CPU
> > > cycles for those that do not ever care about it.
> >
> > Do we have a measurement of the increased compilation time ?
>
> I believe Yury does.
I re-run compilation is a more strict environment, and the difference
is negligible.
^ permalink raw reply
* [PATCH v2 1/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Steven Rostedt @ 2026-06-22 13:07 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov
In-Reply-To: <20260622130739.375198646@kernel.org>
From: Steven Rostedt <rostedt@goodmis.org>
In order to remove the include to trace_printk.h from kernel.h the tracing
control prototypes need to be separated into their own header file as they
are used in other common header files like rcu.h. There's no point in
removing trace_printk.h from kernel.h if it just gets added back to other
common headers.
Prototypes are very cheap for the compiler and should not be an issue.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
Changes since v1: https://patch.msgid.link/20260621093811.007634476@kernel.org
- Instead of moving back into kernel.h, create a new trace_controls.h
header.
arch/powerpc/xmon/xmon.c | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/machine_kexec.c | 1 +
drivers/gpu/drm/i915/i915_gem.h | 1 +
drivers/tty/sysrq.c | 1 +
include/linux/trace_controls.h | 54 ++++++++++++++++++++++++++++++++
include/linux/trace_printk.h | 51 ------------------------------
kernel/debug/debug_core.c | 1 +
kernel/panic.c | 1 +
kernel/rcu/rcu.h | 2 ++
kernel/rcu/rcutorture.c | 1 +
kernel/trace/trace.h | 1 +
kernel/trace/trace_benchmark.c | 1 +
lib/sys_info.c | 1 +
14 files changed, 67 insertions(+), 51 deletions(-)
create mode 100644 include/linux/trace_controls.h
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index cb3a3244ae6f..2135f319e0dd 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -27,6 +27,7 @@
#include <linux/highmem.h>
#include <linux/security.h>
#include <linux/debugfs.h>
+#include <linux/trace_controls.h>
#include <asm/ptrace.h>
#include <asm/smp.h>
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index 3c346b02ceb9..baac66cc4de4 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -22,6 +22,7 @@
#include <linux/debug_locks.h>
#include <linux/vmalloc.h>
#include <linux/secure_boot.h>
+#include <linux/trace_controls.h>
#include <asm/asm-extable.h>
#include <asm/machine.h>
#include <asm/diag.h>
diff --git a/arch/s390/kernel/machine_kexec.c b/arch/s390/kernel/machine_kexec.c
index baeb3dcfc1c8..33f9a89eb3ad 100644
--- a/arch/s390/kernel/machine_kexec.c
+++ b/arch/s390/kernel/machine_kexec.c
@@ -12,6 +12,7 @@
#include <linux/delay.h>
#include <linux/reboot.h>
#include <linux/ftrace.h>
+#include <linux/trace_controls.h>
#include <linux/debug_locks.h>
#include <linux/cpufeature.h>
#include <asm/guarded_storage.h>
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 20b3cb29cfff..1da8fb61c09e 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -116,6 +116,7 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
#endif
#if IS_ENABLED(CONFIG_DRM_I915_TRACE_GEM)
+#include <linux/trace_controls.h>
#define GEM_TRACE(...) trace_printk(__VA_ARGS__)
#define GEM_TRACE_ERR(...) do { \
pr_err(__VA_ARGS__); \
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index c2e4b31b699a..d3f72dc430b8 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -324,6 +324,7 @@ static const struct sysrq_key_op sysrq_showstate_blocked_op = {
};
#ifdef CONFIG_TRACING
+#include <linux/trace_controls.h>
#include <linux/ftrace.h>
static void sysrq_ftrace_dump(u8 key)
diff --git a/include/linux/trace_controls.h b/include/linux/trace_controls.h
new file mode 100644
index 000000000000..995b97e963b4
--- /dev/null
+++ b/include/linux/trace_controls.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_TRACE_CONTROLS_H
+#define _LINUX_TRACE_CONTROLS_H
+
+
+/*
+ * General tracing related utility functions - trace_printk(),
+ * tracing_on/tracing_off and tracing_start()/tracing_stop
+ *
+ * Use tracing_on/tracing_off when you want to quickly turn on or off
+ * tracing. It simply enables or disables the recording of the trace events.
+ * This also corresponds to the user space /sys/kernel/tracing/tracing_on
+ * file, which gives a means for the kernel and userspace to interact.
+ * Place a tracing_off() in the kernel where you want tracing to end.
+ * From user space, examine the trace, and then echo 1 > tracing_on
+ * to continue tracing.
+ *
+ * tracing_stop/tracing_start has slightly more overhead. It is used
+ * by things like suspend to ram where disabling the recording of the
+ * trace is not enough, but tracing must actually stop because things
+ * like calling smp_processor_id() may crash the system.
+ *
+ * Most likely, you want to use tracing_on/tracing_off.
+ */
+enum ftrace_dump_mode {
+ DUMP_NONE,
+ DUMP_ALL,
+ DUMP_ORIG,
+ DUMP_PARAM,
+};
+
+#ifdef CONFIG_TRACING
+void tracing_on(void);
+void tracing_off(void);
+int tracing_is_on(void);
+void tracing_snapshot(void);
+void tracing_snapshot_alloc(void);
+void tracing_start(void);
+void tracing_stop(void);
+void trace_dump_stack(int skip);
+void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
+#else
+static inline void tracing_start(void) { }
+static inline void tracing_stop(void) { }
+static inline void tracing_on(void) { }
+static inline void tracing_off(void) { }
+static inline int tracing_is_on(void) { return 0; }
+static inline void tracing_snapshot(void) { }
+static inline void tracing_snapshot_alloc(void) { }
+static inline void trace_dump_stack(int skip) { }
+static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
+#endif
+
+#endif /* _LINUX_TRACE_CONTROLS_H */
diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
index 3d54f440dccf..a488ea9e9f85 100644
--- a/include/linux/trace_printk.h
+++ b/include/linux/trace_printk.h
@@ -7,43 +7,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
-/*
- * General tracing related utility functions - trace_printk(),
- * tracing_on/tracing_off and tracing_start()/tracing_stop
- *
- * Use tracing_on/tracing_off when you want to quickly turn on or off
- * tracing. It simply enables or disables the recording of the trace events.
- * This also corresponds to the user space /sys/kernel/tracing/tracing_on
- * file, which gives a means for the kernel and userspace to interact.
- * Place a tracing_off() in the kernel where you want tracing to end.
- * From user space, examine the trace, and then echo 1 > tracing_on
- * to continue tracing.
- *
- * tracing_stop/tracing_start has slightly more overhead. It is used
- * by things like suspend to ram where disabling the recording of the
- * trace is not enough, but tracing must actually stop because things
- * like calling smp_processor_id() may crash the system.
- *
- * Most likely, you want to use tracing_on/tracing_off.
- */
-
-enum ftrace_dump_mode {
- DUMP_NONE,
- DUMP_ALL,
- DUMP_ORIG,
- DUMP_PARAM,
-};
-
#ifdef CONFIG_TRACING
-void tracing_on(void);
-void tracing_off(void);
-int tracing_is_on(void);
-void tracing_snapshot(void);
-void tracing_snapshot_alloc(void);
-
-extern void tracing_start(void);
-extern void tracing_stop(void);
-
static inline __printf(1, 2)
void ____trace_printk_check_format(const char *fmt, ...)
{
@@ -149,8 +113,6 @@ int __trace_printk(unsigned long ip, const char *fmt, ...);
extern int __trace_bputs(unsigned long ip, const char *str);
extern int __trace_puts(unsigned long ip, const char *str);
-extern void trace_dump_stack(int skip);
-
/*
* The double __builtin_constant_p is because gcc will give us an error
* if we try to allocate the static variable to fmt if it is not a
@@ -173,19 +135,7 @@ __ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap);
extern __printf(2, 0) int
__ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
-
-extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
#else
-static inline void tracing_start(void) { }
-static inline void tracing_stop(void) { }
-static inline void trace_dump_stack(int skip) { }
-
-static inline void tracing_on(void) { }
-static inline void tracing_off(void) { }
-static inline int tracing_is_on(void) { return 0; }
-static inline void tracing_snapshot(void) { }
-static inline void tracing_snapshot_alloc(void) { }
-
static inline __printf(1, 2)
int trace_printk(const char *fmt, ...)
{
@@ -196,7 +146,6 @@ ftrace_vprintk(const char *fmt, va_list ap)
{
return 0;
}
-static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
#endif /* CONFIG_TRACING */
#endif
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index b276504c1c6b..f9c83a470c98 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -27,6 +27,7 @@
#define pr_fmt(fmt) "KGDB: " fmt
+#include <linux/trace_controls.h>
#include <linux/pid_namespace.h>
#include <linux/clocksource.h>
#include <linux/serial_core.h>
diff --git a/kernel/panic.c b/kernel/panic.c
index 213725b612aa..1415e910371d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -9,6 +9,7 @@
* This function is used through-out the kernel (including mm and fs)
* to indicate a major problem.
*/
+#include <linux/trace_controls.h>
#include <linux/debug_locks.h>
#include <linux/sched/debug.h>
#include <linux/interrupt.h>
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index fa6d30ce73d1..b3e2c8f25a4f 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -280,6 +280,8 @@ extern int rcu_cpu_stall_notifiers;
#ifdef CONFIG_RCU_STALL_COMMON
+#include <linux/trace_controls.h>
+
extern int rcu_cpu_stall_ftrace_dump;
extern int rcu_cpu_stall_suppress;
extern int rcu_cpu_stall_timeout;
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 882a158ada7b..76bf0184b267 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -39,6 +39,7 @@
#include <linux/srcu.h>
#include <linux/slab.h>
#include <linux/trace_clock.h>
+#include <linux/trace_controls.h>
#include <asm/byteorder.h>
#include <linux/torture.h>
#include <linux/vmalloc.h>
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 80fe152af1dd..2537c33ddd49 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/once_lite.h>
#include <linux/ftrace_regs.h>
+#include <linux/trace_controls.h>
#include <linux/llist.h>
#include "pid_list.h"
diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
index e19c32f2a938..69cc39008c36 100644
--- a/kernel/trace/trace_benchmark.c
+++ b/kernel/trace/trace_benchmark.c
@@ -3,6 +3,7 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/trace_clock.h>
+#include <linux/trace_controls.h>
#define CREATE_TRACE_POINTS
#include "trace_benchmark.h"
diff --git a/lib/sys_info.c b/lib/sys_info.c
index f32a06ec9ed4..e3c9ca05601b 100644
--- a/lib/sys_info.c
+++ b/lib/sys_info.c
@@ -8,6 +8,7 @@
#include <linux/ftrace.h>
#include <linux/nmi.h>
#include <linux/sched/debug.h>
+#include <linux/trace_controls.h>
#include <linux/string.h>
#include <linux/sysctl.h>
--
2.53.0
^ permalink raw reply related
* [PATCH v2 2/2] tracing: Remove trace_printk.h from kernel.h
From: Steven Rostedt @ 2026-06-22 13:07 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov
In-Reply-To: <20260622130739.375198646@kernel.org>
From: Steven Rostedt <rostedt@goodmis.org>
There have been complaints about trace_printk.h causing more build time
for being in kernel.h. Move it out of kernel.h and place it in the headers
and C files that use it.
Link: https://lore.kernel.org/all/CAHk-=wikCBeVFjVXiY4o-oepdbjAoir5+TcAgtL12c4u1TpZLQ@mail.gmail.com/
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
Changes since v1: https://patch.msgid.link/20260621093811.168514984@kernel.org
- Just remove trace_printk.h and fix up all the places that need it.
arch/powerpc/kvm/book3s_xics.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
drivers/gpu/drm/i915/i915_gem.h | 1 +
drivers/hwtracing/stm/dummy_stm.c | 4 ++++
drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
drivers/usb/early/xhci-dbc.c | 1 +
fs/ext4/inline.c | 1 +
include/linux/ftrace.h | 2 ++
include/linux/kernel.h | 1 -
include/linux/sunrpc/debug.h | 1 +
include/linux/trace_printk.h | 5 +++--
kernel/trace/ring_buffer_benchmark.c | 1 +
samples/fprobe/fprobe_example.c | 1 +
samples/ftrace/ftrace-direct-too.c | 1 -
samples/trace_printk/trace-printk.c | 1 +
15 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 74a44fa702b0..ef5eb596a56e 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -26,6 +26,7 @@
#if 1
#define XICS_DBG(fmt...) do { } while (0)
#else
+#include <linux/trace_printk.h>
#define XICS_DBG(fmt...) trace_printk(fmt)
#endif
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b54ee4f25af1..f6f223090760 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -35,6 +35,7 @@
#define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
#if IS_ENABLED(CONFIG_DRM_I915_TRACE_GTT)
+#include <linux/trace_printk.h>
#define GTT_TRACE(...) trace_printk(__VA_ARGS__)
#else
#define GTT_TRACE(...)
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 1da8fb61c09e..f490052e8964 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -117,6 +117,7 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
#if IS_ENABLED(CONFIG_DRM_I915_TRACE_GEM)
#include <linux/trace_controls.h>
+#include <linux/trace_printk.h>
#define GEM_TRACE(...) trace_printk(__VA_ARGS__)
#define GEM_TRACE_ERR(...) do { \
pr_err(__VA_ARGS__); \
diff --git a/drivers/hwtracing/stm/dummy_stm.c b/drivers/hwtracing/stm/dummy_stm.c
index 38528ffdc0b3..784f9af7ccba 100644
--- a/drivers/hwtracing/stm/dummy_stm.c
+++ b/drivers/hwtracing/stm/dummy_stm.c
@@ -14,6 +14,10 @@
#include <linux/stm.h>
#include <uapi/linux/stm.h>
+#ifdef DEBUG
+#include <linux/trace_printk.h>
+#endif
+
static ssize_t notrace
dummy_stm_packet(struct stm_data *stm_data, unsigned int master,
unsigned int channel, unsigned int packet, unsigned int flags,
diff --git a/drivers/infiniband/hw/hfi1/trace_dbg.h b/drivers/infiniband/hw/hfi1/trace_dbg.h
index 58304b91380f..30df5e246586 100644
--- a/drivers/infiniband/hw/hfi1/trace_dbg.h
+++ b/drivers/infiniband/hw/hfi1/trace_dbg.h
@@ -103,6 +103,7 @@ __hfi1_trace_def(IOCTL);
*/
#ifdef HFI1_EARLY_DBG
+#include <linux/trace_printk.h>
#define hfi1_dbg_early(fmt, ...) \
trace_printk(fmt, ##__VA_ARGS__)
#else
diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
index 41118bba9197..955c73bd601f 100644
--- a/drivers/usb/early/xhci-dbc.c
+++ b/drivers/usb/early/xhci-dbc.c
@@ -30,6 +30,7 @@ static struct xdbc_state xdbc;
static bool early_console_keep;
#ifdef XDBC_TRACE
+#include <linux/trace_printk.h>
#define xdbc_trace trace_printk
#else
static inline void xdbc_trace(const char *fmt, ...) { }
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 8045e4ff270c..0eff4a0c6a6c 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -934,6 +934,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
}
#ifdef INLINE_DIR_DEBUG
+#include <linux/trace_printk.h>
void ext4_show_inline_dir(struct inode *dir, struct buffer_head *bh,
void *inline_start, int inline_size)
{
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 02bc5027523a..b5336a81e619 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -8,6 +8,8 @@
#define _LINUX_FTRACE_H
#include <linux/trace_recursion.h>
+#include <linux/trace_controls.h>
+#include <linux/trace_printk.h>
#include <linux/trace_clock.h>
#include <linux/jump_label.h>
#include <linux/kallsyms.h>
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e5570a16cbb1..e87a40fbd152 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -31,7 +31,6 @@
#include <linux/build_bug.h>
#include <linux/sprintf.h>
#include <linux/static_call_types.h>
-#include <linux/trace_printk.h>
#include <linux/util_macros.h>
#include <linux/wordpart.h>
diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index ab61bed2f7af..7524f5d82fba 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -29,6 +29,7 @@ extern unsigned int nlm_debug;
# define ifdebug(fac) if (unlikely(rpc_debug & RPCDBG_##fac))
# if IS_ENABLED(CONFIG_SUNRPC_DEBUG_TRACE)
+# include <linux/trace_printk.h>
# define __sunrpc_printk(fmt, ...) trace_printk(fmt, ##__VA_ARGS__)
# else
# define __sunrpc_printk(fmt, ...) printk(KERN_DEFAULT fmt, ##__VA_ARGS__)
diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
index a488ea9e9f85..74ce4f8995c4 100644
--- a/include/linux/trace_printk.h
+++ b/include/linux/trace_printk.h
@@ -1,11 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_TRACE_PRINTK_H
#define _LINUX_TRACE_PRINTK_H
+#if !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) && !defined(BUILD_VDSO)
-#include <linux/compiler_attributes.h>
#include <linux/instruction_pointer.h>
#include <linux/stddef.h>
#include <linux/stringify.h>
+#include <linux/stdarg.h>
#ifdef CONFIG_TRACING
static inline __printf(1, 2)
@@ -147,5 +148,5 @@ ftrace_vprintk(const char *fmt, va_list ap)
return 0;
}
#endif /* CONFIG_TRACING */
-
+#endif /* !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) && !defined(BUILD_VDSO) */
#endif
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 593e3b59e42e..2bb25caebb75 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -5,6 +5,7 @@
* Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com>
*/
#include <linux/ring_buffer.h>
+#include <linux/trace_printk.h>
#include <linux/completion.h>
#include <linux/kthread.h>
#include <uapi/linux/sched/types.h>
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index bfe98ce826f3..de81b9b4ca7d 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -12,6 +12,7 @@
#define pr_fmt(fmt) "%s: " fmt, __func__
+#include <linux/trace_printk.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/fprobe.h>
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index bf2411aa6fd7..159190f4103f 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -1,6 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/module.h>
-
#include <linux/mm.h> /* for handle_mm_fault() */
#include <linux/ftrace.h>
#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
diff --git a/samples/trace_printk/trace-printk.c b/samples/trace_printk/trace-printk.c
index cfc159580263..ff37aeb8523e 100644
--- a/samples/trace_printk/trace-printk.c
+++ b/samples/trace_printk/trace-printk.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/trace_printk.h>
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/irq_work.h>
--
2.53.0
^ permalink raw reply related
* [PATCH v2 0/2] tracing: Remove trace_printk.h from kernel.h
From: Steven Rostedt @ 2026-06-22 13:07 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov
Remove trace_printk.h by creating a trace_controls.h for those places that
need access to tracing prototypes like tracing_off() and for the places that
need trace_printk() directly, to have it included directly.
Changse since v1: https://lore.kernel.org/all/20260621093430.264983361@kernel.org/
- Create a trace_controls.h header to move the prototypes into and not
include it back into kernel.h
- Just remove trace_printk.h from kernel.h with no alternative to keep the
previous behavior.
Steven Rostedt (2):
tracing: Move non-trace_printk prototypes into trace_controls.h
tracing: Remove trace_printk.h from kernel.h
----
arch/powerpc/kvm/book3s_xics.c | 1 +
arch/powerpc/xmon/xmon.c | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/machine_kexec.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
drivers/gpu/drm/i915/i915_gem.h | 2 ++
drivers/hwtracing/stm/dummy_stm.c | 4 +++
drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
drivers/tty/sysrq.c | 1 +
drivers/usb/early/xhci-dbc.c | 1 +
fs/ext4/inline.c | 1 +
include/linux/ftrace.h | 2 ++
include/linux/kernel.h | 1 -
include/linux/sunrpc/debug.h | 1 +
include/linux/trace_controls.h | 54 ++++++++++++++++++++++++++++++++
include/linux/trace_printk.h | 56 ++--------------------------------
kernel/debug/debug_core.c | 1 +
kernel/panic.c | 1 +
kernel/rcu/rcu.h | 2 ++
kernel/rcu/rcutorture.c | 1 +
kernel/trace/ring_buffer_benchmark.c | 1 +
kernel/trace/trace.h | 1 +
kernel/trace/trace_benchmark.c | 1 +
lib/sys_info.c | 1 +
samples/fprobe/fprobe_example.c | 1 +
samples/ftrace/ftrace-direct-too.c | 1 -
samples/trace_printk/trace-printk.c | 1 +
27 files changed, 86 insertions(+), 55 deletions(-)
create mode 100644 include/linux/trace_controls.h
^ permalink raw reply
* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Steven Rostedt @ 2026-06-22 13:08 UTC (permalink / raw)
To: Christophe Leroy (CS GROUP)
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <dbb5915e-6587-4de9-87f3-76bea5024da8@kernel.org>
On Mon, 22 Jun 2026 10:05:13 +0200
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org> wrote:
> > There's been complaints about trace_printk() being defined in kernel.h as it
> > can increase the compilation time. As it is only used by some developers for
> > debugging purposes, it should not be in kernel.h causing lots of wasted CPU
> > cycles for those that do not ever care about it.
>
> Do we have a measurement of the increased compilation time ?
I believe Yury does.
-- Steve
^ permalink raw reply
* Re: [syzbot] [trace?] general protection fault in mtree_load
From: Oleg Nesterov @ 2026-06-22 13:04 UTC (permalink / raw)
To: syzbot
Cc: bp, dave.hansen, hpa, linux-kernel, linux-trace-kernel, mhiramat,
mingo, peterz, syzkaller-bugs, tglx, x86
In-Reply-To: <6a38dd47.713c5d62.148f7.000c.GAE@google.com>
On 06/21, syzbot wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 6b5a2b7d9bc1 Merge tag 'trace-tools-v7.2' of git://git.ker..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16d56986580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=ea6584355d75e0cd
> dashboard link: https://syzkaller.appspot.com/bug?extid=61ce80689253f42e6d80
> compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-6b5a2b7d.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/b3cb0499fbe9/vmlinux-6b5a2b7d.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/47cfbe57f6ea/bzImage-6b5a2b7d.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+61ce80689253f42e6d80@syzkaller.appspotmail.com
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f]
> CPU: 3 UID: 0 PID: 24402 Comm: syz.4.5217 Tainted: G L syzkaller #0 PREEMPT(full)
> Tainted: [L]=SOFTLOCKUP
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:mas_root lib/maple_tree.c:759 [inline]
> RIP: 0010:mas_start lib/maple_tree.c:1179 [inline]
> RIP: 0010:mtree_load+0x16d/0xa90 lib/maple_tree.c:5657
> Code: 00 00 00 00 48 c7 44 24 78 ff ff ff ff e8 6b bd 84 f6 48 8b 5c 24 50 c6 84 24 9c 00 00 00 00 48 8d 7b 48 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 d6 08 00 00 48 8b 5b 48 e8 6f 1a 08 00 31 ff
> RSP: 0018:ffffc900039c76d8 EFLAGS: 00010206
> RAX: 0000000000000011 RBX: 0000000000000040 RCX: ffffffff8b848746
> RDX: ffff888041b6a540 RSI: ffffffff8b848775 RDI: 0000000000000088
> RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000001
> R10: 0000000000000001 R11: 000000000000751b R12: dffffc0000000000
> R13: ffff88802693adc0 R14: 00001fff904365a7 R15: dffffc0000000000
> FS: 0000000000000000(0000) GS:ffff8880d665f000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f44aa04f156 CR3: 00000000364d5000 CR4: 0000000000352ef0
> Call Trace:
> <TASK>
> vma_lookup include/linux/mm.h:4204 [inline]
> __in_uprobe_trampoline arch/x86/kernel/uprobes.c:766 [inline]
> __is_optimized arch/x86/kernel/uprobes.c:1056 [inline]
> is_optimized arch/x86/kernel/uprobes.c:1067 [inline]
> set_orig_insn+0x1ec/0x2a0 arch/x86/kernel/uprobes.c:1098
> remove_breakpoint kernel/events/uprobes.c:1185 [inline]
> register_for_each_vma+0xbb7/0xdb0 kernel/events/uprobes.c:1318
> uprobe_unregister_nosync+0x12a/0x1c0 kernel/events/uprobes.c:1343
> bpf_uprobe_unregister kernel/trace/bpf_trace.c:2936 [inline]
> bpf_uprobe_multi_link_release+0xb3/0x1c0 kernel/trace/bpf_trace.c:2947
> bpf_link_free+0xec/0x4a0 kernel/bpf/syscall.c:3273
> bpf_link_put_direct kernel/bpf/syscall.c:3326 [inline]
> bpf_link_release+0x5d/0x80 kernel/bpf/syscall.c:3333
> __fput+0x3ff/0xb50 fs/file_table.c:512
> task_work_run+0x150/0x240 kernel/task_work.c:233
> exit_task_work include/linux/task_work.h:40 [inline]
current->mm is already NULL, the exiting task has already passed exit_mm().
Hopefully
[PATCHv4 01/13] uprobes/x86: Use proper mm_struct in __in_uprobe_trampoline
https://lore.kernel.org/all/20260526205840.173790-2-jolsa@kernel.org/
should help...
Oleg.
^ permalink raw reply
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Vlastimil Babka (SUSE) @ 2026-06-22 12:31 UTC (permalink / raw)
To: Gregory Price
Cc: David Hildenbrand (Arm), Balbir Singh, lsf-pc, linux-kernel,
linux-cxl, cgroups, linux-mm, linux-trace-kernel, damon,
kernel-team, gregkh, rafael, dakr, dave, jonathan.cameron,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
dan.j.williams, longman, akpm, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, osalvador, ziy, matthew.brost,
joshua.hahnjy, rakie.kim, byungchul, ying.huang, apopple,
axelrasmussen, yuanchu, weixugc, yury.norov, linux, mhiramat,
mathieu.desnoyers, tj, hannes, mkoutny, jackmanb, sj, baolin.wang,
npache, ryan.roberts, dev.jain, baohua, lance.yang, muchun.song,
xu.xin16, chengming.zhou, jannh, linmiaohe, nao.horiguchi,
pfalcato, rientjes, shakeel.butt, riel, harry.yoo, cl,
roman.gushchin, chrisl, kasong, shikemeng, nphamcs, bhe,
zhengqi.arch, terry.bowman, Matthew Wilcox
In-Reply-To: <ajPS3AKrZEbZbXBw@gourry-fedora-PF4VCD3F>
On 6/18/26 13:13, Gregory Price wrote:
> On Thu, Jun 18, 2026 at 10:21:30AM +0200, Vlastimil Babka (SUSE) wrote:
>> On 6/15/26 17:37, Gregory Price wrote:
>> >
>> > One thought would be a way to switch what fallback list is used, and
>> > then have specific fallback lists for certain contexts.
>> >
>> > Right now there is a single example of this: __GFP_THISNODE
>> > |= __GFP_THISNODE => NOFALLBACK
>> > &= ~__GFP_THISNODE => FALLBACK
>> >
>> > We could add an interface with the desired fallback list based as an
>> > argument, and let get_page_from_freelist to prefer that over the default
>> > global lists.
>>
>> Does it mean a new argument in a number of functions in the page allocator,
>> or can it be mapped to alloc_flags (at least internally?), because the
>> number of possible fallback lists is small enough?
>>
>
> What I ended up with was adding a single page_alloc.c external interface
> that allows you define the zonelist via an enum, and then an internal
> selector resolution in prepare_alloc_pages() stored in alloc_context
OK. Since it's in alloc_context then there should be no parameter bloat
inside page allocator. And for the single external entry point it's better
to be explicit.
>
> eg:
>
> static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
> int preferred_nid, nodemask_t *nodemask,
> struct alloc_context *ac, gfp_t *alloc_gfp,
> unsigned int *alloc_flags)
> {
> ac->highest_zoneidx = gfp_zone(gfp_mask);
> ac->zonelist = select_zonelist(preferred_nid, gfp_mask, ac->zlsel);
> ... snip ...
> }
>
> struct folio *__folio_alloc_zonelist_noprof(gfp_t gfp, unsigned int order,
> int preferred_nid, nodemask_t *nodemask,
> enum alloc_zonelist zlsel);
>
>
> The original __folio_alloc* functions just add a DEFAULT - which tells
> select_zonelist() to base the decision on __GFP_THISNODE.
>
>
> struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
> nodemask_t *nodemask)
> {
> return __folio_alloc_core(gfp, order, preferred_nid, nodemask,
> ALLOC_ZONELIST_DEFAULT);
> }
> EXPORT_SYMBOL(__folio_alloc_noprof);
>
>
> This does a few things
> - The isolation is structural, there is no way to accidentally
> allocate private memory without passing ALLOC_ZONELIST_PRIVATE
>
> - The isolation forces folios - there are no non-folio interfaces
> which allow zonelist selection
>
> - The zonelist selection is confined to this allocation context,
> so no inheritence is possible.
>
Ack.
>
> I tried to avoid using an ALLOC_ flag so we can avoid yet another flag
> crunch, but there certainly are few enough zonelists that we could
> encode it there and expose it. I know Brendan was looking at plumbing
> alloc flags out to an interface, so i'm open to that.
>
> Externally the way I determine what zonelist to use is a lookup based on
> reason - letting the node filter. This is really only needed in a
> couple spots:
>
> mm/khugepaged.c: enum alloc_zonelist zlsel = alloc_zonelist_for_node(node, NODE_ALLOC_RECLAIM);
> mm/vmscan.c: mtc->zlsel = alloc_zonelist_for_nodemask(mtc->nmask, NODE_ALLOC_TIERING);
> mm/migrate.c: .zlsel = alloc_zonelist_for_node(node, NODE_ALLOC_USER_MIGRATE),
>
> static inline enum alloc_zonelist
> alloc_zonelist_for_node(int nid, enum node_alloc_reason reason)
> {
> bool ok;
>
> if (!node_state(nid, N_MEMORY_PRIVATE))
> return ALLOC_ZONELIST_DEFAULT;
> switch (reason) {
> case NODE_ALLOC_RECLAIM:
> ok = node_is_reclaimable(nid);
> break;
> case NODE_ALLOC_TIERING:
> ok = node_allows_tiering(nid);
> break;
> case NODE_ALLOC_USER_MIGRATE:
> ok = node_allows_user_migrate(nid);
> break;
> default:
> ok = false;
> }
> return ok ? ALLOC_ZONELIST_PRIVATE : ALLOC_ZONELIST_DEFAULT;
> }
>
> Otherwise... everything is now a mempolicy w/ MPOL_F_BIND and all the
> handling goes through the normal fault-paths :]
>
> static struct page *__alloc_pages_mpol(gfp_t gfp, unsigned int order,
> struct mempolicy *pol, pgoff_t ilx, int nid)
> {
> nodemask_t *nodemask;
> struct page *page;
> enum alloc_zonelist zlsel = (pol->flags & MPOL_F_PRIVATE) ?
> ALLOC_ZONELIST_PRIVATE : ALLOC_ZONELIST_DEFAULT;
> ...
> if (pol->mode == MPOL_PREFERRED_MANY)
> return alloc_pages_preferred_many(gfp, order, nid, nodemask,
> zlsel);
> ...
> }
>
>
> Switching to an alloc_flag would probably be trivially if that's really
> wanted
I guess not. Thanks for the explanation!
> ~Gregory
^ permalink raw reply
* Re: [PATCH v4 6/7] Documentation: bootconfig: document build-time cmdline rendering
From: Breno Leitao @ 2026-06-22 12:30 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Andrew Morton, Nathan Chancellor, paulmck, Nicolas Schier,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, kernel-team
In-Reply-To: <20260618094719.17bf5448351adc2e56c267fb@kernel.org>
On Thu, Jun 18, 2026 at 09:47:19AM +0900, Masami Hiramatsu wrote:
> On Wed, 17 Jun 2026 02:56:23 -0700
> Breno Leitao <leitao@debian.org> wrote:
>
> > On Wed, Jun 10, 2026 at 07:58:10AM -0700, Breno Leitao wrote:
> > > On Wed, Jun 10, 2026 at 11:37:20PM +0900, Masami Hiramatsu wrote:
> > > > To avoid confusion, when this option is used, shouldn't we treat it
> > > > the same way as if embedded command lines were enabled, and either
> > > > not display it in /proc/bootconfig (or always display it, by merging
> > > > the rendered string)?
> > >
> > > You're right that EMBED_CMDLINE breaks it: the embedded kernel.* keys
> > > are already in boot_command_line before setup_boot_config() ever sees
> > > the initrd bconf, so a user reading /proc/bootconfig would see only
> > > the initrd keys while parse_early_param() acted on the embedded ones.
> > > That's exactly the split-state Sashiko was circling around.
> > >
> > > Both options you suggest work for me, but they pull in opposite
> > > directions and I'd rather not guess wrong on the user-facing
> > > contract. Which do you prefer for v5?
> > >
> > > (a) Don't display embedded in /proc/bootconfig -- keep the current
> > > "file shows the active bootconfig source" behavior and document
> > > that with EMBED_CMDLINE=y, the kernel.* subtree may have been
> > > applied separately via the cmdline.
> > >
> > > (b) Always display embedded by merging the rendered string into
> > > /proc/bootconfig when EMBED_CMDLINE=y, so the file reflects
> > > what was actually applied.
> > >
> > > Happy to go either way
> >
> > Following up on my own mail rather than leaving it fully open: after
> > looking at the code more, I'd like to recommend (a).
>
> Agreed. Sorry for replying late.
No problem, thanks. Quick heads-up: v5 already went out and crossed with
this mail. It takes (a) and extends bootconfig.rst to walk through the
four sources (bootloader cmdline, embedded cmdline, initrd bootconfig,
embedded bootconfig), so that part is already in flight:
https://lore.kernel.org/r/20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org
The naming/mutual-exclusion rework below I'll fold into v6.
> Indeed. So I think this EMBED_CMDLINE is more like CMDLINE set by
> bootconfig file, instead of embedded string. That is useful for reusing
> the boot options. We need to change the explanation and clarify it.
Agreed, that's a much clearer model. v6 will reframe the Kconfig help and
bootconfig.rst around "this is CONFIG_CMDLINE, sourced from a bootconfig
file at build time" rather than "an embedded bootconfig that also feeds
the cmdline".
It also matches what the code already does precedence-wise: the rendered
"kernel" string is prepended to boot_command_line in setup_arch(), so it
sits in front of the bootloader args and parse_args() last-wins lets the
bootloader override it -- i.e. exactly CONFIG_CMDLINE without _OVERRIDE.
So this is mostly a rename + dependency + docs change, not a behavioral
one. (A _FORCE/_EXTEND-style variant could come later if there's demand;
the current behavior is the plain "overridable default" one.)
> Thus we should those configs mutual exclusive. If user already sets the
> CONFIG_CMDLINE, EMBED_CMDLINE should not be enabled.
Makes sense -- two built-in cmdline sources at once is confusing. I'll
make them mutually exclusive in v6. I'm thinking:
depends on CMDLINE = ""
on the new symbol. On x86 CONFIG_CMDLINE is a string that depends on
CMDLINE_BOOL and defaults to "", so this reads as "only offer the
bootconfig-rendered cmdline when no static CONFIG_CMDLINE is configured",
and it works the same on other arches that define CMDLINE as a string.
Does that match what you had in mind, or would you rather gate it the
other way (CMDLINE depends on !the-new-symbol)?
> So you can see CONFIG_BOOT_CONFIG_EMBED_CMDLINE is a bit special.
> I think it maybe natual that we call it CONFIG_CMDLINE_BOOT_CONFIG.
> In this case, we render the cmdline string from bootconfig build-time
> and set CONFIG_CMDLINE with the rendered cmdline string.
> So you can see CONFIG_BOOT_CONFIG_EMBED_CMDLINE is a bit special.
> I think it maybe natual that we call it CONFIG_CMDLINE_BOOT_CONFIG.
> In this case, we render the cmdline string from bootconfig build-time
> and set CONFIG_CMDLINE with the rendered cmdline string.
I'll rename it for v6. One nit: the arch opt-in symbol is already
ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, so CONFIG_CMDLINE_FROM_BOOTCONFIG would
pair with it verbatim. I'll use CONFIG_CMDLINE_FROM_BOOTCONFIG I'll rename it
for v6.
Another nit: the arch opt-in symbol is already
ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, so CONFIG_CMDLINE_FROM_BOOTCONFIG would
pair with it verbatim. I'll use CONFIG_CMDLINE_FROM_BOOTCONFIG unless you'd
rather keep CONFIG_CMDLINE_BOOT_CONFIG -- either is fine by me.
One clarification on "set CONFIG_CMDLINE with the rendered string":
CONFIG_CMDLINE is a Kconfig string fixed when .config is read, while the
render happens later during the build, so we can't literally store the
rendered text into CONFIG_CMDLINE. The mechanism stays "render into
.init.rodata, merge into boot_command_line in setup_arch()"; what changes
is how we name and document it, plus the mutual exclusion above. Let me
> So you can see CONFIG_BOOT_CONFIG_EMBED_CMDLINE is a bit special.
> I think it maybe natual that we call it CONFIG_CMDLINE_BOOT_CONFIG.
I'll rename it for v6. One nit: the arch opt-in symbol is already
ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, so CONFIG_CMDLINE_FROM_BOOTCONFIG
would pair with it verbatim. I'll use CONFIG_CMDLINE_FROM_BOOTCONFIG
> In this case, we render the cmdline string from bootconfig build-time
> and set CONFIG_CMDLINE with the rendered cmdline string.
CONFIG_CMDLINE is a Kconfig string fixed when .config is read, while the
render happens later during the build, so we can't literally store the
rendered text into CONFIG_CMDLINE? let me know if you can envision a way to
get it done.
> I think we can proceed it without rendering it in /proc/bootconfig
> at this point. And later we find the way to detect early parameters
> correctly, we can fix it.
Sounds good. I'll document the sharp edge (with both an embedded cmdline and an
initrd bootconfig, early params reflect the embedded values because the initrd
isn't parsed yet) and leave the early-param-aware override detection as the
follow-up you describe.
> (BTW, early parameter problem is a bit complicated. It is not hard
> to distinguish early parameters, but kernel accepts the same key
> for early parameter and normal parameter. e.g. "console=")
Right, console= being both is the awkward case. Agreed that's better as
its own series once we have a reliable way to detect early params.
So the v6 plan:
- rename CONFIG_BOOT_CONFIG_EMBED_CMDLINE -> CONFIG_CMDLINE_FROM_BOOTCONFIG
(or _BOOT_CONFIG, your call)
- make it mutually exclusive with CONFIG_CMDLINE (depends on CMDLINE = "")
- reframe the Kconfig help + bootconfig.rst as "CONFIG_CMDLINE from a
bootconfig file"
- keep (a): no rendering in /proc/bootconfig; document the early-param
sharp edge
- defer early-param-aware override detection to a follow-up
Thanks for the direction,
--breno
^ permalink raw reply
* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: David Hildenbrand (Arm) @ 2026-06-22 11:27 UTC (permalink / raw)
To: Alexei Starovoitov, Kaitao Cheng
Cc: Andrew Morton, Jens Axboe, Tejun Heo, Alexander Viro,
Christian Brauner, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Juri Lelli, Vincent Guittot, Paul Moore, Andy Shevchenko,
Paul E. McKenney, Shakeel Butt, Christian König,
David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, LKML,
open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
io-uring, audit, bpf, Network Development, dri-devel,
linux-perf-use., linux-trace-kernel, kexec, live-patching,
linux-modules, Linux Crypto Mailing List, Linux Power Management,
rcu, sched-ext, linux-mm, virtualization, damon,
clang-built-linux, chengkaitao
In-Reply-To: <CAADnVQJmPWFT01b7DuLdtafv=8FyB84GYHNZ8zSTck+9Aw0JpA@mail.gmail.com>
On 6/22/26 07:28, Alexei Starovoitov wrote:
> On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>
>> From: chengkaitao <chengkaitao@kylinos.cn>
>>
>> The list_for_each*_safe() helpers are used when the loop body may remove
>> the current entry. Their current interface, however, forces every caller
>> to define a temporary cursor outside the macro and pass it in, even when
>> the caller never uses that cursor directly. For most call sites this
>> extra cursor is just boilerplate required by the macro implementation.
>>
>> This is awkward because the saved next pointer is an internal detail of
>> the iteration. Callers that only remove or move the current entry do not
>> need to spell it out.
>>
>> The _safe() suffix has also caused confusion. Christian Koenig pointed
>> out that the name is easy to read as a thread-safe variant, especially
>> for beginners, even though it only means that the iterator keeps enough
>> state to tolerate removal of the current entry. He suggested _mutable()
>> as a clearer description of what the loop permits.
>>
>> Add *_mutable() iterator variants for list, hlist and llist. The new
>> helpers are variadic and support both forms. In the common case, the
>> caller omits the temporary cursor and the macro creates a unique internal
>> cursor with typeof(pos) and __UNIQUE_ID(). If a loop really needs an
>> explicit temporary cursor, the caller can still pass it and the helper
>> keeps the existing *_safe() behaviour.
>>
>> For example, a call site may use the shorter form:
>>
>> list_for_each_entry_mutable(pos, head, member)
>>
>> or keep the explicit temporary cursor form:
>>
>> list_for_each_entry_mutable(pos, tmp, head, member)
>>
>> The existing *_safe() helpers remain available for compatibility. This
>> series only converts users in mm, block, kernel, init and io_uring. If
>> this approach looks acceptable, the remaining users can be converted in
>> follow-up series.
>>
>> Changes in v3 (Christian König, Andy Shevchenko):
>> - Convert safe list walks to mutable iterators
>>
>> Changes in v2 (Muchun Song, Andy Shevchenko):
>> - Drop the list_for_each_entry_mutable*() helpers from v1 and make the
>> cursor change directly in the existing list_for_each_entry*() helpers.
>> - Open-code special list walks that rely on updating the loop cursor in
>> the body, preserving their existing traversal semantics.
>>
>> Link to v2:
>> https://lore.kernel.org/all/20260609061347.93688-1-kaitao.cheng@linux.dev/
>>
>> Link to v1:
>> https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/
>>
>> Kaitao Cheng (7):
>> list: Add mutable iterator variants
>> llist: Add mutable iterator variants
>> mm: Use mutable list iterators
>> block: Use mutable list iterators
>> kernel: Use mutable list iterators
>> initramfs: Use mutable list iterator
>> io_uring: Use mutable list iterators
>>
>> block/bfq-iosched.c | 17 +-
>> block/blk-cgroup.c | 12 +-
>> block/blk-flush.c | 4 +-
>> block/blk-iocost.c | 18 +-
>> block/blk-mq.c | 8 +-
>> block/blk-throttle.c | 4 +-
>> block/kyber-iosched.c | 4 +-
>> block/partitions/ldm.c | 8 +-
>> block/sed-opal.c | 4 +-
>> include/linux/list.h | 269 ++++++++++++++++++++++++----
>> include/linux/llist.h | 81 +++++++--
>> init/initramfs.c | 5 +-
>> io_uring/cancel.c | 6 +-
>> io_uring/poll.c | 3 +-
>> io_uring/rw.c | 4 +-
>> io_uring/timeout.c | 8 +-
>> io_uring/uring_cmd.c | 3 +-
>> kernel/audit_tree.c | 4 +-
>> kernel/audit_watch.c | 16 +-
>> kernel/auditfilter.c | 4 +-
>> kernel/auditsc.c | 4 +-
>> kernel/bpf/arena.c | 10 +-
>> kernel/bpf/arraymap.c | 8 +-
>> kernel/bpf/bpf_local_storage.c | 3 +-
>> kernel/bpf/bpf_lru_list.c | 25 ++-
>> kernel/bpf/btf.c | 18 +-
>> kernel/bpf/cgroup.c | 7 +-
>> kernel/bpf/cpumap.c | 4 +-
>> kernel/bpf/devmap.c | 10 +-
>> kernel/bpf/helpers.c | 8 +-
>> kernel/bpf/local_storage.c | 4 +-
>> kernel/bpf/memalloc.c | 16 +-
>> kernel/bpf/offload.c | 8 +-
>> kernel/bpf/states.c | 4 +-
>> kernel/bpf/stream.c | 4 +-
>> kernel/bpf/verifier.c | 6 +-
>> kernel/cgroup/cgroup-v1.c | 4 +-
>> kernel/cgroup/cgroup.c | 54 +++---
>> kernel/cgroup/dmem.c | 12 +-
>> kernel/cgroup/rdma.c | 8 +-
>> kernel/events/core.c | 44 +++--
>> kernel/events/uprobes.c | 12 +-
>> kernel/exit.c | 8 +-
>> kernel/fail_function.c | 4 +-
>> kernel/gcov/clang.c | 4 +-
>> kernel/irq_work.c | 4 +-
>> kernel/kexec_core.c | 4 +-
>> kernel/kprobes.c | 16 +-
>> kernel/livepatch/core.c | 4 +-
>> kernel/livepatch/core.h | 4 +-
>> kernel/liveupdate/kho_block.c | 4 +-
>> kernel/liveupdate/luo_flb.c | 4 +-
>> kernel/locking/rwsem.c | 2 +-
>> kernel/locking/test-ww_mutex.c | 2 +-
>> kernel/module/main.c | 11 +-
>> kernel/padata.c | 4 +-
>> kernel/power/snapshot.c | 8 +-
>> kernel/power/wakelock.c | 4 +-
>> kernel/printk/printk.c | 11 +-
>> kernel/ptrace.c | 4 +-
>> kernel/rcu/rcutorture.c | 3 +-
>> kernel/rcu/tasks.h | 9 +-
>> kernel/rcu/tree.c | 6 +-
>> kernel/resource.c | 4 +-
>> kernel/sched/core.c | 4 +-
>> kernel/sched/ext.c | 22 +--
>> kernel/sched/fair.c | 28 +--
>> kernel/sched/topology.c | 4 +-
>> kernel/sched/wait.c | 4 +-
>> kernel/seccomp.c | 4 +-
>> kernel/signal.c | 11 +-
>> kernel/smp.c | 4 +-
>> kernel/taskstats.c | 8 +-
>> kernel/time/clockevents.c | 6 +-
>> kernel/time/clocksource.c | 4 +-
>> kernel/time/posix-cpu-timers.c | 4 +-
>> kernel/time/posix-timers.c | 3 +-
>> kernel/torture.c | 3 +-
>> kernel/trace/bpf_trace.c | 4 +-
>> kernel/trace/ftrace.c | 49 +++--
>> kernel/trace/ring_buffer.c | 25 ++-
>> kernel/trace/trace.c | 12 +-
>> kernel/trace/trace_dynevent.c | 6 +-
>> kernel/trace/trace_dynevent.h | 5 +-
>> kernel/trace/trace_events.c | 35 ++--
>> kernel/trace/trace_events_filter.c | 4 +-
>> kernel/trace/trace_events_hist.c | 8 +-
>> kernel/trace/trace_events_trigger.c | 17 +-
>> kernel/trace/trace_events_user.c | 16 +-
>> kernel/trace/trace_stat.c | 4 +-
>> kernel/user-return-notifier.c | 3 +-
>> kernel/workqueue.c | 16 +-
>> mm/backing-dev.c | 8 +-
>> mm/balloon.c | 8 +-
>> mm/cma.c | 4 +-
>> mm/compaction.c | 4 +-
>> mm/damon/core.c | 4 +-
>> mm/damon/sysfs-schemes.c | 4 +-
>> mm/dmapool.c | 4 +-
>> mm/huge_memory.c | 8 +-
>> mm/hugetlb.c | 56 +++---
>> mm/hugetlb_vmemmap.c | 16 +-
>> mm/khugepaged.c | 14 +-
>> mm/kmemleak.c | 7 +-
>> mm/ksm.c | 25 +--
>> mm/list_lru.c | 4 +-
>> mm/memcontrol-v1.c | 8 +-
>> mm/memory-failure.c | 12 +-
>> mm/memory-tiers.c | 4 +-
>> mm/migrate.c | 23 ++-
>> mm/mmu_notifier.c | 9 +-
>> mm/page_alloc.c | 8 +-
>> mm/page_reporting.c | 2 +-
>> mm/percpu.c | 11 +-
>> mm/pgtable-generic.c | 4 +-
>> mm/rmap.c | 10 +-
>> mm/shmem.c | 9 +-
>> mm/slab_common.c | 14 +-
>> mm/slub.c | 33 ++--
>> mm/swapfile.c | 4 +-
>> mm/userfaultfd.c | 12 +-
>> mm/vmalloc.c | 24 +--
>> mm/vmscan.c | 7 +-
>> mm/zsmalloc.c | 4 +-
>> 124 files changed, 875 insertions(+), 681 deletions(-)
>
> Not sure what you were thinking, but this diff stat
> is not landable.
Agreed. If we decide we want this, I guess we should target per-subsystem
conversions.
If this goes through the MM tree, I would even appreciate doing this on a per-MM
component granularity.
(unless we have some magic "Linus converts all of them" script, which I doubt we
will have)
Is there a way forward to replace list_for_each_*_safe entirely, possibly just
reusing the old name but simply the parameter?
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Andy Shevchenko @ 2026-06-22 10:46 UTC (permalink / raw)
To: Kaitao Cheng
Cc: Alexei Starovoitov, Andrew Morton, David Hildenbrand, Jens Axboe,
Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Paul E. McKenney, Shakeel Butt, Christian König,
David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, LKML,
open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
io-uring, audit, bpf, Network Development, dri-devel,
linux-perf-use., linux-trace-kernel, kexec, live-patching,
linux-modules, Linux Crypto Mailing List, Linux Power Management,
rcu, sched-ext, linux-mm, virtualization, damon,
clang-built-linux, chengkaitao, Muchun Song
In-Reply-To: <8c8f1849-86d3-4c69-be27-30bbdffdf616@linux.dev>
On Mon, Jun 22, 2026 at 02:15:01PM +0800, Kaitao Cheng wrote:
> 在 2026/6/22 13:28, Alexei Starovoitov 写道:
> > On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
...
> >> block/bfq-iosched.c | 17 +-
> >> block/blk-cgroup.c | 12 +-
> >> block/blk-flush.c | 4 +-
> >> block/blk-iocost.c | 18 +-
> >> block/blk-mq.c | 8 +-
> >> block/blk-throttle.c | 4 +-
> >> block/kyber-iosched.c | 4 +-
> >> block/partitions/ldm.c | 8 +-
> >> block/sed-opal.c | 4 +-
> >> include/linux/list.h | 269 ++++++++++++++++++++++++----
> >> include/linux/llist.h | 81 +++++++--
> >> init/initramfs.c | 5 +-
> >> io_uring/cancel.c | 6 +-
> >> io_uring/poll.c | 3 +-
> >> io_uring/rw.c | 4 +-
> >> io_uring/timeout.c | 8 +-
> >> io_uring/uring_cmd.c | 3 +-
> >> kernel/audit_tree.c | 4 +-
> >> kernel/audit_watch.c | 16 +-
> >> kernel/auditfilter.c | 4 +-
> >> kernel/auditsc.c | 4 +-
> >> kernel/bpf/arena.c | 10 +-
> >> kernel/bpf/arraymap.c | 8 +-
> >> kernel/bpf/bpf_local_storage.c | 3 +-
> >> kernel/bpf/bpf_lru_list.c | 25 ++-
> >> kernel/bpf/btf.c | 18 +-
> >> kernel/bpf/cgroup.c | 7 +-
> >> kernel/bpf/cpumap.c | 4 +-
> >> kernel/bpf/devmap.c | 10 +-
> >> kernel/bpf/helpers.c | 8 +-
> >> kernel/bpf/local_storage.c | 4 +-
> >> kernel/bpf/memalloc.c | 16 +-
> >> kernel/bpf/offload.c | 8 +-
> >> kernel/bpf/states.c | 4 +-
> >> kernel/bpf/stream.c | 4 +-
> >> kernel/bpf/verifier.c | 6 +-
> >> kernel/cgroup/cgroup-v1.c | 4 +-
> >> kernel/cgroup/cgroup.c | 54 +++---
> >> kernel/cgroup/dmem.c | 12 +-
> >> kernel/cgroup/rdma.c | 8 +-
> >> kernel/events/core.c | 44 +++--
> >> kernel/events/uprobes.c | 12 +-
> >> kernel/exit.c | 8 +-
> >> kernel/fail_function.c | 4 +-
> >> kernel/gcov/clang.c | 4 +-
> >> kernel/irq_work.c | 4 +-
> >> kernel/kexec_core.c | 4 +-
> >> kernel/kprobes.c | 16 +-
> >> kernel/livepatch/core.c | 4 +-
> >> kernel/livepatch/core.h | 4 +-
> >> kernel/liveupdate/kho_block.c | 4 +-
> >> kernel/liveupdate/luo_flb.c | 4 +-
> >> kernel/locking/rwsem.c | 2 +-
> >> kernel/locking/test-ww_mutex.c | 2 +-
> >> kernel/module/main.c | 11 +-
> >> kernel/padata.c | 4 +-
> >> kernel/power/snapshot.c | 8 +-
> >> kernel/power/wakelock.c | 4 +-
> >> kernel/printk/printk.c | 11 +-
> >> kernel/ptrace.c | 4 +-
> >> kernel/rcu/rcutorture.c | 3 +-
> >> kernel/rcu/tasks.h | 9 +-
> >> kernel/rcu/tree.c | 6 +-
> >> kernel/resource.c | 4 +-
> >> kernel/sched/core.c | 4 +-
> >> kernel/sched/ext.c | 22 +--
> >> kernel/sched/fair.c | 28 +--
> >> kernel/sched/topology.c | 4 +-
> >> kernel/sched/wait.c | 4 +-
> >> kernel/seccomp.c | 4 +-
> >> kernel/signal.c | 11 +-
> >> kernel/smp.c | 4 +-
> >> kernel/taskstats.c | 8 +-
> >> kernel/time/clockevents.c | 6 +-
> >> kernel/time/clocksource.c | 4 +-
> >> kernel/time/posix-cpu-timers.c | 4 +-
> >> kernel/time/posix-timers.c | 3 +-
> >> kernel/torture.c | 3 +-
> >> kernel/trace/bpf_trace.c | 4 +-
> >> kernel/trace/ftrace.c | 49 +++--
> >> kernel/trace/ring_buffer.c | 25 ++-
> >> kernel/trace/trace.c | 12 +-
> >> kernel/trace/trace_dynevent.c | 6 +-
> >> kernel/trace/trace_dynevent.h | 5 +-
> >> kernel/trace/trace_events.c | 35 ++--
> >> kernel/trace/trace_events_filter.c | 4 +-
> >> kernel/trace/trace_events_hist.c | 8 +-
> >> kernel/trace/trace_events_trigger.c | 17 +-
> >> kernel/trace/trace_events_user.c | 16 +-
> >> kernel/trace/trace_stat.c | 4 +-
> >> kernel/user-return-notifier.c | 3 +-
> >> kernel/workqueue.c | 16 +-
> >> mm/backing-dev.c | 8 +-
> >> mm/balloon.c | 8 +-
> >> mm/cma.c | 4 +-
> >> mm/compaction.c | 4 +-
> >> mm/damon/core.c | 4 +-
> >> mm/damon/sysfs-schemes.c | 4 +-
> >> mm/dmapool.c | 4 +-
> >> mm/huge_memory.c | 8 +-
> >> mm/hugetlb.c | 56 +++---
> >> mm/hugetlb_vmemmap.c | 16 +-
> >> mm/khugepaged.c | 14 +-
> >> mm/kmemleak.c | 7 +-
> >> mm/ksm.c | 25 +--
> >> mm/list_lru.c | 4 +-
> >> mm/memcontrol-v1.c | 8 +-
> >> mm/memory-failure.c | 12 +-
> >> mm/memory-tiers.c | 4 +-
> >> mm/migrate.c | 23 ++-
> >> mm/mmu_notifier.c | 9 +-
> >> mm/page_alloc.c | 8 +-
> >> mm/page_reporting.c | 2 +-
> >> mm/percpu.c | 11 +-
> >> mm/pgtable-generic.c | 4 +-
> >> mm/rmap.c | 10 +-
> >> mm/shmem.c | 9 +-
> >> mm/slab_common.c | 14 +-
> >> mm/slub.c | 33 ++--
> >> mm/swapfile.c | 4 +-
> >> mm/userfaultfd.c | 12 +-
> >> mm/vmalloc.c | 24 +--
> >> mm/vmscan.c | 7 +-
> >> mm/zsmalloc.c | 4 +-
> >> 124 files changed, 875 insertions(+), 681 deletions(-)
> >
> > Not sure what you were thinking, but this diff stat
> > is not landable.
>
> [PATCH v3 1/7] and [PATCH v3 2/7] contain the main logic and can
> be merged directly. They are also compatible with the old API.
> [PATCH v3 3/7] through [PATCH v3 7/7] are just simple interface
> replacements and do not change any functional logic. They can be
> left unmerged for now; individual modules can pick them up later
> if needed.
>
> In v2, Andy Shevchenko mentioned: "If it's done by Linus himself
> during the day when he prepares -rc1, it's fine."
Yes, but you need to get his blessing first to go with this.
Have you communicated with him on this?
> Even so, the
> changes in this patch series are indeed quite large and touch
> almost every subsystem. I have only converted part of them for
> now, so I wanted to send this out first and see what people think.
That's why it's better to provide a script to convert (e.g., coccinelle)
instead of tons of patches.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
From: Vlastimil Babka (SUSE) @ 2026-06-22 9:55 UTC (permalink / raw)
To: Wandun, linux-mm, linux-kernel, linux-trace-kernel,
linux-rt-devel
Cc: akpm, surenb, mhocko, jackmanb, hannes, ziy, rostedt, mhiramat,
mathieu.desnoyers, david, ljs, liam, rppt, bigeasy, clrkwllms,
Alexander.Krabler, Hugh Dickins
In-Reply-To: <040788a9-e0d5-478e-bb48-3d22b8b41020@gmail.com>
On 6/18/26 13:43, Wandun wrote:
>
>
> On 6/18/26 02:52, Vlastimil Babka (SUSE) wrote:
>> On 6/4/26 04:38, Wandun Chen wrote:
>>> From: Wandun Chen <chenwandun@lixiang.com>
>>>
>>> compact_unevictable_allowed is default 0 under PREEMPT_RT,
>>> isolate_migratepages_block() skips folios with PG_unevictable set.
>>> However, mlock_folio() sets PG_mlocked immediately but defers
>>> PG_unevictable to mlock_folio_batch(), result in a folio with
>>> PG_mlocked=1 but PG_unevictable=0. Compaction will isolate such a
>>> folio.
>>>
>>> Fix by checking folio_test_mlocked() together with the existing
>>> folio_test_unevictable() check.
>>>
>>> A similar issue has been reported by Alexander Krabler on a 6.12-rt
>>> aarch64 system. Vlastimil suggested to check the mlocked flag [1].
>>>
>>> Reported-by: Alexander Krabler <Alexander.Krabler@kuka.com>
>>> Closes: https://lore.kernel.org/all/DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com/
>>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>>> Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
>>> Link: https://lore.kernel.org/all/33275585-f2db-4779-89f0-3ae24b455a67@suse.cz/ [1]
>>
>> Well in that thread, Hugh doubted my suggestion and then it seems we didn't
>> concluded anything. Did you actually in practice observe the issue that
>> Alexander had, and that this patch fixed it, or is that theoretical?
>>
> Yes, I wrote a test case that can reproduce it in a few second.
>
> The test case contains 3 steps:
> 1. mlockall
> 2. mmap file(2GB) + trigger file write page fault;
> 3. during step 1, trigger compact via /proc/sys/vm/compact_memory
>
>
> My reproduction environment is qemu with 4GB ram, 8 core, aarch64,
> preempt_rt and includes the tracepoint in patch 02.
> After running the reproduction program for a few seconds, the
> following output appears.
Ah, nice.
> repro-403 [004] ....1 101.270505: mm_compaction_isolate_folio: pfn=0x71e3a mode=0x0 flags=referenced|uptodate|mlocked
> repro-403 [004] ....1 101.270507: mm_compaction_isolate_folio: pfn=0x71e3b mode=0x0 flags=referenced|uptodate|mlocked
> repro-403 [004] ....1 101.270513: mm_compaction_isolate_folio: pfn=0x71e3c mode=0x0 flags=referenced|uptodate|mlocked
> repro-403 [004] ....1 101.270515: mm_compaction_isolate_folio: pfn=0x71e3d mode=0x0 flags=uptodate|mlocked
> repro-403 [004] ....1 101.270517: mm_compaction_isolate_folio: pfn=0x71e3e mode=0x0 flags=uptodate|mlocked
> repro-403 [004] ....1 101.270520: mm_compaction_isolate_folio: pfn=0x71e3f mode=0x0 flags=uptodate|mlocked
>
>
> Unfortunately, I recently found that there is still a bug in the
> fix patch. Setting mlocked in the mlock_folio function could happen
> even after the page is successfully isolated, so it still cannot
> prevent migration. Because of this, I need to think more about how
> to fix it.
>
> Perhaps we should double-check whether the page is mlocked during
> the actual migration phase.
So IIUC the isolation+migration might be started between the folio is
allocated, and mlocked? In that case the check during migration could still
be racy, and if the page is isolated, it's already bad for the RT process.
So this would only be a short-term problem after the mlockall, but we don't
have a way for the RT process to know the moment it's all settled, right?
Probably the proper solution would be for mlock[all]() itself to wait for an
isolated page, and only continue once it knows it can't be isolated anymore.
This might howver would go against some of the folio batching optimizations?
> What do you think of this best-effort approach?
>
>
> Best regards,
> Wandun
>
>
>
>
>
> The full reproducer is as below:
>
> /* gcc repro.c -o repro -lpthread */
>
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <pthread.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
>
> #define PAGE_SIZE 4096
> #define NR_PAGES 32
> #define FILE_SIZE (2ULL * 1024 * 1024 * 1024)
>
> static void *worker_fn(void *arg)
> {
> int fd = (long)arg;
> size_t len = (size_t)FILE_SIZE;
> char *p = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> if (p == MAP_FAILED)
> return NULL;
>
> for (size_t off = 0; off + NR_PAGES * PAGE_SIZE <= len;
> off += NR_PAGES * PAGE_SIZE) {
> for (int i = 0; i < NR_PAGES; i++)
> p[off + i * PAGE_SIZE] = 1;
> usleep(200);
> }
>
> munmap(p, len);
> return NULL;
> }
>
> static void *compact_fn(void *arg)
> {
> (void)arg;
> int fd = open("/proc/sys/vm/compact_memory", O_WRONLY);
> if (fd < 0)
> return NULL;
>
> while (1) {
> if (write(fd, "1", 1) < 0) {}
> usleep(5000);
> }
> }
>
> int main(void)
> {
> mlockall(MCL_CURRENT | MCL_FUTURE);
>
> int fd = open("./repro_largefile.dat", O_RDWR | O_CREAT, 0600);
> if (fd < 0)
> return 1;
> unlink("./repro_largefile.dat");
> if (ftruncate(fd, (off_t)FILE_SIZE) < 0)
> return 1;
>
> printf("repro_largefile: 1 worker, %d pages/batch, Ctrl-C to stop\n",
> NR_PAGES);
>
> pthread_t compact, worker;
> pthread_create(&compact, NULL, compact_fn, NULL);
> pthread_create(&worker, NULL, worker_fn, (void *)(long)fd);
>
> pthread_join(worker, NULL);
> return 0;
> }
>
>>> ---
>>> mm/compaction.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index b776f35ad020..7e07b792bcb5 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1116,7 +1116,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>> is_unevictable = folio_test_unevictable(folio);
>>>
>>> /* Compaction might skip unevictable pages but CMA takes them */
>>> - if (!(mode & ISOLATE_UNEVICTABLE) && is_unevictable)
>>> + if (!(mode & ISOLATE_UNEVICTABLE) &&
>>> + (is_unevictable || folio_test_mlocked(folio)))
>>> goto isolate_fail_put;
>>>
>>> /*
>>
>
^ permalink raw reply
* [PATCH v2] tracing: Use seq_buf for string concatenation
From: Woradorn Laodhanadhaworn @ 2026-06-22 9:46 UTC (permalink / raw)
To: rostedt
Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
linux-hardening, linux-kernel-mentees, shuah, skhan, me,
jkoolstra, woradorn.laon
In preparation for removing the strlcat API[1],
replace the string concatenation logic with a struct seq_buf,
which tracks the current position and the remaining space internally.
Use seq_buf_str() to NUL-terminate before passing to early_enable_events().
Link: https://github.com/KSPP/linux/issues/370 [1]
Signed-off-by: Woradorn Laodhanadhaworn <woradorn.laon@gmail.com>
---
v1 -> v2: Fixed WARN_ON when booting with empty trace_event.
v1: https://lore.kernel.org/all/20260620175441.223342-1-woradorn.laon@gmail.com
kernel/trace/trace_events.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c46e623e7e0d..1be62a46e49a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -22,6 +22,7 @@
#include <linux/sort.h>
#include <linux/slab.h>
#include <linux/delay.h>
+#include <linux/seq_buf.h>
#include <trace/events/sched.h>
#include <trace/syscall.h>
@@ -4500,14 +4501,20 @@ static void __add_event_to_tracers(struct trace_event_call *call)
extern struct trace_event_call *__start_ftrace_events[];
extern struct trace_event_call *__stop_ftrace_events[];
-static char bootup_event_buf[COMMAND_LINE_SIZE] __initdata;
+static struct seq_buf bootup_event_buf __initdata = {
+ .buffer = (char[COMMAND_LINE_SIZE]) {},
+ .size = COMMAND_LINE_SIZE,
+};
static __init int setup_trace_event(char *str)
{
- if (bootup_event_buf[0] != '\0')
- strlcat(bootup_event_buf, ",", COMMAND_LINE_SIZE);
+ if (seq_buf_used(&bootup_event_buf) > 0)
+ seq_buf_puts(&bootup_event_buf, ",");
+
+ seq_buf_puts(&bootup_event_buf, str);
- strlcat(bootup_event_buf, str, COMMAND_LINE_SIZE);
+ if (seq_buf_has_overflowed(&bootup_event_buf))
+ return -ENOMEM;
trace_set_ring_buffer_expanded(NULL);
disable_tracing_selftest("running event tracing");
@@ -4766,7 +4773,7 @@ static __init int event_trace_enable(void)
*/
__trace_early_add_events(tr);
- early_enable_events(tr, bootup_event_buf, false);
+ early_enable_events(tr, (char *)seq_buf_str(&bootup_event_buf), false);
trace_printk_start_comm();
@@ -4794,7 +4801,7 @@ static __init int event_trace_enable_again(void)
if (!tr)
return -ENODEV;
- early_enable_events(tr, bootup_event_buf, true);
+ early_enable_events(tr, (char *)seq_buf_str(&bootup_event_buf), true);
return 0;
}
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v3 9/9] selftests/verification: add tlob selftests
From: Gabriele Monaco @ 2026-06-22 9:26 UTC (permalink / raw)
To: wen.yang; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel
In-Reply-To: <4aeb668c8446a9f6366d92e218df386bef7bc965.1780847473.git.wen.yang@linux.dev>
On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
>
> +grep -qE "^p ${UPROBE_TARGET}:0x[0-9a-f]+ 0x[0-9a-f]+ threshold=[0-
> 9]+$" "$TLOB_MONITOR"
> +grep -q "threshold=5000000000" "$TLOB_MONITOR"
> +
> +! echo "p ${UPROBE_TARGET}:${busy_offset} ${stop_offset}
> threshold=9999000" > "$TLOB_MONITOR" 2>/dev/null
> +
> +echo "-${UPROBE_TARGET}:${busy_offset}" > "$TLOB_MONITOR"
> +! grep -q "^p .*:0x${busy_offset#0x} " "$TLOB_MONITOR"
...
> +! grep -q "error_env_tlob" /sys/kernel/tracing/trace
There is a widespread misconception under selftests that ! cmd with set -e
fails if the command succeeds, so you can use it to expect a program failure.
That isn't the case though [1], errexit (what set -e does), skips commands starting
with ! where the result isn't checked.
Essentially if you want to say exit if the command succeeds, you can do:
cmd && exit 1 # explicit
! cmd || exit 1 # explicit
or still exploiting errexit
cmd && false
! cmd || false
I'm going to fix it in existing selftests using the last variant, but keep
this in mind in your selftests as well. (Variants with ! are preferred because
they also return 0 if cmd fails, as you'd expect, I personally prefer to use
false instead of explicit exit, but that's up to you).
Thanks,
Gabriele
[1] - https://www.shellcheck.net/wiki/SC2251
> +echo 0 > monitors/tlob/enable
> +echo 0 > /sys/kernel/tracing/events/rv/error_env_tlob/enable
> +echo > /sys/kernel/tracing/trace
> diff --git
> a/tools/testing/selftests/verification/test.d/tlob/uprobe_violation.t
> c
> b/tools/testing/selftests/verification/test.d/tlob/uprobe_violation.t
> c
> new file mode 100644
> index 000000000000..d210d9c3a92d
> --- /dev/null
> +++
> b/tools/testing/selftests/verification/test.d/tlob/uprobe_violation.t
> c
> @@ -0,0 +1,67 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +# description: Test tlob monitor budget violation (error_env_tlob
> and detail_env_tlob fire with correct fields)
> +# requires: tlob:monitor
> +
> +RV_BINDIR="${RV_BINDIR:-$(realpath "$(dirname "${1:-$0}")")}"
> +UPROBE_TARGET="${RV_BINDIR}/tlob_target"
> +TLOB_SYM="${RV_BINDIR}/tlob_sym"
> +[ -x "$UPROBE_TARGET" ] || exit_unsupported
> +[ -x "$TLOB_SYM" ] || exit_unsupported
> +TLOB_MONITOR=monitors/tlob/monitor
> +
> +busy_offset=$("$TLOB_SYM" sym_offset "$UPROBE_TARGET" tlob_busy_work
> 2>/dev/null)
> +stop_offset=$("$TLOB_SYM" sym_offset "$UPROBE_TARGET"
> tlob_busy_work_done 2>/dev/null)
> +[ -n "$busy_offset" ] || exit_unsupported
> +[ -n "$stop_offset" ] || exit_unsupported
> +
> +"$UPROBE_TARGET" 30000 &
> +busy_pid=$!
> +sleep 0.05
> +
> +echo 1 > /sys/kernel/tracing/events/rv/error_env_tlob/enable
> +echo 1 > /sys/kernel/tracing/events/rv/detail_env_tlob/enable
> +echo 1 > /sys/kernel/tracing/tracing_on
> +echo 1 > monitors/tlob/enable
> +echo > /sys/kernel/tracing/trace
> +
> +# 10 µs budget - fires almost immediately; task is busy-spinning on-
> CPU.
> +echo "p ${UPROBE_TARGET}:${busy_offset} ${stop_offset}
> threshold=10000" > "$TLOB_MONITOR"
> +
> +# wait up to 2 s for detail_env_tlob
> +found=0; i=0
> +while [ "$i" -lt 20 ]; do
> + sleep 0.1
> + grep -q "detail_env_tlob" /sys/kernel/tracing/trace && {
> found=1; break; }
> + i=$((i+1))
> +done
> +
> +echo "-${UPROBE_TARGET}:${busy_offset}" > "$TLOB_MONITOR"
> 2>/dev/null
> +kill "$busy_pid" 2>/dev/null || true; wait "$busy_pid" 2>/dev/null
> || true
> +echo 0 > /sys/kernel/tracing/events/rv/error_env_tlob/enable
> +echo 0 > /sys/kernel/tracing/events/rv/detail_env_tlob/enable
> +echo 0 > monitors/tlob/enable
> +
> +[ "$found" = "1" ]
> +
> +# error_env_tlob must carry the clk_elapsed environment field.
> +# The event label is "budget_exceeded" when detected by the hrtimer
> callback,
> +# or the triggering sched event name when detected by the constraint
> path on a
> +# preemption that races with the timer (common on PREEMPT_RT / VM).
> Both are
> +# valid detections; check the env field instead of the label.
> +grep "error_env_tlob" /sys/kernel/tracing/trace | head -n 1 | grep -
> q "clk_elapsed="
> +
> +# detail_env_tlob must have all five fields with the correct
> threshold
> +line=$(grep "detail_env_tlob" /sys/kernel/tracing/trace | head -n 1)
> +echo "$line" | grep -q "pid="
> +echo "$line" | grep -q "threshold_ns=10000"
> +echo "$line" | grep -q "running_ns="
> +echo "$line" | grep -q "waiting_ns="
> +echo "$line" | grep -q "sleeping_ns="
> +
> +# Busy-spin keeps the task on-CPU: running_ns must exceed
> sleeping_ns.
> +running=$(echo "$line" | sed 's/.*running_ns=\([0-9]*\).*/\1/')
> +sleeping=$(echo "$line" | sed 's/.*sleeping_ns=\([0-9]*\).*/\1/')
> +[ "$running" -gt "$sleeping" ]
> +
> +echo > /sys/kernel/tracing/trace
^ permalink raw reply
* Re: [PATCH v8 01/46] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings
From: Binbin Wu @ 2026-06-22 9:08 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, brauner, chao.p.peng, david, jmattson,
jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, tabba, willy,
wyihan, yan.y.zhao, forkloop, pratyush, suzuki.poulose,
aneesh.kumar, liam, Paolo Bonzini, Sean Christopherson,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen, Yuanchu Xie,
Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt, Kiryl Shutsemau,
Baoquan He, Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-1-9d2959357853@google.com>
On 6/19/2026 8:31 AM, Ackerley Tng via B4 Relay wrote:
[...]
>
> +static u64 kvm_gmem_get_attributes(struct inode *inode, pgoff_t index)
> +{
> + struct maple_tree *mt = &GMEM_I(inode)->attributes;
> + void *entry = mtree_load(mt, index);
> +
> + return WARN_ON_ONCE(!entry) ? 0 : xa_to_value(entry);
If the entry is unexpectedly missing, returning 0 means the attribute would be treated as shared.
And then in kvm_gmem_fault_user_mapping(), it would allow the userspace to fault in the folio.
Should gmem deny such edge case?
> +}
> +
> +static bool kvm_gmem_is_private_mem(struct inode *inode, pgoff_t index)
> +{
> + return kvm_gmem_get_attributes(inode, index) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +}
> +
> +static bool kvm_gmem_is_shared_mem(struct inode *inode, pgoff_t index)
> +{
> + return !kvm_gmem_is_private_mem(inode, index);
> +}
> +
> static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
> pgoff_t index, struct folio *folio)
> {
> @@ -397,10 +423,13 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
> if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> return VM_FAULT_SIGBUS;
>
> - if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED))
> - return VM_FAULT_SIGBUS;
> + filemap_invalidate_lock_shared(inode->i_mapping);
> + if (kvm_gmem_is_shared_mem(inode, vmf->pgoff))
> + folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> + else
> + folio = ERR_PTR(-EACCES);
> + filemap_invalidate_unlock_shared(inode->i_mapping);
>
> - folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> if (IS_ERR(folio)) {
> if (PTR_ERR(folio) == -EAGAIN)
> return VM_FAULT_RETRY;
> @@ -557,6 +586,51 @@ bool __weak kvm_arch_supports_gmem_init_shared(struct kvm *kvm)
> return true;
> }
>
^ permalink raw reply
* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Peter Zijlstra @ 2026-06-22 8:34 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
Julia Lawall, Yury Norov, linux-doc, linux-kbuild, linuxppc-dev,
dri-devel, linux-stm32, linux-arm-kernel, linux-rdma, linux-usb,
linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260621093430.264983361@kernel.org>
On Sun, Jun 21, 2026 at 05:34:30AM -0400, Steven Rostedt wrote:
> There's been complaints about trace_printk() being defined in kernel.h as it
> can increase the compilation time. As it is only used by some developers for
> debugging purposes, it should not be in kernel.h causing lots of wasted CPU
> cycles for those that do not ever care about it.
>
> Instead, add a CONFIG_TRACE_PRINTK_DEBUGGING option that developers that do
> use it can set and not have to always remember to add #include <linux/trace_printk.h>
> to the files they add trace_printk() while debugging. It also means that
> those that do not have that config set will not have to worry about wasted
> CPU cycles as it is only include in the CFLAGS when the option is set, and
> its completely ignored otherwise.
Did you forget your C 101 class? If you use a function, you gotta
include the relevant header.
You don't see userspace saying: 'Hey, you know what, perhaps we should
add stdio.h to every other header, just in case someone wants to
printf()' either.
I really don't understand your argument. Yes, maybe someone will forget
and then either their editor (if they have a halfway modern setup with
LSP enabled) or their build will complain, but so what? This is all
trivial stuff, surely we have more pressing matters to concern outselves
with?
^ permalink raw reply
* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Steven Rostedt @ 2026-06-22 8:53 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
Julia Lawall, Yury Norov, linux-doc, linux-kbuild, linuxppc-dev,
dri-devel, linux-stm32, linux-arm-kernel, linux-rdma, linux-usb,
linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260622083440.GX49951@noisy.programming.kicks-ass.net>
On Mon, 22 Jun 2026 10:34:40 +0200
Peter Zijlstra <peterz@infradead.org> wrote:
> Did you forget your C 101 class? If you use a function, you gotta
> include the relevant header.
If this was the way it was back in 2009, yeah sure. But the header
wasn't need for 17 years. Now it suddenly will be.
-- Steve
^ permalink raw reply
* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Christian König @ 2026-06-22 8:51 UTC (permalink / raw)
To: Kaitao Cheng, Andrew Morton, David Hildenbrand, Jens Axboe,
Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Andy Shevchenko, Paul E. McKenney, Shakeel Butt
Cc: David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, linux-kernel, cgroups,
linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf, netdev,
dri-devel, linux-perf-users, linux-trace-kernel, kexec,
live-patching, linux-modules, linux-crypto, linux-pm, rcu,
sched-ext, linux-mm, virtualization, damon, llvm, Kaitao Cheng
In-Reply-To: <20260622040533.29824-2-kaitao.cheng@linux.dev>
On 6/22/26 06:05, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>
> The list_for_each*_safe() helpers are used when the loop body may
> remove the current entry. Their API exposes the temporary cursor at
> every call site, even though most users only need it for the iterator
> implementation and never reference it in the loop body.
>
> Add *_mutable() variants for list and hlist iteration. The new helpers
> support both forms: callers may keep passing an explicit temporary cursor
> when they need to inspect or reset it, or omit it and let the helper use
> a unique internal cursor.
That sounds like a bad idea to me. The macro should really be doing one job and that as best as it can.
> This makes call sites that only mutate the list through the current entry
> less noisy, while keeping the existing *_safe() helpers available for
> compatibility.
This can be perfectly used for code that which really needs the separate variable for the next entry.
Regards,
Christian.
>
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
> ---
> include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
> 1 file changed, 231 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/list.h b/include/linux/list.h
> index 09d979976b3b..1081def7cea9 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -7,6 +7,7 @@
> #include <linux/stddef.h>
> #include <linux/poison.h>
> #include <linux/const.h>
> +#include <linux/args.h>
>
> #include <asm/barrier.h>
>
> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
> #define list_for_each_prev(pos, head) \
> for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>
> -/**
> - * list_for_each_safe - iterate over a list safe against removal of list entry
> - * @pos: the &struct list_head to use as a loop cursor.
> - * @n: another &struct list_head to use as temporary storage
> - * @head: the head for your list.
> +/*
> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
> */
> #define list_for_each_safe(pos, n, head) \
> for (pos = (head)->next, n = pos->next; \
> !list_is_head(pos, (head)); \
> pos = n, n = pos->next)
>
> +#define __list_for_each_mutable_internal(pos, tmp, head) \
> + for (typeof(pos) tmp = (pos = (head)->next)->next; \
> + !list_is_head(pos, (head)); \
> + pos = tmp, tmp = pos->next)
> +
> +#define __list_for_each_mutable1(pos, head) \
> + __list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
> +
> +#define __list_for_each_mutable2(pos, next, head) \
> + list_for_each_safe(pos, next, head)
> +
> /**
> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
> + * list_for_each_mutable - iterate over a list safe against entry removal
> * @pos: the &struct list_head to use as a loop cursor.
> - * @n: another &struct list_head to use as temporary storage
> - * @head: the head for your list.
> + * @...: either (head) or (next, head)
> + *
> + * next: another &struct list_head to use as optional temporary storage.
> + * The temporary cursor is internal unless explicitly supplied by
> + * the caller.
> + * head: the head for your list.
> + */
> +#define list_for_each_mutable(pos, ...) \
> + CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
> + (pos, __VA_ARGS__)
> +
> +/*
> + * list_for_each_prev_safe is an old interface, use list_for_each_prev_mutable instead.
> */
> #define list_for_each_prev_safe(pos, n, head) \
> for (pos = (head)->prev, n = pos->prev; \
> !list_is_head(pos, (head)); \
> pos = n, n = pos->prev)
>
> +#define __list_for_each_prev_mutable_internal(pos, tmp, head) \
> + for (typeof(pos) tmp = (pos = (head)->prev)->prev; \
> + !list_is_head(pos, (head)); \
> + pos = tmp, tmp = pos->prev)
> +
> +#define __list_for_each_prev_mutable1(pos, head) \
> + __list_for_each_prev_mutable_internal(pos, __UNIQUE_ID(prev), head)
> +
> +#define __list_for_each_prev_mutable2(pos, prev, head) \
> + list_for_each_prev_safe(pos, prev, head)
> +
> +/**
> + * list_for_each_prev_mutable - iterate over a list backwards safe against entry removal
> + * @pos: the &struct list_head to use as a loop cursor.
> + * @...: either (head) or (prev, head)
> + *
> + * prev: another &struct list_head to use as optional temporary storage.
> + * The temporary cursor is internal unless explicitly supplied by
> + * the caller.
> + * head: the head for your list.
> + */
> +#define list_for_each_prev_mutable(pos, ...) \
> + CONCATENATE(__list_for_each_prev_mutable, COUNT_ARGS(__VA_ARGS__)) \
> + (pos, __VA_ARGS__)
> +
> /**
> * list_count_nodes - count nodes in the list
> * @head: the head for your list.
> @@ -895,12 +940,8 @@ static inline size_t list_count_nodes(struct list_head *head)
> for (; !list_entry_is_head(pos, head, member); \
> pos = list_prev_entry(pos, member))
>
> -/**
> - * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry
> - * @pos: the type * to use as a loop cursor.
> - * @n: another type * to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the list_head within the struct.
> +/*
> + * list_for_each_entry_safe is an old interface, use list_for_each_entry_mutable instead.
> */
> #define list_for_each_entry_safe(pos, n, head, member) \
> for (pos = list_first_entry(head, typeof(*pos), member), \
> @@ -908,15 +949,36 @@ static inline size_t list_count_nodes(struct list_head *head)
> !list_entry_is_head(pos, head, member); \
> pos = n, n = list_next_entry(n, member))
>
> +#define __list_for_each_entry_mutable_internal(pos, tmp, head, member) \
> + for (typeof(pos) tmp = list_next_entry(pos = \
> + list_first_entry(head, typeof(*pos), member), member); \
> + !list_entry_is_head(pos, head, member); \
> + pos = tmp, tmp = list_next_entry(tmp, member))
> +
> +#define __list_for_each_entry_mutable2(pos, head, member) \
> + __list_for_each_entry_mutable_internal(pos, __UNIQUE_ID(next), head, member)
> +
> +#define __list_for_each_entry_mutable3(pos, next, head, member) \
> + list_for_each_entry_safe(pos, next, head, member)
> +
> /**
> - * list_for_each_entry_safe_continue - continue list iteration safe against removal
> + * list_for_each_entry_mutable - iterate over a list safe against entry removal
> * @pos: the type * to use as a loop cursor.
> - * @n: another type * to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the list_head within the struct.
> + * @...: either (head, member) or (next, head, member)
> *
> - * Iterate over list of given type, continuing after current point,
> - * safe against removal of list entry.
> + * next: another type * to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your list.
> + * member: the name of the list_head within the struct.
> + */
> +#define list_for_each_entry_mutable(pos, ...) \
> + CONCATENATE(__list_for_each_entry_mutable, COUNT_ARGS(__VA_ARGS__)) \
> + (pos, __VA_ARGS__)
> +
> +/*
> + * list_for_each_entry_safe_continue is an old interface,
> + * use list_for_each_entry_mutable_continue instead.
> */
> #define list_for_each_entry_safe_continue(pos, n, head, member) \
> for (pos = list_next_entry(pos, member), \
> @@ -924,30 +986,79 @@ static inline size_t list_count_nodes(struct list_head *head)
> !list_entry_is_head(pos, head, member); \
> pos = n, n = list_next_entry(n, member))
>
> +#define __list_for_each_entry_mutable_continue_internal(pos, tmp, head, member) \
> + for (typeof(pos) tmp = list_next_entry(pos = \
> + list_next_entry(pos, member), member); \
> + !list_entry_is_head(pos, head, member); \
> + pos = tmp, tmp = list_next_entry(tmp, member))
> +
> +#define __list_for_each_entry_mutable_continue2(pos, head, member) \
> + __list_for_each_entry_mutable_continue_internal(pos, \
> + __UNIQUE_ID(next), head, member)
> +
> +#define __list_for_each_entry_mutable_continue3(pos, next, head, member) \
> + list_for_each_entry_safe_continue(pos, next, head, member)
> +
> /**
> - * list_for_each_entry_safe_from - iterate over list from current point safe against removal
> + * list_for_each_entry_mutable_continue - continue list iteration safe against removal
> * @pos: the type * to use as a loop cursor.
> - * @n: another type * to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the list_head within the struct.
> + * @...: either (head, member) or (next, head, member)
> *
> - * Iterate over list of given type from current point, safe against
> - * removal of list entry.
> + * next: another type * to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your list.
> + * member: the name of the list_head within the struct.
> + *
> + * Iterate over list of given type, continuing after current point,
> + * safe against removal of list entry.
> + */
> +#define list_for_each_entry_mutable_continue(pos, ...) \
> + CONCATENATE(__list_for_each_entry_mutable_continue, \
> + COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
> +
> +/*
> + * list_for_each_entry_safe_from is an old interface,
> + * use list_for_each_entry_mutable_from instead.
> */
> #define list_for_each_entry_safe_from(pos, n, head, member) \
> for (n = list_next_entry(pos, member); \
> !list_entry_is_head(pos, head, member); \
> pos = n, n = list_next_entry(n, member))
>
> +#define __list_for_each_entry_mutable_from_internal(pos, tmp, head, member) \
> + for (typeof(pos) tmp = list_next_entry(pos, member); \
> + !list_entry_is_head(pos, head, member); \
> + pos = tmp, tmp = list_next_entry(tmp, member))
> +
> +#define __list_for_each_entry_mutable_from2(pos, head, member) \
> + __list_for_each_entry_mutable_from_internal(pos, \
> + __UNIQUE_ID(next), head, member)
> +
> +#define __list_for_each_entry_mutable_from3(pos, next, head, member) \
> + list_for_each_entry_safe_from(pos, next, head, member)
> +
> /**
> - * list_for_each_entry_safe_reverse - iterate backwards over list safe against removal
> + * list_for_each_entry_mutable_from - iterate over list from current point safe against removal
> * @pos: the type * to use as a loop cursor.
> - * @n: another type * to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the list_head within the struct.
> + * @...: either (head, member) or (next, head, member)
> *
> - * Iterate backwards over list of given type, safe against removal
> - * of list entry.
> + * next: another type * to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your list.
> + * member: the name of the list_head within the struct.
> + *
> + * Iterate over list of given type from current point, safe against
> + * removal of list entry.
> + */
> +#define list_for_each_entry_mutable_from(pos, ...) \
> + CONCATENATE(__list_for_each_entry_mutable_from, \
> + COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
> +
> +/*
> + * list_for_each_entry_safe_reverse is an old interface,
> + * use list_for_each_entry_mutable_reverse instead.
> */
> #define list_for_each_entry_safe_reverse(pos, n, head, member) \
> for (pos = list_last_entry(head, typeof(*pos), member), \
> @@ -955,6 +1066,37 @@ static inline size_t list_count_nodes(struct list_head *head)
> !list_entry_is_head(pos, head, member); \
> pos = n, n = list_prev_entry(n, member))
>
> +#define __list_for_each_entry_mutable_reverse_internal(pos, tmp, head, member) \
> + for (typeof(pos) tmp = list_prev_entry(pos = \
> + list_last_entry(head, typeof(*pos), member), member); \
> + !list_entry_is_head(pos, head, member); \
> + pos = tmp, tmp = list_prev_entry(tmp, member))
> +
> +#define __list_for_each_entry_mutable_reverse2(pos, head, member) \
> + __list_for_each_entry_mutable_reverse_internal(pos, \
> + __UNIQUE_ID(prev), head, member)
> +
> +#define __list_for_each_entry_mutable_reverse3(pos, prev, head, member) \
> + list_for_each_entry_safe_reverse(pos, prev, head, member)
> +
> +/**
> + * list_for_each_entry_mutable_reverse - iterate backwards over list safe against removal
> + * @pos: the type * to use as a loop cursor.
> + * @...: either (head, member) or (prev, head, member)
> + *
> + * prev: another type * to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your list.
> + * member: the name of the list_head within the struct.
> + *
> + * Iterate backwards over list of given type, safe against removal
> + * of list entry.
> + */
> +#define list_for_each_entry_mutable_reverse(pos, ...) \
> + CONCATENATE(__list_for_each_entry_mutable_reverse, \
> + COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
> +
> /**
> * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
> * @pos: the loop cursor used in the list_for_each_entry_safe loop
> @@ -1189,6 +1331,31 @@ static inline void hlist_splice_init(struct hlist_head *from,
> for (pos = (head)->first; pos && ({ n = pos->next; 1; }); \
> pos = n)
>
> +#define __hlist_for_each_mutable_internal(pos, tmp, head) \
> + for (typeof(pos) tmp = (pos = (head)->first) ? pos->next : NULL; \
> + pos; \
> + pos = tmp, tmp = pos ? pos->next : NULL)
> +
> +#define __hlist_for_each_mutable1(pos, head) \
> + __hlist_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
> +
> +#define __hlist_for_each_mutable2(pos, next, head) \
> + hlist_for_each_safe(pos, next, head)
> +
> +/**
> + * hlist_for_each_mutable - iterate over a hlist safe against entry removal
> + * @pos: the &struct hlist_node to use as a loop cursor.
> + * @...: either (head) or (next, head)
> + *
> + * next: another &struct hlist_node to use as optional temporary storage.
> + * The temporary cursor is internal unless explicitly supplied by
> + * the caller.
> + * head: the head for your hlist.
> + */
> +#define hlist_for_each_mutable(pos, ...) \
> + CONCATENATE(__hlist_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
> + (pos, __VA_ARGS__)
> +
> #define hlist_entry_safe(ptr, type, member) \
> ({ typeof(ptr) ____ptr = (ptr); \
> ____ptr ? hlist_entry(____ptr, type, member) : NULL; \
> @@ -1224,18 +1391,44 @@ static inline void hlist_splice_init(struct hlist_head *from,
> for (; pos; \
> pos = hlist_entry_safe((pos)->member.next, typeof(*(pos)), member))
>
> -/**
> - * hlist_for_each_entry_safe - iterate over list of given type safe against removal of list entry
> - * @pos: the type * to use as a loop cursor.
> - * @n: a &struct hlist_node to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the hlist_node within the struct.
> +/*
> + * hlist_for_each_entry_safe is an old interface, use hlist_for_each_entry_mutable instead.
> */
> #define hlist_for_each_entry_safe(pos, n, head, member) \
> for (pos = hlist_entry_safe((head)->first, typeof(*pos), member);\
> pos && ({ n = pos->member.next; 1; }); \
> pos = hlist_entry_safe(n, typeof(*pos), member))
>
> +#define __hlist_for_each_entry_mutable_internal(pos, tmp, head, member) \
> + for (struct hlist_node *tmp = (pos = \
> + hlist_entry_safe((head)->first, typeof(*pos), member)) ? \
> + pos->member.next : NULL; \
> + pos; \
> + pos = hlist_entry_safe((tmp), typeof(*pos), member), \
> + tmp = pos ? pos->member.next : NULL)
> +
> +#define __hlist_for_each_entry_mutable2(pos, head, member) \
> + __hlist_for_each_entry_mutable_internal(pos, \
> + __UNIQUE_ID(next), head, member)
> +
> +#define __hlist_for_each_entry_mutable3(pos, next, head, member) \
> + hlist_for_each_entry_safe(pos, next, head, member)
> +
> +/**
> + * hlist_for_each_entry_mutable - iterate over hlist safe against entry removal
> + * @pos: the type * to use as a loop cursor.
> + * @...: either (head, member) or (next, head, member)
> + *
> + * next: a &struct hlist_node to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your hlist.
> + * member: the name of the hlist_node within the struct.
> + */
> +#define hlist_for_each_entry_mutable(pos, ...) \
> + CONCATENATE(__hlist_for_each_entry_mutable, \
> + COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
> +
> /**
> * hlist_count_nodes - count nodes in the hlist
> * @head: the head for your hlist.
^ permalink raw reply
* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: David Laight @ 2026-06-22 8:42 UTC (permalink / raw)
To: Kaitao Cheng
Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
Christian König, David Howells, Simona Vetter, Randy Dunlap,
Luca Ceresoli, Philipp Stanner, linux-block, linux-kernel,
cgroups, linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf,
netdev, dri-devel, linux-perf-users, linux-trace-kernel, kexec,
live-patching, linux-modules, linux-crypto, linux-pm, rcu,
sched-ext, linux-mm, virtualization, damon, llvm, Kaitao Cheng
In-Reply-To: <20260622040533.29824-2-kaitao.cheng@linux.dev>
On Mon, 22 Jun 2026 12:05:31 +0800
Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>
> The list_for_each*_safe() helpers are used when the loop body may
> remove the current entry. Their API exposes the temporary cursor at
> every call site, even though most users only need it for the iterator
> implementation and never reference it in the loop body.
>
> Add *_mutable() variants for list and hlist iteration. The new helpers
> support both forms: callers may keep passing an explicit temporary cursor
> when they need to inspect or reset it, or omit it and let the helper use
> a unique internal cursor.
I'm not really sure 'mutable' means anything either.
It is possible to make it valid for the loop body (or even other threads)
to delete arbitrary list items - but that needs significant extra overheads.
It might be worth doing something that doesn't need the extra variable,
but there is little point doing all the churn just to rename things.
>
> This makes call sites that only mutate the list through the current entry
> less noisy, while keeping the existing *_safe() helpers available for
> compatibility.
>
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
> ---
> include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
> 1 file changed, 231 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/list.h b/include/linux/list.h
> index 09d979976b3b..1081def7cea9 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -7,6 +7,7 @@
> #include <linux/stddef.h>
> #include <linux/poison.h>
> #include <linux/const.h>
> +#include <linux/args.h>
>
> #include <asm/barrier.h>
>
> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
> #define list_for_each_prev(pos, head) \
> for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>
> -/**
> - * list_for_each_safe - iterate over a list safe against removal of list entry
> - * @pos: the &struct list_head to use as a loop cursor.
> - * @n: another &struct list_head to use as temporary storage
> - * @head: the head for your list.
> +/*
> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
> */
> #define list_for_each_safe(pos, n, head) \
> for (pos = (head)->next, n = pos->next; \
> !list_is_head(pos, (head)); \
> pos = n, n = pos->next)
>
> +#define __list_for_each_mutable_internal(pos, tmp, head) \
> + for (typeof(pos) tmp = (pos = (head)->next)->next; \
Use auto
> + !list_is_head(pos, (head)); \
> + pos = tmp, tmp = pos->next)
> +
> +#define __list_for_each_mutable1(pos, head) \
> + __list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
> +
> +#define __list_for_each_mutable2(pos, next, head) \
> + list_for_each_safe(pos, next, head)
> +
> /**
> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
> + * list_for_each_mutable - iterate over a list safe against entry removal
> * @pos: the &struct list_head to use as a loop cursor.
> - * @n: another &struct list_head to use as temporary storage
> - * @head: the head for your list.
> + * @...: either (head) or (next, head)
> + *
> + * next: another &struct list_head to use as optional temporary storage.
> + * The temporary cursor is internal unless explicitly supplied by
> + * the caller.
> + * head: the head for your list.
> + */
> +#define list_for_each_mutable(pos, ...) \
> + CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
> + (pos, __VA_ARGS__)
The variable argument count logic really just slows down compilation.
Maybe there aren't enough copies of this code to make that significant.
But just because you can do it doesn't mean it is a gooD idea.
I'm also not sure it really adds anything to the readability.
And, it you are going to make the middle argument optional there is
no need to change the macro name.
David
^ permalink raw reply
* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Jani Nikula @ 2026-06-22 8:37 UTC (permalink / raw)
To: Kaitao Cheng, Andrew Morton, David Hildenbrand, Jens Axboe,
Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
Christian König
Cc: David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, linux-kernel, cgroups,
linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf, netdev,
dri-devel, linux-perf-users, linux-trace-kernel, kexec,
live-patching, linux-modules, linux-crypto, linux-pm, rcu,
sched-ext, linux-mm, virtualization, damon, llvm, chengkaitao
In-Reply-To: <20260622040533.29824-1-kaitao.cheng@linux.dev>
On Mon, 22 Jun 2026, Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> Add *_mutable() iterator variants for list, hlist and llist. The new
> helpers are variadic and support both forms. In the common case, the
> caller omits the temporary cursor and the macro creates a unique internal
> cursor with typeof(pos) and __UNIQUE_ID(). If a loop really needs an
> explicit temporary cursor, the caller can still pass it and the helper
> keeps the existing *_safe() behaviour.
>
> For example, a call site may use the shorter form:
>
> list_for_each_entry_mutable(pos, head, member)
>
> or keep the explicit temporary cursor form:
>
> list_for_each_entry_mutable(pos, tmp, head, member)
I'm unconvinced it's a good idea to allow two forms with macro trickery,
*especially* when it's not the last argument you can omit. I think it's
a footgun.
IMO stick with the first form only, and there'll always be the _safe
variant that can be used when the temp pointer is needed.
BR,
Jani.
--
Jani Nikula, Intel
^ permalink raw reply
* Re: [PATCH] tracing: Use seq_buf for string concatenation
From: Woradorn Laodhanadhaworn @ 2026-06-22 8:18 UTC (permalink / raw)
To: Jori Koolstra, rostedt
Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
linux-hardening, linux-kernel-mentees, shuah, skhan, me
In-Reply-To: <70408559.2499685.1782062190117@kpc.webmail.kpnmail.nl>
On 22/6/2569 BE 00:16, Jori Koolstra wrote:
>
>> Op 20-06-2026 19:54 CEST schreef Woradorn Laodhanadhaworn <woradorn.laon@gmail.com>:
>>
>>
>> In preparation for removing the strlcat API[1],
>> replace the string concatenation logic with a struct seq_buf,
>> which tracks the current position and the remaining space internally.
>>
>> The backing buffer bootup_event_buf allocation is unchanged.
>> Use seq_buf_str() to NUL-terminate before passing to early_enable_events().
>>
>> Link: https://github.com/KSPP/linux/issues/370 [1]
>>
>> Signed-off-by: Woradorn Laodhanadhaworn <woradorn.laon@gmail.com>
>> ---
>> kernel/trace/trace_events.c | 21 ++++++++++++++++-----
>> 1 file changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
>> index c46e623e7e0d..15164723e028 100644
>> --- a/kernel/trace/trace_events.c
>> +++ b/kernel/trace/trace_events.c
>> @@ -22,6 +22,7 @@
>> #include <linux/sort.h>
>> #include <linux/slab.h>
>> #include <linux/delay.h>
>> +#include <linux/seq_buf.h>
>>
>> #include <trace/events/sched.h>
>> #include <trace/syscall.h>
>> @@ -4501,13 +4502,23 @@ extern struct trace_event_call *__start_ftrace_events[];
>> extern struct trace_event_call *__stop_ftrace_events[];
>>
>> static char bootup_event_buf[COMMAND_LINE_SIZE] __initdata;
>
> Isn't this now unused?
>
>> +static struct seq_buf bootup_event_seq;
>> +static bool bootup_event_seq_initialized;
>>
>
> I think this can be refactored to avoid the bool. And should bootup_event_seq not be
> __initdata?
>
>> static __init int setup_trace_event(char *str)
>> {
>> - if (bootup_event_buf[0] != '\0')
>> - strlcat(bootup_event_buf, ",", COMMAND_LINE_SIZE);
>> + if (!bootup_event_seq_initialized) {
>> + seq_buf_init(&bootup_event_seq, bootup_event_buf, COMMAND_LINE_SIZE);
>> + bootup_event_seq_initialized = true;
>> + }
>> +
>> + if (seq_buf_used(&bootup_event_seq) > 0)
>> + seq_buf_puts(&bootup_event_seq, ",");
>>
>> - strlcat(bootup_event_buf, str, COMMAND_LINE_SIZE);
>> + seq_buf_puts(&bootup_event_seq, str);
>> +
>> + if (seq_buf_has_overflowed(&bootup_event_seq))
>> + return -ENOMEM;
>>
>> trace_set_ring_buffer_expanded(NULL);
>> disable_tracing_selftest("running event tracing");
>> @@ -4766,7 +4777,7 @@ static __init int event_trace_enable(void)
>> */
>> __trace_early_add_events(tr);
>>
>> - early_enable_events(tr, bootup_event_buf, false);
>> + early_enable_events(tr, (char *)seq_buf_str(&bootup_event_seq), false);
>
> What if trace_event is empty? Then setup_trace_event does not run AFAIK. See the
> WARN_ON in seq_buf_str too. Have you tested this?
>
>>
>> trace_printk_start_comm();
>>
>> @@ -4794,7 +4805,7 @@ static __init int event_trace_enable_again(void)
>> if (!tr)
>> return -ENODEV;
>>
>> - early_enable_events(tr, bootup_event_buf, true);
>> + early_enable_events(tr, (char *)seq_buf_str(&bootup_event_seq), true);
>>
>> return 0;
>> }
>> --
>> 2.43.0
>
> Thanks,
> Jori.
Thank you, Jori, for your review. I will send v2.
I tested both empty and non-empty trace_event cases with QEMU, and both now boot successfully.
The logs below demonstrate this.
Non-empty trace_event:
qemu-system-x86_64 \
-kernel arch/x86/boot/bzImage \
-initrd initramfs.cpio.gz \
-append "console=ttyS0 trace_event=:mod:rproc_qcom_common,:mod:qrtr,:mod:qcom_aoss trace_event=:mod:rproc_qcom_common" \
-nographic \
-m 512M
[ 0.082316] Kernel command line: console=ttyS0 trace_event=:mod:rproc_qcom_common,:mod:qrtr,:mod:qcom_aoss trace_event=:mod:rproc_qcom_common
[ 0.083324] bootup_event_buf: :mod:rproc_qcom_common,:mod:qrtr,:mod:qcom_aoss
[ 0.083413] bootup_event_buf: :mod:rproc_qcom_common,:mod:qrtr,:mod:qcom_aoss,:mod:rproc_qcom_common
[ 0.083689] printk: log buffer data + meta data: 262144 + 917504 = 1179648 bytes
Empty trace_event:
qemu-system-x86_64 \
-kernel arch/x86/boot/bzImage \
-initrd initramfs.cpio.gz \
-append "console=ttyS0" \
-nographic \
-m 512M
[ 0.085213] Kernel command line: console=ttyS0
[ 0.086458] printk: log buffer data + meta data: 262144 + 917504 = 1179648 bytes
Thanks,
Woradorn
^ permalink raw reply
* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Christophe Leroy (CS GROUP) @ 2026-06-22 8:05 UTC (permalink / raw)
To: Steven Rostedt, linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
kvm, intel-gfx
In-Reply-To: <20260621093430.264983361@kernel.org>
Le 21/06/2026 à 11:34, Steven Rostedt a écrit :
> There's been complaints about trace_printk() being defined in kernel.h as it
> can increase the compilation time. As it is only used by some developers for
> debugging purposes, it should not be in kernel.h causing lots of wasted CPU
> cycles for those that do not ever care about it.
Do we have a measurement of the increased compilation time ?
Christophe
>
> Instead, add a CONFIG_TRACE_PRINTK_DEBUGGING option that developers that do
> use it can set and not have to always remember to add #include <linux/trace_printk.h>
> to the files they add trace_printk() while debugging. It also means that
> those that do not have that config set will not have to worry about wasted
> CPU cycles as it is only include in the CFLAGS when the option is set, and
> its completely ignored otherwise.
>
> Steven Rostedt (2):
> tracing: Move non-trace_printk prototypes back to kernel.h
> tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
>
> ----
> .../driver_development_debugging_guide.rst | 2 +-
> Makefile | 5 +++++
> arch/powerpc/kvm/book3s_xics.c | 1 +
> drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
> drivers/gpu/drm/i915/i915_gem.h | 1 +
> drivers/hwtracing/stm/dummy_stm.c | 4 ++++
> drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
> drivers/usb/early/xhci-dbc.c | 1 +
> fs/ext4/inline.c | 1 +
> include/linux/kernel.h | 19 ++++++++++++++++++-
> include/linux/sunrpc/debug.h | 1 +
> include/linux/trace_printk.h | 22 +++-------------------
> kernel/trace/Kconfig | 10 ++++++++++
> kernel/trace/ring_buffer_benchmark.c | 1 +
> kernel/trace/trace.h | 1 +
> samples/fprobe/fprobe_example.c | 1 +
> samples/ftrace/ftrace-direct-modify.c | 1 +
> samples/ftrace/ftrace-direct-multi-modify.c | 1 +
> samples/ftrace/ftrace-direct-multi.c | 2 +-
> samples/ftrace/ftrace-direct-too.c | 2 +-
> samples/ftrace/ftrace-direct.c | 2 +-
> 21 files changed, 56 insertions(+), 24 deletions(-)
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox