Linux Trace Kernel
 help / color / mirror / Atom feed
* Re: [PATCH v3 7/9] rv/tlob: add KUnit tests for the tlob monitor
From: Gabriele Monaco @ 2026-06-17  7:49 UTC (permalink / raw)
  To: wen.yang; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel
In-Reply-To: <d9abf11ee0af54689fce87ab95f8e8f7f5eb233d.1780847473.git.wen.yang@linux.dev>

On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:
> +/*
> + * Valid add lines return -ENOENT (kern_path() finds no such file in
> the test
> + * environment) rather than 0; a non-(-EINVAL) return confirms the
> format was
> + * accepted by the parser.
> + */
> +static void tlob_parse_valid_accepted(struct kunit *test)
> +{
> +	char buf[128];
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(tlob_parse_valid); i++) {
> +		strscpy(buf, tlob_parse_valid[i], sizeof(buf));
> +		KUNIT_EXPECT_NE(test,
> tlob_create_or_delete_uprobe(buf), -EINVAL);

Can you perhaps add those uprobes for real from this test case?
Unit tests should not touch the system, you usually do that by: 

1. stubbing the tested function when it starts doing bad things
2. test with dummy data not attaching anything (not applicable here)
3. test a different function not affecting the system

I think the cleanest here is 3. so you could just kunit test
tlob_parse_uprobe_line() and tlob_parse_remove_line().

Alternatively just stub the entirety of tlob_add_uprobe()
tlob_remove_uprobe_by_key() and maybe even check that they're called
when expected and not called on failure (right now you aren't testing
valid removals, probably because that's going to break).

I believe a good unit test should be validating the parsing logic only
/or/ the add/remove logic (but that's hard, you can skip it or even
check in selftests).
Right now your tests are trying to do both, so you don't know if
failures came from the uprobes subsystem or allocation (you shouln't
even get there from the unit test).

Then you can just check for success and not for ! EINVAL , which is
confusing.

Thanks,
Gabriele

> +	}
> +}
> +
> +static void tlob_parse_invalid_rejected(struct kunit *test)
> +{
> +	char buf[128];
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(tlob_parse_invalid); i++) {
> +		strscpy(buf, tlob_parse_invalid[i], sizeof(buf));
> +		KUNIT_EXPECT_EQ(test,
> tlob_create_or_delete_uprobe(buf), -EINVAL);
> +	}
> +}
> +
> +static void tlob_parse_out_of_range_rejected(struct kunit *test)
> +{
> +	char buf[128];
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(tlob_parse_out_of_range); i++) {
> +		strscpy(buf, tlob_parse_out_of_range[i],
> sizeof(buf));
> +		KUNIT_EXPECT_EQ(test,
> tlob_create_or_delete_uprobe(buf), -ERANGE);
> +	}
> +}
> +
> +static struct kunit_case tlob_parse_cases[] = {
> +	KUNIT_CASE(tlob_parse_valid_accepted),
> +	KUNIT_CASE(tlob_parse_invalid_rejected),
> +	KUNIT_CASE(tlob_parse_out_of_range_rejected),
> +	{}
> +};
> +
> +static struct kunit_suite tlob_parse_suite = {
> +	.name       = "tlob_parse",
> +	.test_cases = tlob_parse_cases,
> +};
> +
> +kunit_test_suite(tlob_parse_suite);
> +
> +MODULE_DESCRIPTION("KUnit tests for the tlob RV monitor");
> +MODULE_LICENSE("GPL");


^ permalink raw reply

* Re: [PATCH] tracing: eprobe: read the complete FILTER_PTR_STRING pointer
From: Martin Kaiser @ 2026-06-17  8:32 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel
In-Reply-To: <20260616110910.e6420488b6a798d49951cde9@kernel.org>

Hiramatsu-san,

thank you for reviewing my patch.

Thus wrote Masami Hiramatsu (mhiramat@kernel.org):

> Ah, this is a bit complicated. It seems to work with sched_switch event
> as commit f04dec93466a ("tracing/eprobes: Fix reading of string fields"):

> echo 'e:sw sched/sched_switch comm=$next_comm:string' > dynamic_events

> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>               sh-162     [002] d..3.    54.027213: sw: (sched.sched_switch) comm="swapper/2"
>           <idle>-0       [007] d..3.    54.034573: sw: (sched.sched_switch) comm="rcu_preempt"
>      rcu_preempt-15      [007] d..3.    54.034589: sw: (sched.sched_switch) comm="swapper/7"

> Maybe comm is stored as a fixed string information in the event record?

Yes, this example does not execute my change.

> /sys/kernel/tracing # cat events/sched/sched_switch/format 
> name: sched_switch
> ID: 254
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;

> 	field:char prev_comm[16];	offset:8;	size:16;	signed:0;
> 	field:pid_t prev_pid;	offset:24;	size:4;	signed:1;
> 	field:int prev_prio;	offset:28;	size:4;	signed:1;
> 	field:long prev_state;	offset:32;	size:8;	signed:1;
> 	field:char next_comm[16];	offset:40;	size:16;	signed:0;
> 	field:pid_t next_pid;	offset:56;	size:4;	signed:1;
> 	field:int next_prio;	offset:60;	size:4;	signed:1;

> But the filename is a pointer.

> /sys/kernel/tracing # cat events/syscalls/sys_enter_openat/format 
> name: sys_enter_openat
> ID: 705
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;

> 	field:int __syscall_nr;	offset:8;	size:4;	signed:1;
> 	field:int dfd;	offset:16;	size:8;	signed:0;
> 	field:const char * filename;	offset:24;	size:8;	signed:0;
> 	field:int flags;	offset:32;	size:8;	signed:0;
> 	field:umode_t mode;	offset:40;	size:8;	signed:0;
> 	field:__data_loc char[] __filename_val;	offset:48;	size:4;	signed:0;

> In this case, the filename field should use __data_loc directly instead of
> pointing data on the ring buffer.

> Can you try 

> echo 'e syscalls.sys_enter_openat $__filename_val:string' > \
>  		/sys/kernel/tracing/dynamic_events

> Instead?

This field is working as expected.

I still believe that the handling of FILTER_PTR_STRING is not correct. The
pointer is stored in the ringbuffer as unsigned long and read as a char. This
gives us a truncated pointer that cannot be dereferenced.

> I think better solution is fixing sycall tracer.

I would say that syscall trace is doing the right thing. The ringbuffer entry
is a struct syscall_trace_enter, the syscall arguments are unsigned longs.
They are written in ftrace_syscall_enter, this looks correct to me.

A const char * syscall argument is using FILTER_PTR_STRING, the unsigned long
argument from the ringbuffer is read as a char and then converted to a
truncated pointer.

Thanks,
Martin

^ permalink raw reply

* Re: [PATCH v9 0/6] mm/memory-failure: add panic option for unrecoverable pages
From: Breno Leitao @ 2026-06-17  9:40 UTC (permalink / raw)
  To: Miaohe Lin, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Shuah Khan, Naoya Horiguchi, Jonathan Corbet, Shuah Khan,
	Liam R. Howlett, lance.yang, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team
In-Reply-To: <20260609-ecc_panic-v9-0-432a74002e74@debian.org>

On Tue, Jun 09, 2026 at 03:56:54AM -0700, Breno Leitao wrote:
> A multi-bit ECC error on a kernel-owned page that the memory failure
> handler cannot recover is currently swallowed: PG_hwpoison is set, the
> event is logged, and the kernel keeps running.  The corrupted memory
> remains accessible to the kernel and either drives silent data
> corruption or surfaces seconds-to-minutes later as an apparently
> unrelated crash.  In a large fleet that delayed, unattributable crash
> turns into significant engineering effort to root-cause; in a kdump
> configuration, by the time the crash happens the original error
> context (faulting PFN, MCE/GHES record, page state) is long gone.
> 
> This series adds an opt-in sysctl,
> vm.panic_on_unrecoverable_memory_failure, that converts an
> unrecoverable kernel-page hwpoison event into an immediate panic with
> a clean dmesg/vmcore that still contains the original failure
> context.  The default is disabled so existing workloads see no
> change.
> 
> There is a selftest that test different cases, and I tested it using
> the following variants:
> 
>   ┌─────────┬──────────┬───────────────────────────────────────────────────────────┐
>   │ Variant │   PFN    │                          Result                           │
>   ├─────────┼──────────┼───────────────────────────────────────────────────────────┤
>   │ rodata  │ 0x2600   │ Panic with "Memory failure: 0x2600: unrecoverable page"   │
>   ├─────────┼──────────┼───────────────────────────────────────────────────────────┤
>   │ slab    │ 0x100032 │ Panic with "Memory failure: 0x100032: unrecoverable page" │
>   ├─────────┼──────────┼───────────────────────────────────────────────────────────┤
>   │ pgtable │ 0x100000 │ Panic with "Memory failure: 0x100000: unrecoverable page" │
>   └─────────┴──────────┴───────────────────────────────────────────────────────────┘
> 
> Each one shows the same call trace, exactly the path the series builds:
> 
>   hard_offline_page_store
>     → memory_failure
>       → action_result
>         → panic("Memory failure: %#lx: unrecoverable page")

Debugging another issue earlier today, just found a kernel crash that is
hitting a ignored page later in the day, and randomly misbehaving/crashing.

 Memory failure: 0x140ae: unhandlable page.
 Memory failure: 0x140ae: recovery action for get hwpoison page: Ignored                     <-- Ignored 
 loop0: detected capacity change from 0 to 15241056
 EDAC MC0: 1 UE multi-bit ECC on LP5x_0 LP5x_0 (node:0 card:0 module:0 rank:0 bank:2 device:28 row:42700 column:96
 {3}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 308
 {3}[Hardware Error]: event severity: recoverable
 {3}[Hardware Error]:  imprecise tstamp: 2026-06-16 02:50:03
 {3}[Hardware Error]:  Error 0, type: recoverable
 {3}[Hardware Error]:   section_type: memory error
 {3}[Hardware Error]:   physical_address: 0x0000000aeccde180
 {3}[Hardware Error]:   physical_address_mask: 0xfffffffffffff000
 {3}[Hardware Error]:   node:0 card:0 module:0 rank:0 bank:2 device:28 row:42700 column:960 requestor_id:0x0000000
 {3}[Hardware Error]:   error_type: 3, multi-bit ECC
 {3}[Hardware Error]:   DIMM location: LP5x_0 LP5x_0
 Memory failure: 0xaeccd: recovery action for dirty LRU page: Recovered

 Internal error: synchronous external abort: 0000000096000410 [#1]  SMP
 Modules linked in: ghes_edac(E) squashfs(E) act_gact(E) sch_fq(E) tcp_diag(E) inet_diag(E) cls_bpf(E) evdev(E) sm
 CPU: 51 UID: 0 PID: 1 Comm: systemd Kdump: loaded Tainted: G   M       OE K     6.16.1-0_fbk2_0_gf40efc324cc8 #1
 Tainted: [M]=MACHINE_CHECK, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH
 pstate: 834010c9 (Nzcv daIF +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
 pc : clear_inode+0x34/0x108
 lr : proc_evict_inode.llvm.1771226604092943895+0x28/0x68
 sp : ffff800083f6f8d0
 x29: ffff800083f6f8e0 x28: 0000000000000011 x27: ffff0000c1378788
 x26: ffffffffffffffff x25: ffff800082747de0 x24: ffff0000c0ae9898
 x23: ffff8000819155f8 x22: ffff0000c0ae9888 x21: ffff0000c0ae9808
 x20: ffff0000c0ae9818 x19: ffff0000c0ae9788 x18: 000000000000001c
 x17: 0000000000000018 x16: 0000000000000040 x15: 0000000000000000
 x14: 0000000000000001 x13: 0000000000000000 x12: 0000000000002710
 x11: ffff0000c0ae9898 x10: ffff0000c1299b58 x9 : 0000000000000001
 x8 : ffff0000c0ae9900 x7 : ffff8000828db000 x6 : 0000000000005040
 x5 : ffffffffffffffff x4 : ffffffdfc05c8aa0 x3 : ffff000126470000
 x2 : ffffffffffffffff x1 : 0000000000000000 x0 : ffff0000c0ae9788
 Call trace:
  clear_inode+0x34/0x108 (P)
  proc_evict_inode.llvm.1771226604092943895+0x28/0x68
  evict+0xec/0x328
  iput+0xa8/0x310
  dentry_unlink_inode+0xa4/0x188
  __dentry_kill+0x74/0x358
  shrink_dentry_list+0xc8/0x198
 ....

^ permalink raw reply

* Re: [PATCH v4 6/7] Documentation: bootconfig: document build-time cmdline rendering
From: Breno Leitao @ 2026-06-17  9:56 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andrew Morton, Nathan Chancellor, paulmck, Nicolas Schier,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, kernel-team
In-Reply-To: <ail6rQnRYKsXPxyF@gmail.com>

On Wed, Jun 10, 2026 at 07:58:10AM -0700, Breno Leitao wrote:
> On Wed, Jun 10, 2026 at 11:37:20PM +0900, Masami Hiramatsu wrote:
> > To avoid confusion, when this option is used, shouldn't we treat it
> > the same way as if embedded command lines were enabled, and either
> > not display it in /proc/bootconfig (or always display it, by merging
> > the rendered string)?
> 
> You're right that EMBED_CMDLINE breaks it: the embedded kernel.* keys
> are already in boot_command_line before setup_boot_config() ever sees
> the initrd bconf, so a user reading /proc/bootconfig would see only
> the initrd keys while parse_early_param() acted on the embedded ones.
> That's exactly the split-state Sashiko was circling around.
> 
> Both options you suggest work for me, but they pull in opposite
> directions and I'd rather not guess wrong on the user-facing
> contract.  Which do you prefer for v5?
> 
>   (a) Don't display embedded in /proc/bootconfig -- keep the current
>       "file shows the active bootconfig source" behavior and document
>       that with EMBED_CMDLINE=y, the kernel.* subtree may have been
>       applied separately via the cmdline.
> 
>   (b) Always display embedded by merging the rendered string into
>       /proc/bootconfig when EMBED_CMDLINE=y, so the file reflects
>       what was actually applied.
> 
> Happy to go either way

Following up on my own mail rather than leaving it fully open: after
looking at the code more, I'd like to recommend (a).

The deciding factor is ordering. EMBED_CMDLINE only works because the
embedded "kernel" keys are folded into boot_command_line in
setup_arch(), before parse_early_param() -- which is long before
setup_boot_config() looks at the initrd.

So for early params the embedded values are necessarily applied first, and an
initrd bootconfig cannot override them no matter how we present
/proc/bootconfig. That makes the embedded cmdline behave like a build-time
CONFIG_CMDLINE rather than a bootconfig source, and (a) is the option that
describes it honestly: it shows in /proc/cmdline, and /proc/bootconfig keeps
meaning "the bootconfig tree that was parsed".

(a) is also what the tree already does -- saved_boot_config is built
only from the XBC tree, the rendered string never enters it -- so it is
no new code on the /proc side and keeps the series small.

(b) would pull the flattened cmdline string back into the structured
tree view and need dedup against the initrd keys, which muddies what
/proc/bootconfig means for little gain.

So unless you'd rather have (b), I'll take (a) for v5 and extend
bootconfig.rst to cover the four sources (bootloader cmdline, embedded
cmdline, initrd bootconfig, embedded bootconfig).

I'll also document the sharp edge -- with both an embedded cmdline and an
initrd bootconfig, early params reflect the embedded values because the initrd
is not parsed yet.

Thanks,
--breno

^ permalink raw reply

* Re: [PATCH v3 09/13] verification/rvgen: Delete __parse_constraint()
From: Nam Cao @ 2026-06-17  9:59 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, Wander Lairson Costa, linux-trace-kernel,
	linux-kernel
In-Reply-To: <e3f63618b48dba9299c88bfffeae14b71717fa77.camel@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
> This function used to validate things we are no longer validating, now it's
> alright to create a model where a clock is never reset, which doesn't fully
> make sense. Should we add that check somewhere else?

Theory does not require clock reset, right? This is not some sort of
hidden issue that trips up unsuspecting people. It is obvious from the
model that the clock is never reset. So I think it's fine to allow
people to do that, maybe there will be an actual useful model without
clock reset, you never know.

The self.env_types check is enforced by the grammar. We do lose the
self.env_types check, but that is likely redundant anyway because we
have this:

        for transition in self.transitions:
            [...]
            if transition.reset:
                envs.append(transition.reset.env)
                self.env_stored.add(transition.reset.env)

so it is clear that all envs that are reset do have a storage.

That said, I am fine with keeping these sanity checks, if you are
paranoid.

Nam

^ permalink raw reply

* [PATCH] tracing: make tracepoint_printk static as not exported
From: Ben Dooks @ 2026-06-17 10:58 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel
  Cc: Ben Dooks

The tracepoint_printk symbol is not exported, so make it
static to remove the following sparse warning:

kernel/trace/trace.c:90:5: warning: symbol 'tracepoint_printk' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 kernel/trace/trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6eb4d3097a4d..4c3729c8d5e2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -87,7 +87,7 @@ void __init disable_tracing_selftest(const char *reason)
 
 /* Pipe tracepoints to printk */
 static struct trace_iterator *tracepoint_print_iter;
-int tracepoint_printk;
+static int tracepoint_printk;
 static bool tracepoint_printk_stop_on_boot __initdata;
 static bool traceoff_after_boot __initdata;
 static DEFINE_STATIC_KEY_FALSE(tracepoint_printk_key);
-- 
2.37.2.352.g3c44437643


^ permalink raw reply related

* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
From: David Hildenbrand (Arm) @ 2026-06-17 11:11 UTC (permalink / raw)
  To: JP Kobryn, linux-mm, willy, shakeel.butt, usama.arif, akpm,
	vbabka, mhocko, rostedt, mhiramat, mathieu.desnoyers, kasong,
	qi.zheng, baohua, axelrasmussen, yuanchu, weixugc, chrisl,
	shikemeng, nphamcs, baoquan.he, youngjun.park
  Cc: linux-kernel, linux-trace-kernel
In-Reply-To: <20260610195220.12403-1-jp.kobryn@linux.dev>

On 6/10/26 21:52, JP Kobryn wrote:
> LRU add batches can be drained before they reach capacity. This can be a
> source of LRU lock contention, but it is not currently possible to
> attribute these drains to callers with existing tracepoints.
> 
> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
> lru_add batch is drained. This allows tracing to distinguish full drains
> from partial drains and attribute them to the calling stack.
> 
> Add mm_lru_add_drain_all to capture callers of __lru_add_drain_all and
> whether they set the force flag for all CPUs. The tracepoint resembles
> the signature of the enclosing function, but is needed because of
> potential inlining.
> 
> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
> ---
>  include/trace/events/pagemap.h | 37 ++++++++++++++++++++++++++++++++++
>  mm/swap.c                      |  7 ++++++-
>  2 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
> index 171524d3526d..ff3da07ccb40 100644
> --- a/include/trace/events/pagemap.h
> +++ b/include/trace/events/pagemap.h
> @@ -77,6 +77,43 @@ TRACE_EVENT(mm_lru_activate,
>  	TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
>  );
>  
> +TRACE_EVENT(mm_lru_add_drain,
> +
> +	TP_PROTO(int cpu, unsigned int nr),
> +
> +	TP_ARGS(cpu, nr),
> +
> +	TP_STRUCT__entry(
> +		__field(int,		cpu	)
> +		__field(unsigned int,	nr	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->cpu	= cpu;
> +		__entry->nr	= nr;
> +	),
> +
> +	TP_printk("cpu=%d nr=%u", __entry->cpu, __entry->nr)
> +);
> +
> +TRACE_EVENT(mm_lru_add_drain_all,
> +
> +	TP_PROTO(bool force_all_cpus),
> +
> +	TP_ARGS(force_all_cpus),
> +
> +	TP_STRUCT__entry(
> +		__field(bool,	force_all_cpus	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->force_all_cpus	= force_all_cpus;
> +	),
> +
> +	TP_printk("force_all_cpus=%s",
> +		__entry->force_all_cpus ? "true" : "false")
> +);
> +
>  #endif /* _TRACE_PAGEMAP_H */
>  
>  /* This part must be outside protection */
> diff --git a/mm/swap.c b/mm/swap.c
> index 588f50d8f1a8..e14b7612f896 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
>  {
>  	struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
>  	struct folio_batch *fbatch = &fbatches->lru_add;
> +	unsigned int nr_folios_add = folio_batch_count(fbatch);
>  
> -	if (folio_batch_count(fbatch))
> +	if (nr_folios_add) {
>  		folio_batch_move_lru(fbatch, lru_add);
> +		trace_mm_lru_add_drain(cpu, nr_folios_add);
> +	}
>  
>  	fbatch = &fbatches->lru_move_tail;
>  	/* Disabling interrupts below acts as a compiler barrier. */
> @@ -869,6 +872,8 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
>  	if (WARN_ON(!mm_percpu_wq))
>  		return;
>  
> +	trace_mm_lru_add_drain_all(force_all_cpus);
> +
>  	/*
>  	 * Guarantee folio_batch counter stores visible by this CPU
>  	 * are visible to other CPUs before loading the current drain

Given that trace events can quickly become stable ABI [1], are we really sure we
want to add this?

[1] https://lore.kernel.org/r/20260603130006.7d2c4a62@gandalf.local.home

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH 1/3] rv/reactors: fix lockdep "Invalid wait context" in rv_react()
From: Nam Cao @ 2026-06-17 11:12 UTC (permalink / raw)
  To: wen.yang, Gabriele Monaco
  Cc: linux-trace-kernel, linux-kernel, Wen Yang, Thomas Weißschuh
In-Reply-To: <bc01343ae74acf6bdf142434aeaa4e6b40aa72a9.1781541556.git.wen.yang@linux.dev>

wen.yang@linux.dev writes:
> The DEFINE_WAIT_OVERRIDE_MAP() macro creates a lockdep map with
> wait_type_inner = LD_WAIT_CONFIG, which inherits the outer context's
> wait type.  When rv_react() is called from a LD_WAIT_FREE context
> (e.g., a KUnit test with busy-wait), and the reactor callback triggers
> a timer interrupt during the busy-loop,

I am confused by the last sentence. How can reactor callback triggers a
timer interrupt?

Do you mean a timer interrupt happens in the middle of the reactor
callback? And this only happens sporadically, right?

> the interrupt exit path attempts
> to schedule (preempt_schedule_irq -> __schedule -> rq->__lock), which is
> LD_WAIT_SPIN.  Lockdep then reports:
>
>     [ BUG: Invalid wait context ]
>     context-{5:5}
>     1 lock held by kunit_try_catch/209:
>      #0: rv_react_map-wait-type-override at rv_react+0x9d/0xf0
>
> The wait_type_override map allowed the outer LD_WAIT_FREE to propagate
> inward, but scheduling from an interrupt is LD_WAIT_SPIN, violating the
> constraint.
>
> Fix by explicitly setting wait_type_inner = LD_WAIT_SPIN, which is the
> tightest constraint rv_react() callbacks must satisfy: they may not
> sleep (LD_WAIT_SLEEP) or use mutexes, but can use spinlocks and be
> interrupted. This matches the documented LD_WAIT_FREE constraint.

These concepts are new to me. Let me do some studying before reviewing.

Nam

^ permalink raw reply

* [PATCH v5 0/7] bootconfig: embed kernel.* cmdline at build time
From: Breno Leitao @ 2026-06-17 11:23 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team

The userspace pieces (xbc_snprint_cmdline() in lib/, tools/bootconfig -C)
already landed; this series wires the rendered cmdline into the kernel.

Motivation: today the embedded bootconfig is parsed at runtime, after
parse_early_param() has already run, so early_param() handlers can't
see embedded values. Folding the kernel.* subtree into the cmdline at
build time gives a CONFIG_CMDLINE-equivalent for embedded-bootconfig
users without forcing them to maintain two cmdline sources.

Behaviorally, the "kernel" subtree is rendered to a flat string at
build time and stashed in .init.rodata. setup_arch() prepends it to
boot_command_line before parse_early_param() runs. Overflow is a soft
error: the helper logs and leaves boot_command_line untouched rather
than panicking, so an oversized embedded bconf cannot brick a boot.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v5:
- Patch 3 (Kconfig): drop the redundant "depends on BOOT_CONFIG_EMBED"
  from BOOT_CONFIG_EMBED_CMDLINE; Julian Braha.
- Patch 6 (Documentation): spell out how the embedded cmdline interacts
  with the bootloader cmdline, an initrd bootconfig, and the embedded
  bootconfig 
- Link to v4: https://lore.kernel.org/r/20260609-bootconfig_using_tools-v4-0-73c463f03a97@debian.org

Changes in v4:
- Patch 3 (build pipeline): clear CROSS_COMPILE= in the kernel-side
  tools/bootconfig sub-make. Without it, an LLVM=1 cross build
  inherits CROSS_COMPILE and tools/scripts/Makefile.include injects
  --target=/--sysroot= into the host clang, producing a target
  binary that fails to exec.
- Patch 3 (build pipeline): place embedded-cmdline.S in its own
  .init.rodata.embed_cmdline subsection ("a") so ld.lld does not
  see a section-type mismatch against lib/bootconfig-data.S's
  writable .init.rodata ("aw"). The linker's *(.init.rodata
  .init.rodata.*) glob still folds it into the init image.
- Patch 6 (x86/setup): also accept the bootconfig=<anything> form
  via cmdline_find_option(), matching the runtime parse_args() loop.
  Without it, bootconfig=0/=off would skip the early prepend but
  still trigger the late runtime apply -- a split-brain state.
- New patch 7: document CONFIG_BOOT_CONFIG_EMBED_CMDLINE in
  Documentation/admin-guide/bootconfig.rst (semantics, opt-in,
  precedence, overflow behavior, example).
- Link to v3: https://lore.kernel.org/r/20260608-bootconfig_using_tools-v3-0-4ddd079a0696@debian.org

Changes in v3:
- Patch 3: Move HOSTCC override to the kernel-side rule; tool keeps
  $(CC) for standalone/cross builds.
- Patch 6: Drop the false fail-safe wording; document the
  BOOT_CONFIG_FORCE=y default interaction.
- Link to v2:
  https://lore.kernel.org/r/20260605-bootconfig_using_tools-v2-0-d309f544b5f7@debian.org

Changes in v2 (addressing review of v1):
- Split out a standalone fix for the NULL-pointer arithmetic in
  xbc_snprint_cmdline() so the build-time render cannot trip host
  UBSan/FORTIFY_SOURCE.
- Rework the leaf-root handling: instead of returning early, skip @root
  inside the loop so a root carrying both a value and subkeys
  (kernel = x together with kernel.foo = bar) still renders its
  descendant keys.
- Build tools/bootconfig with $(HOSTCC) so cross-compiled (ARCH=...)
  builds render the cmdline on the build host instead of failing with
  "Exec format error".
- Mark the embedded cmdline section read-only (drop the "w" flag from
  .init.rodata).
- Add a make-clean hook so tools/bootconfig artifacts are removed by
  make clean.
- Gate the x86 prepend on "bootconfig" being present on the command
  line (or CONFIG_BOOT_CONFIG_FORCE), matching the init.* opt-in
  semantics documented in bootconfig.rst and preserving fail-safe
  recovery: dropping "bootconfig" from the bootloader cmdline now also
  disables the embedded kernel.* keys.
- Link to v1: https://patch.msgid.link/20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org

---
Breno Leitao (7):
      bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
      bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
      bootconfig: render embedded bootconfig as a kernel cmdline at build time
      bootconfig: clean build-time tools/bootconfig from make clean
      bootconfig: add xbc_prepend_embedded_cmdline() helper
      Documentation: bootconfig: document build-time cmdline rendering
      x86/setup: prepend embedded bootconfig cmdline before parse_early_param

 Documentation/admin-guide/bootconfig.rst |  81 ++++++++++++++++++++++
 MAINTAINERS                              |   1 +
 Makefile                                 |  28 +++++++-
 arch/x86/Kconfig                         |   1 +
 arch/x86/kernel/setup.c                  |  27 ++++++++
 include/linux/bootconfig.h               |   9 +++
 init/Kconfig                             |  35 ++++++++++
 init/main.c                              |  25 ++++++-
 lib/Makefile                             |  16 +++++
 lib/bootconfig.c                         | 112 +++++++++++++++++++++++++++++--
 lib/embedded-cmdline.S                   |  16 +++++
 tools/bootconfig/Makefile                |   4 +-
 12 files changed, 342 insertions(+), 13 deletions(-)
---
base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
change-id: 20260508-bootconfig_using_tools-cfa7aa9d6a5a

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply

* [PATCH v5 1/7] bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
From: Breno Leitao @ 2026-06-17 11:23 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org>

xbc_snprint_cmdline() is meant to be called twice: first with
buf=NULL, size=0 to probe the rendered length, then with a real
buffer to fill it (the standard snprintf() two-pass pattern). The
probe call makes the function compute "buf + size" (NULL + 0) and,
on every iteration, advance "buf += ret" from that NULL base and
pass the result back into snprintf().

Pointer arithmetic on a NULL pointer is undefined behavior. It is
harmless in the in-kernel callers today, but the follow-up patches
run this same code in the userspace tools/bootconfig parser at kernel
build time, where host UBSan / FORTIFY_SOURCE abort the build.

Track a running written length (size_t) instead of mutating @buf, and
only form "buf + len" when @buf is non-NULL. snprintf(NULL, 0, ...)
is itself well defined and returns the would-be length, so the
two-pass "probe then fill" usage returns identical byte counts.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/bootconfig.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index f445b7703fdd9..2ed9ee3dc81c7 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -427,10 +427,18 @@ static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
 int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 {
 	struct xbc_node *knode, *vnode;
-	char *end = buf + size;
 	const char *val, *q;
+	size_t len = 0;
 	int ret;
 
+	/*
+	 * Track the running written length rather than advancing @buf, so we
+	 * never form "buf + size" or "buf += ret" while @buf is NULL (the
+	 * size-probe call passes buf=NULL, size=0). NULL pointer arithmetic
+	 * is undefined behavior and trips host UBSan / FORTIFY_SOURCE when
+	 * this renderer runs at kernel build time. snprintf(NULL, 0, ...)
+	 * itself is well defined and returns the would-be length.
+	 */
 	xbc_node_for_each_key_value(root, knode, val) {
 		ret = xbc_node_compose_key_after(root, knode,
 					xbc_namebuf, XBC_KEYLEN_MAX);
@@ -439,10 +447,11 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 
 		vnode = xbc_node_get_child(knode);
 		if (!vnode) {
-			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
+			ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+				       "%s ", xbc_namebuf);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 			continue;
 		}
 		xbc_array_for_each_value(vnode, val) {
@@ -452,15 +461,15 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 			 * whitespace.
 			 */
 			q = strpbrk(val, " \t\r\n") ? "\"" : "";
-			ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
-				       xbc_namebuf, q, val, q);
+			ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+				       "%s=%s%s%s ", xbc_namebuf, q, val, q);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 		}
 	}
 
-	return buf - (end - size);
+	return len;
 }
 #undef rest
 

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v5 2/7] bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
From: Breno Leitao @ 2026-06-17 11:23 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org>

xbc_node_for_each_key_value() walks to the first leaf under @root, and
when @root is itself a leaf it yields @root. That happens not only for
an empty "kernel {}" subtree, but also when @root carries both a value
and subkeys, e.g.

	kernel = x
	kernel.foo = bar

Here @root ("kernel") is a leaf because its first child is the value
node "x", so the iterator returns @root first. Feeding @root back into
xbc_node_compose_key_after(root, root) returns -EINVAL, which the only
in-kernel caller papers over with a "len <= 0" check -- but the
follow-up tools/bootconfig -C user propagates the error and turns such
a bootconfig into a build failure. Worse, short-circuiting the whole
call on a leaf @root would silently drop the valid "kernel.foo = bar"
descendant that this patch should render.

Skip @root inside the loop instead of bailing out: the value-only entry
is dropped (it is rendered through the "kernel" cmdline path, not here),
while real descendant keys are still emitted. An entirely empty subtree
now renders nothing and returns 0 rather than -EINVAL, matching the
"nothing to render is not an error" semantics expected by the new
build-time caller.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/bootconfig.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 2ed9ee3dc81c7..926094d97397e 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -440,6 +440,17 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 	 * itself is well defined and returns the would-be length.
 	 */
 	xbc_node_for_each_key_value(root, knode, val) {
+		/*
+		 * An empty or value-only @root (e.g. "kernel {}" or
+		 * "kernel = x", possibly alongside "kernel.foo = bar")
+		 * yields @root itself here. Skip it: composing a key for it
+		 * would fail with -EINVAL, yet any real descendant keys must
+		 * still be rendered. An entirely empty subtree then renders
+		 * nothing and returns 0 rather than an error.
+		 */
+		if (knode == root)
+			continue;
+
 		ret = xbc_node_compose_key_after(root, knode,
 					xbc_namebuf, XBC_KEYLEN_MAX);
 		if (ret < 0)

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v5 3/7] bootconfig: render embedded bootconfig as a kernel cmdline at build time
From: Breno Leitao @ 2026-06-17 11:23 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org>

Add the build-time pipeline that renders the "kernel" subtree of
CONFIG_BOOT_CONFIG_EMBED_FILE into a flat cmdline string and stashes
it in .init.rodata as embedded_kernel_cmdline[]. A follow-up patch
adds the runtime helper that prepends this string to boot_command_line
during early architecture setup so parse_early_param() sees the values.

The build wires up:
  tools/bootconfig -C kernel - userspace tool already shared with
                               lib/bootconfig.c, used here in -C mode
                               to render a bootconfig file to a cmdline
  lib/embedded-cmdline.S     - .incbin's the rendered text plus a NUL
                               (listed under the EXTRA BOOT CONFIG
                               MAINTAINERS entry)
  lib/Makefile rule          - runs tools/bootconfig at build time
  Makefile prepare dep       - ensures tools/bootconfig is built first,
                               same pattern as tools/objtool and
                               tools/bpf/resolve_btfids

Drop the test target from tools/bootconfig/Makefile's default 'all'
recipe so that hooking the binary into the kernel build does not run
test-bootconfig.sh on every prepare. The tests stay available as
'make -C tools/bootconfig test', matching the convention of
tools/objtool and tools/bpf/resolve_btfids whose 'all' targets only
build the binary.

Require BOOT_CONFIG_EMBED_FILE to be non-empty before the new option
can be enabled, otherwise tools/bootconfig -C runs against an empty
file and prints a parse error on every kernel build.

The feature gates on CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, a
silent symbol arches select once they've wired the prepend call into
setup_arch(). No arch selects it in this patch, so the user-visible
CONFIG_BOOT_CONFIG_EMBED_CMDLINE is not yet enableable; when an arch
later opts in, the runtime behavior is added by the follow-up patches.

tools/bootconfig also installs on target systems, so its own Makefile
keeps $(CC) and stays cross-buildable as a standalone tool. The kernel
build, which runs the tool on the build host during prepare, instead
forces CC=$(HOSTCC) from a dedicated tools/bootconfig rule and clears
CROSS_COMPILE= in the sub-make. Without that clear, an LLVM=1 cross
build would inherit CROSS_COMPILE and tools/scripts/Makefile.include
would inject --target=/--sysroot= flags into the host clang invocation,
producing a target binary that fails to exec ("Exec format error").

embedded-cmdline.S places the rendered string in its own .init.rodata
subsection (.init.rodata.embed_cmdline) with the "a" (allocatable,
read-only) flag and %progbits. lib/bootconfig-data.S already places
the embedded bootconfig blob in .init.rodata with the "aw" flag
(xbc_init() rewrites separators in place, so that data must be
writable). Using a distinct subsection name avoids the ld.lld section-
type mismatch that would otherwise arise from mixing "a" and "aw"
under the same name; the linker's "*(.init.rodata .init.rodata.*)"
glob still folds both into the init image and frees them after boot.

A follow-up patch wires the build-time tools/bootconfig into the
top-level clean target.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 MAINTAINERS               |  1 +
 Makefile                  | 15 +++++++++++++++
 init/Kconfig              | 35 +++++++++++++++++++++++++++++++++++
 lib/Makefile              | 16 ++++++++++++++++
 lib/embedded-cmdline.S    | 16 ++++++++++++++++
 tools/bootconfig/Makefile |  2 +-
 6 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 57656ec0e9d5d..953231df1911d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9844,6 +9844,7 @@ F:	fs/proc/bootconfig.c
 F:	include/linux/bootconfig.h
 F:	lib/bootconfig-data.S
 F:	lib/bootconfig.c
+F:	lib/embedded-cmdline.S
 F:	tools/bootconfig/*
 F:	tools/bootconfig/scripts/*
 
diff --git a/Makefile b/Makefile
index bf196c6df5b92..a7abb3f9a6264 100644
--- a/Makefile
+++ b/Makefile
@@ -1545,6 +1545,21 @@ prepare: tools/bpf/resolve_btfids
 endif
 endif
 
+# tools/bootconfig renders the embedded bootconfig into a cmdline at build time.
+ifdef CONFIG_BOOT_CONFIG_EMBED_CMDLINE
+prepare: tools/bootconfig
+endif
+
+# tools/bootconfig is run on the build host during prepare, so force a host
+# binary here; its own Makefile keeps $(CC) for standalone and cross builds.
+# CROSS_COMPILE= is cleared so tools/scripts/Makefile.include does not inject
+# the target's --target=/--sysroot= flags into the host clang invocation under
+# LLVM=1 cross builds (which would produce a target binary that fails to exec).
+tools/bootconfig: FORCE
+	$(Q)mkdir -p $(objtree)/tools
+	$(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/ \
+		bootconfig CC=$(HOSTCC) CROSS_COMPILE=
+
 # The tools build system is not a part of Kbuild and tends to introduce
 # its own unique issues. If you need to integrate a new tool into Kbuild,
 # please consider locating that tool outside the tools/ tree and using the
diff --git a/init/Kconfig b/init/Kconfig
index 5230d4879b1c8..d2b8613a6b927 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1566,6 +1566,41 @@ config BOOT_CONFIG_EMBED_FILE
 	  This bootconfig will be used if there is no initrd or no other
 	  bootconfig in the initrd.
 
+config ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+	bool
+	help
+	  Silent symbol; no C code reads it directly. Architectures
+	  select it once their setup_arch() calls
+	  xbc_prepend_embedded_cmdline() before parse_early_param().
+	  Its only role is to gate the user-visible
+	  BOOT_CONFIG_EMBED_CMDLINE option per-arch, the same
+	  ARCH_SUPPORTS_* idiom used by ARCH_SUPPORTS_CFI, etc.
+
+config BOOT_CONFIG_EMBED_CMDLINE
+	bool "Render embedded bootconfig as kernel cmdline at build time"
+	depends on BOOT_CONFIG_EMBED_FILE != ""
+	depends on ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+	default n
+	help
+	  Render the "kernel" subtree of the embedded bootconfig file into a
+	  flat cmdline string at kernel build time and prepend it to
+	  boot_command_line during early architecture setup. This makes
+	  early_param() handlers (e.g. mem=, earlycon=, loglevel=) see the
+	  values supplied via the embedded bootconfig.
+
+	  The runtime bootconfig parser is unaffected, so tree-structured
+	  consumers such as ftrace boot-time tracing keep working.
+
+	  Note: when an initrd also carries a bootconfig, its "kernel"
+	  subtree is still parsed at runtime, but the embedded "kernel"
+	  keys remain in boot_command_line for parse_early_param() and
+	  end up later than the initrd keys in saved_command_line, so
+	  parse_args() last-wins favors the embedded values. If you need
+	  initrd to override embedded kernel.* keys, leave this option
+	  off.
+
+	  If unsure, say N.
+
 config CMDLINE_LOG_WRAP_IDEAL_LEN
 	int "Length to try to wrap the cmdline when logged at boot"
 	default 1021
diff --git a/lib/Makefile b/lib/Makefile
index 7f75cc6edf94a..4ace86a5cb6de 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -273,6 +273,22 @@ filechk_defbconf = cat $(or $(real-prereqs), /dev/null)
 $(obj)/default.bconf: $(CONFIG_BOOT_CONFIG_EMBED_FILE) FORCE
 	$(call filechk,defbconf)
 
+obj-$(CONFIG_BOOT_CONFIG_EMBED_CMDLINE) += embedded-cmdline.o
+$(obj)/embedded-cmdline.o: $(obj)/embedded_cmdline.bin
+
+# Render the bootconfig "kernel" subtree to a flat cmdline string using
+# the userspace tools/bootconfig parser (-C mode). The runtime prepend
+# helper enforces COMMAND_LINE_SIZE at boot, so no build-time size
+# check is performed here (COMMAND_LINE_SIZE is an arch header
+# constant, not a Kconfig value).
+quiet_cmd_render_cmdline = BCONF2C $@
+      cmd_render_cmdline = \
+	$(objtree)/tools/bootconfig/bootconfig -C $< > $@
+
+targets += embedded_cmdline.bin
+$(obj)/embedded_cmdline.bin: $(obj)/default.bconf $(objtree)/tools/bootconfig/bootconfig FORCE
+	$(call if_changed,render_cmdline)
+
 obj-$(CONFIG_RBTREE_TEST) += rbtree_test.o
 obj-$(CONFIG_INTERVAL_TREE_TEST) += interval_tree_test.o
 
diff --git a/lib/embedded-cmdline.S b/lib/embedded-cmdline.S
new file mode 100644
index 0000000000000..bda81b4a42bea
--- /dev/null
+++ b/lib/embedded-cmdline.S
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Embed the build-time-rendered bootconfig "kernel" subtree as a flat
+ * cmdline string. setup_arch() prepends this to boot_command_line on
+ * architectures that select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG.
+ *
+ * Copyright (c) 2026 Meta Platforms, Inc. and affiliates
+ * Copyright (c) 2026 Breno Leitao <leitao@debian.org>
+ */
+	.section .init.rodata.embed_cmdline, "a", %progbits
+	.global embedded_kernel_cmdline
+embedded_kernel_cmdline:
+	.incbin "lib/embedded_cmdline.bin"
+	.byte 0
+	.global embedded_kernel_cmdline_end
+embedded_kernel_cmdline_end:
diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
index 90eb47c9d8de6..4e82fd9553cde 100644
--- a/tools/bootconfig/Makefile
+++ b/tools/bootconfig/Makefile
@@ -15,7 +15,7 @@ override CFLAGS += -Wall -g -I$(CURDIR)/include
 ALL_TARGETS := bootconfig
 ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
 
-all: $(ALL_PROGRAMS) test
+all: $(ALL_PROGRAMS)
 
 $(OUTPUT)bootconfig: main.c include/linux/bootconfig.h $(LIBSRC)
 	$(CC) $(filter %.c,$^) $(CFLAGS) $(LDFLAGS) -o $@

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v5 4/7] bootconfig: clean build-time tools/bootconfig from make clean
From: Breno Leitao @ 2026-06-17 11:23 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org>

The previous patch builds tools/bootconfig during 'make prepare' to
render the embedded bootconfig cmdline, but nothing removes it on
'make clean', leaving the compiled tool and its objects behind.

Wire a bootconfig_clean hook into the top-level clean target so the
compiled tool and its objects are removed by make clean, matching the
prepare-wired tools/objtool and tools/bpf/resolve_btfids.

The hook runs tools/bootconfig's Makefile via $(MAKE), which the kernel
build invokes with -rR (MAKEFLAGS += -rR). -rR drops the built-in $(RM)
variable, so the existing "$(RM) -f ..." clean recipe would expand to a
bare "-f ..." and fail. Spell the recipe with a literal "rm -f" so it
keeps working both standalone and when invoked from Kbuild.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Makefile                  | 13 ++++++++++++-
 tools/bootconfig/Makefile |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index a7abb3f9a6264..a6e13fa1c1dc1 100644
--- a/Makefile
+++ b/Makefile
@@ -1586,6 +1586,17 @@ ifneq ($(wildcard $(objtool_O)),)
 	$(Q)$(MAKE) -sC $(abs_srctree)/tools/objtool O=$(objtool_O) srctree=$(abs_srctree) $(patsubst objtool_%,%,$@)
 endif
 
+PHONY += bootconfig_clean
+
+bootconfig_O = $(abspath $(objtree))/tools/bootconfig
+
+# tools/bootconfig is only built (via the prepare hook above) when
+# CONFIG_BOOT_CONFIG_EMBED_CMDLINE is set; skip its clean otherwise.
+bootconfig_clean:
+ifneq ($(wildcard $(bootconfig_O)),)
+	$(Q)$(MAKE) -sC $(srctree)/tools/bootconfig O=$(bootconfig_O) clean
+endif
+
 tools/: FORCE
 	$(Q)mkdir -p $(objtree)/tools
 	$(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/
@@ -1756,7 +1767,7 @@ vmlinuxclean:
 	$(Q)$(CONFIG_SHELL) $(srctree)/scripts/link-vmlinux.sh clean
 	$(Q)$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) clean)
 
-clean: archclean vmlinuxclean resolve_btfids_clean objtool_clean
+clean: archclean vmlinuxclean resolve_btfids_clean objtool_clean bootconfig_clean
 
 # mrproper - Delete all generated files, including .config
 #
diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
index 4e82fd9553cde..3cb8066d5141b 100644
--- a/tools/bootconfig/Makefile
+++ b/tools/bootconfig/Makefile
@@ -27,4 +27,4 @@ install: $(ALL_PROGRAMS)
 	install $(OUTPUT)bootconfig $(DESTDIR)$(bindir)
 
 clean:
-	$(RM) -f $(OUTPUT)*.o $(ALL_PROGRAMS)
+	rm -f $(OUTPUT)*.o $(ALL_PROGRAMS)

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v5 5/7] bootconfig: add xbc_prepend_embedded_cmdline() helper
From: Breno Leitao @ 2026-06-17 11:23 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org>

Add a helper that prepends the build-time-rendered embedded bootconfig
"kernel" subtree (embedded_kernel_cmdline[] from embedded-cmdline.S) to
a cmdline buffer with a separating space. Architectures call this from
setup_arch() before parse_early_param() so early_param() handlers
(mem=, earlycon=, loglevel=, ...) see values supplied via the embedded
bootconfig.

The in-place prepend (shift the existing string right, then drop the
embedded string in front) is factored into a small str_prepend() helper.

On overflow the helper logs an error and leaves the cmdline untouched
rather than panicking. Booting without the embedded values is better
than refusing to boot, and the error tells the user why their embedded
keys are missing.

The helper records whether it actually prepended, exposed via
xbc_embedded_cmdline_applied(). setup_boot_config() uses this to decide
whether the runtime "kernel" render would duplicate keys already folded
into boot_command_line.

When CONFIG_BOOT_CONFIG_EMBED_CMDLINE=n, the public declaration in
<linux/bootconfig.h> resolves to a no-op stub so callers compile
unchanged.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/bootconfig.h |  9 ++++++
 lib/bootconfig.c           | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index 1c7f3b74ffcf3..c186137f87ac5 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -308,4 +308,13 @@ static inline const char *xbc_get_embedded_bootconfig(size_t *size)
 }
 #endif
 
+/* Build-time-rendered bootconfig cmdline prepended in setup_arch() */
+#ifdef CONFIG_BOOT_CONFIG_EMBED_CMDLINE
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size);
+bool __init xbc_embedded_cmdline_applied(void);
+#else
+static inline void xbc_prepend_embedded_cmdline(char *dst, size_t size) { }
+static inline bool xbc_embedded_cmdline_applied(void) { return false; }
+#endif
+
 #endif
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 926094d97397e..f66be0b2dc241 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -19,6 +19,7 @@
 #include <linux/errno.h>
 #include <linux/cache.h>
 #include <linux/compiler.h>
+#include <linux/printk.h>
 #include <linux/sprintf.h>
 #include <linux/memblock.h>
 #include <linux/string.h>
@@ -34,6 +35,83 @@ const char * __init xbc_get_embedded_bootconfig(size_t *size)
 	return (*size) ? embedded_bootconfig_data : NULL;
 }
 #endif
+
+#ifdef CONFIG_BOOT_CONFIG_EMBED_CMDLINE
+/* embedded_kernel_cmdline is defined in embedded-cmdline.S */
+extern __visible const char embedded_kernel_cmdline[];
+extern __visible const char embedded_kernel_cmdline_end[];
+
+/* Set once the embedded cmdline has actually been prepended. */
+static bool xbc_cmdline_applied __initdata;
+
+/*
+ * str_prepend() - Prepend @src in front of the string in @dst, in place
+ * @dst: NUL-terminated destination buffer, currently @dst_len bytes long
+ * @dst_len: length of the current @dst string (excluding its NUL)
+ * @src: bytes to prepend (not NUL-terminated)
+ * @src_len: number of bytes from @src to prepend
+ *
+ * The caller must guarantee @dst has room for src_len + dst_len + 1 bytes.
+ * Moving dst_len + 1 bytes carries @dst's NUL terminator too, so an empty
+ * @dst needs no special case.
+ */
+static void __init str_prepend(char *dst, size_t dst_len,
+			       const char *src, size_t src_len)
+{
+	memmove(dst + src_len, dst, dst_len + 1);
+	memcpy(dst, src, src_len);
+}
+
+/**
+ * xbc_prepend_embedded_cmdline() - Prepend embedded bootconfig cmdline
+ * @dst: cmdline buffer to prepend into (must already contain a NUL byte)
+ * @size: total capacity of @dst in bytes
+ *
+ * Prepend the build-time-rendered "kernel" subtree of the embedded
+ * bootconfig to @dst. The rendered string already ends with a single
+ * space (the xbc_snprint_cmdline() invariant), which serves as the
+ * separator between the embedded keys and any existing content of @dst.
+ * On overflow, log an error and leave @dst untouched rather than
+ * silently truncating: booting without the embedded values is better
+ * than refusing to boot, and the error message tells the user why
+ * their embedded keys are missing.
+ *
+ * Intended to be called from setup_arch() before parse_early_param() so
+ * that early_param() handlers see the embedded values.
+ */
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size)
+{
+	size_t embed_len = embedded_kernel_cmdline_end - embedded_kernel_cmdline;
+	size_t dst_len;
+
+	if (!size || embed_len <= 1)	/* trailing NUL only */
+		return;
+	embed_len--;			/* exclude trailing NUL byte */
+
+	dst_len = strnlen(dst, size);
+	if (embed_len + dst_len + 1 > size) {
+		pr_err("embedded bootconfig cmdline (%zu bytes) does not fit in COMMAND_LINE_SIZE with %zu bytes already used; ignoring embedded values\n",
+		       embed_len, dst_len);
+		return;
+	}
+
+	str_prepend(dst, dst_len, embedded_kernel_cmdline, embed_len);
+	xbc_cmdline_applied = true;
+}
+
+/**
+ * xbc_embedded_cmdline_applied() - Did the embedded cmdline get prepended?
+ *
+ * Return true if xbc_prepend_embedded_cmdline() actually prepended the
+ * embedded "kernel" subtree. setup_boot_config() uses this to avoid
+ * rendering the same keys a second time.
+ */
+bool __init xbc_embedded_cmdline_applied(void)
+{
+	return xbc_cmdline_applied;
+}
+#endif
+
 #endif
 
 /*

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v5 6/7] Documentation: bootconfig: document build-time cmdline rendering
From: Breno Leitao @ 2026-06-17 11:23 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org>

Add a section describing CONFIG_BOOT_CONFIG_EMBED_CMDLINE: what it
does (renders the embedded "kernel" subtree to a flat cmdline at
build time so early_param() handlers see the values), what it
requires (BOOT_CONFIG_EMBED, a non-empty BOOT_CONFIG_EMBED_FILE,
and ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG -- currently x86 only),
the bootconfig opt-in semantics, the initrd-vs-embedded precedence,
and the soft-error overflow behavior.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Documentation/admin-guide/bootconfig.rst | 81 ++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst
index f712758472d5c..4a7e90c21f968 100644
--- a/Documentation/admin-guide/bootconfig.rst
+++ b/Documentation/admin-guide/bootconfig.rst
@@ -234,6 +234,87 @@ Kconfig option selected.
 Note that even if you set this option, you can override the embedded
 bootconfig by another bootconfig which attached to the initrd.
 
+Rendering Embedded kernel.* Keys at Build Time
+----------------------------------------------
+
+By default, the embedded bootconfig (``CONFIG_BOOT_CONFIG_EMBED=y``) is
+parsed at runtime, after ``parse_early_param()`` has already run. Early
+parameter handlers (``mem=``, ``earlycon=``, ``loglevel=``, ...) therefore
+cannot see values supplied via the embedded ``kernel`` subtree.
+
+``CONFIG_BOOT_CONFIG_EMBED_CMDLINE`` resolves this by rendering the
+``kernel`` subtree of ``CONFIG_BOOT_CONFIG_EMBED_FILE`` into a flat cmdline
+string at kernel build time (via ``tools/bootconfig -C``) and prepending
+it to ``boot_command_line`` during early architecture setup, so the keys
+are visible to ``parse_early_param()``.
+
+The option requires ``CONFIG_BOOT_CONFIG_EMBED=y``, a non-empty
+``CONFIG_BOOT_CONFIG_EMBED_FILE``, and an architecture that selects
+``CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG``. Currently only x86
+selects it; on other architectures the embedded bootconfig still works,
+but only through the late runtime parser.
+
+The same ``bootconfig`` opt-in applies as elsewhere: the rendered keys
+are prepended only when ``bootconfig`` (in any form) appears on the
+kernel command line, or when ``CONFIG_BOOT_CONFIG_FORCE`` is set, which
+defaults to ``y`` when ``CONFIG_BOOT_CONFIG_EMBED`` is set.
+
+For example, given::
+
+ kernel {
+   loglevel = 7
+   mem = 4G
+ }
+
+the kernel boots as if ``loglevel=7 mem=4G`` had been prepended to the
+bootloader command line, with the values visible to early-parsed
+handlers. Comma-separated values are still expanded into multiple
+cmdline entries per the bootconfig array convention -- the embedded
+``kernel.earlycon = "uart8250,io,0x3f8"`` must be quoted to land as a
+single ``earlycon=`` entry, exactly as for the runtime parser.
+
+If the rendered string would not fit in ``COMMAND_LINE_SIZE`` together
+with the existing command line, the prepend is skipped and an error is
+logged, so an oversized embedded bootconfig cannot brick a boot.
+
+Interaction with other command line and bootconfig sources
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+With ``CONFIG_BOOT_CONFIG_EMBED_CMDLINE=y`` the rendered ``kernel``
+subtree behaves like a build-time command line (similar to
+``CONFIG_CMDLINE``), not like a bootconfig source. It is prepended to
+``boot_command_line`` in ``setup_arch()``, before ``parse_early_param()``
+and long before the runtime parser looks at an initrd. Options can reach
+the kernel from up to four places:
+
+- Bootloader command line: the arguments the boot loader passes. The
+  embedded cmdline is prepended in front of them, so for last-one-wins
+  parameters a bootloader option still overrides the embedded value.
+  Visible in /proc/cmdline.
+- Embedded cmdline (this option): the rendered ``kernel`` subtree,
+  prepended early so it is seen by ``parse_early_param()``. Visible in
+  /proc/cmdline.
+- Initrd bootconfig: parsed late in ``setup_boot_config()``; its
+  ``kernel`` keys are placed ahead of ``boot_command_line``, i.e. before
+  the embedded cmdline, so last-wins favors the embedded values. As a
+  bootconfig source, an initrd bootconfig still replaces the embedded
+  bootconfig. Visible in /proc/cmdline and /proc/bootconfig.
+- Embedded bootconfig (runtime): parsed late, only when no initrd
+  bootconfig is present. Visible in /proc/cmdline and /proc/bootconfig.
+
+So with this option the embedded ``kernel.*`` values take precedence
+over an initrd bootconfig's ``kernel.*`` values: for early parameters
+the initrd is not parsed yet, and for ordinary parameters the embedded
+keys land later in the command line. If you need an initrd bootconfig to
+override the embedded ``kernel.*`` keys, leave this option off and rely
+on the runtime parser.
+
+The rendered string is part of the command line, so it appears in
+/proc/cmdline. It is deliberately not shown in /proc/bootconfig: that
+file keeps reporting the parsed bootconfig tree -- the initrd bootconfig
+if present, otherwise the embedded bootconfig -- independent of whether
+build-time cmdline rendering is enabled.
+
 Kernel parameters via Boot Config
 =================================
 

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v5 7/7] x86/setup: prepend embedded bootconfig cmdline before parse_early_param
From: Breno Leitao @ 2026-06-17 11:23 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org>

Call xbc_prepend_embedded_cmdline() in setup_arch() right after the
CONFIG_CMDLINE merge and before strscpy(command_line, ...) so the
build-time-rendered embedded bootconfig "kernel" subtree is part of
boot_command_line by the time parse_early_param() runs. early_param()
handlers (mem=, earlycon=, loglevel=, ...) now see values supplied via
CONFIG_BOOT_CONFIG_EMBED_FILE without parsing bootconfig at runtime.

Gate the prepend on the same opt-in the runtime parser uses: prepend
when "bootconfig" is present on the command line, or when
CONFIG_BOOT_CONFIG_FORCE is set. setup_boot_config()'s parse_args()
loop treats any presence of the "bootconfig" key as opt-in regardless
of value, so check both cmdline_find_option_bool() (matches the bare
key) and cmdline_find_option() (matches "bootconfig=<anything>").
Without the latter check, "bootconfig=0" would skip the early prepend
yet still trigger the late runtime apply, leaving the embedded keys
invisible to early_param() but applied to saved_command_line.

The prepend necessarily runs before setup_boot_config() detects an
initrd bootconfig, so an initrd cannot override the embedded "kernel"
keys for early_param(). This is intentional: the embedded cmdline acts
like a build-time CONFIG_CMDLINE. An initrd bootconfig's "kernel" keys
never reached early_param() anyway (they apply late via
extra_command_line), so nothing is lost -- the initrd keys still apply
late, with last-wins keeping the embedded values in effect.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/x86/Kconfig        |  1 +
 arch/x86/kernel/setup.c | 27 +++++++++++++++++++++++++++
 init/main.c             | 25 ++++++++++++++++++++++---
 3 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0de23e6471973..8ab11199c16d5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -127,6 +127,7 @@ config X86
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
 	select ARCH_SUPPORTS_CFI		if X86_64
+	select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
 	select ARCH_USES_CFI_TRAPS		if X86_64 && CFI
 	select ARCH_SUPPORTS_LTO_CLANG
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 46882ce79c3a4..d69ba84c203f1 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -6,6 +6,7 @@
  * parts of early kernel initialization.
  */
 #include <linux/acpi.h>
+#include <linux/bootconfig.h>
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/crash_dump.h>
@@ -36,6 +37,7 @@
 #include <asm/bios_ebda.h>
 #include <asm/bugs.h>
 #include <asm/cacheinfo.h>
+#include <asm/cmdline.h>
 #include <asm/coco.h>
 #include <asm/cpu.h>
 #include <asm/efi.h>
@@ -924,6 +926,31 @@ void __init setup_arch(char **cmdline_p)
 	builtin_cmdline_added = true;
 #endif
 
+	/*
+	 * Match the runtime bootconfig parser's opt-in: only fold the
+	 * embedded kernel.* keys into the cmdline when "bootconfig" is
+	 * present on the command line, or CONFIG_BOOT_CONFIG_FORCE is set.
+	 * setup_boot_config()'s parse_args() loop treats any presence of
+	 * the "bootconfig" key as opt-in (bare, =0, =1, ...), so check both
+	 * forms here: cmdline_find_option_bool() matches the bare key,
+	 * cmdline_find_option() matches "bootconfig=<anything>". Without
+	 * the second check, "bootconfig=0" would skip the early prepend
+	 * but still trigger the late runtime apply -- a split-brain state.
+	 * CONFIG_BOOT_CONFIG_FORCE defaults to y when BOOT_CONFIG_EMBED is
+	 * set, so on the default config the embedded keys are applied
+	 * unconditionally.
+	 */
+	{
+		char buf[8];
+
+		if (IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE) ||
+		    cmdline_find_option_bool(boot_command_line, "bootconfig") ||
+		    cmdline_find_option(boot_command_line, "bootconfig",
+					buf, sizeof(buf)) >= 0)
+			xbc_prepend_embedded_cmdline(boot_command_line,
+						     COMMAND_LINE_SIZE);
+	}
+
 	strscpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
 	*cmdline_p = command_line;
 
diff --git a/init/main.c b/init/main.c
index e363232b428b4..2ecb6aa536dd1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -378,12 +378,15 @@ static void __init setup_boot_config(void)
 	int pos, ret;
 	size_t size;
 	char *err;
+	bool from_embedded = false;
 
 	/* Cut out the bootconfig data even if we have no bootconfig option */
 	data = get_boot_config_from_initrd(&size);
 	/* If there is no bootconfig in initrd, try embedded one. */
-	if (!data)
+	if (!data) {
 		data = xbc_get_embedded_bootconfig(&size);
+		from_embedded = true;
+	}
 
 	strscpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
 	err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
@@ -421,8 +424,24 @@ static void __init setup_boot_config(void)
 	} else {
 		xbc_get_info(&ret, NULL);
 		pr_info("Load bootconfig: %ld bytes %d nodes\n", (long)size, ret);
-		/* keys starting with "kernel." are passed via cmdline */
-		extra_command_line = xbc_make_cmdline("kernel");
+		/*
+		 * keys starting with "kernel." are passed via cmdline. When
+		 * this bootconfig came from the embedded source and
+		 * setup_arch() already prepended the rendered "kernel" subtree
+		 * to boot_command_line, rendering again here would duplicate
+		 * the keys in saved_command_line and make accumulating handlers
+		 * (console=, earlycon=, ...) re-register the same value. Skip
+		 * only when the prepend really happened.
+		 *
+		 * On arches that do not select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG,
+		 * CONFIG_BOOT_CONFIG_EMBED_CMDLINE is unselectable and
+		 * xbc_embedded_cmdline_applied() collapses to a stub returning
+		 * false, so this path still runs and the embedded "kernel"
+		 * keys reach the cmdline via the runtime parser exactly as
+		 * before this series.
+		 */
+		if (!from_embedded || !xbc_embedded_cmdline_applied())
+			extra_command_line = xbc_make_cmdline("kernel");
 		/* Also, "init." keys are init arguments */
 		extra_init_args = xbc_make_cmdline("init");
 	}

-- 
2.53.0-Meta


^ permalink raw reply related

* [RFC PATCH v2 0/4] tracing/osnoise: Track IPIs
From: Valentin Schneider @ 2026-06-17 13:17 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Tomas Glozar,
	Costa Shulyupin, Crystal Wood, John Kacur, Ivan Pravdin,
	Jonathan Corbet

Hi folks,

So I've seen a few times now reports of latency spikes caused by IPIs, usually
because of isolation misconfiguration, but only detected at the tail of end
e.g. a 24h timerlat run.

It's not because those IPIs are rare, but rather that they don't by themselves
cause a monitered CPU to reach the latency threshold, it's usually a combined
interference that gets us there.

I'd like to make it easier to detect such misconfigurations and thus IPIs
hitting supposedly-isolated CPUs. I initially kludged a timerlat option to stop
tracing as soon as an IPI was sent to a monitored CPU, regardless of the latency
threshold. It sort of did the trick, but Tomáš convinced me timerlat wasn't
really the place for that.

So here's IPI tracking added to osnoise. This time around fully in userspace, as
Tomáš pointed out to me that this will make it a lot easier to deploy to older
kernels.

Based on top of linux/next at 'next-20260616' to have the latest libsubcmd
changes.
  
Cheers,
Valentin

Revisions
=========

v1 -> v2
++++++++

o Dropped the in-kernel osnoise_sample changes and made it all userspace

Valentin Schneider (4):
  rtla/osnoise: Add IPI tracking cmdline option
  rtla/osnoise: Record IPI count in osnoise top
  rtla/osnoise: Trace IPI events when recording a trace file
  rtla/osnoise: Leverage IPI event filters when tracing a subset of CPUs

 Documentation/tools/rtla/rtla-osnoise-top.rst |   4 +
 tools/tracing/rtla/src/cli.c                  |   1 +
 tools/tracing/rtla/src/cli_p.h                |   3 +
 tools/tracing/rtla/src/common.c               |   2 +-
 tools/tracing/rtla/src/common.h               |   3 +-
 tools/tracing/rtla/src/osnoise.c              |  17 +-
 tools/tracing/rtla/src/osnoise_top.c          | 153 +++++++++++++++++-
 7 files changed, 179 insertions(+), 4 deletions(-)

--
2.54.0


^ permalink raw reply

* [RFC PATCH v2 1/4] rtla/osnoise: Add IPI tracking cmdline option
From: Valentin Schneider @ 2026-06-17 13:17 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Tomas Glozar, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Costa Shulyupin, Crystal Wood, John Kacur, Ivan Pravdin,
	Jonathan Corbet
In-Reply-To: <20260617131803.2988989-1-vschneid@redhat.com>

Later commits will add IPI tracking to osnoise top. To avoid breaking
existing scripts, this new feature will be gated behind a new -i option.

Suggested-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 Documentation/tools/rtla/rtla-osnoise-top.rst | 4 ++++
 tools/tracing/rtla/src/cli.c                  | 1 +
 tools/tracing/rtla/src/cli_p.h                | 3 +++
 tools/tracing/rtla/src/common.h               | 1 +
 4 files changed, 9 insertions(+)

diff --git a/Documentation/tools/rtla/rtla-osnoise-top.rst b/Documentation/tools/rtla/rtla-osnoise-top.rst
index b91c02ac2bbe1..98f77f8971a69 100644
--- a/Documentation/tools/rtla/rtla-osnoise-top.rst
+++ b/Documentation/tools/rtla/rtla-osnoise-top.rst
@@ -28,6 +28,10 @@ OPTIONS
 =======
 .. include:: common_osnoise_options.txt
 
+**-i**, **--ipi**
+
+	Track sources of IPIs.
+
 .. include:: common_top_options.txt
 
 .. include:: common_options.txt
diff --git a/tools/tracing/rtla/src/cli.c b/tools/tracing/rtla/src/cli.c
index c5279c9875310..eb1e76a6b0dea 100644
--- a/tools/tracing/rtla/src/cli.c
+++ b/tools/tracing/rtla/src/cli.c
@@ -78,6 +78,7 @@ struct common_params *osnoise_top_parse_args(int argc, char **argv)
 		RTLA_OPT_STOP_TOTAL('S', "stop-total", "total sample"),
 		OSNOISE_OPT_THRESHOLD,
 		RTLA_OPT_TRACE_OUTPUT("osnoise", opt_osnoise_trace_output_cb),
+		OSNOISE_OPT_IPI,
 
 	OPT_GROUP("Event Configuration:"),
 		RTLA_OPT_EVENT,
diff --git a/tools/tracing/rtla/src/cli_p.h b/tools/tracing/rtla/src/cli_p.h
index 3c939de9abf02..7d3f982cfabdb 100644
--- a/tools/tracing/rtla/src/cli_p.h
+++ b/tools/tracing/rtla/src/cli_p.h
@@ -305,6 +305,9 @@ static int opt_filter_cb(const struct option *opt, const char *arg, int unset)
 	"the minimum delta to be considered a noise", \
 	opt_llong_callback)
 
+#define OSNOISE_OPT_IPI OPT_BOOLEAN('i', "ipi", &params->common.ipi, \
+	"track sources of IPIs")
+
 /*
  * Callback functions for command line options for osnoise tools
  */
diff --git a/tools/tracing/rtla/src/common.h b/tools/tracing/rtla/src/common.h
index 04b287a03f6d4..045253230fcf2 100644
--- a/tools/tracing/rtla/src/common.h
+++ b/tools/tracing/rtla/src/common.h
@@ -108,6 +108,7 @@ struct common_params {
 	bool			kernel_workload;
 	bool			user_data;
 	bool			aa_only;
+	bool			ipi;
 
 	struct actions		threshold_actions;
 	struct actions		end_actions;
-- 
2.54.0


^ permalink raw reply related

* [RFC PATCH v2 2/4] rtla/osnoise: Record IPI count in osnoise top
From: Valentin Schneider @ 2026-06-17 13:17 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Tomas Glozar,
	Costa Shulyupin, Crystal Wood, John Kacur, Ivan Pravdin,
	Jonathan Corbet
In-Reply-To: <20260617131803.2988989-1-vschneid@redhat.com>

Leverage the ipi_send_cpu and ipi_send_cpumask trace events to record the
count of IPIs sent to monitored CPUs. These interferences are already
accounted by the IRQ count, but this split gives a better overall picture.

This uses the newly added -i cmdline option.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 tools/tracing/rtla/src/osnoise_top.c | 124 ++++++++++++++++++++++++++-
 1 file changed, 123 insertions(+), 1 deletion(-)

diff --git a/tools/tracing/rtla/src/osnoise_top.c b/tools/tracing/rtla/src/osnoise_top.c
index 512a6299cb018..5b462a3543b97 100644
--- a/tools/tracing/rtla/src/osnoise_top.c
+++ b/tools/tracing/rtla/src/osnoise_top.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <signal.h>
 #include <unistd.h>
+#include <errno.h>
 #include <stdio.h>
 #include <time.h>
 
@@ -25,6 +26,7 @@ struct osnoise_top_cpu {
 	unsigned long long	irq_count;
 	unsigned long long	softirq_count;
 	unsigned long long	thread_count;
+	unsigned long long	ipi_count;
 
 	int			sum_cycles;
 };
@@ -70,6 +72,91 @@ static struct osnoise_top_data *osnoise_alloc_top(void)
 	return NULL;
 }
 
+static void account_ipi(struct osnoise_tool *tool,
+			unsigned long long src_cpu, unsigned long long dst_cpu)
+{
+	struct osnoise_top_cpu *cpu_data;
+	struct osnoise_top_data *data;
+	unsigned long long inc = 1;
+
+	data = tool->data;
+	cpu_data = &data->cpu_data[dst_cpu];
+
+	update_sum(&cpu_data->ipi_count, &inc);
+}
+
+/*
+ * osnoise_ipi_cpu_handler - this is the handler for single CPU IPI events.
+ */
+static int
+osnoise_ipi_cpu_handler(struct trace_seq *s, struct tep_record *record,
+		     struct tep_event *event, void *context)
+{
+	struct osnoise_tool *tool;
+	struct osnoise_params *params;
+	unsigned long long src_cpu, dst_cpu;
+	struct trace_instance *trace = context;
+
+	tool = container_of(trace, struct osnoise_tool, trace);
+	params = to_osnoise_params(tool->params);
+
+	src_cpu = record->cpu;
+	tep_get_field_val(s, event, "cpu", record, &dst_cpu, 1);
+
+	if (CPU_ISSET(dst_cpu, &params->common.monitored_cpus))
+		account_ipi(tool, src_cpu, dst_cpu);
+
+	return 0;
+}
+
+static cpu_set_t cpumask_tmp_cpus;
+
+/*
+ * osnoise_ipi_cpumask_handler - this is the handler for broadcasted IPI events.
+ */
+static int
+osnoise_ipi_cpumask_handler(struct trace_seq *s, struct tep_record *record,
+			 struct tep_event *event, void *context)
+{
+	struct trace_instance *trace = context;
+	struct osnoise_tool *tool;
+	struct osnoise_params *params;
+	struct tep_format_field *field;
+	unsigned long long src_cpu;
+	cpu_set_t *event_cpus;
+	int len;
+
+	tool = container_of(trace, struct osnoise_tool, trace);
+	params = to_osnoise_params(tool->params);
+
+	src_cpu = record->cpu;
+
+	field = tep_find_field(event, "cpumask");
+	if (!field)
+		return 0;
+
+	event_cpus = tep_get_field_raw(s, event, "cpumask", record, &len, 1);
+	if (!event_cpus) {
+		err_msg("Failed to get cpumask field\n");
+		return 0;
+	}
+
+	CPU_AND(&cpumask_tmp_cpus, event_cpus, &params->common.monitored_cpus);
+
+	/*
+	 * Computing the mask weight is overkill but there is no leaner option
+	 * provided by glibc, e.g cpumask_first() or somesuch.
+	 */
+	if (CPU_COUNT(&cpumask_tmp_cpus)) {
+		for (int cpu = 0; cpu < nr_cpus; cpu++) {
+			if (CPU_ISSET(cpu, &cpumask_tmp_cpus))
+				account_ipi(tool, src_cpu, cpu);
+		}
+	}
+
+	return 0;
+}
+
 /*
  * osnoise_top_handler - this is the handler for osnoise tracer events
  */
@@ -164,6 +251,8 @@ static void osnoise_top_header(struct osnoise_tool *top)
 		goto eol;
 
 	trace_seq_printf(s, "          IRQ      Softirq       Thread");
+	if (params->common.ipi)
+		trace_seq_printf(s, "          IPI");
 
 eol:
 	if (pretty)
@@ -218,7 +307,13 @@ static void osnoise_top_print(struct osnoise_tool *tool, int cpu)
 
 	trace_seq_printf(s, "%12llu ", cpu_data->irq_count);
 	trace_seq_printf(s, "%12llu ", cpu_data->softirq_count);
-	trace_seq_printf(s, "%12llu\n", cpu_data->thread_count);
+	trace_seq_printf(s, "%12llu", cpu_data->thread_count);
+	if (!params->common.ipi) {
+		trace_seq_printf(s, "\n");
+		return;
+	}
+
+	trace_seq_printf(s, " %12llu\n", cpu_data->ipi_count);
 }
 
 /*
@@ -281,6 +376,7 @@ osnoise_top_apply_config(struct osnoise_tool *tool)
 struct osnoise_tool *osnoise_init_top(struct common_params *params)
 {
 	struct osnoise_tool *tool;
+	int retval;
 
 	tool = osnoise_init_tool("osnoise_top");
 	if (!tool)
@@ -295,7 +391,33 @@ struct osnoise_tool *osnoise_init_top(struct common_params *params)
 	tep_register_event_handler(tool->trace.tep, -1, "ftrace", "osnoise",
 				   osnoise_top_handler, NULL);
 
+	if (!params->ipi)
+		goto out;
+
+	retval = tracefs_event_enable(tool->trace.inst, "ipi", "ipi_send_cpu");
+	if (retval < 0 && !errno) {
+		err_msg("Could not find ipi_send_cpu event\n");
+		goto out_err;
+	}
+
+	retval = tracefs_event_enable(tool->trace.inst, "ipi", "ipi_send_cpumask");
+	if (retval < 0 && !errno) {
+		err_msg("Could not find ipi_send_cpumask event\n");
+		goto out_err;
+	}
+
+	tep_register_event_handler(tool->trace.tep, -1, "ipi", "ipi_send_cpu",
+				   osnoise_ipi_cpu_handler, NULL);
+
+	tep_register_event_handler(tool->trace.tep, -1, "ipi", "ipi_send_cpumask",
+				   osnoise_ipi_cpumask_handler, NULL);
+
+out:
 	return tool;
+out_err:
+	osnoise_free_top_tool(tool);
+	osnoise_destroy_tool(tool);
+	return NULL;
 }
 
 struct tool_ops osnoise_top_ops = {
-- 
2.54.0


^ permalink raw reply related

* [RFC PATCH v2 3/4] rtla/osnoise: Trace IPI events when recording a trace file
From: Valentin Schneider @ 2026-06-17 13:17 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Tomas Glozar,
	Costa Shulyupin, Crystal Wood, John Kacur, Ivan Pravdin,
	Jonathan Corbet
In-Reply-To: <20260617131803.2988989-1-vschneid@redhat.com>

IPIs can now be monitored and accounted by osnoise top. When that is
the case, also record them when saving a trace file.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 tools/tracing/rtla/src/common.c  |  2 +-
 tools/tracing/rtla/src/common.h  |  2 +-
 tools/tracing/rtla/src/osnoise.c | 17 ++++++++++++++++-
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/tools/tracing/rtla/src/common.c b/tools/tracing/rtla/src/common.c
index d0a8a6edbf0cb..dd302427557ca 100644
--- a/tools/tracing/rtla/src/common.c
+++ b/tools/tracing/rtla/src/common.c
@@ -204,7 +204,7 @@ int run_tool(struct tool_ops *ops, int argc, char *argv[])
 
 	if (params->threshold_actions.present[ACTION_TRACE_OUTPUT] ||
 	    params->end_actions.present[ACTION_TRACE_OUTPUT]) {
-		tool->record = osnoise_init_trace_tool(ops->tracer);
+		tool->record = osnoise_init_trace_tool(params, ops->tracer);
 		if (!tool->record) {
 			err_msg("Failed to enable the trace instance\n");
 			goto out_free;
diff --git a/tools/tracing/rtla/src/common.h b/tools/tracing/rtla/src/common.h
index 045253230fcf2..421e06e10f3f1 100644
--- a/tools/tracing/rtla/src/common.h
+++ b/tools/tracing/rtla/src/common.h
@@ -178,7 +178,7 @@ int osnoise_set_workload(struct osnoise_context *context, bool onoff);
 
 void osnoise_destroy_tool(struct osnoise_tool *top);
 struct osnoise_tool *osnoise_init_tool(char *tool_name);
-struct osnoise_tool *osnoise_init_trace_tool(const char *tracer);
+struct osnoise_tool *osnoise_init_trace_tool(struct common_params *params, const char *tracer);
 bool osnoise_trace_is_off(struct osnoise_tool *tool, struct osnoise_tool *record);
 int osnoise_set_stop_us(struct osnoise_context *context, long long stop_us);
 int osnoise_set_stop_total_us(struct osnoise_context *context,
diff --git a/tools/tracing/rtla/src/osnoise.c b/tools/tracing/rtla/src/osnoise.c
index 4ff5dad013b10..281f6f57d15af 100644
--- a/tools/tracing/rtla/src/osnoise.c
+++ b/tools/tracing/rtla/src/osnoise.c
@@ -1181,7 +1181,8 @@ struct osnoise_tool *osnoise_init_tool(char *tool_name)
 /*
  * osnoise_init_trace_tool - init a tracer instance to trace osnoise events
  */
-struct osnoise_tool *osnoise_init_trace_tool(const char *tracer)
+struct osnoise_tool *osnoise_init_trace_tool(struct common_params *params,
+					     const char *tracer)
 {
 	struct osnoise_tool *trace;
 	int retval;
@@ -1196,6 +1197,20 @@ struct osnoise_tool *osnoise_init_trace_tool(const char *tracer)
 		goto out_err;
 	}
 
+	if (params->ipi) {
+		retval = tracefs_event_enable(trace->trace.inst, "ipi", "ipi_send_cpu");
+		if (retval < 0 && !errno) {
+			err_msg("Could not find ipi_send_cpu event\n");
+			goto out_err;
+		}
+
+		retval = tracefs_event_enable(trace->trace.inst, "ipi", "ipi_send_cpumask");
+		if (retval < 0 && !errno) {
+			err_msg("Could not find ipi_send_cpumask event\n");
+			goto out_err;
+		}
+	}
+
 	retval = enable_tracer_by_name(trace->trace.inst, tracer);
 	if (retval) {
 		err_msg("Could not enable %s tracer for tracing\n", tracer);
-- 
2.54.0


^ permalink raw reply related

* [RFC PATCH v2 4/4] rtla/osnoise: Leverage IPI event filters when tracing a subset of CPUs
From: Valentin Schneider @ 2026-06-17 13:17 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Tomas Glozar,
	Costa Shulyupin, Crystal Wood, John Kacur, Ivan Pravdin,
	Jonathan Corbet
In-Reply-To: <20260617131803.2988989-1-vschneid@redhat.com>

Instead of post-processing the events in the tracefs_iterate_raw_events()
callbacks, leverage the kernel event filtering infrastructure to only emit
IPI events if they target CPUs that are being traced, as specified by the
-c cmdline option.

Note that some post-processing is still required for the ipi_send_cpumask
event, as the event being emitted means *some* CPUs targeted by that event
are monitored, but not all of them - userspace has to recompute that
intersection.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 tools/tracing/rtla/src/osnoise_top.c | 37 +++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/tools/tracing/rtla/src/osnoise_top.c b/tools/tracing/rtla/src/osnoise_top.c
index 5b462a3543b97..8040521710884 100644
--- a/tools/tracing/rtla/src/osnoise_top.c
+++ b/tools/tracing/rtla/src/osnoise_top.c
@@ -93,18 +93,15 @@ osnoise_ipi_cpu_handler(struct trace_seq *s, struct tep_record *record,
 		     struct tep_event *event, void *context)
 {
 	struct osnoise_tool *tool;
-	struct osnoise_params *params;
 	unsigned long long src_cpu, dst_cpu;
 	struct trace_instance *trace = context;
 
 	tool = container_of(trace, struct osnoise_tool, trace);
-	params = to_osnoise_params(tool->params);
 
 	src_cpu = record->cpu;
 	tep_get_field_val(s, event, "cpu", record, &dst_cpu, 1);
 
-	if (CPU_ISSET(dst_cpu, &params->common.monitored_cpus))
-		account_ipi(tool, src_cpu, dst_cpu);
+	account_ipi(tool, src_cpu, dst_cpu);
 
 	return 0;
 }
@@ -141,6 +138,11 @@ osnoise_ipi_cpumask_handler(struct trace_seq *s, struct tep_record *record,
 		return 0;
 	}
 
+	/*
+	 * Despite already filtering for such an intersection, we need to compute
+	 * the intersection here as the @cpumask field may contain non-monitered
+	 * CPUs.
+	 */
 	CPU_AND(&cpumask_tmp_cpus, event_cpus, &params->common.monitored_cpus);
 
 	/*
@@ -406,6 +408,33 @@ struct osnoise_tool *osnoise_init_top(struct common_params *params)
 		goto out_err;
 	}
 
+	/*
+	 * If tracing on a subset of possible CPUs, leverage the kernel filtering
+	 * infrastructure to only generate events on traced CPUs.
+	 */
+	if (params->cpus) {
+		char filter[MAX_PATH];
+
+		snprintf(filter, ARRAY_SIZE(filter), "cpu & CPUS{%s}\n", params->cpus);
+		retval = tracefs_event_file_write(tool->trace.inst,
+						  "ipi", "ipi_send_cpu", "filter",
+						  filter);
+		if (retval) {
+			err_msg("Could not set ipi_send_cpu CPU filter\n");
+			goto out_err;
+		}
+
+
+		snprintf(filter, ARRAY_SIZE(filter), "cpumask & CPUS{%s}\n", params->cpus);
+		retval = tracefs_event_file_write(tool->trace.inst,
+						  "ipi", "ipi_send_cpumask", "filter",
+						  filter);
+		if (retval) {
+			err_msg("Could not set ipi_send_cpumask CPU filter\n");
+			goto out_err;
+		}
+	}
+
 	tep_register_event_handler(tool->trace.tep, -1, "ipi", "ipi_send_cpu",
 				   osnoise_ipi_cpu_handler, NULL);
 
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH] tracing: ring-buffer: allowlist clang-generated symbols
From: Vincent Donnefort @ 2026-06-17 13:26 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Steven Rostedt, Masami Hiramatsu, Nathan Chancellor,
	Arnd Bergmann, Mathieu Desnoyers, Nick Desaulniers, Bill Wendling,
	Justin Stitt, Marc Zyngier, Thomas Weißschuh, Paolo Bonzini,
	linux-kernel, linux-trace-kernel, llvm
In-Reply-To: <20260616164211.3733326-1-arnd@kernel.org>

On Tue, Jun 16, 2026 at 06:42:03PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> In randconfig build testing using clang-22, I came across two
> sets of extra symbols in the ring buffer code that may get
> inserted by the compiler:
> 
> Unexpected symbols in kernel/trace/simple_ring_buffer.o:
>          U memset
> 
> Unexpected symbols in kernel/trace/simple_ring_buffer.o:
>                  U llvm_gcda_emit_arcs
>                  U llvm_gcda_emit_function
>                  U llvm_gcda_end_file
>                  U llvm_gcda_start_file
>                  U llvm_gcda_summary_info
>                  U llvm_gcov_init
> 
> Add all of these to the allowlist.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  kernel/trace/Makefile | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index f934ff586bd4..aa8564fb8ff4 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -146,6 +146,7 @@ KASAN_SANITIZE_undefsyms_base.o := y

Would "GCOV_PROFILE_undefsyms_base.o := y" work?

>  
>  UNDEFINED_ALLOWLIST = __asan __gcov __kasan __kcsan __hwasan __sancov __sanitizer __tsan __ubsan __msan \
>  		      __aeabi_unwind_cpp __s390_indirect_jump __x86_indirect_thunk simple_ring_buffer \
> +		      memset llvm_gcda llvm_gcov \
>  		      $(shell $(NM) -u $(obj)/undefsyms_base.o 2>/dev/null | awk '{print $$2}')
>  
>  quiet_cmd_check_undefined = NM      $<
> -- 
> 2.39.5
> 

^ permalink raw reply

* Re: [PATCH v5 3/7] bootconfig: render embedded bootconfig as a kernel cmdline at build time
From: Nicolas Schier @ 2026-06-17 13:30 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, linux-kernel,
	linux-trace-kernel, linux-kbuild, bpf, kernel-team
In-Reply-To: <20260617-bootconfig_using_tools-v5-3-fd589a9cc5e3@debian.org>

On Wed, Jun 17, 2026 at 04:23:35AM -0700, Breno Leitao wrote:
> Add the build-time pipeline that renders the "kernel" subtree of
> CONFIG_BOOT_CONFIG_EMBED_FILE into a flat cmdline string and stashes
> it in .init.rodata as embedded_kernel_cmdline[]. A follow-up patch
> adds the runtime helper that prepends this string to boot_command_line
> during early architecture setup so parse_early_param() sees the values.
> 
> The build wires up:
>   tools/bootconfig -C kernel - userspace tool already shared with
>                                lib/bootconfig.c, used here in -C mode
>                                to render a bootconfig file to a cmdline
>   lib/embedded-cmdline.S     - .incbin's the rendered text plus a NUL
>                                (listed under the EXTRA BOOT CONFIG
>                                MAINTAINERS entry)
>   lib/Makefile rule          - runs tools/bootconfig at build time
>   Makefile prepare dep       - ensures tools/bootconfig is built first,
>                                same pattern as tools/objtool and
>                                tools/bpf/resolve_btfids
[...]
> 
> Drop the test target from tools/bootconfig/Makefile's default 'all'
> recipe so that hooking the binary into the kernel build does not run
> test-bootconfig.sh on every prepare. The tests stay available as
> 'make -C tools/bootconfig test', matching the convention of
> tools/objtool and tools/bpf/resolve_btfids whose 'all' targets only
> build the binary.
> 
> Require BOOT_CONFIG_EMBED_FILE to be non-empty before the new option
> can be enabled, otherwise tools/bootconfig -C runs against an empty
> file and prints a parse error on every kernel build.
> 
> The feature gates on CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, a
> silent symbol arches select once they've wired the prepend call into
> setup_arch(). No arch selects it in this patch, so the user-visible
> CONFIG_BOOT_CONFIG_EMBED_CMDLINE is not yet enableable; when an arch
> later opts in, the runtime behavior is added by the follow-up patches.
> 
> tools/bootconfig also installs on target systems, so its own Makefile
> keeps $(CC) and stays cross-buildable as a standalone tool. The kernel
> build, which runs the tool on the build host during prepare, instead
> forces CC=$(HOSTCC) from a dedicated tools/bootconfig rule and clears
> CROSS_COMPILE= in the sub-make. Without that clear, an LLVM=1 cross
> build would inherit CROSS_COMPILE and tools/scripts/Makefile.include
> would inject --target=/--sysroot= flags into the host clang invocation,
> producing a target binary that fails to exec ("Exec format error").
> 
> embedded-cmdline.S places the rendered string in its own .init.rodata
> subsection (.init.rodata.embed_cmdline) with the "a" (allocatable,
> read-only) flag and %progbits. lib/bootconfig-data.S already places
> the embedded bootconfig blob in .init.rodata with the "aw" flag
> (xbc_init() rewrites separators in place, so that data must be
> writable). Using a distinct subsection name avoids the ld.lld section-
> type mismatch that would otherwise arise from mixing "a" and "aw"
> under the same name; the linker's "*(.init.rodata .init.rodata.*)"
> glob still folds both into the init image and frees them after boot.
> 
> A follow-up patch wires the build-time tools/bootconfig into the
> top-level clean target.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  MAINTAINERS               |  1 +
>  Makefile                  | 15 +++++++++++++++
>  init/Kconfig              | 35 +++++++++++++++++++++++++++++++++++
>  lib/Makefile              | 16 ++++++++++++++++
>  lib/embedded-cmdline.S    | 16 ++++++++++++++++
>  tools/bootconfig/Makefile |  2 +-
>  6 files changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 57656ec0e9d5d..953231df1911d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9844,6 +9844,7 @@ F:	fs/proc/bootconfig.c
>  F:	include/linux/bootconfig.h
>  F:	lib/bootconfig-data.S
>  F:	lib/bootconfig.c
> +F:	lib/embedded-cmdline.S
>  F:	tools/bootconfig/*
>  F:	tools/bootconfig/scripts/*
>  
> diff --git a/Makefile b/Makefile
> index bf196c6df5b92..a7abb3f9a6264 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1545,6 +1545,21 @@ prepare: tools/bpf/resolve_btfids
>  endif
>  endif
>  
> +# tools/bootconfig renders the embedded bootconfig into a cmdline at build time.
> +ifdef CONFIG_BOOT_CONFIG_EMBED_CMDLINE
> +prepare: tools/bootconfig
> +endif
> +
> +# tools/bootconfig is run on the build host during prepare, so force a host
> +# binary here; its own Makefile keeps $(CC) for standalone and cross builds.
> +# CROSS_COMPILE= is cleared so tools/scripts/Makefile.include does not inject
> +# the target's --target=/--sysroot= flags into the host clang invocation under
> +# LLVM=1 cross builds (which would produce a target binary that fails to exec).
> +tools/bootconfig: FORCE
> +	$(Q)mkdir -p $(objtree)/tools
> +	$(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/ \
> +		bootconfig CC=$(HOSTCC) CROSS_COMPILE=

sashiko whines (priority: low) about the 'CC=$(HOSTCC)' as HOSTCC might 
contains spaces (e.g. "ccache gcc") [1].  Instead of adding quotes (as 
sashiko suggests), the CC could be redefined locally for the target, for 
example:


tools/bootconfig: export CC := $(HOSTCC)
tools/bootconfig: FORCE
	$(Q)mkdir -p $(objtree)/tools
	$(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/ \
		bootconfig CROSS_COMPILE=


That way, make handles the variable definition as it should and there is 
no interference with shell escaping.

for Kbuild:

Reviewed-by: Nicolas Schier <n.schier@fritz.com>


Kind regards,
Nicolas


[1]: http://sashiko.dev/#/message/20260617113701.0405E1F000E9%40smtp.kernel.org


> +
>  # The tools build system is not a part of Kbuild and tends to introduce
>  # its own unique issues. If you need to integrate a new tool into Kbuild,
>  # please consider locating that tool outside the tools/ tree and using the
> diff --git a/init/Kconfig b/init/Kconfig
> index 5230d4879b1c8..d2b8613a6b927 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1566,6 +1566,41 @@ config BOOT_CONFIG_EMBED_FILE
>  	  This bootconfig will be used if there is no initrd or no other
>  	  bootconfig in the initrd.
>  
> +config ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
> +	bool
> +	help
> +	  Silent symbol; no C code reads it directly. Architectures
> +	  select it once their setup_arch() calls
> +	  xbc_prepend_embedded_cmdline() before parse_early_param().
> +	  Its only role is to gate the user-visible
> +	  BOOT_CONFIG_EMBED_CMDLINE option per-arch, the same
> +	  ARCH_SUPPORTS_* idiom used by ARCH_SUPPORTS_CFI, etc.
> +
> +config BOOT_CONFIG_EMBED_CMDLINE
> +	bool "Render embedded bootconfig as kernel cmdline at build time"
> +	depends on BOOT_CONFIG_EMBED_FILE != ""
> +	depends on ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
> +	default n
> +	help
> +	  Render the "kernel" subtree of the embedded bootconfig file into a
> +	  flat cmdline string at kernel build time and prepend it to
> +	  boot_command_line during early architecture setup. This makes
> +	  early_param() handlers (e.g. mem=, earlycon=, loglevel=) see the
> +	  values supplied via the embedded bootconfig.
> +
> +	  The runtime bootconfig parser is unaffected, so tree-structured
> +	  consumers such as ftrace boot-time tracing keep working.
> +
> +	  Note: when an initrd also carries a bootconfig, its "kernel"
> +	  subtree is still parsed at runtime, but the embedded "kernel"
> +	  keys remain in boot_command_line for parse_early_param() and
> +	  end up later than the initrd keys in saved_command_line, so
> +	  parse_args() last-wins favors the embedded values. If you need
> +	  initrd to override embedded kernel.* keys, leave this option
> +	  off.
> +
> +	  If unsure, say N.
> +
>  config CMDLINE_LOG_WRAP_IDEAL_LEN
>  	int "Length to try to wrap the cmdline when logged at boot"
>  	default 1021
> diff --git a/lib/Makefile b/lib/Makefile
> index 7f75cc6edf94a..4ace86a5cb6de 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -273,6 +273,22 @@ filechk_defbconf = cat $(or $(real-prereqs), /dev/null)
>  $(obj)/default.bconf: $(CONFIG_BOOT_CONFIG_EMBED_FILE) FORCE
>  	$(call filechk,defbconf)
>  
> +obj-$(CONFIG_BOOT_CONFIG_EMBED_CMDLINE) += embedded-cmdline.o
> +$(obj)/embedded-cmdline.o: $(obj)/embedded_cmdline.bin
> +
> +# Render the bootconfig "kernel" subtree to a flat cmdline string using
> +# the userspace tools/bootconfig parser (-C mode). The runtime prepend
> +# helper enforces COMMAND_LINE_SIZE at boot, so no build-time size
> +# check is performed here (COMMAND_LINE_SIZE is an arch header
> +# constant, not a Kconfig value).
> +quiet_cmd_render_cmdline = BCONF2C $@
> +      cmd_render_cmdline = \
> +	$(objtree)/tools/bootconfig/bootconfig -C $< > $@
> +
> +targets += embedded_cmdline.bin
> +$(obj)/embedded_cmdline.bin: $(obj)/default.bconf $(objtree)/tools/bootconfig/bootconfig FORCE
> +	$(call if_changed,render_cmdline)
> +
>  obj-$(CONFIG_RBTREE_TEST) += rbtree_test.o
>  obj-$(CONFIG_INTERVAL_TREE_TEST) += interval_tree_test.o
>  
> diff --git a/lib/embedded-cmdline.S b/lib/embedded-cmdline.S
> new file mode 100644
> index 0000000000000..bda81b4a42bea
> --- /dev/null
> +++ b/lib/embedded-cmdline.S
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Embed the build-time-rendered bootconfig "kernel" subtree as a flat
> + * cmdline string. setup_arch() prepends this to boot_command_line on
> + * architectures that select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG.
> + *
> + * Copyright (c) 2026 Meta Platforms, Inc. and affiliates
> + * Copyright (c) 2026 Breno Leitao <leitao@debian.org>
> + */
> +	.section .init.rodata.embed_cmdline, "a", %progbits
> +	.global embedded_kernel_cmdline
> +embedded_kernel_cmdline:
> +	.incbin "lib/embedded_cmdline.bin"
> +	.byte 0
> +	.global embedded_kernel_cmdline_end
> +embedded_kernel_cmdline_end:
> diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
> index 90eb47c9d8de6..4e82fd9553cde 100644
> --- a/tools/bootconfig/Makefile
> +++ b/tools/bootconfig/Makefile
> @@ -15,7 +15,7 @@ override CFLAGS += -Wall -g -I$(CURDIR)/include
>  ALL_TARGETS := bootconfig
>  ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
>  
> -all: $(ALL_PROGRAMS) test
> +all: $(ALL_PROGRAMS)
>  
>  $(OUTPUT)bootconfig: main.c include/linux/bootconfig.h $(LIBSRC)
>  	$(CC) $(filter %.c,$^) $(CFLAGS) $(LDFLAGS) -o $@
> 
> -- 
> 2.53.0-Meta
> 

^ permalink raw reply

* Re: [PATCH v5 4/7] bootconfig: clean build-time tools/bootconfig from make clean
From: Nicolas Schier @ 2026-06-17 13:45 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, linux-kernel,
	linux-trace-kernel, linux-kbuild, bpf, kernel-team
In-Reply-To: <20260617-bootconfig_using_tools-v5-4-fd589a9cc5e3@debian.org>

On Wed, Jun 17, 2026 at 04:23:36AM -0700, Breno Leitao wrote:
> The previous patch builds tools/bootconfig during 'make prepare' to
> render the embedded bootconfig cmdline, but nothing removes it on
> 'make clean', leaving the compiled tool and its objects behind.
> 
> Wire a bootconfig_clean hook into the top-level clean target so the
> compiled tool and its objects are removed by make clean, matching the
> prepare-wired tools/objtool and tools/bpf/resolve_btfids.
> 
> The hook runs tools/bootconfig's Makefile via $(MAKE), which the kernel
> build invokes with -rR (MAKEFLAGS += -rR). -rR drops the built-in $(RM)
> variable, so the existing "$(RM) -f ..." clean recipe would expand to a
> bare "-f ..." and fail. Spell the recipe with a literal "rm -f" so it
> keeps working both standalone and when invoked from Kbuild.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  Makefile                  | 13 ++++++++++++-
>  tools/bootconfig/Makefile |  2 +-
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index a7abb3f9a6264..a6e13fa1c1dc1 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1586,6 +1586,17 @@ ifneq ($(wildcard $(objtool_O)),)
>  	$(Q)$(MAKE) -sC $(abs_srctree)/tools/objtool O=$(objtool_O) srctree=$(abs_srctree) $(patsubst objtool_%,%,$@)
>  endif
>  
> +PHONY += bootconfig_clean
> +
> +bootconfig_O = $(abspath $(objtree))/tools/bootconfig
> +
> +# tools/bootconfig is only built (via the prepare hook above) when
> +# CONFIG_BOOT_CONFIG_EMBED_CMDLINE is set; skip its clean otherwise.

The wildcard below matches for all in-source builds and also for all 
out-of-source builds that _once_ built bootconfig (as the directory will 
never be removed).  I'd like the comment to be removed, it's obvious 
enough what is happening here.

> +bootconfig_clean:
> +ifneq ($(wildcard $(bootconfig_O)),)
> +	$(Q)$(MAKE) -sC $(srctree)/tools/bootconfig O=$(bootconfig_O) clean
> +endif
> +

Some additional bike-shedding:  I'd rather keep it here as short and 
simple altogether:


PHONY += bootconfig_clean
bootconfig_clean: bootconfig_O = $(abs_output))/tools/bootconfig
	$(Q)$(MAKE) -sC $(srctree)/tools/bootconfig O=$(bootconfig_O) clean


Nevertheless, for kbuild:

Reviewed-by: Nicolas Schier <n.schier@fritz.com>



Kind regards,
Nicolas

^ permalink raw reply

* Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Gregory Price @ 2026-06-17 14:03 UTC (permalink / raw)
  To: Balbir Singh
  Cc: David Hildenbrand (Arm), lsf-pc, linux-kernel, linux-cxl, cgroups,
	linux-mm, linux-trace-kernel, damon, kernel-team, gregkh, rafael,
	dakr, dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, longman, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	osalvador, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, axelrasmussen, yuanchu, weixugc,
	yury.norov, linux, mhiramat, mathieu.desnoyers, tj, hannes,
	mkoutny, jackmanb, sj, baolin.wang, npache, ryan.roberts,
	dev.jain, baohua, lance.yang, muchun.song, xu.xin16,
	chengming.zhou, jannh, linmiaohe, nao.horiguchi, pfalcato,
	rientjes, shakeel.butt, riel, harry.yoo, cl, roman.gushchin,
	chrisl, kasong, shikemeng, nphamcs, bhe, zhengqi.arch,
	terry.bowman
In-Reply-To: <ajIb4DJdLGPbMB4V@parvat>

On Wed, Jun 17, 2026 at 02:02:47PM +1000, Balbir Singh wrote:
> On Wed, Jun 10, 2026 at 12:37:34PM -0400, Gregory Price wrote:
> > On Wed, Jun 10, 2026 at 05:00:33PM +0200, David Hildenbrand (Arm) wrote:
> > > On 6/10/26 12:41, Gregory Price wrote:
> > > > On Wed, Jun 03, 2026 at 03:00:01PM +1000, Balbir Singh wrote:
> > > > 
> > 
> > For mm/slub.c we can choose to do one of thwo things
> > 
> >   1) 100% refuse slab allocations on private nodes, i.e.:
> > 
> >      kmalloc_node(..., private_nid, __GFP_THISNODE)
> > 
> >      And will fail (return NULL).
> > 
> 
> Doesn't this iterate through N_MEMORY only? N_MEMORY_PRIVATE should not
> be in the regular for_each(...) loops
> 

If a node is in neither FALLBACK nor NOFALLBACK - it is *completely*
unreachable in the current page allocator.

Next RFC I've reduced this to create a ZONELIST_PRIVATE separate from
the ZONELIST_FALLBACK and ZONELIST_NOFALLBACK, and an explicit folio
allocation interface that selects which fallback list to use.

the feedback in the past week has been helpful in honing in on a
solution that I think is generalizable.  Have just been taking the time
to test various behaviors to make sure I haven't been regressing any
userland API/ABIs (mbind, mempolicy, etc).

~Gregory

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox