* Re: [PATCH 3/4] tracing: probes: fix typo in a log message
From: Masami Hiramatsu @ 2026-06-18 23:43 UTC (permalink / raw)
To: Martin Kaiser; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel
In-Reply-To: <20260507081041.885781-4-martin@kaiser.cx>
On Thu, 7 May 2026 10:09:08 +0200
Martin Kaiser <martin@kaiser.cx> wrote:
> Fix a typo ("Invalid $-variable") in a log message.
>
> Signed-off-by: Martin Kaiser <martin@kaiser.cx>
This looks good to me. Let me pick it.
Thanks,
> ---
> kernel/trace/trace_probe.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index 262d8707a3df..df68d40de161 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -509,7 +509,7 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
> C(NO_RETVAL, "This function returns 'void' type"), \
> C(BAD_STACK_NUM, "Invalid stack number"), \
> C(BAD_ARG_NUM, "Invalid argument number"), \
> - C(BAD_VAR, "Invalid $-valiable specified"), \
> + C(BAD_VAR, "Invalid $-variable specified"), \
> C(BAD_REG_NAME, "Invalid register name"), \
> C(BAD_MEM_ADDR, "Invalid memory address"), \
> C(BAD_IMM, "Invalid immediate value"), \
> --
> 2.43.7
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* [PATCH] tracing/user_events: fix use-after-free of enabler in user_event_mm_dup()
From: Michael Bommarito @ 2026-06-18 22:27 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
Cc: Beau Belgrave, linux-trace-kernel, linux-kernel, stable
user_event_enabler_destroy() removes an enabler from the per-mm
mm->enablers list with list_del_rcu() and then frees it immediately with
kfree(). That list is walked locklessly by user_event_mm_dup() during
fork(), under rcu_read_lock() only:
rcu_read_lock();
list_for_each_entry_rcu(enabler, &old_mm->enablers, mm_enablers_link)
...
user_event_mm_dup() does not take event_mutex. The per-enabler destroy
path user_events_ioctl_unreg() (DIAG_IOCSUNREG) takes event_mutex but
nothing that excludes the dup walk. Threads that share an mm share one
user_event_mm and one enabler list, so an unregister on one thread can
free an enabler while another thread is forking and user_event_mm_dup()
is mid-walk. The walk then dereferences the freed enabler (for example
enabler->event in user_event_enabler_dup()).
This is reachable by an unprivileged task that can open user_events_data:
a single multithreaded process that registers an enabler and then
concurrently unregisters it and calls fork() triggers the race. KASAN
reports a slab-use-after-free read in user_event_enabler_dup() called
from user_event_mm_dup() and copy_process() during clone(); with
kasan.fault=panic the kernel panics.
Free the enabler after a grace period with kfree_rcu(), matching the
list_del_rcu() removal and the rcu_read_lock() readers in
user_event_mm_dup(). Add an rcu_head to struct user_event_enabler for
this. The error path in user_event_enabler_create() keeps using kfree()
because that enabler is freed before it is published to the RCU list.
Cc: stable@vger.kernel.org
Fixes: 7235759084a4 ("tracing/user_events: Use remote writes for event enablement")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
Notes:
KASAN on the unpatched tree (v7.1, x86-64, CONFIG_KASAN=y, SMP):
BUG: KASAN: slab-use-after-free in user_event_enabler_dup+0x50a/0x540
Read of size 8 (enabler->event, 16 bytes into a freed kmalloc-cg-64):
user_event_enabler_dup
user_event_mm_dup
copy_process
__do_sys_clone
Allocated by the registering task; freed on another CPU via the
DIAG_IOCSUNREG path. With kasan.fault=panic the access panics.
After the patch the same reproducer runs cleanly (no splat, no panic)
across the full window, and a serialized control (same paths, no
concurrency) is clean on both stock and patched.
Re-ran tools/testing/selftests/user_events on stock and patched, both
clean: abi_test pass:6/6, dyn_test pass:4/4, ftrace_test pass:6/6.
kernel/trace/trace_events_user.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index c4ba484f7b38b..412ca1e3a40cf 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -109,6 +109,9 @@ struct user_event_enabler {
/* Track enable bit, flags, etc. Aligned for bitops. */
unsigned long values;
+
+ /* Defer free so RCU list readers (user_event_mm_dup) are safe. */
+ struct rcu_head rcu;
};
/* Bits 0-5 are for the bit to update upon enable/disable (0-63 allowed) */
@@ -404,7 +407,12 @@ static void user_event_enabler_destroy(struct user_event_enabler *enabler,
/* No longer tracking the event via the enabler */
user_event_put(enabler->event, locked);
- kfree(enabler);
+ /*
+ * The enabler is removed from an RCU-traversed list
+ * (user_event_mm_dup walks mm->enablers under rcu_read_lock only),
+ * so the backing memory must outlive a grace period.
+ */
+ kfree_rcu(enabler, rcu);
}
static int user_event_mm_fault_in(struct user_event_mm *mm, unsigned long uaddr,
--
2.53.0
^ permalink raw reply related
* Re: [PATCH 0/3] rv/reactors: fix lockdep warning and add KUnit tests
From: Gabriele Monaco @ 2026-06-18 15:35 UTC (permalink / raw)
To: Wen Yang; +Cc: Nam Cao, linux-trace-kernel, linux-kernel
In-Reply-To: <4053c9bb-6229-438c-8c14-917909c1618f@linux.dev>
On Thu, 2026-06-18 at 01:11 +0800, Wen Yang wrote:
> Thank you for your feedback.
> I am using a WSL dev environment with 12 cores and 16GB. The config
> of the tested kernel code is as follows:
Uhm that's a strange one, I cannot get a machine like that..
The closest is a 16 CPUs where I can limit the resources in vng.
> And then, using vng to build and run kselftests (since kunit is
> already
> built-in) can reproduce this issue:
>
> $ vng --build
>
> $ vng -v --run arch/x86/boot/bzImage --user root --
> tools/testing/selftests/verification/verificationtest-ktap
Well whenever I pass some argument to vng (instead of just vng -v that brings
up an interactive shell), I see an unrelated lockdep splat in
timekeeping_init(), but all clear when the KUnit runs..
I'm going to try and understand better what's going on, I don't think I can
reproduce it easily.
Thanks,
Gabriele
^ permalink raw reply
* [PATCH] rv: update rvgen monitor synthesis documentation path
From: Yu Chuanyu via B4 Relay @ 2026-06-18 13:45 UTC (permalink / raw)
To: Steven Rostedt, Gabriele Monaco
Cc: linux-trace-kernel, linux-kernel, Yu Chuanyu
From: Yu Chuanyu <lucayu.alight@gmail.com>
The rvgen source comments still refer to da_monitor_synthesis.rst, which
no longer exists. The documentation is now available in
monitor_synthesis.rst. Update both references to point to the current
file.
Signed-off-by: Yu Chuanyu <lucayu.alight@gmail.com>
---
tools/verification/rvgen/__main__.py | 2 +-
tools/verification/rvgen/rvgen/dot2k.py | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/verification/rvgen/__main__.py b/tools/verification/rvgen/__main__.py
index 5c923dc..2a2bb03 100644
--- a/tools/verification/rvgen/__main__.py
+++ b/tools/verification/rvgen/__main__.py
@@ -6,7 +6,7 @@
# dot2k: transform dot files into a monitor for the Linux kernel.
#
# For further information, see:
-# Documentation/trace/rv/da_monitor_synthesis.rst
+# Documentation/trace/rv/monitor_synthesis.rst
if __name__ == '__main__':
from rvgen.dot2k import da2k, ha2k
diff --git a/tools/verification/rvgen/rvgen/dot2k.py b/tools/verification/rvgen/rvgen/dot2k.py
index 110cfd6..326984f 100644
--- a/tools/verification/rvgen/rvgen/dot2k.py
+++ b/tools/verification/rvgen/rvgen/dot2k.py
@@ -6,7 +6,7 @@
# dot2k: transform dot files into a monitor for the Linux kernel.
#
# For further information, see:
-# Documentation/trace/rv/da_monitor_synthesis.rst
+# Documentation/trace/rv/monitor_synthesis.rst
from collections import deque
from .dot2c import Dot2c
---
base-commit: e771677c937da5808f7b6c1f0e4a97ec1a84f8a8
change-id: 20260618-rvgen-doc-path-11695c57153d
Best regards,
--
Yu Chuanyu <lucayu.alight@gmail.com>
^ permalink raw reply related
* Re: [PATCH v3 09/13] verification/rvgen: Delete __parse_constraint()
From: Nam Cao @ 2026-06-18 13:24 UTC (permalink / raw)
To: Gabriele Monaco
Cc: Steven Rostedt, Wander Lairson Costa, linux-trace-kernel,
linux-kernel
In-Reply-To: <9035cc5b83dda3a8ec06e8488fba62ceb7431123.camel@redhat.com>
Gabriele Monaco <gmonaco@redhat.com> writes:
> Yeah, I don't see it explicitly mandated in the theory, but the
> description (from the sources) states:
>
> The value of a clock thus denotes the amount of time that has been
> elapsed since its last reset
>
> But it also says (emphasis added by me):
>
> Clocks /can/ be reset to zero after which they start increasing ...
>
> Nowhere it says clocks /must/ be reset, their value simply won't make
> sense (according to the definition).
>
> Now in our implementation we may have some automatic reset when the
> monitor starts (I'm planning that to avoid invalid states), which could
> make explicit resets superfluous in some cases.
Reseting the clocks on monitor start sounds sensible.
> Let's leave that to the user for now and skip this check.
Thanks,
Nam
^ permalink raw reply
* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
From: David Hildenbrand (Arm) @ 2026-06-18 12:38 UTC (permalink / raw)
To: Vlastimil Babka (SUSE), Shakeel Butt
Cc: JP Kobryn, linux-mm, willy, usama.arif, akpm, mhocko, rostedt,
mhiramat, mathieu.desnoyers, kasong, qi.zheng, baohua,
axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel
In-Reply-To: <bbcb6db5-6a01-46b7-979f-dadd52a5176f@kernel.org>
On 6/18/26 10:30, Vlastimil Babka (SUSE) wrote:
> On 6/18/26 10:21, David Hildenbrand (Arm) wrote:
>> On 6/17/26 20:18, Vlastimil Babka (SUSE) wrote:
>>>
>>> Yeah and I don't recall ever that a change to a mm tracepoint would ever
>>> break someone who'd complain and we'd have to revert it.
>> Really? :)
>>
>> Read the context of the link I posted once more.
>
> Ah, I see. I've only read the single mail from Steven that referred to the
> old powertop breakage and didn't notice the context.
>
> But I don't think these worries should stop us from adding easily usable
> tracepoints.
Steve explained a way how apparently scheduler people are handling it without
trace events.
You can always remove/modify tracepoints, but not trace events.
Anyhow, just wanted to mention it, because so far MM didn't rally know about
this implication.
--
Cheers,
David
^ permalink raw reply
* Re: [RFC PATCH 3/3] mm/compaction: respect compact_unevictable_allowed in alloc_contig path
From: Wandun @ 2026-06-18 11:47 UTC (permalink / raw)
To: Vlastimil Babka (SUSE), linux-mm, linux-kernel,
linux-trace-kernel, linux-rt-devel
Cc: akpm, surenb, mhocko, jackmanb, hannes, ziy, rostedt, mhiramat,
mathieu.desnoyers, david, ljs, liam, rppt, bigeasy, clrkwllms,
Alexander.Krabler
In-Reply-To: <9890b8f5-69b9-49bc-8ed6-ea47723b644e@kernel.org>
On 6/18/26 02:57, Vlastimil Babka (SUSE) wrote:
> On 6/4/26 04:38, Wandun Chen wrote:
>> From: Wandun Chen <chenwandun@lixiang.com>
>>
>> vm.compact_unevictable_allowed=0 is used to prevent compacting
>> unevictable pages. However, isolate_migratepages_range() passes
>> ISOLATE_UNEVICTABLE regardless of this sysctl, so the setting
>> has no effect in the alloc_contig path.
>>
>> Fix it by:
>> - Keep ISOLATE_UNEVICTABLE for CMA allocation, discussed in [1].
>> - Honour sysctl_compact_unevictable_allowed for non-CMA allocation.
>>
>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>> Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
>> Link: https://lore.kernel.org/all/25ba0d77-eb61-4efc-b2fc-73878cbd85c1@suse.cz/ [1]
>
> There was also the "Ideally by not having mlock'd pages in CMA areas at
> all." part. Is it the case? It was more elaborated here:
Yes, It is the case.
> https://lore.kernel.org/all/CAPTztWZpnX1j8-7yeppVUsxE=O9hbVeqricDjZt8_pnN7a-kBQ@mail.gmail.com/
I missed this important information. Thanks for pointing it out, Vlastimil.
Best regards,
Wandun
>
>> ---
>> include/linux/compaction.h | 6 ++++++
>> mm/compaction.c | 9 +++++++--
>> mm/internal.h | 1 +
>> mm/page_alloc.c | 2 ++
>> 4 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
>> index f29ef0653546..04e60f65b976 100644
>> --- a/include/linux/compaction.h
>> +++ b/include/linux/compaction.h
>> @@ -106,6 +106,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>> extern void __meminit kcompactd_run(int nid);
>> extern void __meminit kcompactd_stop(int nid);
>> extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx);
>> +extern bool compaction_allow_unevictable(void);
>>
>> #else
>> static inline void reset_isolation_suitable(pg_data_t *pgdat)
>> @@ -131,6 +132,11 @@ static inline void wakeup_kcompactd(pg_data_t *pgdat,
>> {
>> }
>>
>> +static inline bool compaction_allow_unevictable(void)
>> +{
>> + return true;
>> +}
>> +
>> #endif /* CONFIG_COMPACTION */
>>
>> struct node;
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 007d5e00a8ae..a10acb273454 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1341,6 +1341,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>> unsigned long end_pfn)
>> {
>> unsigned long pfn, block_start_pfn, block_end_pfn;
>> + isolate_mode_t mode = cc->allow_unevictable ? ISOLATE_UNEVICTABLE : 0;
>> int ret = 0;
>>
>> /* Scan block by block. First and last block may be incomplete */
>> @@ -1360,8 +1361,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>> block_end_pfn, cc->zone))
>> continue;
>>
>> - ret = isolate_migratepages_block(cc, pfn, block_end_pfn,
>> - ISOLATE_UNEVICTABLE);
>> + ret = isolate_migratepages_block(cc, pfn, block_end_pfn, mode);
>>
>> if (ret)
>> break;
>> @@ -1902,6 +1902,11 @@ typedef enum {
>> * compactable pages.
>> */
>> static int sysctl_compact_unevictable_allowed __read_mostly = CONFIG_COMPACT_UNEVICTABLE_DEFAULT;
>> +
>> +bool compaction_allow_unevictable(void)
>> +{
>> + return sysctl_compact_unevictable_allowed;
>> +}
>> /*
>> * Tunable for proactive compaction. It determines how
>> * aggressively the kernel should compact memory in the
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 181e79f1d6a2..163f9d6b37f3 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -1052,6 +1052,7 @@ struct compact_control {
>> * ensure forward progress.
>> */
>> bool alloc_contig; /* alloc_contig_range allocation */
>> + bool allow_unevictable; /* Allow isolation of unevictable folios */
>> };
>>
>> /*
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 81a9d4d1e6c0..1cf9d4a3b14c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7118,6 +7118,8 @@ int alloc_contig_frozen_range_noprof(unsigned long start, unsigned long end,
>> .ignore_skip_hint = true,
>> .no_set_skip_hint = true,
>> .alloc_contig = true,
>> + .allow_unevictable = !!(alloc_flags & ACR_FLAGS_CMA) ||
>> + compaction_allow_unevictable(),
>> };
>> INIT_LIST_HEAD(&cc.migratepages);
>> enum pb_isolate_mode mode = (alloc_flags & ACR_FLAGS_CMA) ?
>
^ permalink raw reply
* Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
From: Wandun @ 2026-06-18 11:43 UTC (permalink / raw)
To: Vlastimil Babka (SUSE), linux-mm, linux-kernel,
linux-trace-kernel, linux-rt-devel
Cc: akpm, surenb, mhocko, jackmanb, hannes, ziy, rostedt, mhiramat,
mathieu.desnoyers, david, ljs, liam, rppt, bigeasy, clrkwllms,
Alexander.Krabler, Hugh Dickins
In-Reply-To: <969cb14b-5b8b-48e6-add6-4dd13101dd89@kernel.org>
On 6/18/26 02:52, Vlastimil Babka (SUSE) wrote:
> On 6/4/26 04:38, Wandun Chen wrote:
>> From: Wandun Chen <chenwandun@lixiang.com>
>>
>> compact_unevictable_allowed is default 0 under PREEMPT_RT,
>> isolate_migratepages_block() skips folios with PG_unevictable set.
>> However, mlock_folio() sets PG_mlocked immediately but defers
>> PG_unevictable to mlock_folio_batch(), result in a folio with
>> PG_mlocked=1 but PG_unevictable=0. Compaction will isolate such a
>> folio.
>>
>> Fix by checking folio_test_mlocked() together with the existing
>> folio_test_unevictable() check.
>>
>> A similar issue has been reported by Alexander Krabler on a 6.12-rt
>> aarch64 system. Vlastimil suggested to check the mlocked flag [1].
>>
>> Reported-by: Alexander Krabler <Alexander.Krabler@kuka.com>
>> Closes: https://lore.kernel.org/all/DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com/
>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>> Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
>> Link: https://lore.kernel.org/all/33275585-f2db-4779-89f0-3ae24b455a67@suse.cz/ [1]
>
> Well in that thread, Hugh doubted my suggestion and then it seems we didn't
> concluded anything. Did you actually in practice observe the issue that
> Alexander had, and that this patch fixed it, or is that theoretical?
>
Yes, I wrote a test case that can reproduce it in a few second.
The test case contains 3 steps:
1. mlockall
2. mmap file(2GB) + trigger file write page fault;
3. during step 1, trigger compact via /proc/sys/vm/compact_memory
My reproduction environment is qemu with 4GB ram, 8 core, aarch64,
preempt_rt and includes the tracepoint in patch 02.
After running the reproduction program for a few seconds, the
following output appears.
repro-403 [004] ....1 101.270505: mm_compaction_isolate_folio: pfn=0x71e3a mode=0x0 flags=referenced|uptodate|mlocked
repro-403 [004] ....1 101.270507: mm_compaction_isolate_folio: pfn=0x71e3b mode=0x0 flags=referenced|uptodate|mlocked
repro-403 [004] ....1 101.270513: mm_compaction_isolate_folio: pfn=0x71e3c mode=0x0 flags=referenced|uptodate|mlocked
repro-403 [004] ....1 101.270515: mm_compaction_isolate_folio: pfn=0x71e3d mode=0x0 flags=uptodate|mlocked
repro-403 [004] ....1 101.270517: mm_compaction_isolate_folio: pfn=0x71e3e mode=0x0 flags=uptodate|mlocked
repro-403 [004] ....1 101.270520: mm_compaction_isolate_folio: pfn=0x71e3f mode=0x0 flags=uptodate|mlocked
Unfortunately, I recently found that there is still a bug in the
fix patch. Setting mlocked in the mlock_folio function could happen
even after the page is successfully isolated, so it still cannot
prevent migration. Because of this, I need to think more about how
to fix it.
Perhaps we should double-check whether the page is mlocked during
the actual migration phase.
What do you think of this best-effort approach?
Best regards,
Wandun
The full reproducer is as below:
/* gcc repro.c -o repro -lpthread */
#define _GNU_SOURCE
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
#define PAGE_SIZE 4096
#define NR_PAGES 32
#define FILE_SIZE (2ULL * 1024 * 1024 * 1024)
static void *worker_fn(void *arg)
{
int fd = (long)arg;
size_t len = (size_t)FILE_SIZE;
char *p = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (p == MAP_FAILED)
return NULL;
for (size_t off = 0; off + NR_PAGES * PAGE_SIZE <= len;
off += NR_PAGES * PAGE_SIZE) {
for (int i = 0; i < NR_PAGES; i++)
p[off + i * PAGE_SIZE] = 1;
usleep(200);
}
munmap(p, len);
return NULL;
}
static void *compact_fn(void *arg)
{
(void)arg;
int fd = open("/proc/sys/vm/compact_memory", O_WRONLY);
if (fd < 0)
return NULL;
while (1) {
if (write(fd, "1", 1) < 0) {}
usleep(5000);
}
}
int main(void)
{
mlockall(MCL_CURRENT | MCL_FUTURE);
int fd = open("./repro_largefile.dat", O_RDWR | O_CREAT, 0600);
if (fd < 0)
return 1;
unlink("./repro_largefile.dat");
if (ftruncate(fd, (off_t)FILE_SIZE) < 0)
return 1;
printf("repro_largefile: 1 worker, %d pages/batch, Ctrl-C to stop\n",
NR_PAGES);
pthread_t compact, worker;
pthread_create(&compact, NULL, compact_fn, NULL);
pthread_create(&worker, NULL, worker_fn, (void *)(long)fd);
pthread_join(worker, NULL);
return 0;
}
>> ---
>> mm/compaction.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index b776f35ad020..7e07b792bcb5 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1116,7 +1116,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> is_unevictable = folio_test_unevictable(folio);
>>
>> /* Compaction might skip unevictable pages but CMA takes them */
>> - if (!(mode & ISOLATE_UNEVICTABLE) && is_unevictable)
>> + if (!(mode & ISOLATE_UNEVICTABLE) &&
>> + (is_unevictable || folio_test_mlocked(folio)))
>> goto isolate_fail_put;
>>
>> /*
>
^ permalink raw reply
* Re: [PATCH] usb: typec: add trace point for typec_set_mode
From: Heikki Krogerus @ 2026-06-18 11:31 UTC (permalink / raw)
To: Ahmad Fatoum
Cc: Greg Kroah-Hartman, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, linux-kernel, linux-usb, linux-trace-kernel,
kernel
In-Reply-To: <43e13854-a634-4706-bc12-723c871a5579@pengutronix.de>
Hi,
On Thu, Jun 18, 2026 at 01:00:58PM +0200, Ahmad Fatoum wrote:
> Hello Heikki,
>
> On 6/18/26 12:56 PM, Heikki Krogerus wrote:
> > On Wed, Jun 17, 2026 at 10:03:04PM +0200, Ahmad Fatoum wrote:
> >> --- a/drivers/usb/typec/class.c
> >> +++ b/drivers/usb/typec/class.c
> >> @@ -20,6 +20,9 @@
> >> #include "class.h"
> >> #include "pd.h"
> >>
> >> +#define CREATE_TRACE_POINTS
> >> +#include <trace/events/typec.h>
> >
> > Those should probable go to drivers/usb/typec/trace.c and then you
> > need add something like this to drivers/usb/typec/Makefile:
> >
> > obj-$(CONFIG_TYPEC) += typec.o
> > typec-y := class.o mux.o bus.o pd.o retimer.o mode_selection.o
> > typec-$(CONFIG_ACPI) += port-mapper.o
> > +typec-$(CONFIG_TRACING) += trace.o
>
> Thanks for the suggestion. I will do that for v2.
>
> I also saw there is Sashiko AI feedback on this patch[1], but I am not
> familiar enough with how the event headers are used outside the kernel
> to determine if that's actionable advice or if it can be ignored.
>
> Do you have an opinion on that?
>
> [1]:
> https://sashiko.dev/#/patchset/20260617-typec_set_mode-tracepoint-v1-1-bdfbb39cfccd%40pengutronix.de
It's correct. You need to use a private trace.h in this case, so just
move it here: drivers/usb/typec/trace.h
And also make sure you include everything needed in that header like
it's telling you.
Thanks,
--
heikki
^ permalink raw reply
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Gregory Price @ 2026-06-18 11:13 UTC (permalink / raw)
To: Vlastimil Babka (SUSE)
Cc: David Hildenbrand (Arm), Balbir Singh, lsf-pc, linux-kernel,
linux-cxl, cgroups, linux-mm, linux-trace-kernel, damon,
kernel-team, gregkh, rafael, dakr, dave, jonathan.cameron,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
dan.j.williams, longman, akpm, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, osalvador, ziy, matthew.brost,
joshua.hahnjy, rakie.kim, byungchul, ying.huang, apopple,
axelrasmussen, yuanchu, weixugc, yury.norov, linux, mhiramat,
mathieu.desnoyers, tj, hannes, mkoutny, jackmanb, sj, baolin.wang,
npache, ryan.roberts, dev.jain, baohua, lance.yang, muchun.song,
xu.xin16, chengming.zhou, jannh, linmiaohe, nao.horiguchi,
pfalcato, rientjes, shakeel.butt, riel, harry.yoo, cl,
roman.gushchin, chrisl, kasong, shikemeng, nphamcs, bhe,
zhengqi.arch, terry.bowman, Matthew Wilcox
In-Reply-To: <90418cd3-751f-439d-83ed-a0c33517c3bd@kernel.org>
On Thu, Jun 18, 2026 at 10:21:30AM +0200, Vlastimil Babka (SUSE) wrote:
> On 6/15/26 17:37, Gregory Price wrote:
> >
> > One thought would be a way to switch what fallback list is used, and
> > then have specific fallback lists for certain contexts.
> >
> > Right now there is a single example of this: __GFP_THISNODE
> > |= __GFP_THISNODE => NOFALLBACK
> > &= ~__GFP_THISNODE => FALLBACK
> >
> > We could add an interface with the desired fallback list based as an
> > argument, and let get_page_from_freelist to prefer that over the default
> > global lists.
>
> Does it mean a new argument in a number of functions in the page allocator,
> or can it be mapped to alloc_flags (at least internally?), because the
> number of possible fallback lists is small enough?
>
What I ended up with was adding a single page_alloc.c external interface
that allows you define the zonelist via an enum, and then an internal
selector resolution in prepare_alloc_pages() stored in alloc_context
eg:
static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
int preferred_nid, nodemask_t *nodemask,
struct alloc_context *ac, gfp_t *alloc_gfp,
unsigned int *alloc_flags)
{
ac->highest_zoneidx = gfp_zone(gfp_mask);
ac->zonelist = select_zonelist(preferred_nid, gfp_mask, ac->zlsel);
... snip ...
}
struct folio *__folio_alloc_zonelist_noprof(gfp_t gfp, unsigned int order,
int preferred_nid, nodemask_t *nodemask,
enum alloc_zonelist zlsel);
The original __folio_alloc* functions just add a DEFAULT - which tells
select_zonelist() to base the decision on __GFP_THISNODE.
struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask)
{
return __folio_alloc_core(gfp, order, preferred_nid, nodemask,
ALLOC_ZONELIST_DEFAULT);
}
EXPORT_SYMBOL(__folio_alloc_noprof);
This does a few things
- The isolation is structural, there is no way to accidentally
allocate private memory without passing ALLOC_ZONELIST_PRIVATE
- The isolation forces folios - there are no non-folio interfaces
which allow zonelist selection
- The zonelist selection is confined to this allocation context,
so no inheritence is possible.
I tried to avoid using an ALLOC_ flag so we can avoid yet another flag
crunch, but there certainly are few enough zonelists that we could
encode it there and expose it. I know Brendan was looking at plumbing
alloc flags out to an interface, so i'm open to that.
Externally the way I determine what zonelist to use is a lookup based on
reason - letting the node filter. This is really only needed in a
couple spots:
mm/khugepaged.c: enum alloc_zonelist zlsel = alloc_zonelist_for_node(node, NODE_ALLOC_RECLAIM);
mm/vmscan.c: mtc->zlsel = alloc_zonelist_for_nodemask(mtc->nmask, NODE_ALLOC_TIERING);
mm/migrate.c: .zlsel = alloc_zonelist_for_node(node, NODE_ALLOC_USER_MIGRATE),
static inline enum alloc_zonelist
alloc_zonelist_for_node(int nid, enum node_alloc_reason reason)
{
bool ok;
if (!node_state(nid, N_MEMORY_PRIVATE))
return ALLOC_ZONELIST_DEFAULT;
switch (reason) {
case NODE_ALLOC_RECLAIM:
ok = node_is_reclaimable(nid);
break;
case NODE_ALLOC_TIERING:
ok = node_allows_tiering(nid);
break;
case NODE_ALLOC_USER_MIGRATE:
ok = node_allows_user_migrate(nid);
break;
default:
ok = false;
}
return ok ? ALLOC_ZONELIST_PRIVATE : ALLOC_ZONELIST_DEFAULT;
}
Otherwise... everything is now a mempolicy w/ MPOL_F_BIND and all the
handling goes through the normal fault-paths :]
static struct page *__alloc_pages_mpol(gfp_t gfp, unsigned int order,
struct mempolicy *pol, pgoff_t ilx, int nid)
{
nodemask_t *nodemask;
struct page *page;
enum alloc_zonelist zlsel = (pol->flags & MPOL_F_PRIVATE) ?
ALLOC_ZONELIST_PRIVATE : ALLOC_ZONELIST_DEFAULT;
...
if (pol->mode == MPOL_PREFERRED_MANY)
return alloc_pages_preferred_many(gfp, order, nid, nodemask,
zlsel);
...
}
Switching to an alloc_flag would probably be trivially if that's really
wanted
~Gregory
^ permalink raw reply
* Re: [PATCH] usb: typec: add trace point for typec_set_mode
From: Ahmad Fatoum @ 2026-06-18 11:00 UTC (permalink / raw)
To: Heikki Krogerus
Cc: Greg Kroah-Hartman, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, linux-kernel, linux-usb, linux-trace-kernel,
kernel
In-Reply-To: <ajPO6roV4HRZYGNd@kuha>
Hello Heikki,
On 6/18/26 12:56 PM, Heikki Krogerus wrote:
> On Wed, Jun 17, 2026 at 10:03:04PM +0200, Ahmad Fatoum wrote:
>> --- a/drivers/usb/typec/class.c
>> +++ b/drivers/usb/typec/class.c
>> @@ -20,6 +20,9 @@
>> #include "class.h"
>> #include "pd.h"
>>
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/typec.h>
>
> Those should probable go to drivers/usb/typec/trace.c and then you
> need add something like this to drivers/usb/typec/Makefile:
>
> obj-$(CONFIG_TYPEC) += typec.o
> typec-y := class.o mux.o bus.o pd.o retimer.o mode_selection.o
> typec-$(CONFIG_ACPI) += port-mapper.o
> +typec-$(CONFIG_TRACING) += trace.o
Thanks for the suggestion. I will do that for v2.
I also saw there is Sashiko AI feedback on this patch[1], but I am not
familiar enough with how the event headers are used outside the kernel
to determine if that's actionable advice or if it can be ignored.
Do you have an opinion on that?
[1]:
https://sashiko.dev/#/patchset/20260617-typec_set_mode-tracepoint-v1-1-bdfbb39cfccd%40pengutronix.de
Thanks,
Ahmad
>
>
> Thanks,
>
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply
* Re: [PATCH] usb: typec: add trace point for typec_set_mode
From: Heikki Krogerus @ 2026-06-18 10:56 UTC (permalink / raw)
To: Ahmad Fatoum
Cc: Greg Kroah-Hartman, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, linux-kernel, linux-usb, linux-trace-kernel,
kernel
In-Reply-To: <20260617-typec_set_mode-tracepoint-v1-1-bdfbb39cfccd@pengutronix.de>
Hi Ahmad,
On Wed, Jun 17, 2026 at 10:03:04PM +0200, Ahmad Fatoum wrote:
> Some Type-C controllers toggle muxes themselves. Other controllers like
> the TUSB320 report the mode to the host, so it can control the muxes.
>
> To improve debuggability of both kinds of drivers, add a trace point that
> can be used to keep track of the mode being set inside the Type-C
> framework:
>
> echo 1 > /sys/kernel/debug/tracing/events/typec/typec_mode/enable
>
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> ---
> MAINTAINERS | 1 +
> drivers/usb/typec/class.c | 9 ++++++++-
> include/trace/events/typec.h | 36 ++++++++++++++++++++++++++++++++++++
> 3 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c8d4b913f26c..ddd59e5e6eaf 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -27753,6 +27753,7 @@ F: Documentation/ABI/testing/sysfs-class-typec
> F: Documentation/driver-api/usb/typec.rst
> F: drivers/usb/typec/
> F: include/linux/usb/typec.h
> +F: include/trace/events/typec*.h
>
> USB TYPEC INTEL PMC MUX DRIVER
> M: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> diff --git a/drivers/usb/typec/class.c b/drivers/usb/typec/class.c
> index 0977581ad1b6..9316d067f19a 100644
> --- a/drivers/usb/typec/class.c
> +++ b/drivers/usb/typec/class.c
> @@ -20,6 +20,9 @@
> #include "class.h"
> #include "pd.h"
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/typec.h>
Those should probable go to drivers/usb/typec/trace.c and then you
need add something like this to drivers/usb/typec/Makefile:
obj-$(CONFIG_TYPEC) += typec.o
typec-y := class.o mux.o bus.o pd.o retimer.o mode_selection.o
typec-$(CONFIG_ACPI) += port-mapper.o
+typec-$(CONFIG_TRACING) += trace.o
obj-$(CONFIG_TYPEC) += altmodes/
obj-$(CONFIG_TYPEC_TCPM) += tcpm/
obj-$(CONFIG_TYPEC_UCSI) += ucsi/
Thanks,
--
heikki
^ permalink raw reply
* Re: [PATCH v5 1/2] serial: qcom-geni: trace: Drop redundant len field from geni_serial_data
From: Konrad Dybcio @ 2026-06-18 8:55 UTC (permalink / raw)
To: Praveen Talari, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Greg Kroah-Hartman, Jiri Slaby
Cc: linux-kernel, linux-trace-kernel, linux-arm-msm, linux-serial,
mukesh.savaliya, aniket.randive, chandana.chiluveru
In-Reply-To: <20260615-add-tracepoints-for-qcom-geni-serial-v5-1-2efa4c97e0e2@oss.qualcomm.com>
On 6/15/26 4:16 PM, Praveen Talari wrote:
> The dynamic array stored in the ring buffer already carries its own
> length in the array metadata. There is no need to also store it as a
> separate scalar field in the entry struct.
>
> Drop __field(unsigned int, len) and the corresponding __entry->len
> assignment, and use __get_dynamic_array_len(data) in the TP_printk for
> both the len=%u format argument and the __print_hex() size argument.
> This saves 4 bytes per event on the ring buffer.
>
> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> ---
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Konrad
^ permalink raw reply
* [PATCH] tracing/probes: Remove WARN_ON_ONCE from parse_btf_arg
From: Masami Hiramatsu (Google) @ 2026-06-18 8:50 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Sashiko found that user can cause this WARN_ON_ONCE() easily
with adding a kprobe event based on a raw address with BTF
parameter.
Since this is not an unexpected condition, remove the
WARN_ON_ONCE().
Link: https://sashiko.dev/#/patchset/178165816303.269421.7302603996990753309.stgit%40devnote2
Reported-by: Sashiko <sashiko-bot@kernel.org>
Fixes: b576e09701c7 ("tracing/probes: Support function parameters if BTF is available")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
kernel/trace/trace_probe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index fd1caa1f9723..98532c503d02 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -678,7 +678,7 @@ static int parse_btf_arg(char *varname,
int i, is_ptr, ret;
u32 tid;
- if (WARN_ON_ONCE(!ctx->funcname && !(ctx->flags & TPARG_FL_TEVENT)))
+ if (!ctx->funcname && !(ctx->flags & TPARG_FL_TEVENT))
return -EINVAL;
is_ptr = split_next_field(varname, &field, ctx);
^ permalink raw reply related
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: David Hildenbrand (Arm) @ 2026-06-18 8:31 UTC (permalink / raw)
To: Gregory Price, Brendan Jackman
Cc: Vlastimil Babka (SUSE), Balbir Singh, lsf-pc, linux-kernel,
linux-cxl, cgroups, linux-mm, linux-trace-kernel, damon,
kernel-team, gregkh, rafael, dakr, dave, jonathan.cameron,
dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
dan.j.williams, longman, akpm, lorenzo.stoakes, Liam.Howlett,
vbabka, rppt, surenb, mhocko, osalvador, ziy, matthew.brost,
joshua.hahnjy, rakie.kim, byungchul, ying.huang, apopple,
axelrasmussen, yuanchu, weixugc, yury.norov, linux, mhiramat,
mathieu.desnoyers, tj, hannes, mkoutny, jackmanb, sj, baolin.wang,
npache, ryan.roberts, dev.jain, baohua, lance.yang, muchun.song,
xu.xin16, chengming.zhou, jannh, linmiaohe, nao.horiguchi,
pfalcato, rientjes, shakeel.butt, riel, harry.yoo, cl,
roman.gushchin, chrisl, kasong, shikemeng, nphamcs, bhe,
zhengqi.arch, terry.bowman, Matthew Wilcox
In-Reply-To: <ajFT235iYsSJ7nbR@gourry-fedora-PF4VCD3F>
On 6/16/26 15:47, Gregory Price wrote:
> On Tue, Jun 16, 2026 at 11:57:42AM +0000, Brendan Jackman wrote:
>> On Mon Jun 15, 2026 at 2:38 PM UTC, Vlastimil Babka (SUSE) wrote:
>>>
>>> I think the memalloc approach is dangerous due to unexpected nesting. There
>>> might be nested page allocations in page allocation itself (due to some
>>> debugging option). But also interrupts do not change what "current" points
>>> to. Suddenly those could start requesting folios and/or private nodes and be
>>> surprised, I'm afraid.
>>
>> Minor side-note: couldn't we just define it such that the allocator
>> ignores the context when not in_task() (and warn if you try to enter the
>> context while not currently in_task())?
>>
>> (Don't think this would change the conclusion very much, e.g. doesn't
>> help with the nesting issues. Mostly curious in case I'm missing a
>> detail here).
>>
So I took a look at which nested allocations we could end up having, and I
wonder whether gfp_nested_mask() indicates all these?
If we could reliably identify them, all we'd have to do is safe+restore some
context (activating a "nested" context).
>
> I looked at this - only solves one issue and oh boy is that an obtuse
> confusing condition to understand. We still suffer from recursion in
> reclaim.
Right, we'd have to clear the context before calling into reclaim/compaction
that does weird things.
I'm sure BPF hooks could just arbitrarily try to allocate pages with
kmalloc_nolock(). So that would require a context save/restore as well.
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
From: Vlastimil Babka (SUSE) @ 2026-06-18 8:30 UTC (permalink / raw)
To: David Hildenbrand (Arm), Shakeel Butt
Cc: JP Kobryn, linux-mm, willy, usama.arif, akpm, mhocko, rostedt,
mhiramat, mathieu.desnoyers, kasong, qi.zheng, baohua,
axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel
In-Reply-To: <3149bc84-dd3a-43ba-826e-6364965fdafd@kernel.org>
On 6/18/26 10:21, David Hildenbrand (Arm) wrote:
> On 6/17/26 20:18, Vlastimil Babka (SUSE) wrote:
>> On 6/17/26 17:03, Shakeel Butt wrote:
>>> On Wed, Jun 17, 2026 at 01:11:16PM +0200, David Hildenbrand (Arm) wrote:
>>>>
>>>> Given that trace events can quickly become stable ABI [1], are we really sure we
>>>> want to add this?
>>>
>>> Yes, I think so as this is useful to get insights into lru cache draining.
>>> Trace events being stable or not is secondary IMHO. If in future we rearchitect
>>> the lru page handling where there is no cache draining anymore, we can make
>>> these a noops.
>>
>> Yeah and I don't recall ever that a change to a mm tracepoint would ever
>> break someone who'd complain and we'd have to revert it.
> Really? :)
>
> Read the context of the link I posted once more.
Ah, I see. I've only read the single mail from Steven that referred to the
old powertop breakage and didn't notice the context.
But I don't think these worries should stop us from adding easily usable
tracepoints.
^ permalink raw reply
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Vlastimil Babka (SUSE) @ 2026-06-18 8:21 UTC (permalink / raw)
To: Gregory Price, David Hildenbrand (Arm)
Cc: Balbir Singh, lsf-pc, linux-kernel, linux-cxl, cgroups, linux-mm,
linux-trace-kernel, damon, kernel-team, gregkh, rafael, dakr,
dave, jonathan.cameron, dave.jiang, alison.schofield,
vishal.l.verma, ira.weiny, dan.j.williams, longman, akpm,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
osalvador, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
byungchul, ying.huang, apopple, axelrasmussen, yuanchu, weixugc,
yury.norov, linux, mhiramat, mathieu.desnoyers, tj, hannes,
mkoutny, jackmanb, sj, baolin.wang, npache, ryan.roberts,
dev.jain, baohua, lance.yang, muchun.song, xu.xin16,
chengming.zhou, jannh, linmiaohe, nao.horiguchi, pfalcato,
rientjes, shakeel.butt, riel, harry.yoo, cl, roman.gushchin,
chrisl, kasong, shikemeng, nphamcs, bhe, zhengqi.arch,
terry.bowman, Matthew Wilcox
In-Reply-To: <ajAcIwBAnqgEEWSD@gourry-fedora-PF4VCD3F>
On 6/15/26 17:37, Gregory Price wrote:
> On Mon, Jun 15, 2026 at 05:18:55PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/15/26 16:38, Vlastimil Babka (SUSE) wrote:
>> >
>> > I think the memalloc approach is dangerous due to unexpected nesting. There
>> > might be nested page allocations in page allocation itself (due to some
>> > debugging option). But also interrupts do not change what "current" points
>> > to. Suddenly those could start requesting folios and/or private nodes and be
>> > surprised, I'm afraid.
>>
>> Yeah, we'd need some way to distinguish the main allocation from these other
>> (nested) allocations.
>>
>>
>> >
>> > The memalloc scopes only work well when they restrict the context wrt
>> > reclaim, and allocations in IRQ have to be already restricted heavily
>> > (atomic) so further memalloc restrictions don't do anything in practice. But
>> > to make them change other aspects of the allocations like this won't work.
>>
>> I was assuming that memalloc_pin_save() would already violate that, but really
>> it only restricts where movable allocations land, and that doesn't matter for
>> other kernel allocations.
>>
>> Do you see any other way to make something like an allocation context work, and
>> avoid introducing more GFP flags?
>>
>
> One thought would be a way to switch what fallback list is used, and
> then have specific fallback lists for certain contexts.
>
> Right now there is a single example of this: __GFP_THISNODE
> |= __GFP_THISNODE => NOFALLBACK
> &= ~__GFP_THISNODE => FALLBACK
>
> We could add an interface with the desired fallback list based as an
> argument, and let get_page_from_freelist to prefer that over the default
> global lists.
Does it mean a new argument in a number of functions in the page allocator,
or can it be mapped to alloc_flags (at least internally?), because the
number of possible fallback lists is small enough?
> Omit all special nodes from FALLBACK/NOFALLBACK and make the special
> contexts provide the fallback-base that should be used.
>
> On my current branch i think that would include modifying, in totality:
>
> alloc_folio_mpol()
> alloc_demotion_folio()
> alloc_migration_target()
>
> And i'm pretty sure that all just nests nicely.
>
> We might not even need memalloc... hmmm
>
> ~Gregory
^ permalink raw reply
* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
From: David Hildenbrand (Arm) @ 2026-06-18 8:21 UTC (permalink / raw)
To: Vlastimil Babka (SUSE), Shakeel Butt
Cc: JP Kobryn, linux-mm, willy, usama.arif, akpm, mhocko, rostedt,
mhiramat, mathieu.desnoyers, kasong, qi.zheng, baohua,
axelrasmussen, yuanchu, weixugc, chrisl, shikemeng, nphamcs,
baoquan.he, youngjun.park, linux-kernel, linux-trace-kernel
In-Reply-To: <1136baf3-3967-4202-9eaa-5fd667c235cf@kernel.org>
On 6/17/26 20:18, Vlastimil Babka (SUSE) wrote:
> On 6/17/26 17:03, Shakeel Butt wrote:
>> On Wed, Jun 17, 2026 at 01:11:16PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> Given that trace events can quickly become stable ABI [1], are we really sure we
>>> want to add this?
>>
>> Yes, I think so as this is useful to get insights into lru cache draining.
>> Trace events being stable or not is secondary IMHO. If in future we rearchitect
>> the lru page handling where there is no cache draining anymore, we can make
>> these a noops.
>
> Yeah and I don't recall ever that a change to a mm tracepoint would ever
> break someone who'd complain and we'd have to revert it.
Really? :)
Read the context of the link I posted once more.
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH v3 09/13] verification/rvgen: Delete __parse_constraint()
From: Gabriele Monaco @ 2026-06-18 8:13 UTC (permalink / raw)
To: Nam Cao
Cc: Steven Rostedt, Wander Lairson Costa, linux-trace-kernel,
linux-kernel
In-Reply-To: <87tsr1mqrj.fsf@yellow.woof>
On Wed, 2026-06-17 at 11:59 +0200, Nam Cao wrote:
> Gabriele Monaco <gmonaco@redhat.com> writes:
> > This function used to validate things we are no longer validating,
> > now it's
> > alright to create a model where a clock is never reset, which
> > doesn't fully
> > make sense. Should we add that check somewhere else?
>
> Theory does not require clock reset, right?
Yeah, I don't see it explicitly mandated in the theory, but the
description (from the sources) states:
The value of a clock thus denotes the amount of time that has been
elapsed since its last reset
But it also says (emphasis added by me):
Clocks /can/ be reset to zero after which they start increasing ...
Nowhere it says clocks /must/ be reset, their value simply won't make
sense (according to the definition).
Now in our implementation we may have some automatic reset when the
monitor starts (I'm planning that to avoid invalid states), which could
make explicit resets superfluous in some cases.
Let's leave that to the user for now and skip this check.
Thanks,
Gabriele
> This is not some sort of
> hidden issue that trips up unsuspecting people. It is obvious from
> the
> model that the clock is never reset. So I think it's fine to allow
> people to do that, maybe there will be an actual useful model without
> clock reset, you never know.
>
> The self.env_types check is enforced by the grammar. We do lose the
> self.env_types check, but that is likely redundant anyway because we
> have this:
>
> for transition in self.transitions:
> [...]
> if transition.reset:
> envs.append(transition.reset.env)
> self.env_stored.add(transition.reset.env)
>
> so it is clear that all envs that are reset do have a storage.
>
> That said, I am fine with keeping these sanity checks, if you are
> paranoid.
>
> Nam
^ permalink raw reply
* Re: [PATCH] tracing: eprobe: read the complete FILTER_PTR_STRING pointer
From: Masami Hiramatsu @ 2026-06-18 1:52 UTC (permalink / raw)
To: Martin Kaiser; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel
In-Reply-To: <ajJbkeK0zXb8MtcS@akranes.kaiser.cx>
On Wed, 17 Jun 2026 10:32:17 +0200
Martin Kaiser <martin@kaiser.cx> wrote:
> Hiramatsu-san,
>
> thank you for reviewing my patch.
>
> Thus wrote Masami Hiramatsu (mhiramat@kernel.org):
>
> > Ah, this is a bit complicated. It seems to work with sched_switch event
> > as commit f04dec93466a ("tracing/eprobes: Fix reading of string fields"):
>
> > echo 'e:sw sched/sched_switch comm=$next_comm:string' > dynamic_events
>
> > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> > # | | | ||||| | |
> > sh-162 [002] d..3. 54.027213: sw: (sched.sched_switch) comm="swapper/2"
> > <idle>-0 [007] d..3. 54.034573: sw: (sched.sched_switch) comm="rcu_preempt"
> > rcu_preempt-15 [007] d..3. 54.034589: sw: (sched.sched_switch) comm="swapper/7"
>
> > Maybe comm is stored as a fixed string information in the event record?
>
> Yes, this example does not execute my change.
>
> > /sys/kernel/tracing # cat events/sched/sched_switch/format
> > name: sched_switch
> > ID: 254
> > format:
> > field:unsigned short common_type; offset:0; size:2; signed:0;
> > field:unsigned char common_flags; offset:2; size:1; signed:0;
> > field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
> > field:int common_pid; offset:4; size:4; signed:1;
>
> > field:char prev_comm[16]; offset:8; size:16; signed:0;
> > field:pid_t prev_pid; offset:24; size:4; signed:1;
> > field:int prev_prio; offset:28; size:4; signed:1;
> > field:long prev_state; offset:32; size:8; signed:1;
> > field:char next_comm[16]; offset:40; size:16; signed:0;
> > field:pid_t next_pid; offset:56; size:4; signed:1;
> > field:int next_prio; offset:60; size:4; signed:1;
>
> > But the filename is a pointer.
>
> > /sys/kernel/tracing # cat events/syscalls/sys_enter_openat/format
> > name: sys_enter_openat
> > ID: 705
> > format:
> > field:unsigned short common_type; offset:0; size:2; signed:0;
> > field:unsigned char common_flags; offset:2; size:1; signed:0;
> > field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
> > field:int common_pid; offset:4; size:4; signed:1;
>
> > field:int __syscall_nr; offset:8; size:4; signed:1;
> > field:int dfd; offset:16; size:8; signed:0;
> > field:const char * filename; offset:24; size:8; signed:0;
> > field:int flags; offset:32; size:8; signed:0;
> > field:umode_t mode; offset:40; size:8; signed:0;
> > field:__data_loc char[] __filename_val; offset:48; size:4; signed:0;
>
> > In this case, the filename field should use __data_loc directly instead of
> > pointing data on the ring buffer.
>
> > Can you try
>
> > echo 'e syscalls.sys_enter_openat $__filename_val:string' > \
> > /sys/kernel/tracing/dynamic_events
>
> > Instead?
>
> This field is working as expected.
>
> I still believe that the handling of FILTER_PTR_STRING is not correct. The
> pointer is stored in the ringbuffer as unsigned long and read as a char. This
> gives us a truncated pointer that cannot be dereferenced.
Ah, OK. I understand the problem.
- ring buffer and its records should be self-contained.
- In most cases, events use __data_loc/__rel_loc or fixed array to store
strings.
- only syscall events exposes the char *, which is not recommended but
important to debug user space. (not for dereference)
The example usage of FILTER_PTR_STRING is actually using FILTER_STATIC_STRING
now, so FILTER_PTR_STRING is left broken. (hmm, but there are many
"const char *" are used especially under rcu events...)
OK, can you update your patch description to use rcu events?
BTW, I think those also should be decoded from enum value in the events,
or use __rel_loc. Since it is not self-contained. (it's a TODO item)
> > I think better solution is fixing sycall tracer.
>
> I would say that syscall trace is doing the right thing. The ringbuffer entry
> is a struct syscall_trace_enter, the syscall arguments are unsigned longs.
> They are written in ftrace_syscall_enter, this looks correct to me.
OK, I thought the filename points the ringbuffer, but it actually points
the user space. (saving a raw parameter values) So it is OK.
For eprobe users, it should not access to the user space data directly
because it can cause page fault in the kernel without fixup. It may work
on x86, but it doesn't work on other architecture which has separated
address space for user space. To avoid such mistake, it saves actual
string in the ringbuffer as __filename_val.
Hmm, this must be documented in eprobe example code...
>
> A const char * syscall argument is using FILTER_PTR_STRING, the unsigned long
> argument from the ringbuffer is read as a char and then converted to a
> truncated pointer.
Thanks,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH v4 6/7] Documentation: bootconfig: document build-time cmdline rendering
From: Masami Hiramatsu @ 2026-06-18 0:47 UTC (permalink / raw)
To: Breno Leitao
Cc: Andrew Morton, Nathan Chancellor, paulmck, Nicolas Schier,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, kernel-team
In-Reply-To: <ajJu2KlfVyuUH-VA@gmail.com>
On Wed, 17 Jun 2026 02:56:23 -0700
Breno Leitao <leitao@debian.org> wrote:
> On Wed, Jun 10, 2026 at 07:58:10AM -0700, Breno Leitao wrote:
> > On Wed, Jun 10, 2026 at 11:37:20PM +0900, Masami Hiramatsu wrote:
> > > To avoid confusion, when this option is used, shouldn't we treat it
> > > the same way as if embedded command lines were enabled, and either
> > > not display it in /proc/bootconfig (or always display it, by merging
> > > the rendered string)?
> >
> > You're right that EMBED_CMDLINE breaks it: the embedded kernel.* keys
> > are already in boot_command_line before setup_boot_config() ever sees
> > the initrd bconf, so a user reading /proc/bootconfig would see only
> > the initrd keys while parse_early_param() acted on the embedded ones.
> > That's exactly the split-state Sashiko was circling around.
> >
> > Both options you suggest work for me, but they pull in opposite
> > directions and I'd rather not guess wrong on the user-facing
> > contract. Which do you prefer for v5?
> >
> > (a) Don't display embedded in /proc/bootconfig -- keep the current
> > "file shows the active bootconfig source" behavior and document
> > that with EMBED_CMDLINE=y, the kernel.* subtree may have been
> > applied separately via the cmdline.
> >
> > (b) Always display embedded by merging the rendered string into
> > /proc/bootconfig when EMBED_CMDLINE=y, so the file reflects
> > what was actually applied.
> >
> > Happy to go either way
>
> Following up on my own mail rather than leaving it fully open: after
> looking at the code more, I'd like to recommend (a).
Agreed. Sorry for replying late.
>
> The deciding factor is ordering. EMBED_CMDLINE only works because the
> embedded "kernel" keys are folded into boot_command_line in
> setup_arch(), before parse_early_param() -- which is long before
> setup_boot_config() looks at the initrd.
Yes. Unless doing setup_arch() we can not get initrd image, this means
we don't know whether there is bootconfig or not at that point.
>
> So for early params the embedded values are necessarily applied first, and an
> initrd bootconfig cannot override them no matter how we present
> /proc/bootconfig. That makes the embedded cmdline behave like a build-time
> CONFIG_CMDLINE rather than a bootconfig source, and (a) is the option that
> describes it honestly: it shows in /proc/cmdline, and /proc/bootconfig keeps
> meaning "the bootconfig tree that was parsed".
Indeed. So I think this EMBED_CMDLINE is more like CMDLINE set by bootconfig
file, instead of embedded string. That is useful for reusing the boot options.
We need to change the explanation and clarify it.
Thus we should those configs mutual exclusive. If user already sets the
CONFIG_CMDLINE, EMBED_CMDLINE should not be enabled.
But actually, there is another options we need to mention:
- CONFIG_CMDLINE: default cmdline, could be ignored if bootloader passes
a cmdline string.
- CONFIG_CMDLINE_FORCE: ignore the other cmdline. (but bootconfig can
overwrite it, hmm)
- CONFIG_CMDLINE_EXTEND: append the embedded cmdline string to bootloader
cmdline. (similar to bootconfig current behavior)
- CONFIG_BOOT_CONFIG_EMBED: just an embedded bootconfig. extends the
existing cmdline, but does not support early parameters. This is ignored
if user passed bootconfig via initrd.
- CONFIG_BOOT_CONFIG_EMBED_CMDLINE: replacing CONFIG_CMDLINE with bootconfig
but it will not shown in /proc/bootconfig.
So you can see CONFIG_BOOT_CONFIG_EMBED_CMDLINE is a bit special.
I think it maybe natual that we call it CONFIG_CMDLINE_BOOT_CONFIG.
In this case, we render the cmdline string from bootconfig build-time
and set CONFIG_CMDLINE with the rendered cmdline string.
>
> (a) is also what the tree already does -- saved_boot_config is built
> only from the XBC tree, the rendered string never enters it -- so it is
> no new code on the /proc side and keeps the series small.
Agreed.
>
> (b) would pull the flattened cmdline string back into the structured
> tree view and need dedup against the initrd keys, which muddies what
> /proc/bootconfig means for little gain.
I would like to avoid such complexity, just keep it simple as possible.
>
> So unless you'd rather have (b), I'll take (a) for v5 and extend
> bootconfig.rst to cover the four sources (bootloader cmdline, embedded
> cmdline, initrd bootconfig, embedded bootconfig).
Yes, I agree with you.
>
> I'll also document the sharp edge -- with both an embedded cmdline and an
> initrd bootconfig, early params reflect the embedded values because the initrd
> is not parsed yet.
My recommendation is to give simpler mind model to users. If it is simply
extend the CONFIG_CMDLINE which can be described by bootconfig file,
that is more managable outside of kernel configuration.
Or, you would like to access cmdline setting via /proc/bootconfig?
In this case, the problem is a bit more limitation of bootconfig side.
Since the kernel cmdline accepts any contradictory settings, if "foo=A foo=B"
are passed, bootconfig will make an error because foo has 2 different
settings.
Typically, this is represented as an array in bootconfig.
foo = A, B;
But if cmdline bootconfig says:
foo = A;
and initrd bootconfig says:
foo := B;
":=" means overriding the previous settings. Thus a contradiction
arises between these two, when rendering /proc/bootconfig. It can not
show 2 different settings for the same key. (it is possible if we
render it twice, but /proc/bootconfig user may not expect it.)
I think it's fine to represent it as an array (foo = A, B) if this
ENBED_CMDLINE is set, but it still seems risky if early parameters
aren't detected. If `early_param = A` is set in the embedded
bootconfig, and accidentally initrd bootconfig sets `early_param = B`
we should ignore latter one (with warning). But maybe it is another
story.
I think we can proceed it without rendering it in /proc/bootconfig
at this point. And later we find the way to detect early parameters
correctly, we can fix it.
(BTW, early parameter problem is a bit complicated. It is not hard
to distinguish early parameters, but kernel accepts the same key
for early parameter and normal parameter. e.g. "console=")
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH 2/2] selftests/ftrace: Account for 8-byte aligned trace_marker_raw events
From: Shuah Khan @ 2026-06-17 23:19 UTC (permalink / raw)
To: Steven Rostedt, Hui Wang
Cc: mhiramat, mathieu.desnoyers, pjw, linux-trace-kernel, shuah,
wangfushuai, linux-kselftest, Shuah Khan
In-Reply-To: <20260608125049.092d4543@fedora>
On 6/8/26 10:50, Steven Rostedt wrote:
> On Sun, 7 Jun 2026 15:24:31 +0800
> Hui Wang <hui.wang@canonical.com> wrote:
>
>> trace_marker_raw.tc assumes that the raw marker payload length
>> reported in trace_pipe is the result of int((id + 3) / 4) * 4, but
>> that is not true on kernels with CONFIG_HAVE_64BIT_ALIGNED_ACCESS
>> enabled.
>>
>> With forced 8-byte alignment, the ring buffer event forces 8-byte
>> alignment. The event length is stored in array[0], the payload data
>> and id are placed in a struct raw_data_entry which is stored starting
>> at array[1]. In this case, the printed payload data length is 8*N+4
>> bytes.
>>
>> To make the testcase pass in this case, add a kconfig_enabled() helper
>> and use it to detect CONFIG_HAVE_64BIT_ALIGNED_ACCESS so
>> trace_marker_raw.tc can calculate the expected length correctly.
>>
>> Assisted-by: Copilot:gpt-5.5
>> Signed-off-by: Hui Wang <hui.wang@canonical.com>
>
> NACK
>
> Let's not change the kernel for a broken test. Also this has already
> been fixed but appears not to be applied yet.
>
> Shuah, can you please apply the below fix.
>
> https://lore.kernel.org/all/20260601023251.1916483-1-dtcccc@linux.alibaba.com/
I applied the above to linux-kselftest next - will send it up later
this week to for Linux 7.2-rc1
thanks,
-- Shuah
^ permalink raw reply
* Re: [PATCH] selftests/ftrace: Drop invalid top-level local in test_ownership
From: Shuah Khan @ 2026-06-17 22:22 UTC (permalink / raw)
To: Masami Hiramatsu (Google), Steven Rostedt
Cc: CaoRuichuang, mathieu.desnoyers, shuah, linux-kernel,
linux-trace-kernel, linux-kselftest, Shuah Khan
In-Reply-To: <20260601133146.b16b0ad7c2204adcc168c945@kernel.org>
On 5/31/26 22:31, Masami Hiramatsu (Google) wrote:
> On Tue, 7 Apr 2026 20:37:27 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>>
>> Shuah,
>>
>> Care to take this through your tree. Probably could even add:
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 8b55572e51805 ("tracing/selftests: Add tracefs mount options test")
>>
>> As well as:
>>
>> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>>
>
> Shuah, here is my ack too.
>
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
Thanks - sorry for the delay. I will send this up for Linux 7.2-rc1
thanks,
-- Shuah
^ permalink raw reply
* Re: [GIT PULL v2] RTLA additional fixes for v7.2
From: Steven Rostedt @ 2026-06-17 20:37 UTC (permalink / raw)
To: Tomas Glozar; +Cc: LKML, linux-trace-kernel
In-Reply-To: <20260617153045.546686-1-tglozar@redhat.com>
On Wed, 17 Jun 2026 17:30:45 +0200
Tomas Glozar <tglozar@redhat.com> wrote:
> Steven,
>
> The following changes since commit 6b5a2b7d9bc156e505f09e698d85d6a1547c1206:
>
> Merge tag 'trace-tools-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace (2026-06-16 17:50:34 +0530)
>
> are available in the Git repository at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tglozar/linux.git tags/rtla-v7.2-fixups-v2
>
> for you to fetch changes up to c35eb77a67515d4201bc91294f40761591f43bbd:
>
> rtla/tests: Fix pgrep filter in get_workload_pids.sh (2026-06-17 16:26:44 +0200)
Thanks,
As these are fixes they can still go in later in the merge window. This
week I'm really hurting for free time to work on upstream so this may
have to wait until next week. I don't want to rush and screw up again :-/
-- Steve
^ permalink raw reply
* [PATCH] usb: typec: add trace point for typec_set_mode
From: Ahmad Fatoum @ 2026-06-17 20:03 UTC (permalink / raw)
To: Heikki Krogerus, Greg Kroah-Hartman, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers
Cc: linux-kernel, linux-usb, linux-trace-kernel, kernel, Ahmad Fatoum
Some Type-C controllers toggle muxes themselves. Other controllers like
the TUSB320 report the mode to the host, so it can control the muxes.
To improve debuggability of both kinds of drivers, add a trace point that
can be used to keep track of the mode being set inside the Type-C
framework:
echo 1 > /sys/kernel/debug/tracing/events/typec/typec_mode/enable
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
MAINTAINERS | 1 +
drivers/usb/typec/class.c | 9 ++++++++-
include/trace/events/typec.h | 36 ++++++++++++++++++++++++++++++++++++
3 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index c8d4b913f26c..ddd59e5e6eaf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27753,6 +27753,7 @@ F: Documentation/ABI/testing/sysfs-class-typec
F: Documentation/driver-api/usb/typec.rst
F: drivers/usb/typec/
F: include/linux/usb/typec.h
+F: include/trace/events/typec*.h
USB TYPEC INTEL PMC MUX DRIVER
M: Heikki Krogerus <heikki.krogerus@linux.intel.com>
diff --git a/drivers/usb/typec/class.c b/drivers/usb/typec/class.c
index 0977581ad1b6..9316d067f19a 100644
--- a/drivers/usb/typec/class.c
+++ b/drivers/usb/typec/class.c
@@ -20,6 +20,9 @@
#include "class.h"
#include "pd.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/typec.h>
+
static DEFINE_IDA(typec_index_ida);
const struct class typec_class = {
@@ -2427,10 +2430,14 @@ EXPORT_SYMBOL_GPL(typec_get_orientation);
int typec_set_mode(struct typec_port *port, int mode)
{
struct typec_mux_state state = { };
+ int ret;
state.mode = mode;
- return typec_mux_set(port->mux, &state);
+ ret = typec_mux_set(port->mux, &state);
+ trace_typec_mode(port, mode, ret);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(typec_set_mode);
diff --git a/include/trace/events/typec.h b/include/trace/events/typec.h
new file mode 100644
index 000000000000..a7dcb9f3fd49
--- /dev/null
+++ b/include/trace/events/typec.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM typec
+
+#if !defined(_TRACE_TYPEC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_TYPEC_H
+
+#include <linux/usb/typec.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(typec_mode,
+
+ TP_PROTO(struct typec_port *port, int mode, int err),
+
+ TP_ARGS(port, mode, err),
+
+ TP_STRUCT__entry(
+ __string(device, dev_name(&port->dev))
+ __field(int, mode)
+ __field(int, err)
+ ),
+
+ TP_fast_assign(
+ __assign_str(device);
+ __entry->mode = mode;
+ __entry->err = err;
+ ),
+
+ TP_printk("%s mode=%d (%d)",
+ __get_str(device), __entry->mode, __entry->err)
+);
+
+#endif /* if !defined(_TRACE_TYPEC_H) || defined(TRACE_HEADER_MULTI_READ) */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
---
base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
change-id: 20260617-typec_set_mode-tracepoint-011fc43feaca
Best regards,
--
Ahmad Fatoum <a.fatoum@pengutronix.de>
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox