Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH] fsl-edma: tracing: no ptr dereference during log output
From: Steven Rostedt @ 2026-06-30 20:44 UTC (permalink / raw)
  To: Martin Kaiser
  Cc: Frank Li, Vinod Koul, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, imx, dmaengine
In-Reply-To: <20260630160544.4211ae88@gandalf.local.home>

On Tue, 30 Jun 2026 16:05:44 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> >  	TP_printk("offset %08x: value %08x",
> > -		(u32)(__entry->addr - __entry->edma->membase), __entry->value)
> > +		(u32)(__entry->addr - __entry->membase), __entry->value)  
> 
> Hmm, I think I should update the TP_printk checks at boot to cover this too.

I created the following to catch this:

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c46e623e7e0d..2da3c02bea54 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -400,10 +400,37 @@ static bool process_string(const char *fmt, int len, struct trace_event_call *ca
 	return true;
 }
 
+static void test_double_dereference(const char *str, int len,
+				    struct trace_event_call *call)
+{
+	const char *ptr;
+	const char *end = str + len;
+
+	ptr = strstr(str, "REC->");
+
+	while (ptr && ptr < end) {
+
+		ptr += 5;
+		for (; ptr < end; ptr++) {
+			if (ptr[0] == '-' && ptr[1] == '>') {
+				WARN_ONCE(1, "Event %s has double dereference in TP_printk: %*s\n",
+					  trace_event_name(call), len, str);
+				return;
+			}
+			if (!isalnum(*ptr) && *ptr != '_')
+				break;
+		}
+
+		ptr = strstr(ptr, "REC->");
+	}
+}
+
 static void handle_dereference_arg(const char *arg_str, u64 string_flags, int len,
 				   u64 *dereference_flags, int arg,
 				   struct trace_event_call *call)
 {
+	test_double_dereference(arg_str, len, call);
+
 	if (string_flags & (1ULL << arg)) {
 		if (process_string(arg_str, len, call))
 			*dereference_flags &= ~(1ULL << arg);

Enabled this event to see if it would trigger, but instead it found *another* BUG!

[    0.719012][    T0] ------------[ cut here ]------------
[    0.720850][    T0] Event ufshcd_exception_event has double dereference in TP_printk: dev_name(REC->hba->dev), REC->status
[    0.724646][    T0] WARNING: kernel/trace/trace_events.c:416 at handle_dereference_arg+0x342/0x5a0, CPU#0: swapper/0/0


I'll go make a fix for the ufshcd_exception_event event, and then I will
definitely add this patch to make sure this bug isn't in other places.

-- Steve

^ permalink raw reply related

* Re: [PATCH] ring-buffer: serialize read-page order with subbuffer resize
From: Yousef Alhouseen @ 2026-06-30 20:45 UTC (permalink / raw)
  To: rostedt
  Cc: mhiramat, mathieu.desnoyers, petr.pavlu, linux-trace-kernel,
	linux-kernel
In-Reply-To: <20260630101425.2f7cfbea@robin>

One issue turned up while checking the suggested locking:
ring_buffer_subbuf_order_set() writes buffer->subbuf_order before
taking reader_lock and never takes cpu_buffer->lock. An allocator can
therefore take cpu_buffer->lock after the new order is published but
before resize clears the old-order free_page, tag that old page with
the new order, and return it.

I can keep the allocations outside buffer->mutex and hold the mutex
only while snapshotting subbuf_order and taking or returning
free_page. That removes allocation from the critical section and
serializes the order/free-page pair with resize. Would you prefer
that, or should free_page and the order transition be synchronized
another way?

On Tue, 30 Jun 2026 10:14:25 -0400, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Sun, 28 Jun 2026 02:46:53 +0200
> Yousef Alhouseen <alhouseenyousef@gmail.com> wrote:
>
> > ring_buffer_read_page() checks that its spare page has the current
> > subbuffer order before taking cpu_buffer->reader_lock. A concurrent
> > ring_buffer_subbuf_order_set() can change the order and replace the
> > reader page after that check. The reader then copies a larger subbuffer
> > into the old allocation, causing an out-of-bounds write.
> >
> > Keep spare-page allocation and release under buffer->mutex, which already
> > serializes order changes. Move the read-side order check under
> > reader_lock, the lock used by resize when replacing per-CPU pages.
> >
> > Fixes: f9b94daa542a ("ring-buffer: Set new size of the ring buffer sub page")
> > Reported-by: syzbot+2dd9d02f60775ce5c1fb@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=2dd9d02f60775ce5c1fb
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> > ---
> > kernel/trace/ring_buffer.c | 9 ++++++---
> > 1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index 56a328e94395..eed5d7cffdee 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -6950,6 +6950,8 @@ ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu)
> > if (!cpumask_test_cpu(cpu, buffer->cpumask))
> > return ERR_PTR(-ENODEV);
> >
> > + guard(mutex)(&buffer->mutex);
> > +
> > bpage = kzalloc_obj(*bpage);
>
> First, do not grab locks around allocations unless the are really needed.
> This is bad practice, as it extends the critical section and may even add
> the allocation locking to the lock chain.
>
> That said, just moving things around the current locks should work.
>
> Like this (not compiled nor tested):
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 56a328e94395..8352f935a223 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -6954,11 +6954,11 @@ ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu)
> if (!bpage)
> return ERR_PTR(-ENOMEM);
>
> - bpage->order = buffer->subbuf_order;
> cpu_buffer = buffer->buffers[cpu];
> local_irq_save(flags);
> arch_spin_lock(&cpu_buffer->lock);
>
> + bpage->order = buffer->subbuf_order;
> if (cpu_buffer->free_page) {
> bpage->data = cpu_buffer->free_page;
> cpu_buffer->free_page = NULL;
> @@ -7007,13 +7007,13 @@ void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu,
> * is different from the subbuffer order of the buffer -
> * we can't reuse it
> */
> - if (page_ref_count(page) > 1 || data_page->order != buffer->subbuf_order)
> + if (page_ref_count(page) > 1)
> goto out;
>
> local_irq_save(flags);
> arch_spin_lock(&cpu_buffer->lock);
>
> - if (!cpu_buffer->free_page) {
> + if (!cpu_buffer->free_page && data_page->order == buffer->subbuf_order)
> cpu_buffer->free_page = dpage;
> dpage = NULL;
> }
> @@ -7091,15 +7091,15 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
> if (!data_page || !data_page->data)
> return -1;
>
> - if (data_page->order != buffer->subbuf_order)
> - return -1;
> -
> dpage = data_page->data;
> if (!dpage)
> return -1;
>
> guard(raw_spinlock_irqsave)(&cpu_buffer->reader_lock);
>
> + if (data_page->order != buffer->subbuf_order)
> + return -1;
> +
> reader = rb_get_reader_page(cpu_buffer);
> if (!reader)
> return -1;
>
> -- Steve

^ permalink raw reply

* Re: [PATCH v10 0/6] mm/memory-failure: add panic option for unrecoverable pages
From: Andrew Morton @ 2026-06-30 20:55 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Miaohe Lin, David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Naoya Horiguchi, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
	lance.yang, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team
In-Reply-To: <20260630-ecc_panic-v10-0-c6ed5b62eea2@debian.org>

On Tue, 30 Jun 2026 05:46:03 -0700 Breno Leitao <leitao@debian.org> wrote:

> A multi-bit ECC error on a kernel-owned page that the memory failure
> handler cannot recover is currently swallowed: PG_hwpoison is set, the
> event is logged, and the kernel keeps running.  The corrupted memory
> remains accessible to the kernel and either drives silent data
> corruption or surfaces seconds-to-minutes later as an apparently
> unrelated crash.  In a large fleet that delayed, unattributable crash
> turns into significant engineering effort to root-cause; in a kdump
> configuration, by the time the crash happens the original error
> context (faulting PFN, MCE/GHES record, page state) is long gone.
> 
> This series adds an opt-in sysctl,
> vm.panic_on_unrecoverable_memory_failure, that converts an
> unrecoverable kernel-page hwpoison event into an immediate panic with
> a clean dmesg/vmcore that still contains the original failure
> context.  The default is disabled so existing workloads see no
> change.

Updated, thanks.

Sashiko said things:
	https://sashiko.dev/#/patchset/20260630-ecc_panic-v10-0-c6ed5b62eea2@debian.org


> Changes in v10:
> - Reuse kselftest declarations
> - Residual race harmless documentation
> - Link to v9: https://lore.kernel.org/r/20260609-ecc_panic-v9-0-432a74002e74@debian.org

Here's how v10 altered mm.git:


 mm/memory-failure.c                          |    6 +-
 tools/testing/selftests/mm/hwpoison-panic.sh |   42 +++++++++--------
 2 files changed, 28 insertions(+), 20 deletions(-)

--- a/mm/memory-failure.c~b
+++ a/mm/memory-failure.c
@@ -1366,8 +1366,10 @@ static inline bool is_kernel_owned_page(
 	 * Page-type bits live only on the head page, so resolve any tail
 	 * first.  The check takes no refcount; recheck the head afterwards
 	 * so a concurrent split or compound free cannot leave us trusting
-	 * a stale view.  A free->alloc->free in the same window is still
-	 * possible but closing it would require taking a reference here.
+	 * a stale view.  A residual free->alloc->free cannot be closed here
+	 * (frozen slab and large-kmalloc pages cannot be pinned), but is
+	 * harmless: where a wrong verdict could panic, memory_failure() has
+	 * already set PageHWPoison, which bars the page from the allocator.
 	 */
 retry:
 	head = compound_head(page);
--- a/tools/testing/selftests/mm/hwpoison-panic.sh~b
+++ a/tools/testing/selftests/mm/hwpoison-panic.sh
@@ -35,7 +35,11 @@
 
 set -u
 
-ksft_skip=4
+# KTAP output helpers (ktap_print_msg, ktap_skip_all, ktap_exit_fail_msg, ...).
+DIR="$(dirname "$(readlink -f "$0")")"
+# shellcheck source=../kselftest/ktap_helpers.sh
+source "${DIR}"/../kselftest/ktap_helpers.sh
+
 sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure
 inject_path=/sys/devices/system/memory/hard_offline_page
 kpageflags_path=/proc/kpageflags
@@ -53,24 +57,24 @@ pagesize=$(getconf PAGE_SIZE)
 
 kind=${1:-rodata}
 
-ksft_print() { echo "# $*"; }
-ksft_exit_skip() { ksft_print "$*"; exit "$ksft_skip"; }
-ksft_exit_fail() { echo "not ok 1 $*"; exit 1; }
-
 if [ "$(id -u)" -ne 0 ]; then
-	ksft_exit_skip "must run as root"
+	ktap_skip_all "must run as root"
+	exit "$KSFT_SKIP"
 fi
 
 if [ ! -w "$sysctl_path" ]; then
-	ksft_exit_skip "$sysctl_path not present (kernel without the sysctl?)"
+	ktap_skip_all "$sysctl_path not present (kernel without the sysctl?)"
+	exit "$KSFT_SKIP"
 fi
 
 if [ ! -w "$inject_path" ]; then
-	ksft_exit_skip "$inject_path not present (no MEMORY_HOTPLUG?)"
+	ktap_skip_all "$inject_path not present (no MEMORY_HOTPLUG?)"
+	exit "$KSFT_SKIP"
 fi
 
 if [ "${RUN_DESTRUCTIVE:-0}" != "1" ]; then
-	ksft_exit_skip "destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM"
+	ktap_skip_all "destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM"
+	exit "$KSFT_SKIP"
 fi
 
 # Pick a PFN inside the kernel image rodata region of /proc/iomem.
@@ -208,21 +212,22 @@ pgtable)
 	missing_msg="no usable page-table PFN found in $kpageflags_path"
 	;;
 *)
-	ksft_exit_fail "unknown kind '$kind' (expected: rodata|slab|pgtable)"
+	ktap_exit_fail_msg "unknown kind '$kind' (expected: rodata|slab|pgtable)"
 	;;
 esac
 
 if [ -z "$phys_addr" ]; then
-	ksft_exit_skip "$missing_msg"
+	ktap_skip_all "$missing_msg"
+	exit "$KSFT_SKIP"
 fi
 
-ksft_print "enabling $sysctl_path"
+ktap_print_msg "enabling $sysctl_path"
 prior=$(cat "$sysctl_path")
-echo 1 > "$sysctl_path" || ksft_exit_fail "failed to enable sysctl"
+echo 1 > "$sysctl_path" || ktap_exit_fail_msg "failed to enable sysctl"
 
 pfn=$((phys_addr / pagesize))
-ksft_print "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (pfn 0x$(printf '%x' "$pfn"), kind=$kind)"
-ksft_print "expecting kernel panic: 'Memory failure: <pfn>: unrecoverable page'"
+ktap_print_msg "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (pfn 0x$(printf '%x' "$pfn"), kind=$kind)"
+ktap_print_msg "expecting kernel panic: 'Memory failure: <pfn>: unrecoverable page'"
 
 # A successful run never returns from the inject -- it panics the kernel.
 # Reaching the code below therefore means no panic fired.  Note whether
@@ -243,7 +248,8 @@ try_unpoison "$pfn"
 # if it raced to another type the run is inconclusive, so skip instead.
 kpageflags_bit_set "$pfn" "$recheck_bit"
 case $? in
-0)	ksft_exit_fail "$verdict (page still $kind)" ;;
-1)	ksft_exit_skip "target PFN no longer $kind; raced before inject, inconclusive" ;;
-*)	ksft_exit_fail "$verdict (could not reconfirm page type via $kpageflags_path)" ;;
+0)	ktap_exit_fail_msg "$verdict (page still $kind)" ;;
+1)	ktap_skip_all "target PFN no longer $kind; raced before inject, inconclusive"
+	exit "$KSFT_SKIP" ;;
+*)	ktap_exit_fail_msg "$verdict (could not reconfirm page type via $kpageflags_path)" ;;
 esac
_


^ permalink raw reply

* Re: [PATCH] ring-buffer: serialize read-page order with subbuffer resize
From: Steven Rostedt @ 2026-06-30 21:16 UTC (permalink / raw)
  To: Yousef Alhouseen
  Cc: mhiramat, mathieu.desnoyers, petr.pavlu, linux-trace-kernel,
	linux-kernel
In-Reply-To: <CAMuQ4bV-HJWbS7NHPGtoyKz-_+LR335phbXeOi83EbV6kn+3gg@mail.gmail.com>

On Tue, 30 Jun 2026 13:45:05 -0700
Yousef Alhouseen <alhouseenyousef@gmail.com> wrote:

> One issue turned up while checking the suggested locking:
> ring_buffer_subbuf_order_set() writes buffer->subbuf_order before
> taking reader_lock and never takes cpu_buffer->lock. An allocator can
> therefore take cpu_buffer->lock after the new order is published but
> before resize clears the old-order free_page, tag that old page with
> the new order, and return it.
> 
> I can keep the allocations outside buffer->mutex and hold the mutex
> only while snapshotting subbuf_order and taking or returning
> free_page. That removes allocation from the critical section and
> serializes the order/free-page pair with resize. Would you prefer
> that, or should free_page and the order transition be synchronized
> another way?

Nothing should be reading when the subbuf_order is being updated. Let's add
a flag to state that it's being updated, and make all reads simply fail
during that time.

-- Steve

^ permalink raw reply

* Re: [PATCH] ring-buffer: serialize read-page order with subbuffer resize
From: Yousef Alhouseen @ 2026-06-30 21:16 UTC (permalink / raw)
  To: rostedt
  Cc: mhiramat, mathieu.desnoyers, petr.pavlu, linux-trace-kernel,
	linux-kernel
In-Reply-To: <20260630171603.26530150@gandalf.local.home>

Agreed. I’ll add an explicit resize-in-progress flag, set it around
the order transition, and make the external read-page
allocation/free/read paths reject work while it is set. I’ll check the
flag under the locks that serialize each path so it cannot race the
transition, then compile and test the resulting v2.

On Tue, 30 Jun 2026 17:16:03 -0400, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 30 Jun 2026 13:45:05 -0700
> Yousef Alhouseen <alhouseenyousef@gmail.com> wrote:
>
> > One issue turned up while checking the suggested locking:
> > ring_buffer_subbuf_order_set() writes buffer->subbuf_order before
> > taking reader_lock and never takes cpu_buffer->lock. An allocator can
> > therefore take cpu_buffer->lock after the new order is published but
> > before resize clears the old-order free_page, tag that old page with
> > the new order, and return it.
> >
> > I can keep the allocations outside buffer->mutex and hold the mutex
> > only while snapshotting subbuf_order and taking or returning
> > free_page. That removes allocation from the critical section and
> > serializes the order/free-page pair with resize. Would you prefer
> > that, or should free_page and the order transition be synchronized
> > another way?
>
> Nothing should be reading when the subbuf_order is being updated. Let's add
> a flag to state that it's being updated, and make all reads simply fail
> during that time.
>
> -- Steve

^ permalink raw reply

* Re: [PATCH v3] ufs: core: add hba parameter to trace events
From: Steven Rostedt @ 2026-06-30 21:49 UTC (permalink / raw)
  To: peter.wang
  Cc: linux-scsi, martin.petersen, avri.altman, alim.akhtar, jejb,
	wsd_upstream, linux-mediatek, chun-hung.wu, alice.chao, cc.chou,
	chaotian.jing, jiajie.hao, yi-fan.peng, qilin.tan, lin.gui,
	tun-yu.yu, eddie.huang, naomi.chu, ed.tsai, bvanassche,
	Linux Trace Kernel
In-Reply-To: <20260630165612.3e21b510@gandalf.local.home>

On Tue, 30 Jun 2026 16:56:12 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> >  
> >  	TP_printk("%s: gating state changed to %s",
> > -		__get_str(dev_name),
> > +		dev_name(__entry->hba->dev),  
> 
> NO YOU CAN NOT DO THIS!!!!

This is why you should always Cc linux-trace-kernel@vger.kernel.org on any
trace event updates. We look to catch bugs like this.

The below patch should fix it, and I'll send it as a proper patch soon:

diff --git a/drivers/ufs/core/ufs_trace.h b/drivers/ufs/core/ufs_trace.h
index 309ae51b4906..377a3c54b9f5 100644
--- a/drivers/ufs/core/ufs_trace.h
+++ b/drivers/ufs/core/ufs_trace.h
@@ -89,16 +89,18 @@ TRACE_EVENT(ufshcd_clk_gating,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(int, state)
 	),
 
 	TP_fast_assign(
+		__assign_str(dev_name);
 		__entry->hba = hba;
 		__entry->state = state;
 	),
 
 	TP_printk("%s: gating state changed to %s",
-		dev_name(__entry->hba->dev),
+		__get_str(dev_name),
 		__print_symbolic(__entry->state, UFSCHD_CLK_GATING_STATES))
 );
 
@@ -111,6 +113,7 @@ TRACE_EVENT(ufshcd_clk_scaling,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__string(state, state)
 		__string(clk, clk)
 		__field(u32, prev_state)
@@ -119,6 +122,7 @@ TRACE_EVENT(ufshcd_clk_scaling,
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__assign_str(state);
 		__assign_str(clk);
 		__entry->prev_state = prev_state;
@@ -126,7 +130,7 @@ TRACE_EVENT(ufshcd_clk_scaling,
 	),
 
 	TP_printk("%s: %s %s from %u to %u Hz",
-		dev_name(__entry->hba->dev), __get_str(state), __get_str(clk),
+		__get_str(dev_name), __get_str(state), __get_str(clk),
 		__entry->prev_state, __entry->curr_state)
 );
 
@@ -138,16 +142,18 @@ TRACE_EVENT(ufshcd_auto_bkops_state,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__string(state, state)
 	),
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__assign_str(state);
 	),
 
 	TP_printk("%s: auto bkops - %s",
-		dev_name(__entry->hba->dev), __get_str(state))
+		__get_str(dev_name), __get_str(state))
 );
 
 DECLARE_EVENT_CLASS(ufshcd_profiling_template,
@@ -158,6 +164,7 @@ DECLARE_EVENT_CLASS(ufshcd_profiling_template,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__string(profile_info, profile_info)
 		__field(s64, time_us)
 		__field(int, err)
@@ -165,13 +172,14 @@ DECLARE_EVENT_CLASS(ufshcd_profiling_template,
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__assign_str(profile_info);
 		__entry->time_us = time_us;
 		__entry->err = err;
 	),
 
 	TP_printk("%s: %s: took %lld usecs, err %d",
-		dev_name(__entry->hba->dev), __get_str(profile_info),
+		__get_str(dev_name), __get_str(profile_info),
 		__entry->time_us, __entry->err)
 );
 
@@ -200,6 +208,7 @@ DECLARE_EVENT_CLASS(ufshcd_template,
 		__field(s64, usecs)
 		__field(int, err)
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(int, dev_state)
 		__field(int, link_state)
 	),
@@ -208,13 +217,14 @@ DECLARE_EVENT_CLASS(ufshcd_template,
 		__entry->usecs = usecs;
 		__entry->err = err;
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__entry->dev_state = dev_state;
 		__entry->link_state = link_state;
 	),
 
 	TP_printk(
 		"%s: took %lld usecs, dev_state: %s, link_state: %s, err %d",
-		dev_name(__entry->hba->dev),
+		__get_str(dev_name),
 		__entry->usecs,
 		__print_symbolic(__entry->dev_state, UFS_PWR_MODES),
 		__print_symbolic(__entry->link_state, UFS_LINK_STATES),
@@ -279,6 +289,7 @@ TRACE_EVENT(ufshcd_command,
 	TP_STRUCT__entry(
 		__field(struct scsi_device *, sdev)
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(&sdev->sdev_dev))
 		__field(enum ufs_trace_str_t, str_t)
 		__field(unsigned int, tag)
 		__field(u32, doorbell)
@@ -291,6 +302,7 @@ TRACE_EVENT(ufshcd_command,
 	),
 
 	TP_fast_assign(
+		__assign_str(dev_name);
 		__entry->sdev = sdev;
 		__entry->hba = hba;
 		__entry->str_t = str_t;
@@ -307,7 +319,7 @@ TRACE_EVENT(ufshcd_command,
 	TP_printk(
 		"%s: %s: tag: %u, DB: 0x%x, size: %d, IS: %u, LBA: %llu, opcode: 0x%x (%s), group_id: 0x%x, hwq_id: %d",
 		show_ufs_cmd_trace_str(__entry->str_t),
-		dev_name(&__entry->sdev->sdev_dev), __entry->tag,
+		__get_str(dev_name), __entry->tag,
 		__entry->doorbell, __entry->transfer_len, __entry->intr,
 		__entry->lba, (u32)__entry->opcode, str_opcode(__entry->opcode),
 		(u32)__entry->group_id, __entry->hwq_id
@@ -322,6 +334,7 @@ TRACE_EVENT(ufshcd_uic_command,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(enum ufs_trace_str_t, str_t)
 		__field(u32, cmd)
 		__field(u32, arg1)
@@ -331,6 +344,7 @@ TRACE_EVENT(ufshcd_uic_command,
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__entry->str_t = str_t;
 		__entry->cmd = cmd;
 		__entry->arg1 = arg1;
@@ -340,7 +354,7 @@ TRACE_EVENT(ufshcd_uic_command,
 
 	TP_printk(
 		"%s: %s: cmd: 0x%x, arg1: 0x%x, arg2: 0x%x, arg3: 0x%x",
-		show_ufs_cmd_trace_str(__entry->str_t), dev_name(__entry->hba->dev),
+		show_ufs_cmd_trace_str(__entry->str_t), __get_str(dev_name),
 		__entry->cmd, __entry->arg1, __entry->arg2, __entry->arg3
 	)
 );
@@ -353,6 +367,7 @@ TRACE_EVENT(ufshcd_upiu,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(enum ufs_trace_str_t, str_t)
 		__array(unsigned char, hdr, 12)
 		__array(unsigned char, tsf, 16)
@@ -361,6 +376,7 @@ TRACE_EVENT(ufshcd_upiu,
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__entry->str_t = str_t;
 		memcpy(__entry->hdr, hdr, sizeof(__entry->hdr));
 		memcpy(__entry->tsf, tsf, sizeof(__entry->tsf));
@@ -369,7 +385,7 @@ TRACE_EVENT(ufshcd_upiu,
 
 	TP_printk(
 		"%s: %s: HDR:%s, %s:%s",
-		show_ufs_cmd_trace_str(__entry->str_t), dev_name(__entry->hba->dev),
+		show_ufs_cmd_trace_str(__entry->str_t), __get_str(dev_name),
 		__print_hex(__entry->hdr, sizeof(__entry->hdr)),
 		show_ufs_cmd_trace_tsf(__entry->tsf_t),
 		__print_hex(__entry->tsf, sizeof(__entry->tsf))
@@ -384,16 +400,18 @@ TRACE_EVENT(ufshcd_exception_event,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(u16, status)
 	),
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__entry->status = status;
 	),
 
 	TP_printk("%s: status 0x%x",
-		dev_name(__entry->hba->dev), __entry->status
+		__get_str(dev_name), __entry->status
 	)
 );
 
-- Steve

^ permalink raw reply related

* Re: [PATCH v2 5.15.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-30 22:23 UTC (permalink / raw)
  To: stable
  Cc: Sasha Levin, rostedt, mhiramat, mathieu.desnoyers, dhowells,
	linux-trace-kernel, linux-kernel, doebel
In-Reply-To: <20260630060321.1494832-1-doebel@amazon.de>

> [doebel@amazon.de: move patch section using guard() macro into a
> separate block to address declaration after statement warning.]
> Signed-off-by: Bjoern Doebel <doebel@amazon.de>

Queued for 5.15, thanks.

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v2 5.10.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-30 22:23 UTC (permalink / raw)
  To: stable
  Cc: Sasha Levin, rostedt, mhiramat, mathieu.desnoyers, dhowells,
	linux-trace-kernel, linux-kernel, doebel
In-Reply-To: <20260630060634.1496989-1-doebel@amazon.de>

> [doebel@amazon.de: move patch section using guard() macro into a
> separate block to address declaration after statement warning.]
> Signed-off-by: Bjoern Doebel <doebel@amazon.de>

Queued for 5.10, thanks.

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH] riscv: probes: save original sp in rethook trampoline
From: Masami Hiramatsu @ 2026-06-30 22:33 UTC (permalink / raw)
  To: Martin Kaiser
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Steven Rostedt,
	Masami Hiramatsu, linux-riscv, linux-kernel, linux-trace-kernel
In-Reply-To: <20260630194010.1824039-1-martin@kaiser.cx>

On Tue, 30 Jun 2026 21:40:03 +0200
Martin Kaiser <martin@kaiser.cx> wrote:

> Reading a word from the stack in a kretprobe crashes a risc-v kernel.
> 
> $ cd /sys/kernel/tracing/
> $ echo 'r n_tty_write $stack0' > dynamic_events
> $ echo 1 > events/kprobes/enable
> Unable to handle kernel paging request at virtual address 0000000200000128
> ...
> [<ffffffff80016d16>] regs_get_kernel_stack_nth+0x26/0x38
> [<ffffffff80177196>] process_fetch_insn+0x3ee/0x760
> [<ffffffff80177836>] kretprobe_trace_func+0x116/0x1f0
> [<ffffffff8017795a>] kretprobe_dispatcher+0x4a/0x58
> [<ffffffff8013572e>] kretprobe_rethook_handler+0x5e/0x90
> [<ffffffff80180838>] rethook_trampoline_handler+0x70/0x108
> [<ffffffff8001ba32>] arch_rethook_trampoline_callback+0x12/0x1c
> [<ffffffff8001ba84>] arch_rethook_trampoline+0x48/0x94
> [<ffffffff8067872a>] tty_write+0x1a/0x30
> 
> In regs_get_kernel_stack_nth, regs->sp contains an arbitrary value.
> 
> arch_rethook_trampoline saves the registers from the probed function in a
> struct pt_regs. sp is not saved. Instead, sp is decremented for
> arch_rethook_trampoline's local stack.
> 
> Fix this crash and save the original sp along with the other registers.
> Use a0 as a temporary register, it is overwritten anyway.

Good catch!

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

I would like this to be handled by the RISC-V maintainers.

Thank you,

> 
> Signed-off-by: Martin Kaiser <martin@kaiser.cx>
> ---
>  arch/riscv/kernel/probes/rethook_trampoline.S | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/riscv/kernel/probes/rethook_trampoline.S b/arch/riscv/kernel/probes/rethook_trampoline.S
> index f2cd83d9b0f0..c3aa8d8cf5af 100644
> --- a/arch/riscv/kernel/probes/rethook_trampoline.S
> +++ b/arch/riscv/kernel/probes/rethook_trampoline.S
> @@ -41,6 +41,9 @@
>  	REG_S x29, PT_T4(sp)
>  	REG_S x30, PT_T5(sp)
>  	REG_S x31, PT_T6(sp)
> +	/* save original sp */
> +	addi a0, sp, PT_SIZE_ON_STACK
> +	REG_S a0, PT_SP(sp)
>  	.endm
>  
>  	.macro restore_all_base_regs
> -- 
> 2.43.7
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v13 04/11] perf/probe: Ignore comment lines in dynamic_events/kprobe_events file
From: Masami Hiramatsu @ 2026-06-30 22:39 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Steven Rostedt, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, linux-kernel, linux-trace-kernel,
	linux-doc, linux-kselftest
In-Reply-To: <akMO53eG_4YKJH0j@google.com>

On Mon, 29 Jun 2026 17:33:43 -0700
Namhyung Kim <namhyung@kernel.org> wrote:

> Hi Masami,
> 
> On Tue, Jun 30, 2026 at 07:32:11AM +0900, Masami Hiramatsu wrote:
> > Hi Arnaldo, Namhyung,
> > 
> > I forgot to CC this. Can I pick this patch via linux-trace tree,
> > or would you pick this?
> > This is a part of typecast series [1] only for debugging.
> 
> Thanks for letting me know.
> 
> I think it's better to route this through the perf tree as we're seeing
> a lot of cleanups all around the code base.  Having this together would
> reduce chances of future conflicts.  Does that sound ok to you?

OK, thanks for confirmation. Then I'll drop it from probes/for-next (and probes/core).

Thank you,

> 
> Thanks,
> Namhyung
> 
> 
> > 
> > [1] https://lore.kernel.org/all/178271361825.1176915.16095297120719039761.stgit@devnote2/
> > 
> > Thanks,
> > 
> > On Mon, 29 Jun 2026 15:13:38 +0900
> > "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> > 
> > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > 
> > > Since dynamic_events/kprobe_events files show the fetcharg debug
> > > information as comment lines, its reader needs to ignore it.
> > > 
> > > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > ---
> > >  tools/perf/util/probe-file.c |    2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
> > > index 4032572cbf55..4d12693a83b3 100644
> > > --- a/tools/perf/util/probe-file.c
> > > +++ b/tools/perf/util/probe-file.c
> > > @@ -197,6 +197,8 @@ struct strlist *probe_file__get_rawlist(int fd)
> > >  		idx = strlen(p) - 1;
> > >  		if (p[idx] == '\n')
> > >  			p[idx] = '\0';
> > > +		if (buf[0] == '#')
> > > +			continue;
> > >  		ret = strlist__add(sl, buf);
> > >  		if (ret < 0) {
> > >  			pr_debug("strlist__add failed (%d)\n", ret);
> > > 
> > 
> > 
> > -- 
> > Masami Hiramatsu (Google) <mhiramat@kernel.org>


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH] tracing: Warn when an event dereferences a pointer in TP_printk()
From: Steven Rostedt @ 2026-06-30 22:48 UTC (permalink / raw)
  To: LKML, Linux Trace Kernel
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Martin Kaiser, Frank Li,
	Vinod Koul

From: Steven Rostedt <rostedt@goodmis.org>

Currently on boot up and when modules are loaded, the trace event
infrastructure will examine the TP_printk's of every event looking to see
if it dereferences pointers on the ring buffer via printk formats like
"%pB" and such. What it doesn't do is check if the arguments themselves
do a dereference from a pointer.

This was brought with a fix[1] to the fsl_edma event that had in the
arguments of the TP_printk(): "__entry->edma->membase"

The __entry->edma is a pointer saved in the ring buffer. The dereference
from TP_printk() happens when the user reads the "trace" file which can be
seconds, minutes, hours, days, weeks, or even months later! There is no
guarantee that the __entry->edma pointer will still be pointing to what it
was when it was recorded, and could crash the kernel when a user reads the
event.

Add logic to the test_event_printk() that also checks for this case and
warn if the event dereferences a pointer from the ring buffer.

[1] https://lore.kernel.org/all/20260630200022.1826420-1-martin@kaiser.cx/

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_events.c | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c46e623e7e0d..3b52bfd8b300 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -400,6 +400,31 @@ static bool process_string(const char *fmt, int len, struct trace_event_call *ca
 	return true;
 }
 
+static void test_double_dereference(const char *str, int len,
+				    struct trace_event_call *call)
+{
+	const char *ptr;
+	const char *end = str + len;
+
+	ptr = strstr(str, "REC->");
+
+	while (ptr && ptr < end) {
+
+		ptr += 5;
+		for (; ptr < end; ptr++) {
+			if (ptr[0] == '-' && ptr[1] == '>') {
+				WARN_ONCE(1, "Event %s has double dereference in TP_printk: %.*s\n",
+					  trace_event_name(call), len, str);
+				return;
+			}
+			if (!isalnum(*ptr) && *ptr != '_')
+				break;
+		}
+
+		ptr = strstr(ptr, "REC->");
+	}
+}
+
 static void handle_dereference_arg(const char *arg_str, u64 string_flags, int len,
 				   u64 *dereference_flags, int arg,
 				   struct trace_event_call *call)
@@ -459,12 +484,6 @@ static void test_event_printk(struct trace_event_call *call)
 				if (in_quote) {
 					arg = 0;
 					first = false;
-					/*
-					 * If there was no %p* uses
-					 * the fmt is OK.
-					 */
-					if (!dereference_flags)
-						return;
 				}
 			}
 			if (in_quote) {
@@ -576,6 +595,8 @@ static void test_event_printk(struct trace_event_call *call)
 				continue;
 			}
 
+			test_double_dereference(fmt + start_arg, e - start_arg, call);
+
 			if (dereference_flags & (1ULL << arg)) {
 				handle_dereference_arg(fmt + start_arg, string_flags,
 						       e - start_arg,
@@ -589,6 +610,8 @@ static void test_event_printk(struct trace_event_call *call)
 		}
 	}
 
+	test_double_dereference(fmt + start_arg, i - start_arg, call);
+
 	if (dereference_flags & (1ULL << arg)) {
 		handle_dereference_arg(fmt + start_arg, string_flags,
 				       i - start_arg,
-- 
2.53.0


^ permalink raw reply related

* [PATCH] ufs: core: tracing: Do not dereference pointers in TP_printk()
From: Steven Rostedt @ 2026-06-30 22:54 UTC (permalink / raw)
  To: LKML, Linux Trace Kernel, linux-scsi
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Alim Akhtar, Avri Altman,
	Bart Van Assche, James Bottomley, Martin K. Petersen, Peter Wang

From: Steven Rostedt <rostedt@goodmis.org>

The trace events in drivers/ufs/core/ufs_trace.h were converted to take a
pointer to the hba structure as an argument for the tracepoint and then in
TP_printk() the printing of the dev_name from the ring buffer was
converted to using the dev dereferenced pointer from the hba saved
pointer.

This is not allowed as the TP_printk() is executed at the time the trace
event is read from /sys/kernel/tracing/trace file. That can happen
literally, seconds, minutes, hours, weeks, days, or even months later!
There is no guarantee that the hba pointer will still exist by the time it
is dereferenced when the "trace" file is read.

Instead, save the device name from the hba pointer at the time the
tracepoint is called and place it into the ring buffer event. Then the
TP_printk() can read the name directly from the ring buffer and remove the
possibility that it will read a freed pointer and crash the kernel.

This was detected when testing the trace event code that looks for
TP_printk() parameters doing illegal derferences[1]

[1] https://lore.kernel.org/all/20260630184836.74d477b6@gandalf.local.home/

Cc: stable@vger.kernel.org
Fixes: 583e518e71003 ("scsi: ufs: core: Add hba parameter to trace events")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 drivers/ufs/core/ufs_trace.h | 36 +++++++++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/ufs/core/ufs_trace.h b/drivers/ufs/core/ufs_trace.h
index 309ae51b4906..377a3c54b9f5 100644
--- a/drivers/ufs/core/ufs_trace.h
+++ b/drivers/ufs/core/ufs_trace.h
@@ -89,16 +89,18 @@ TRACE_EVENT(ufshcd_clk_gating,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(int, state)
 	),
 
 	TP_fast_assign(
+		__assign_str(dev_name);
 		__entry->hba = hba;
 		__entry->state = state;
 	),
 
 	TP_printk("%s: gating state changed to %s",
-		dev_name(__entry->hba->dev),
+		__get_str(dev_name),
 		__print_symbolic(__entry->state, UFSCHD_CLK_GATING_STATES))
 );
 
@@ -111,6 +113,7 @@ TRACE_EVENT(ufshcd_clk_scaling,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__string(state, state)
 		__string(clk, clk)
 		__field(u32, prev_state)
@@ -119,6 +122,7 @@ TRACE_EVENT(ufshcd_clk_scaling,
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__assign_str(state);
 		__assign_str(clk);
 		__entry->prev_state = prev_state;
@@ -126,7 +130,7 @@ TRACE_EVENT(ufshcd_clk_scaling,
 	),
 
 	TP_printk("%s: %s %s from %u to %u Hz",
-		dev_name(__entry->hba->dev), __get_str(state), __get_str(clk),
+		__get_str(dev_name), __get_str(state), __get_str(clk),
 		__entry->prev_state, __entry->curr_state)
 );
 
@@ -138,16 +142,18 @@ TRACE_EVENT(ufshcd_auto_bkops_state,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__string(state, state)
 	),
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__assign_str(state);
 	),
 
 	TP_printk("%s: auto bkops - %s",
-		dev_name(__entry->hba->dev), __get_str(state))
+		__get_str(dev_name), __get_str(state))
 );
 
 DECLARE_EVENT_CLASS(ufshcd_profiling_template,
@@ -158,6 +164,7 @@ DECLARE_EVENT_CLASS(ufshcd_profiling_template,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__string(profile_info, profile_info)
 		__field(s64, time_us)
 		__field(int, err)
@@ -165,13 +172,14 @@ DECLARE_EVENT_CLASS(ufshcd_profiling_template,
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__assign_str(profile_info);
 		__entry->time_us = time_us;
 		__entry->err = err;
 	),
 
 	TP_printk("%s: %s: took %lld usecs, err %d",
-		dev_name(__entry->hba->dev), __get_str(profile_info),
+		__get_str(dev_name), __get_str(profile_info),
 		__entry->time_us, __entry->err)
 );
 
@@ -200,6 +208,7 @@ DECLARE_EVENT_CLASS(ufshcd_template,
 		__field(s64, usecs)
 		__field(int, err)
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(int, dev_state)
 		__field(int, link_state)
 	),
@@ -208,13 +217,14 @@ DECLARE_EVENT_CLASS(ufshcd_template,
 		__entry->usecs = usecs;
 		__entry->err = err;
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__entry->dev_state = dev_state;
 		__entry->link_state = link_state;
 	),
 
 	TP_printk(
 		"%s: took %lld usecs, dev_state: %s, link_state: %s, err %d",
-		dev_name(__entry->hba->dev),
+		__get_str(dev_name),
 		__entry->usecs,
 		__print_symbolic(__entry->dev_state, UFS_PWR_MODES),
 		__print_symbolic(__entry->link_state, UFS_LINK_STATES),
@@ -279,6 +289,7 @@ TRACE_EVENT(ufshcd_command,
 	TP_STRUCT__entry(
 		__field(struct scsi_device *, sdev)
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(&sdev->sdev_dev))
 		__field(enum ufs_trace_str_t, str_t)
 		__field(unsigned int, tag)
 		__field(u32, doorbell)
@@ -291,6 +302,7 @@ TRACE_EVENT(ufshcd_command,
 	),
 
 	TP_fast_assign(
+		__assign_str(dev_name);
 		__entry->sdev = sdev;
 		__entry->hba = hba;
 		__entry->str_t = str_t;
@@ -307,7 +319,7 @@ TRACE_EVENT(ufshcd_command,
 	TP_printk(
 		"%s: %s: tag: %u, DB: 0x%x, size: %d, IS: %u, LBA: %llu, opcode: 0x%x (%s), group_id: 0x%x, hwq_id: %d",
 		show_ufs_cmd_trace_str(__entry->str_t),
-		dev_name(&__entry->sdev->sdev_dev), __entry->tag,
+		__get_str(dev_name), __entry->tag,
 		__entry->doorbell, __entry->transfer_len, __entry->intr,
 		__entry->lba, (u32)__entry->opcode, str_opcode(__entry->opcode),
 		(u32)__entry->group_id, __entry->hwq_id
@@ -322,6 +334,7 @@ TRACE_EVENT(ufshcd_uic_command,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(enum ufs_trace_str_t, str_t)
 		__field(u32, cmd)
 		__field(u32, arg1)
@@ -331,6 +344,7 @@ TRACE_EVENT(ufshcd_uic_command,
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__entry->str_t = str_t;
 		__entry->cmd = cmd;
 		__entry->arg1 = arg1;
@@ -340,7 +354,7 @@ TRACE_EVENT(ufshcd_uic_command,
 
 	TP_printk(
 		"%s: %s: cmd: 0x%x, arg1: 0x%x, arg2: 0x%x, arg3: 0x%x",
-		show_ufs_cmd_trace_str(__entry->str_t), dev_name(__entry->hba->dev),
+		show_ufs_cmd_trace_str(__entry->str_t), __get_str(dev_name),
 		__entry->cmd, __entry->arg1, __entry->arg2, __entry->arg3
 	)
 );
@@ -353,6 +367,7 @@ TRACE_EVENT(ufshcd_upiu,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(enum ufs_trace_str_t, str_t)
 		__array(unsigned char, hdr, 12)
 		__array(unsigned char, tsf, 16)
@@ -361,6 +376,7 @@ TRACE_EVENT(ufshcd_upiu,
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__entry->str_t = str_t;
 		memcpy(__entry->hdr, hdr, sizeof(__entry->hdr));
 		memcpy(__entry->tsf, tsf, sizeof(__entry->tsf));
@@ -369,7 +385,7 @@ TRACE_EVENT(ufshcd_upiu,
 
 	TP_printk(
 		"%s: %s: HDR:%s, %s:%s",
-		show_ufs_cmd_trace_str(__entry->str_t), dev_name(__entry->hba->dev),
+		show_ufs_cmd_trace_str(__entry->str_t), __get_str(dev_name),
 		__print_hex(__entry->hdr, sizeof(__entry->hdr)),
 		show_ufs_cmd_trace_tsf(__entry->tsf_t),
 		__print_hex(__entry->tsf, sizeof(__entry->tsf))
@@ -384,16 +400,18 @@ TRACE_EVENT(ufshcd_exception_event,
 
 	TP_STRUCT__entry(
 		__field(struct ufs_hba *, hba)
+		__string(dev_name, dev_name(hba->dev))
 		__field(u16, status)
 	),
 
 	TP_fast_assign(
 		__entry->hba = hba;
+		__assign_str(dev_name);
 		__entry->status = status;
 	),
 
 	TP_printk("%s: status 0x%x",
-		dev_name(__entry->hba->dev), __entry->status
+		__get_str(dev_name), __entry->status
 	)
 );
 
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v2] lib/bootconfig: fix undefined behavior involving NULL pointer arithmetic
From: Masami Hiramatsu @ 2026-06-30 22:58 UTC (permalink / raw)
  To: Bradley Morgan; +Cc: akpm, linux-kernel, linux-trace-kernel, stable
In-Reply-To: <20260630174746.14795-1-include@grrlz.net>

On Tue, 30 Jun 2026 17:47:46 +0000
Bradley Morgan <include@grrlz.net> wrote:

> When xbc_snprint_cmdline() is called during the size-probing phase
> (with buf = NULL and size = 0), the function computes the end pointer
> as 'buf + size' (NULL + 0) and repeatedly advances 'buf' via 'buf += ret'.
> 
> Under the C standard, performing pointer arithmetic on a NULL pointer is
> undefined behavior. While harmless inside the kernel, this code is also
> compiled into the userspace host tool 'tools/bootconfig', where host
> compilers with UBSan or FORTIFY_SOURCE enabled abort the build when they
> detect NULL pointer arithmetic.
> 
> Fix this by guarding the pointer arithmetic so 'buf' is only advanced when
> non-NULL, and track the running written length in a separate 'len' counter
> for the return value (which cannot be recovered from pointer math when
> 'buf' is NULL). The rest() helper and snprintf call sites are unchanged.
> 
> Fixes: 51887d03aca1 ("bootconfig: init: Allow admin to use bootconfig for kernel command line")
> Cc: stable@vger.kernel.org
> Assisted-by: GLM:glm-5.2
> Signed-off-by: Bradley Morgan <include@grrlz.net>

Thanks for the fix!
Let me pick this to bootconfig/fixes.

Thank you,

> ---
>  lib/bootconfig.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> Changes since v1:
> - Got the big guns out! :) (see Assisted-by).
> - Addressed review from Masami Hiramatsu and Breno Leitao.
> 
> diff --git a/lib/bootconfig.c b/lib/bootconfig.c
> index f445b7703fdd..c913259c80ce 100644
> --- a/lib/bootconfig.c
> +++ b/lib/bootconfig.c
> @@ -427,8 +427,9 @@ static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
>  int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
>  {
>  	struct xbc_node *knode, *vnode;
> -	char *end = buf + size;
> +	char *end = buf ? buf + size : NULL;
>  	const char *val, *q;
> +	size_t len = 0;
>  	int ret;
>  
>  	xbc_node_for_each_key_value(root, knode, val) {
> @@ -442,7 +443,9 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
>  			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
>  			if (ret < 0)
>  				return ret;
> -			buf += ret;
> +			len += ret;
> +			if (buf)
> +				buf += ret;
>  			continue;
>  		}
>  		xbc_array_for_each_value(vnode, val) {
> @@ -456,11 +459,13 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
>  				       xbc_namebuf, q, val, q);
>  			if (ret < 0)
>  				return ret;
> -			buf += ret;
> +			len += ret;
> +			if (buf)
> +				buf += ret;
>  		}
>  	}
>  
> -	return buf - (end - size);
> +	return len;
>  }
>  #undef rest
>  
> -- 
> 2.53.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v2] ufs: core: add hba parameter to trace events
From: Steven Rostedt @ 2026-06-30 22:58 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: peter.wang, linux-scsi, martin.petersen, avri.altman, alim.akhtar,
	jejb, sutoshd, wsd_upstream, linux-mediatek, chun-hung.wu,
	alice.chao, cc.chou, chaotian.jing, jiajie.hao, yi-fan.peng,
	qilin.tan, lin.gui, tun-yu.yu, eddie.huang, naomi.chu, ed.tsai,
	Linux Trace Kernel
In-Reply-To: <16f26ea9-69d6-4f2f-9adc-c576c288a2f5@acm.org>

On Thu, 13 Feb 2025 09:19:42 -0800
Bart Van Assche <bvanassche@acm.org> wrote:

> On 2/13/25 3:35 AM, peter.wang@mediatek.com wrote:
> > diff --git a/drivers/ufs/core/ufs_trace.h b/drivers/ufs/core/ufs_trace.h
> > index 84deca2b841d..2f79982846b6 100644
> > --- a/drivers/ufs/core/ufs_trace.h
> > +++ b/drivers/ufs/core/ufs_trace.h
> > @@ -83,16 +83,18 @@ UFS_CMD_TRACE_TSF_TYPES
> >   
> >   TRACE_EVENT(ufshcd_clk_gating,
> >   
> > -	TP_PROTO(const char *dev_name, int state),
> > +	TP_PROTO(struct ufs_hba *hba, int state),
> >   
> > -	TP_ARGS(dev_name, state),
> > +	TP_ARGS(hba, state),
> >   
> >   	TP_STRUCT__entry(
> > -		__string(dev_name, dev_name)
> > +		__field(struct ufs_hba *, hba)
> > +		__string(dev_name, dev_name(hba->dev))
> >   		__field(int, state)
> >   	),  
> 
> Please reduce the size of the tracing entries by removing dev_name from 
> TP_STRUCT__entry() and by replacing 'dev_name' with 'dev_name(hba->dev)'
> in the TP_printk() calls.

For future references, please do not recommend moving dereferences into the
TP_printk() callers. Those happen when the event is read by the user and
the hba pointer may no longer exist.

-- Steve

^ permalink raw reply

* Re: [PATCH v2] lib/bootconfig: fix undefined behavior involving NULL pointer arithmetic
From: Masami Hiramatsu @ 2026-06-30 23:06 UTC (permalink / raw)
  To: Bradley Morgan; +Cc: akpm, linux-kernel, linux-trace-kernel, stable
In-Reply-To: <20260630174746.14795-1-include@grrlz.net>

On Tue, 30 Jun 2026 17:47:46 +0000
Bradley Morgan <include@grrlz.net> wrote:

> When xbc_snprint_cmdline() is called during the size-probing phase
> (with buf = NULL and size = 0), the function computes the end pointer
> as 'buf + size' (NULL + 0) and repeatedly advances 'buf' via 'buf += ret'.
> 
> Under the C standard, performing pointer arithmetic on a NULL pointer is
> undefined behavior. While harmless inside the kernel, this code is also
> compiled into the userspace host tool 'tools/bootconfig', where host
> compilers with UBSan or FORTIFY_SOURCE enabled abort the build when they
> detect NULL pointer arithmetic.
> 
> Fix this by guarding the pointer arithmetic so 'buf' is only advanced when
> non-NULL, and track the running written length in a separate 'len' counter
> for the return value (which cannot be recovered from pointer math when
> 'buf' is NULL). The rest() helper and snprintf call sites are unchanged.
> 
> Fixes: 51887d03aca1 ("bootconfig: init: Allow admin to use bootconfig for kernel command line")
> Cc: stable@vger.kernel.org
> Assisted-by: GLM:glm-5.2
> Signed-off-by: Bradley Morgan <include@grrlz.net>

Oops, Breno already did it.

https://lore.kernel.org/all/20260626-bootconfig_using_tools-v7-1-24ab72139c29@debian.org/

Let me drop this patch since it makes a conflict with Breno patch.

Thanks, 

> ---
>  lib/bootconfig.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> Changes since v1:
> - Got the big guns out! :) (see Assisted-by).
> - Addressed review from Masami Hiramatsu and Breno Leitao.
> 
> diff --git a/lib/bootconfig.c b/lib/bootconfig.c
> index f445b7703fdd..c913259c80ce 100644
> --- a/lib/bootconfig.c
> +++ b/lib/bootconfig.c
> @@ -427,8 +427,9 @@ static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
>  int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
>  {
>  	struct xbc_node *knode, *vnode;
> -	char *end = buf + size;
> +	char *end = buf ? buf + size : NULL;
>  	const char *val, *q;
> +	size_t len = 0;
>  	int ret;
>  
>  	xbc_node_for_each_key_value(root, knode, val) {
> @@ -442,7 +443,9 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
>  			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
>  			if (ret < 0)
>  				return ret;
> -			buf += ret;
> +			len += ret;
> +			if (buf)
> +				buf += ret;
>  			continue;
>  		}
>  		xbc_array_for_each_value(vnode, val) {
> @@ -456,11 +459,13 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
>  				       xbc_namebuf, q, val, q);
>  			if (ret < 0)
>  				return ret;
> -			buf += ret;
> +			len += ret;
> +			if (buf)
> +				buf += ret;
>  		}
>  	}
>  
> -	return buf - (end - size);
> +	return len;
>  }
>  #undef rest
>  
> -- 
> 2.53.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v2] lib/bootconfig: fix undefined behavior involving NULL pointer arithmetic
From: Masami Hiramatsu @ 2026-06-30 23:26 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Bradley Morgan, akpm, linux-kernel, linux-trace-kernel, stable
In-Reply-To: <20260701075843.a308d7dadf327eda4015236b@kernel.org>

On Wed, 1 Jul 2026 07:58:43 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Tue, 30 Jun 2026 17:47:46 +0000
> Bradley Morgan <include@grrlz.net> wrote:
> 
> > When xbc_snprint_cmdline() is called during the size-probing phase
> > (with buf = NULL and size = 0), the function computes the end pointer
> > as 'buf + size' (NULL + 0) and repeatedly advances 'buf' via 'buf += ret'.
> > 
> > Under the C standard, performing pointer arithmetic on a NULL pointer is
> > undefined behavior. While harmless inside the kernel, this code is also
> > compiled into the userspace host tool 'tools/bootconfig', where host
> > compilers with UBSan or FORTIFY_SOURCE enabled abort the build when they
> > detect NULL pointer arithmetic.
> > 
> > Fix this by guarding the pointer arithmetic so 'buf' is only advanced when
> > non-NULL, and track the running written length in a separate 'len' counter
> > for the return value (which cannot be recovered from pointer math when
> > 'buf' is NULL). The rest() helper and snprintf call sites are unchanged.
> > 
> > Fixes: 51887d03aca1 ("bootconfig: init: Allow admin to use bootconfig for kernel command line")
> > Cc: stable@vger.kernel.org
> > Assisted-by: GLM:glm-5.2
> > Signed-off-by: Bradley Morgan <include@grrlz.net>
> 
> Thanks for the fix!
> Let me pick this to bootconfig/fixes.

Sorry, I eventually decided to pick Breno's fix [1], because it fixes
the same issue earlier (in bootconfig/core) and has a well documented
comment on the code.

[1] https://lore.kernel.org/all/20260626-bootconfig_using_tools-v7-1-24ab72139c29@debian.org/

BTW, I decided to have several branches for bootconfig and probes.

bootconfig/core is a core development branch, which is the main branch.
The patches in this branch is for development, including new features
and fixes. (but fixes will be moved to */fixes soon.)

bootconfig/fixes is for a branch to manage fixes. This will be sent to
Linus soon (for urgent fix), or after releasing -rc.

bootconfig/for-next is for new features or cleanups, for preparing the
next merge window, and for merge test in linux-next.

If you make any patches, please check the bootconfig/core at first,
and check bootconfig/fixes for fix.

Note: The core is usually forcibly updated, actively rebased on top of
bootconfig/fixes. The for-next is not so frequently updated, but can
be forced update for fixing merge conflict etc. The fixes should be
solid, but if I made mistakes I will forcibly update it before sending
PR.

Thank you,

> 
> Thank you,
> 
> > ---
> >  lib/bootconfig.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > Changes since v1:
> > - Got the big guns out! :) (see Assisted-by).
> > - Addressed review from Masami Hiramatsu and Breno Leitao.
> > 
> > diff --git a/lib/bootconfig.c b/lib/bootconfig.c
> > index f445b7703fdd..c913259c80ce 100644
> > --- a/lib/bootconfig.c
> > +++ b/lib/bootconfig.c
> > @@ -427,8 +427,9 @@ static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
> >  int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
> >  {
> >  	struct xbc_node *knode, *vnode;
> > -	char *end = buf + size;
> > +	char *end = buf ? buf + size : NULL;
> >  	const char *val, *q;
> > +	size_t len = 0;
> >  	int ret;
> >  
> >  	xbc_node_for_each_key_value(root, knode, val) {
> > @@ -442,7 +443,9 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
> >  			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
> >  			if (ret < 0)
> >  				return ret;
> > -			buf += ret;
> > +			len += ret;
> > +			if (buf)
> > +				buf += ret;
> >  			continue;
> >  		}
> >  		xbc_array_for_each_value(vnode, val) {
> > @@ -456,11 +459,13 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
> >  				       xbc_namebuf, q, val, q);
> >  			if (ret < 0)
> >  				return ret;
> > -			buf += ret;
> > +			len += ret;
> > +			if (buf)
> > +				buf += ret;
> >  		}
> >  	}
> >  
> > -	return buf - (end - size);
> > +	return len;
> >  }
> >  #undef rest
> >  
> > -- 
> > 2.53.0
> > 
> 
> 
> -- 
> Masami Hiramatsu (Google) <mhiramat@kernel.org>


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH] riscv: probes: save original sp in rethook trampoline
From: Paul Walmsley @ 2026-07-01  0:51 UTC (permalink / raw)
  To: Martin Kaiser, Masami Hiramatsu
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Steven Rostedt,
	linux-riscv, linux-kernel, linux-trace-kernel
In-Reply-To: <20260701073335.548d8f0b435b1a5fb4e41a69@kernel.org>

On Wed, 1 Jul 2026, Masami Hiramatsu wrote:

> On Tue, 30 Jun 2026 21:40:03 +0200
> Martin Kaiser <martin@kaiser.cx> wrote:
> 
> > Reading a word from the stack in a kretprobe crashes a risc-v kernel.
> > 
> > $ cd /sys/kernel/tracing/
> > $ echo 'r n_tty_write $stack0' > dynamic_events
> > $ echo 1 > events/kprobes/enable
> > Unable to handle kernel paging request at virtual address 0000000200000128
> > ...
> > [<ffffffff80016d16>] regs_get_kernel_stack_nth+0x26/0x38
> > [<ffffffff80177196>] process_fetch_insn+0x3ee/0x760
> > [<ffffffff80177836>] kretprobe_trace_func+0x116/0x1f0
> > [<ffffffff8017795a>] kretprobe_dispatcher+0x4a/0x58
> > [<ffffffff8013572e>] kretprobe_rethook_handler+0x5e/0x90
> > [<ffffffff80180838>] rethook_trampoline_handler+0x70/0x108
> > [<ffffffff8001ba32>] arch_rethook_trampoline_callback+0x12/0x1c
> > [<ffffffff8001ba84>] arch_rethook_trampoline+0x48/0x94
> > [<ffffffff8067872a>] tty_write+0x1a/0x30
> > 
> > In regs_get_kernel_stack_nth, regs->sp contains an arbitrary value.
> > 
> > arch_rethook_trampoline saves the registers from the probed function in a
> > struct pt_regs. sp is not saved. Instead, sp is decremented for
> > arch_rethook_trampoline's local stack.
> > 
> > Fix this crash and save the original sp along with the other registers.
> > Use a0 as a temporary register, it is overwritten anyway.
> 
> Good catch!
> 
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> I would like this to be handled by the RISC-V maintainers.

Thanks, added a Fixes: tag and cc'ed stable, and queued for v7.2-rc.


- Paul

^ permalink raw reply

* [PATCH v4 2/7] mm/page_owner: add MR_NEVER to enum migrate_reason and use it for last_migrate_reason
From: Ye Liu @ 2026-07-01  1:22 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Steven Rostedt,
	Masami Hiramatsu, Vlastimil Babka, Jan Kiszka, Kieran Bingham
  Cc: Ye Liu, Zi Yan, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Mathieu Desnoyers, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, linux-mm, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260701012239.315262-1-ye.liu@linux.dev>

The last_migrate_reason field uses -1 as a sentinel value to mean "no
migration has happened".  Replace the four bare -1 occurrences by
adding a proper MR_NEVER member to enum migrate_reason, defining a
corresponding "never_migrated" string in the MIGRATE_REASON trace
macro, and updating the GDB page_owner script to use MR_NEVER instead
of the hardcoded -1 so that lx-dump-page-owner does not incorrectly
report unmigrated pages as migrated.

No functional change.

Signed-off-by: Ye Liu <ye.liu@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 include/linux/migrate_mode.h    | 1 +
 include/trace/events/migrate.h  | 3 ++-
 mm/page_owner.c                 | 8 ++++----
 scripts/gdb/linux/page_owner.py | 4 +++-
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index 265c4328b36a..05102d4d2490 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -25,6 +25,7 @@ enum migrate_reason {
 	MR_LONGTERM_PIN,
 	MR_DEMOTION,
 	MR_DAMON,
+	MR_NEVER,		/* page has never been migrated */
 	MR_TYPES
 };
 
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index cd01dd7b3640..11bc0aa14c7e 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -23,7 +23,8 @@
 	EM( MR_CONTIG_RANGE,	"contig_range")			\
 	EM( MR_LONGTERM_PIN,	"longterm_pin")			\
 	EM( MR_DEMOTION,	"demotion")			\
-	EMe(MR_DAMON,		"damon")
+	EM( MR_DAMON,		"damon")			\
+	EMe(MR_NEVER,		"never_migrated")
 
 /*
  * First define the enums in the above macros to be exported to userspace
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 342549891a8d..c2f43ab860eb 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -339,7 +339,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
 	depot_stack_handle_t handle;
 
 	handle = save_stack(gfp_mask);
-	__update_page_owner_handle(page, handle, order, gfp_mask, -1,
+	__update_page_owner_handle(page, handle, order, gfp_mask, MR_NEVER,
 				   ts_nsec, current->pid, current->tgid,
 				   current->comm);
 	inc_stack_record_count(handle, gfp_mask, 1 << order);
@@ -596,7 +596,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
 	if (ret >= count)
 		goto err;
 
-	if (page_owner->last_migrate_reason != -1) {
+	if (page_owner->last_migrate_reason != MR_NEVER) {
 		ret += scnprintf(kbuf + ret, count - ret,
 			"Page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
@@ -667,7 +667,7 @@ void __dump_page_owner(const struct page *page)
 		stack_depot_print(handle);
 	}
 
-	if (page_owner->last_migrate_reason != -1)
+	if (page_owner->last_migrate_reason != MR_NEVER)
 		pr_alert("page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
 	page_ext_put(page_ext);
@@ -826,7 +826,7 @@ static void init_pages_in_zone(struct zone *zone)
 
 			/* Found early allocated page */
 			__update_page_owner_handle(page, early_handle, 0, 0,
-						   -1, local_clock(), current->pid,
+						   MR_NEVER, local_clock(), current->pid,
 						   current->tgid, current->comm);
 			count++;
 ext_put_continue:
diff --git a/scripts/gdb/linux/page_owner.py b/scripts/gdb/linux/page_owner.py
index 8e713a09cfe7..eeabaeed438b 100644
--- a/scripts/gdb/linux/page_owner.py
+++ b/scripts/gdb/linux/page_owner.py
@@ -34,6 +34,7 @@ class DumpPageOwner(gdb.Command):
     max_pfn = None
     p_ops = None
     migrate_reason_names = None
+    mr_never = None
 
     def __init__(self):
         super(DumpPageOwner, self).__init__("lx-dump-page-owner", gdb.COMMAND_SUPPORT)
@@ -65,6 +66,7 @@ class DumpPageOwner(gdb.Command):
         self.max_pfn = int(gdb.parse_and_eval("max_pfn"))
         self.page_ext_size = int(gdb.parse_and_eval("page_ext_size"))
         self.migrate_reason_names = gdb.parse_and_eval('migrate_reason_names')
+        self.mr_never = int(gdb.parse_and_eval('MR_NEVER'))
 
     def page_ext_invalid(self, page_ext):
         if page_ext == gdb.Value(0):
@@ -138,7 +140,7 @@ class DumpPageOwner(gdb.Command):
         else:
             gdb.write('page last free stack trace:\n')
             stackdepot.stack_depot_print(page_owner["free_handle"])
-        if page_owner['last_migrate_reason'] != -1:
+        if page_owner['last_migrate_reason'] != self.mr_never:
             gdb.write('page has been migrated, last migrate reason: %s\n' % self.migrate_reason_names[page_owner['last_migrate_reason']])
 
     def read_page_owner(self):
-- 
2.43.0


^ permalink raw reply related

* [PATCH v4 3/7] mm: use enum migrate_reason instead of int for migration reason parameters
From: Ye Liu @ 2026-07-01  1:22 UTC (permalink / raw)
  To: Muchun Song, Oscar Salvador, Andrew Morton, David Hildenbrand,
	Steven Rostedt, Masami Hiramatsu, Vlastimil Babka
  Cc: Ye Liu, Zi Yan, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Mathieu Desnoyers,
	Brendan Jackman, Johannes Weiner, linux-mm, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260701012239.315262-1-ye.liu@linux.dev>

Replace all 'int reason' function parameters that carry migrate_reason
values with the proper 'enum migrate_reason' type.  This makes the
intent explicit and leverages compiler type checking.  The affected
subsystems are:

  - page_owner: __folio_set_owner_migrate_reason(),
                folio_set_owner_migrate_reason()
  - migrate: migrate_pages(), migrate_pages_sync(),
             migrate_pages_batch(), migrate_folios_move(),
             migrate_hugetlbs(), unmap_and_move_huge_page()
  - hugetlb: move_hugetlb_state(), htlb_allow_alloc_fallback()
  - trace: mm_migrate_pages and mm_migrate_pages_start events

The 'short last_migrate_reason' struct field and internal helper
parameter in page_owner are intentionally left as 'short' since they
store per-page metadata where size matters.

No functional change.

Signed-off-by: Ye Liu <ye.liu@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 include/linux/hugetlb.h        |  9 +++++----
 include/linux/migrate.h        |  6 ++++--
 include/linux/page_owner.h     |  7 ++++---
 include/trace/events/migrate.h |  8 ++++----
 mm/hugetlb.c                   |  3 ++-
 mm/migrate.c                   | 12 ++++++------
 mm/page_owner.c                |  2 +-
 7 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2abaf99321e9..fa828232dfcc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -154,7 +154,8 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
 bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list);
 int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison);
 void folio_putback_hugetlb(struct folio *folio);
-void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason);
+void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio,
+			enum migrate_reason reason);
 void hugetlb_fix_reserve_counts(struct inode *inode);
 extern struct mutex *hugetlb_fault_mutex_table;
 u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
@@ -424,7 +425,7 @@ static inline void folio_putback_hugetlb(struct folio *folio)
 }
 
 static inline void move_hugetlb_state(struct folio *old_folio,
-					struct folio *new_folio, int reason)
+					struct folio *new_folio, enum migrate_reason reason)
 {
 }
 
@@ -956,7 +957,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
 	return modified_mask;
 }
 
-static inline bool htlb_allow_alloc_fallback(int reason)
+static inline bool htlb_allow_alloc_fallback(enum migrate_reason reason)
 {
 	bool allowed_fallback = false;
 
@@ -1238,7 +1239,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
 	return 0;
 }
 
-static inline bool htlb_allow_alloc_fallback(int reason)
+static inline bool htlb_allow_alloc_fallback(enum migrate_reason reason)
 {
 	return false;
 }
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index d5af2b7f577b..1f83924615d6 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -57,7 +57,8 @@ void putback_movable_pages(struct list_head *l);
 int migrate_folio(struct address_space *mapping, struct folio *dst,
 		struct folio *src, enum migrate_mode mode);
 int migrate_pages(struct list_head *l, new_folio_t new, free_folio_t free,
-		  unsigned long private, enum migrate_mode mode, int reason,
+		  unsigned long private, enum migrate_mode mode,
+		  enum migrate_reason reason,
 		  unsigned int *ret_succeeded);
 struct folio *alloc_migration_target(struct folio *src, unsigned long private);
 bool isolate_movable_ops_page(struct page *page, isolate_mode_t mode);
@@ -77,7 +78,8 @@ int set_movable_ops(const struct movable_operations *ops, enum pagetype type);
 static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_folio_t new,
 		free_folio_t free, unsigned long private,
-		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+		enum migrate_mode mode, enum migrate_reason reason,
+		unsigned int *ret_succeeded)
 	{ return -ENOSYS; }
 static inline struct folio *alloc_migration_target(struct folio *src,
 		unsigned long private)
diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 3328357f6dba..9fe51dfccf26 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -3,6 +3,7 @@
 #define __LINUX_PAGE_OWNER_H
 
 #include <linux/jump_label.h>
+#include <linux/migrate_mode.h>
 
 #ifdef CONFIG_PAGE_OWNER
 extern struct static_key_false page_owner_inited;
@@ -14,7 +15,7 @@ extern void __set_page_owner(struct page *page,
 extern void __split_page_owner(struct page *page, int old_order,
 			int new_order);
 extern void __folio_copy_owner(struct folio *newfolio, struct folio *old);
-extern void __folio_set_owner_migrate_reason(struct folio *folio, int reason);
+extern void __folio_set_owner_migrate_reason(struct folio *folio, enum migrate_reason reason);
 extern void __dump_page_owner(const struct page *page);
 extern void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 					pg_data_t *pgdat, struct zone *zone);
@@ -43,7 +44,7 @@ static inline void folio_copy_owner(struct folio *newfolio, struct folio *old)
 	if (static_branch_unlikely(&page_owner_inited))
 		__folio_copy_owner(newfolio, old);
 }
-static inline void folio_set_owner_migrate_reason(struct folio *folio, int reason)
+static inline void folio_set_owner_migrate_reason(struct folio *folio, enum migrate_reason reason)
 {
 	if (static_branch_unlikely(&page_owner_inited))
 		__folio_set_owner_migrate_reason(folio, reason);
@@ -68,7 +69,7 @@ static inline void split_page_owner(struct page *page, int old_order,
 static inline void folio_copy_owner(struct folio *newfolio, struct folio *folio)
 {
 }
-static inline void folio_set_owner_migrate_reason(struct folio *folio, int reason)
+static inline void folio_set_owner_migrate_reason(struct folio *folio, enum migrate_reason reason)
 {
 }
 static inline void dump_page_owner(const struct page *page)
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index 11bc0aa14c7e..15ee2ef201b5 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -52,7 +52,7 @@ TRACE_EVENT(mm_migrate_pages,
 	TP_PROTO(unsigned long succeeded, unsigned long failed,
 		 unsigned long thp_succeeded, unsigned long thp_failed,
 		 unsigned long thp_split, unsigned long large_folio_split,
-		 enum migrate_mode mode, int reason),
+		 enum migrate_mode mode, enum migrate_reason reason),
 
 	TP_ARGS(succeeded, failed, thp_succeeded, thp_failed,
 		thp_split, large_folio_split, mode, reason),
@@ -65,7 +65,7 @@ TRACE_EVENT(mm_migrate_pages,
 		__field(	unsigned long,		thp_split)
 		__field(	unsigned long,		large_folio_split)
 		__field(	enum migrate_mode,	mode)
-		__field(	int,			reason)
+		__field(	enum migrate_reason,	reason)
 	),
 
 	TP_fast_assign(
@@ -92,13 +92,13 @@ TRACE_EVENT(mm_migrate_pages,
 
 TRACE_EVENT(mm_migrate_pages_start,
 
-	TP_PROTO(enum migrate_mode mode, int reason),
+	TP_PROTO(enum migrate_mode mode, enum migrate_reason reason),
 
 	TP_ARGS(mode, reason),
 
 	TP_STRUCT__entry(
 		__field(enum migrate_mode, mode)
-		__field(int, reason)
+		__field(enum migrate_reason, reason)
 	),
 
 	TP_fast_assign(
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 571212b80835..17732d1fdc5e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7182,7 +7182,8 @@ void folio_putback_hugetlb(struct folio *folio)
 	folio_put(folio);
 }
 
-void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason)
+void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio,
+			enum migrate_reason reason)
 {
 	struct hstate *h = folio_hstate(old_folio);
 
diff --git a/mm/migrate.c b/mm/migrate.c
index d9b23909d716..49e10feeb094 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1469,7 +1469,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 static int unmap_and_move_huge_page(new_folio_t get_new_folio,
 		free_folio_t put_new_folio, unsigned long private,
 		struct folio *src, int force, enum migrate_mode mode,
-		int reason, struct list_head *ret)
+		enum migrate_reason reason, struct list_head *ret)
 {
 	struct folio *dst;
 	int rc = -EAGAIN;
@@ -1626,7 +1626,7 @@ struct migrate_pages_stats {
  */
 static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio,
 			    free_folio_t put_new_folio, unsigned long private,
-			    enum migrate_mode mode, int reason,
+			    enum migrate_mode mode, enum migrate_reason reason,
 			    struct migrate_pages_stats *stats,
 			    struct list_head *ret_folios)
 {
@@ -1716,7 +1716,7 @@ static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio,
 static void migrate_folios_move(struct list_head *src_folios,
 		struct list_head *dst_folios,
 		free_folio_t put_new_folio, unsigned long private,
-		enum migrate_mode mode, int reason,
+		enum migrate_mode mode, enum migrate_reason reason,
 		struct list_head *ret_folios,
 		struct migrate_pages_stats *stats,
 		int *retry, int *thp_retry, int *nr_failed,
@@ -1799,7 +1799,7 @@ static void migrate_folios_undo(struct list_head *src_folios,
  */
 static int migrate_pages_batch(struct list_head *from,
 		new_folio_t get_new_folio, free_folio_t put_new_folio,
-		unsigned long private, enum migrate_mode mode, int reason,
+		unsigned long private, enum migrate_mode mode, enum migrate_reason reason,
 		struct list_head *ret_folios, struct list_head *split_folios,
 		struct migrate_pages_stats *stats, int nr_pass)
 {
@@ -2011,7 +2011,7 @@ static int migrate_pages_batch(struct list_head *from,
 
 static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio,
 		free_folio_t put_new_folio, unsigned long private,
-		enum migrate_mode mode, int reason,
+		enum migrate_mode mode, enum migrate_reason reason,
 		struct list_head *ret_folios, struct list_head *split_folios,
 		struct migrate_pages_stats *stats)
 {
@@ -2088,7 +2088,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio,
  */
 int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
 		free_folio_t put_new_folio, unsigned long private,
-		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+		enum migrate_mode mode, enum migrate_reason reason, unsigned int *ret_succeeded)
 {
 	int rc, rc_gather;
 	int nr_pages;
diff --git a/mm/page_owner.c b/mm/page_owner.c
index c2f43ab860eb..4e352941a6e2 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -345,7 +345,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
 	inc_stack_record_count(handle, gfp_mask, 1 << order);
 }
 
-void __folio_set_owner_migrate_reason(struct folio *folio, int reason)
+void __folio_set_owner_migrate_reason(struct folio *folio, enum migrate_reason reason)
 {
 	struct page_ext *page_ext = page_ext_get(&folio->page);
 	struct page_owner *page_owner;
-- 
2.43.0


^ permalink raw reply related

* Re: [syzbot] [trace?] general protection fault in mtree_load
From: syzbot @ 2026-07-01  1:32 UTC (permalink / raw)
  To: bp, dave.hansen, hpa, linux-kernel, linux-trace-kernel, mhiramat,
	mingo, oleg, olsajiri, peterz, syzkaller-bugs, tglx, x86
In-Reply-To: <6a38dd47.713c5d62.148f7.000c.GAE@google.com>

syzbot has found a reproducer for the following issue on:

HEAD commit:    dc59e4fea9d8 Linux 7.2-rc1
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15c7d61c580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f9bf5d2bfae96234
dashboard link: https://syzkaller.appspot.com/bug?extid=61ce80689253f42e6d80
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12bbb11c580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=130bf4ea580000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-dc59e4fe.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/1bc8aed8d2e8/vmlinux-dc59e4fe.xz
kernel image: https://storage.googleapis.com/syzbot-assets/0b1fdfc4aa09/bzImage-dc59e4fe.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+61ce80689253f42e6d80@syzkaller.appspotmail.com

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f]
CPU: 3 UID: 0 PID: 6107 Comm: syz.0.85 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:mas_root lib/maple_tree.c:759 [inline]
RIP: 0010:mas_start lib/maple_tree.c:1179 [inline]
RIP: 0010:mtree_load+0x16d/0xa90 lib/maple_tree.c:5657
Code: 00 00 00 00 48 c7 44 24 78 ff ff ff ff e8 5b c8 74 f6 48 8b 5c 24 50 c6 84 24 9c 00 00 00 00 48 8d 7b 48 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 d6 08 00 00 48 8b 5b 48 e8 3f 1a 08 00 31 ff
RSP: 0018:ffffc9000412f740 EFLAGS: 00010206
RAX: 0000000000000011 RBX: 0000000000000040 RCX: ffffffff8b94b796
RDX: ffff888035462540 RSI: ffffffff8b94b7c5 RDI: 0000000000000088
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff888013522280 R14: 0000200000ffc007 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880d63e0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc8aaa4cff8 CR3: 0000000038832000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 vma_lookup include/linux/mm.h:4238 [inline]
 __in_uprobe_trampoline arch/x86/kernel/uprobes.c:766 [inline]
 __is_optimized arch/x86/kernel/uprobes.c:1056 [inline]
 is_optimized arch/x86/kernel/uprobes.c:1067 [inline]
 set_orig_insn+0x1ec/0x2a0 arch/x86/kernel/uprobes.c:1098
 remove_breakpoint kernel/events/uprobes.c:1185 [inline]
 register_for_each_vma+0xbb7/0xdb0 kernel/events/uprobes.c:1318
 uprobe_unregister_nosync+0x12a/0x1c0 kernel/events/uprobes.c:1343
 bpf_uprobe_unregister kernel/trace/bpf_trace.c:2982 [inline]
 bpf_uprobe_multi_link_release+0xb3/0x1c0 kernel/trace/bpf_trace.c:2993
 bpf_link_free+0xec/0x4a0 kernel/bpf/syscall.c:3395
 bpf_link_put_direct kernel/bpf/syscall.c:3448 [inline]
 bpf_link_release+0x5d/0x80 kernel/bpf/syscall.c:3455
 __fput+0x3ff/0xb50 fs/file_table.c:512
 task_work_run+0x150/0x240 kernel/task_work.c:233
 exit_task_work include/linux/task_work.h:40 [inline]
 do_exit+0x951/0x2ae0 kernel/exit.c:1004
 do_group_exit+0xd5/0x2a0 kernel/exit.c:1147
 get_signal+0x1ec7/0x21e0 kernel/signal.c:3038
 arch_do_signal_or_restart+0x91/0x7e0 arch/x86/kernel/signal.c:337
 __exit_to_user_mode_loop kernel/entry/common.c:66 [inline]
 exit_to_user_mode_loop+0x139/0x6f0 kernel/entry/common.c:101
 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline]
 syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline]
 do_syscall_64+0x666/0x870 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fc8a9b9de59
Code: Unable to access opcode bytes at 0x7fc8a9b9de2f.
RSP: 002b:00007fc8aaa4d0e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007fc8a9e25fa8 RCX: 00007fc8a9b9de59
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007fc8a9e25fa8
RBP: 00007fc8a9e25fa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fc8a9e26038 R14: 00007fff99db45c0 R15: 00007fff99db46a8
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:mas_root lib/maple_tree.c:759 [inline]
RIP: 0010:mas_start lib/maple_tree.c:1179 [inline]
RIP: 0010:mtree_load+0x16d/0xa90 lib/maple_tree.c:5657
Code: 00 00 00 00 48 c7 44 24 78 ff ff ff ff e8 5b c8 74 f6 48 8b 5c 24 50 c6 84 24 9c 00 00 00 00 48 8d 7b 48 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 d6 08 00 00 48 8b 5b 48 e8 3f 1a 08 00 31 ff
RSP: 0018:ffffc9000412f740 EFLAGS: 00010206

RAX: 0000000000000011 RBX: 0000000000000040 RCX: ffffffff8b94b796
RDX: ffff888035462540 RSI: ffffffff8b94b7c5 RDI: 0000000000000088
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff888013522280 R14: 0000200000ffc007 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880d63e0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc8aaa4cff8 CR3: 0000000038832000 CR4: 0000000000352ef0
----------------
Code disassembly (best guess):
   0:	00 00                	add    %al,(%rax)
   2:	00 00                	add    %al,(%rax)
   4:	48 c7 44 24 78 ff ff 	movq   $0xffffffffffffffff,0x78(%rsp)
   b:	ff ff
   d:	e8 5b c8 74 f6       	call   0xf674c86d
  12:	48 8b 5c 24 50       	mov    0x50(%rsp),%rbx
  17:	c6 84 24 9c 00 00 00 	movb   $0x0,0x9c(%rsp)
  1e:	00
  1f:	48 8d 7b 48          	lea    0x48(%rbx),%rdi
  23:	48 89 f8             	mov    %rdi,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	42 80 3c 20 00       	cmpb   $0x0,(%rax,%r12,1) <-- trapping instruction
  2f:	0f 85 d6 08 00 00    	jne    0x90b
  35:	48 8b 5b 48          	mov    0x48(%rbx),%rbx
  39:	e8 3f 1a 08 00       	call   0x81a7d
  3e:	31 ff                	xor    %edi,%edi


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply

* [PATCH v5 2/9] mm/page_owner: add MR_NEVER to enum migrate_reason and use it for last_migrate_reason
From: Ye Liu @ 2026-07-01  6:10 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Steven Rostedt,
	Masami Hiramatsu, Vlastimil Babka, Jan Kiszka, Kieran Bingham
  Cc: Ye Liu, Zi Yan, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Mathieu Desnoyers, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, linux-mm, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260701061101.344679-1-ye.liu@linux.dev>

The last_migrate_reason field uses -1 as a sentinel value to mean "no
migration has happened".  Replace the four bare -1 occurrences by
adding a proper MR_NEVER member to enum migrate_reason, defining a
corresponding "never_migrated" string in the MIGRATE_REASON trace
macro, and updating the GDB page_owner script to use MR_NEVER instead
of the hardcoded -1 so that lx-dump-page-owner does not incorrectly
report unmigrated pages as migrated.

No functional change.

Signed-off-by: Ye Liu <ye.liu@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 include/linux/migrate_mode.h    | 1 +
 include/trace/events/migrate.h  | 3 ++-
 mm/page_owner.c                 | 8 ++++----
 scripts/gdb/linux/page_owner.py | 4 +++-
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index 265c4328b36a..05102d4d2490 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -25,6 +25,7 @@ enum migrate_reason {
 	MR_LONGTERM_PIN,
 	MR_DEMOTION,
 	MR_DAMON,
+	MR_NEVER,		/* page has never been migrated */
 	MR_TYPES
 };
 
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index cd01dd7b3640..11bc0aa14c7e 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -23,7 +23,8 @@
 	EM( MR_CONTIG_RANGE,	"contig_range")			\
 	EM( MR_LONGTERM_PIN,	"longterm_pin")			\
 	EM( MR_DEMOTION,	"demotion")			\
-	EMe(MR_DAMON,		"damon")
+	EM( MR_DAMON,		"damon")			\
+	EMe(MR_NEVER,		"never_migrated")
 
 /*
  * First define the enums in the above macros to be exported to userspace
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 342549891a8d..c2f43ab860eb 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -339,7 +339,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
 	depot_stack_handle_t handle;
 
 	handle = save_stack(gfp_mask);
-	__update_page_owner_handle(page, handle, order, gfp_mask, -1,
+	__update_page_owner_handle(page, handle, order, gfp_mask, MR_NEVER,
 				   ts_nsec, current->pid, current->tgid,
 				   current->comm);
 	inc_stack_record_count(handle, gfp_mask, 1 << order);
@@ -596,7 +596,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
 	if (ret >= count)
 		goto err;
 
-	if (page_owner->last_migrate_reason != -1) {
+	if (page_owner->last_migrate_reason != MR_NEVER) {
 		ret += scnprintf(kbuf + ret, count - ret,
 			"Page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
@@ -667,7 +667,7 @@ void __dump_page_owner(const struct page *page)
 		stack_depot_print(handle);
 	}
 
-	if (page_owner->last_migrate_reason != -1)
+	if (page_owner->last_migrate_reason != MR_NEVER)
 		pr_alert("page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
 	page_ext_put(page_ext);
@@ -826,7 +826,7 @@ static void init_pages_in_zone(struct zone *zone)
 
 			/* Found early allocated page */
 			__update_page_owner_handle(page, early_handle, 0, 0,
-						   -1, local_clock(), current->pid,
+						   MR_NEVER, local_clock(), current->pid,
 						   current->tgid, current->comm);
 			count++;
 ext_put_continue:
diff --git a/scripts/gdb/linux/page_owner.py b/scripts/gdb/linux/page_owner.py
index 8e713a09cfe7..eeabaeed438b 100644
--- a/scripts/gdb/linux/page_owner.py
+++ b/scripts/gdb/linux/page_owner.py
@@ -34,6 +34,7 @@ class DumpPageOwner(gdb.Command):
     max_pfn = None
     p_ops = None
     migrate_reason_names = None
+    mr_never = None
 
     def __init__(self):
         super(DumpPageOwner, self).__init__("lx-dump-page-owner", gdb.COMMAND_SUPPORT)
@@ -65,6 +66,7 @@ class DumpPageOwner(gdb.Command):
         self.max_pfn = int(gdb.parse_and_eval("max_pfn"))
         self.page_ext_size = int(gdb.parse_and_eval("page_ext_size"))
         self.migrate_reason_names = gdb.parse_and_eval('migrate_reason_names')
+        self.mr_never = int(gdb.parse_and_eval('MR_NEVER'))
 
     def page_ext_invalid(self, page_ext):
         if page_ext == gdb.Value(0):
@@ -138,7 +140,7 @@ class DumpPageOwner(gdb.Command):
         else:
             gdb.write('page last free stack trace:\n')
             stackdepot.stack_depot_print(page_owner["free_handle"])
-        if page_owner['last_migrate_reason'] != -1:
+        if page_owner['last_migrate_reason'] != self.mr_never:
             gdb.write('page has been migrated, last migrate reason: %s\n' % self.migrate_reason_names[page_owner['last_migrate_reason']])
 
     def read_page_owner(self):
-- 
2.43.0


^ permalink raw reply related

* [PATCH v5 3/9] mm: use enum migrate_reason instead of int for migration reason parameters
From: Ye Liu @ 2026-07-01  6:10 UTC (permalink / raw)
  To: Muchun Song, Oscar Salvador, Andrew Morton, David Hildenbrand,
	Steven Rostedt, Masami Hiramatsu, Vlastimil Babka
  Cc: Ye Liu, Zi Yan, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Mathieu Desnoyers,
	Brendan Jackman, Johannes Weiner, linux-mm, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260701061101.344679-1-ye.liu@linux.dev>

Replace all 'int reason' function parameters that carry migrate_reason
values with the proper 'enum migrate_reason' type.  This makes the
intent explicit and leverages compiler type checking.  The affected
subsystems are:

  - page_owner: __folio_set_owner_migrate_reason(),
                folio_set_owner_migrate_reason()
  - migrate: migrate_pages(), migrate_pages_sync(),
             migrate_pages_batch(), migrate_folios_move(),
             migrate_hugetlbs(), unmap_and_move_huge_page()
  - hugetlb: move_hugetlb_state(), htlb_allow_alloc_fallback()
  - trace: mm_migrate_pages and mm_migrate_pages_start events

The 'short last_migrate_reason' struct field and internal helper
parameter in page_owner are intentionally left as 'short' since they
store per-page metadata where size matters.

No functional change.

Signed-off-by: Ye Liu <ye.liu@linux.dev>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
---
 include/linux/hugetlb.h        |  9 +++++----
 include/linux/migrate.h        |  6 ++++--
 include/linux/page_owner.h     |  7 ++++---
 include/trace/events/migrate.h |  8 ++++----
 mm/hugetlb.c                   |  3 ++-
 mm/migrate.c                   | 12 ++++++------
 mm/page_owner.c                |  2 +-
 7 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2abaf99321e9..fa828232dfcc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -154,7 +154,8 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
 bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list);
 int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison);
 void folio_putback_hugetlb(struct folio *folio);
-void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason);
+void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio,
+			enum migrate_reason reason);
 void hugetlb_fix_reserve_counts(struct inode *inode);
 extern struct mutex *hugetlb_fault_mutex_table;
 u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
@@ -424,7 +425,7 @@ static inline void folio_putback_hugetlb(struct folio *folio)
 }
 
 static inline void move_hugetlb_state(struct folio *old_folio,
-					struct folio *new_folio, int reason)
+					struct folio *new_folio, enum migrate_reason reason)
 {
 }
 
@@ -956,7 +957,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
 	return modified_mask;
 }
 
-static inline bool htlb_allow_alloc_fallback(int reason)
+static inline bool htlb_allow_alloc_fallback(enum migrate_reason reason)
 {
 	bool allowed_fallback = false;
 
@@ -1238,7 +1239,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
 	return 0;
 }
 
-static inline bool htlb_allow_alloc_fallback(int reason)
+static inline bool htlb_allow_alloc_fallback(enum migrate_reason reason)
 {
 	return false;
 }
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index d5af2b7f577b..1f83924615d6 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -57,7 +57,8 @@ void putback_movable_pages(struct list_head *l);
 int migrate_folio(struct address_space *mapping, struct folio *dst,
 		struct folio *src, enum migrate_mode mode);
 int migrate_pages(struct list_head *l, new_folio_t new, free_folio_t free,
-		  unsigned long private, enum migrate_mode mode, int reason,
+		  unsigned long private, enum migrate_mode mode,
+		  enum migrate_reason reason,
 		  unsigned int *ret_succeeded);
 struct folio *alloc_migration_target(struct folio *src, unsigned long private);
 bool isolate_movable_ops_page(struct page *page, isolate_mode_t mode);
@@ -77,7 +78,8 @@ int set_movable_ops(const struct movable_operations *ops, enum pagetype type);
 static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_folio_t new,
 		free_folio_t free, unsigned long private,
-		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+		enum migrate_mode mode, enum migrate_reason reason,
+		unsigned int *ret_succeeded)
 	{ return -ENOSYS; }
 static inline struct folio *alloc_migration_target(struct folio *src,
 		unsigned long private)
diff --git a/include/linux/page_owner.h b/include/linux/page_owner.h
index 3328357f6dba..9fe51dfccf26 100644
--- a/include/linux/page_owner.h
+++ b/include/linux/page_owner.h
@@ -3,6 +3,7 @@
 #define __LINUX_PAGE_OWNER_H
 
 #include <linux/jump_label.h>
+#include <linux/migrate_mode.h>
 
 #ifdef CONFIG_PAGE_OWNER
 extern struct static_key_false page_owner_inited;
@@ -14,7 +15,7 @@ extern void __set_page_owner(struct page *page,
 extern void __split_page_owner(struct page *page, int old_order,
 			int new_order);
 extern void __folio_copy_owner(struct folio *newfolio, struct folio *old);
-extern void __folio_set_owner_migrate_reason(struct folio *folio, int reason);
+extern void __folio_set_owner_migrate_reason(struct folio *folio, enum migrate_reason reason);
 extern void __dump_page_owner(const struct page *page);
 extern void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 					pg_data_t *pgdat, struct zone *zone);
@@ -43,7 +44,7 @@ static inline void folio_copy_owner(struct folio *newfolio, struct folio *old)
 	if (static_branch_unlikely(&page_owner_inited))
 		__folio_copy_owner(newfolio, old);
 }
-static inline void folio_set_owner_migrate_reason(struct folio *folio, int reason)
+static inline void folio_set_owner_migrate_reason(struct folio *folio, enum migrate_reason reason)
 {
 	if (static_branch_unlikely(&page_owner_inited))
 		__folio_set_owner_migrate_reason(folio, reason);
@@ -68,7 +69,7 @@ static inline void split_page_owner(struct page *page, int old_order,
 static inline void folio_copy_owner(struct folio *newfolio, struct folio *folio)
 {
 }
-static inline void folio_set_owner_migrate_reason(struct folio *folio, int reason)
+static inline void folio_set_owner_migrate_reason(struct folio *folio, enum migrate_reason reason)
 {
 }
 static inline void dump_page_owner(const struct page *page)
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index 11bc0aa14c7e..15ee2ef201b5 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -52,7 +52,7 @@ TRACE_EVENT(mm_migrate_pages,
 	TP_PROTO(unsigned long succeeded, unsigned long failed,
 		 unsigned long thp_succeeded, unsigned long thp_failed,
 		 unsigned long thp_split, unsigned long large_folio_split,
-		 enum migrate_mode mode, int reason),
+		 enum migrate_mode mode, enum migrate_reason reason),
 
 	TP_ARGS(succeeded, failed, thp_succeeded, thp_failed,
 		thp_split, large_folio_split, mode, reason),
@@ -65,7 +65,7 @@ TRACE_EVENT(mm_migrate_pages,
 		__field(	unsigned long,		thp_split)
 		__field(	unsigned long,		large_folio_split)
 		__field(	enum migrate_mode,	mode)
-		__field(	int,			reason)
+		__field(	enum migrate_reason,	reason)
 	),
 
 	TP_fast_assign(
@@ -92,13 +92,13 @@ TRACE_EVENT(mm_migrate_pages,
 
 TRACE_EVENT(mm_migrate_pages_start,
 
-	TP_PROTO(enum migrate_mode mode, int reason),
+	TP_PROTO(enum migrate_mode mode, enum migrate_reason reason),
 
 	TP_ARGS(mode, reason),
 
 	TP_STRUCT__entry(
 		__field(enum migrate_mode, mode)
-		__field(int, reason)
+		__field(enum migrate_reason, reason)
 	),
 
 	TP_fast_assign(
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 571212b80835..17732d1fdc5e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7182,7 +7182,8 @@ void folio_putback_hugetlb(struct folio *folio)
 	folio_put(folio);
 }
 
-void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason)
+void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio,
+			enum migrate_reason reason)
 {
 	struct hstate *h = folio_hstate(old_folio);
 
diff --git a/mm/migrate.c b/mm/migrate.c
index d9b23909d716..49e10feeb094 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1469,7 +1469,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 static int unmap_and_move_huge_page(new_folio_t get_new_folio,
 		free_folio_t put_new_folio, unsigned long private,
 		struct folio *src, int force, enum migrate_mode mode,
-		int reason, struct list_head *ret)
+		enum migrate_reason reason, struct list_head *ret)
 {
 	struct folio *dst;
 	int rc = -EAGAIN;
@@ -1626,7 +1626,7 @@ struct migrate_pages_stats {
  */
 static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio,
 			    free_folio_t put_new_folio, unsigned long private,
-			    enum migrate_mode mode, int reason,
+			    enum migrate_mode mode, enum migrate_reason reason,
 			    struct migrate_pages_stats *stats,
 			    struct list_head *ret_folios)
 {
@@ -1716,7 +1716,7 @@ static int migrate_hugetlbs(struct list_head *from, new_folio_t get_new_folio,
 static void migrate_folios_move(struct list_head *src_folios,
 		struct list_head *dst_folios,
 		free_folio_t put_new_folio, unsigned long private,
-		enum migrate_mode mode, int reason,
+		enum migrate_mode mode, enum migrate_reason reason,
 		struct list_head *ret_folios,
 		struct migrate_pages_stats *stats,
 		int *retry, int *thp_retry, int *nr_failed,
@@ -1799,7 +1799,7 @@ static void migrate_folios_undo(struct list_head *src_folios,
  */
 static int migrate_pages_batch(struct list_head *from,
 		new_folio_t get_new_folio, free_folio_t put_new_folio,
-		unsigned long private, enum migrate_mode mode, int reason,
+		unsigned long private, enum migrate_mode mode, enum migrate_reason reason,
 		struct list_head *ret_folios, struct list_head *split_folios,
 		struct migrate_pages_stats *stats, int nr_pass)
 {
@@ -2011,7 +2011,7 @@ static int migrate_pages_batch(struct list_head *from,
 
 static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio,
 		free_folio_t put_new_folio, unsigned long private,
-		enum migrate_mode mode, int reason,
+		enum migrate_mode mode, enum migrate_reason reason,
 		struct list_head *ret_folios, struct list_head *split_folios,
 		struct migrate_pages_stats *stats)
 {
@@ -2088,7 +2088,7 @@ static int migrate_pages_sync(struct list_head *from, new_folio_t get_new_folio,
  */
 int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
 		free_folio_t put_new_folio, unsigned long private,
-		enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
+		enum migrate_mode mode, enum migrate_reason reason, unsigned int *ret_succeeded)
 {
 	int rc, rc_gather;
 	int nr_pages;
diff --git a/mm/page_owner.c b/mm/page_owner.c
index c2f43ab860eb..4e352941a6e2 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -345,7 +345,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
 	inc_stack_record_count(handle, gfp_mask, 1 << order);
 }
 
-void __folio_set_owner_migrate_reason(struct folio *folio, int reason)
+void __folio_set_owner_migrate_reason(struct folio *folio, enum migrate_reason reason)
 {
 	struct page_ext *page_ext = page_ext_get(&folio->page);
 	struct page_owner *page_owner;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v3] ufs: core: add hba parameter to trace events
From: Peter Wang (王信友) @ 2026-07-01  6:11 UTC (permalink / raw)
  To: rostedt@goodmis.org
  Cc: linux-trace-kernel@vger.kernel.org,
	CC Chou (周志杰), jejb@linux.ibm.com,
	bvanassche@acm.org, linux-scsi@vger.kernel.org,
	linux-mediatek@lists.infradead.org,
	Chaotian Jing (井朝天),
	Eddie Huang (黃智傑),
	Qilin Tan (谭麒麟), Lin Gui (桂林),
	Yi-fan Peng (彭羿凡), alim.akhtar@samsung.com,
	Jiajie Hao (郝加节),
	Naomi Chu (朱詠田),
	Alice Chao (趙珮均),
	Ed Tsai (蔡宗軒), wsd_upstream,
	avri.altman@wdc.com, martin.petersen@oracle.com,
	Chun-Hung Wu (巫駿宏),
	Tun-yu Yu (游敦聿)
In-Reply-To: <20260630174949.16a9d867@gandalf.local.home>

On Tue, 2026-06-30 at 17:49 -0400, Steven Rostedt wrote:
> On Tue, 30 Jun 2026 16:56:12 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > > 
> > >     TP_printk("%s: gating state changed to %s",
> > > -           __get_str(dev_name),
> > > +           dev_name(__entry->hba->dev),
> > 
> > NO YOU CAN NOT DO THIS!!!!
> 
> This is why you should always Cc
> linux-trace-kernel@vger.kernel.org on any
> trace event updates. We look to catch bugs like this.
> 
> The below patch should fix it, and I'll send it as a proper patch
> soon:

Hi Steven,

Thank you for the reminder and for fixing this bug.
This was indeed not thoroughly considered.

However, I am curious: if the HBA is removed, implying that the 
storage would become unusable, might the system encounter an 
I/O hang or shutdown, potentially preventing its detection? 
Perhaps it's a theoretical issue that would not manifest 
in a real-world situation?

Thanks
Peter



^ permalink raw reply

* Re: [PATCH] ufs: core: tracing: Do not dereference pointers in TP_printk()
From: Peter Wang (王信友) @ 2026-07-01  6:12 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org, rostedt@goodmis.org,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
  Cc: mhiramat@kernel.org, James.Bottomley@HansenPartnership.com,
	avri.altman@sandisk.com, alim.akhtar@samsung.com,
	mathieu.desnoyers@efficios.com, martin.petersen@oracle.com,
	bvanassche@acm.org
In-Reply-To: <20260630185412.283c26c5@gandalf.local.home>

On Tue, 2026-06-30 at 18:54 -0400, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> The trace events in drivers/ufs/core/ufs_trace.h were converted to
> take a
> pointer to the hba structure as an argument for the tracepoint and
> then in
> TP_printk() the printing of the dev_name from the ring buffer was
> converted to using the dev dereferenced pointer from the hba saved
> pointer.
> 
> This is not allowed as the TP_printk() is executed at the time the
> trace
> event is read from /sys/kernel/tracing/trace file. That can happen
> literally, seconds, minutes, hours, weeks, days, or even months
> later!
> There is no guarantee that the hba pointer will still exist by the
> time it
> is dereferenced when the "trace" file is read.
> 
> Instead, save the device name from the hba pointer at the time the
> tracepoint is called and place it into the ring buffer event. Then
> the
> TP_printk() can read the name directly from the ring buffer and
> remove the
> possibility that it will read a freed pointer and crash the kernel.
> 
> This was detected when testing the trace event code that looks for
> TP_printk() parameters doing illegal derferences[1]
> 
> [1]
> https://lore.kernel.org/all/20260630184836.74d477b6@gandalf.local.home/
> 
> Cc: stable@vger.kernel.org
> Fixes: 583e518e71003 ("scsi: ufs: core: Add hba parameter to trace
> events")
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Reviewed-by: Peter Wang <peter.wang@mediatek.com>


^ permalink raw reply

* Re: [PATCH v8 23/46] KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION
From: Yan Zhao @ 2026-07-01  6:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng, aik@amd.com, andrew.jones@linux.dev,
	binbin.wu@linux.intel.com, brauner@kernel.org,
	chao.p.peng@linux.intel.com, david@kernel.org,
	jmattson@google.com, jthoughton@google.com, michael.roth@amd.com,
	oupton@kernel.org, pankaj.gupta@amd.com, qperret@google.com,
	Edgecombe, Rick P, rientjes@google.com, shivankg@amd.com,
	steven.price@arm.com, tabba@google.com, willy@infradead.org,
	wyihan@google.com, forkloop@google.com, pratyush@kernel.org,
	suzuki.poulose@arm.com, aneesh.kumar@kernel.org,
	liam@infradead.org, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86@kernel.org, H. Peter Anvin,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, Shuah Khan, Annapurve, Vishal,
	Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
	Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
	Jason Gunthorpe, Vlastimil Babka, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, linux-coco@lists.linux.dev
In-Reply-To: <akPEKslqAhygyjhg@google.com>

On Tue, Jun 30, 2026 at 09:27:06PM +0800, Sean Christopherson wrote:
> On Tue, Jun 30, 2026, Yan Zhao wrote:
> > On Tue, Jun 30, 2026 at 08:35:49AM +0800, Sean Christopherson wrote:
> > > Gah, I thought I had sent this out this morning, long before Ackerley's response.
> > > But I got distracted by a meeting and forgot to get back to this... *sigh*
> > > 
> > > Sending what I already wrote, even though there's a lot of overlap with Ackerley's
> > > mail.
> > > 
> > > On Mon, Jun 29, 2026, Yan Zhao wrote:
> > > > On Fri, Jun 26, 2026 at 08:28:32AM -0700, Ackerley Tng wrote:
> > > > > Yan Zhao <yan.y.zhao@intel.com> writes:
> > > > > > But if a user configures 0 uaddr as valid, writes to it, and then passes 0 as
> > > > > > source_addr(not from gmem), I'm not sure if it's good for the kernel to silently
> > > > > > treat 0 uaddr as an identifier for in-place copy from the private PFN in gmem.
> > > > > >
> > > > > 
> > > > > I'd say the original uAPI perhaps just didn't document 0 as an
> > > > > unsupported uaddr. Given that commit 2a62345b3052 already merged, uAPI
> > > > > was perhaps accidentally changed and no customer complained, I think we
> > > > > can move forward with 0 as an invalid src_address? I wouldn't think
> > > > > anyone relies on 0 intentionally being a valid address.
> > > > > 
> > > > > I could document that, if it helps?
> > > > What about just documenting that 0 is an unsupported uaddr which will be
> > > > re-purposed as an indicator to use the target pfn as the source, regardless of
> > > > whether gmem_in_place_conversion is true? i.e.,
> > > > 
> > > > if (!src_page) 
> > > > 	src_page = pfn_to_page(pfn);
> > > 
> > > Because KVM can't generally use the target page as the source without in-place
> > > conversion, it's not supported today, and out-of-place conversion is being
> > > deprecated.
> > By "out-of-place conversion", do you mean using per-VM memory attribute
> > conversion?
> 
> Yep, I couldn't come up with a better description.
> 
> > > > I don't get why the two scenarios should be treated differently:
> > > > 1. gmem_in_place_conversion==true, shared memory is not from gmem 
> > > > 2. gmem_in_place_conversion==false, shared memory is not from gmem
> > > > 
> > > > In both case, a 0 uaddr could be mapped to a valid page not from gmem.
> > > 
> > > That's immaterial.  KVM's ABI (that we're solidifying) is that an address of '0'
> > > for the source means NULL.  The fact that userspace could have a valid mapping
> > > at virtual address '0' is irrelevant.
> > So, I'm wondering if we can document that 0 uaddr could always mean using target
> > PFN.
> 
> I would document it as saying "no source page", and then state that a source page
> is required if in-place conversion isn't enabled/supported/allowed.
Ok.

> > i.e., for both scenarios 1 and 2, al long as 0 uaddr is specified, we always
> > use target PFN as source for in-place add.
> > 
> > > Again, just because something is technically possible doesn't mean it needs to
> > > be supported by every piece of KVM's uAPI.
> > > 
> > > > So why not update the uAPI to handle both cases consistently? :)
> > > 
> > > Because retroactively adding support for out-of-place conversion is pointless
> > > (requires a userspace update for a feature that's being deprecated), KVM can't
> > > generally support using the source for out-of-place conversion (it's effectively
> > > an obscure zero-page optimization), and IMO rejecting the out-of-place conversion
> > > scenario is valuable for KVM developers, e.g. to help newcomers understand what
> > > exactly is and isn't possible.
> > Ok. You mean per-VM memory attribute is deprecating, and source page from !gmem
> > backend is also deprecating, so we don't want to change uAPI for scenarios under
> > gmem_in_place_conversion==false. Right?
> 
> Right.
> 
> > 
> > > Side topic, isn't TDX broken if target page has already been added to the TD?
> > > IIUC, kvm_tdp_mmu_map_private_pfn() will be a glorified nop due to the page
> > > already having a valid S-EPT mapping, and so KVM will incorrectly allow a double
> > Not sure if my understand out-of-place conversion correctly.
> > Given target PFNs and GFNs are not duplicated, what would cause double add? :)
> 
> I was working through what would happen if userspace did KVM_TDX_INIT_MEM_REGION
> on the same target page multiple times.
Oh. To have KVM_TDX_INIT_MEM_REGION on the same target page multiple times, the
user needs to invoke KVM_TDX_INIT_MEM_REGION on the same GPA multiple times.
In that case, yes, kvm_tdp_mmu_map_private_pfn() will return -EIO.

> > 
> > > add.  Ahhh, no, because KVM will return RET_PF_SPURIOUS and
> > > kvm_tdp_mmu_map_private_pfn() will then return -EIO.
> > My asking was if we could document uaddr always means using target PFN, since
> > TDX's in-place add does not rely on gmem in-place conversion.
> 
> Yeah, I was on a tangent, ignore everything from "Side topic" on.
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox