Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* [PATCH] samples/ftrace: Prevent division by zero when nr_function_calls is zero
From: Samuel Moelius @ 2026-06-29 15:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Samuel Moelius, Masami Hiramatsu, Mark Rutland,
	open list:FUNCTION HOOKS (FTRACE),
	open list:FUNCTION HOOKS (FTRACE)

The ftrace-ops sample exposes nr_function_calls as a module parameter
and uses it as the divisor when printing the measured time per call.
Loading the module with nr_function_calls=0 skips the benchmark loop and
then divides the elapsed time by zero, crashing the kernel during sample
module initialization.

Keep accepting the parameter value, but report -1LL as the per-call
duration when the call count is zero instead of dividing by it.

Assisted-by: Codex:gpt-5.5-cyber-preview
Signed-off-by: Samuel Moelius <sam.moelius@trailofbits.com>
---
 samples/ftrace/ftrace-ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/ftrace/ftrace-ops.c b/samples/ftrace/ftrace-ops.c
index 68d6685c80bd..e6c07da407cc 100644
--- a/samples/ftrace/ftrace-ops.c
+++ b/samples/ftrace/ftrace-ops.c
@@ -223,7 +223,7 @@ static int __init ftrace_ops_sample_init(void)
 
 	pr_info("Attempted %u calls to %ps in %lluns (%lluns / call)\n",
 		nr_function_calls, tracee_relevant,
-		period, div_u64(period, nr_function_calls));
+		period, nr_function_calls ? div_u64(period, nr_function_calls) : -1LL);
 
 	if (persist)
 		return 0;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 01/30] mm: move vma_start_pgoff() into mm.h and clean up
From: Gregory Price @ 2026-06-29 15:27 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
	Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <b28b698df4c009e85c4728446ca5863d8e633164.1782735110.git.ljs@kernel.org>

On Mon, Jun 29, 2026 at 01:23:12PM +0100, Lorenzo Stoakes wrote:
> vma_last_pgoff() already lives there, so it's a bit odd to keep
> vma_start_pgoff() in mm/interval_tree.c. Move them together.
> 
> These each return unsigned long, which pgoff_t is typedef'd to. Make this
> consistent and have these functions return pgoff_t instead.
> 
> Additionally, express vma_last_pgoff() in terms of vma_start_pgoff(), since
> we wrap the vma->vm_pgoff access, we may as well use it here.
> 
> Also while we're here, const-ify the VMA and cleanup a bit.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>

Reviewed-by: Gregory Price <gourry@gourry.net>


^ permalink raw reply

* Re: [PATCH 02/30] mm: add kdoc comments for vma_start/last_pgoff()
From: Gregory Price @ 2026-06-29 15:31 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
	Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <8c618dfd7de419e3b797b8bd1cd921d4c5b8878b.1782735110.git.ljs@kernel.org>

On Mon, Jun 29, 2026 at 01:23:13PM +0100, Lorenzo Stoakes wrote:
> Describe what vma_start_pgoff() and vma_last_pgoff() actually provide in
> detail.
> 
> This is in order that we can differentiate this between functions that will
> be added in a subsequent patch which provide a different page offset.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>

Reviewed-by: Gregory Price <gourry@gourry.net>


^ permalink raw reply

* Re: [PATCH 03/30] tools/testing/vma: use vma_start_pgoff() in merge tests
From: Gregory Price @ 2026-06-29 15:40 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
	Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <b501eca378b9d9734e83838102aadc9276590fba.1782735110.git.ljs@kernel.org>

On Mon, Jun 29, 2026 at 01:23:14PM +0100, Lorenzo Stoakes wrote:
> Now we have the vma_start_pgoff() helper, update the merge tests to make
> use of it for consistency.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>

Question: Should we have primitive tests for vma_*_pgoff() since
the behavior changes depending on file/anon?

Nice to have the cleanup and clarity. Maybe worth asserting no
one ever breaks this.

for this patch though

Reviewed-by: Gregory Price <gourry@gourry.net>

^ permalink raw reply

* Re: [PATCH v10 6/6] selftests/mm: add hwpoison-panic destructive test
From: Breno Leitao @ 2026-06-29 15:50 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Miaohe Lin, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Suren Baghdasaryan, Michal Hocko, Shuah Khan,
	Naoya Horiguchi, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
	lance.yang, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	linux-mm, linux-kernel, linux-doc, linux-kselftest,
	linux-trace-kernel, kernel-team
In-Reply-To: <aj-BShkN6BXex_ku@kernel.org>

On Sat, Jun 27, 2026 at 10:52:42AM +0300, Mike Rapoport wrote:
> Hi Breno,
> 
> On Fri, Jun 26, 2026 at 08:33:20AM -0700, Breno Leitao wrote:
> > Add a destructive selftest that verifies
> > vm.panic_on_unrecoverable_memory_failure actually panics when a
> > hwpoison error hits a kernel-owned page.
> 
> > +ksft_skip=4
> 
> ...
> 
> > +ksft_print() { echo "# $*"; }
> > +ksft_exit_skip() { ksft_print "$*"; exit "$ksft_skip"; }
> > +ksft_exit_fail() { echo "not ok 1 $*"; exit 1; }
> 
> There is tools/testing/selftests/kselftest/ktap_helpers.sh that already
> implements this :)

Ack, let me source that file in my selftest.

	DIR="$(dirname "$(readlink -f "$0")")"
	source "${DIR}"/../kselftest/ktap_helpers.sh

I will update, thanks for the review,
--breno

^ permalink raw reply

* Re: [PATCH 04/30] mm: introduce and use vma_end_pgoff()
From: Gregory Price @ 2026-06-29 15:54 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
	Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <e379a1cb6a897126ad96e3a263fdb91d6c11f6cb.1782735110.git.ljs@kernel.org>

On Mon, Jun 29, 2026 at 01:23:15PM +0100, Lorenzo Stoakes wrote:
> We already have vma_last_pgoff() which retrieves the last page offset
> within a VMA.
> 
> However, code often wishes to span a page offset range, which requires the
> exclusive end of this range.
> 
> So provide this in vma_end_pgoff() and update vma_last_pgoff() to use this
> function.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>

Reviewed-by: Gregory Price <gourry@gourry.net>


^ permalink raw reply

* Re: [PATCH 05/30] mm/rmap: update mm/interval_tree.c comments
From: Gregory Price @ 2026-06-29 16:01 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
	Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <80d482a927b2e9862487b812e0ab48ebc1289a70.1782735110.git.ljs@kernel.org>

On Mon, Jun 29, 2026 at 01:23:16PM +0100, Lorenzo Stoakes wrote:
> Update the file comment to clarify that both file-backed and anonymous
> interval trees are provided, referencing the relevant data types for
> clarity.
>

Isn't this self-evident by nature of the function definitions?
(one takes a vm_area_struct, the other takes an anon_vma_chain)

> -	VM_BUG_ON_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
> +	VM_WARN_ON_ONCE_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
>  

For my own edification - I know not to add new BUG(), should I be
converting BUG->WARN/something when i find them in areas i happen to be
working in?

~Gregory

^ permalink raw reply

* Re: [PATCH 03/30] tools/testing/vma: use vma_start_pgoff() in merge tests
From: Lorenzo Stoakes @ 2026-06-29 16:35 UTC (permalink / raw)
  To: Gregory Price
  Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
	Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <akKR3bFV7393yOUs@gourry-fedora-PF4VCD3F>

On Mon, Jun 29, 2026 at 11:40:13AM -0400, Gregory Price wrote:
> On Mon, Jun 29, 2026 at 01:23:14PM +0100, Lorenzo Stoakes wrote:
> > Now we have the vma_start_pgoff() helper, update the merge tests to make
> > use of it for consistency.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
>
> Question: Should we have primitive tests for vma_*_pgoff() since
> the behavior changes depending on file/anon?
>
> Nice to have the cleanup and clarity. Maybe worth asserting no
> one ever breaks this.

Well funny you should mention that :) I do add some asserts as I go.

In my RFC series which this series is the predicate for, I add more as then
we track virtal page off separately (see [0]).

Amusingly (or not) /dev/zero breaks assumptions a bit (anonymous VMA with
vma->vm_file that tracks by file index, just glroious). But I plan to fix
that later!

>
> for this patch though
>
> Reviewed-by: Gregory Price <gourry@gourry.net>

Thanks!

Cheers, Lorenzo

[0]:https://lore.kernel.org/linux-mm/cover.1782745153.git.ljs@kernel.org/

^ permalink raw reply

* Re: [PATCHv4 05/13] uprobes/x86: Move optimized uprobe from nop5 to nop10
From: Oleg Nesterov @ 2026-06-29 16:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Masami Hiramatsu, Andrii Nakryiko,
	bpf, linux-trace-kernel
In-Reply-To: <akJNZN9gvZ30zKUf@krava>

On 06/29, Jiri Olsa wrote:
>
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -265,6 +265,10 @@ static bool is_prefix_bad(struct insn *insn)
>
>  		attr = inat_get_opcode_attribute(p);
>  		switch (attr) {
> +		case INAT_MAKE_PREFIX(INAT_PFX_CS):
> +			if (insn->x86_64)
> +				break;
> +			fallthrough;
>  		case INAT_MAKE_PREFIX(INAT_PFX_ES):
>  		case INAT_MAKE_PREFIX(INAT_PFX_DS):
>  		case INAT_MAKE_PREFIX(INAT_PFX_SS):
>
> or we could just skip it for nop10.. maybe that's better

Well, if you ask me I'd agree with the "maybe that's better" plan ;)
I mean... I don't think that INAT_PFX_CS should be "special" in is_prefix_bad.

But, whatever you do - I agree, feel free to keep my r-b.

Oleg.


^ permalink raw reply

* Re: [PATCH 05/30] mm/rmap: update mm/interval_tree.c comments
From: Lorenzo Stoakes @ 2026-06-29 16:41 UTC (permalink / raw)
  To: Gregory Price
  Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
	Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <akKWvnU2Ua-8ceSb@gourry-fedora-PF4VCD3F>

On Mon, Jun 29, 2026 at 12:01:02PM -0400, Gregory Price wrote:
> On Mon, Jun 29, 2026 at 01:23:16PM +0100, Lorenzo Stoakes wrote:
> > Update the file comment to clarify that both file-backed and anonymous
> > interval trees are provided, referencing the relevant data types for
> > clarity.
> >
>
> Isn't this self-evident by nature of the function definitions?
> (one takes a vm_area_struct, the other takes an anon_vma_chain)

Well you see you're already hitting up on issues there, they both take an
rb_root_cached and the vma_*() ones do not instantly scream 'file-backed' do
they? As VMAs are obviously used for buth anon and file-backed...

But later patches fix this stuff :)

And I feel it's hard visually to see where one set of definitions end and
another begins, which was really the motive for this, as trivial as it is!

>
> > -	VM_BUG_ON_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
> > +	VM_WARN_ON_ONCE_VMA(vma_start_pgoff(node) != vma_start_pgoff(prev), node);
> >
>
> For my own edification - I know not to add new BUG(), should I be
> converting BUG->WARN/something when i find them in areas i happen to be
> working in?

Yeah pretty much in all cases.

It's very rare that you'd want the kernel to definitely oops, and I can't think
of any circumstance where you'd only what that if CONFIG_DEBUG_VM was set :))

>
> ~Gregory

Cheers, Lorenzo

^ permalink raw reply

* Re: [PATCH 05/30] mm/rmap: update mm/interval_tree.c comments
From: Gregory Price @ 2026-06-29 17:11 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
	Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <akKfAl-wdIAbexNR@lucifer>

On Mon, Jun 29, 2026 at 05:41:16PM +0100, Lorenzo Stoakes wrote:
> On Mon, Jun 29, 2026 at 12:01:02PM -0400, Gregory Price wrote:
> > On Mon, Jun 29, 2026 at 01:23:16PM +0100, Lorenzo Stoakes wrote:
> > > Update the file comment to clarify that both file-backed and anonymous
> > > interval trees are provided, referencing the relevant data types for
> > > clarity.
> > >
> >
> > Isn't this self-evident by nature of the function definitions?
> > (one takes a vm_area_struct, the other takes an anon_vma_chain)
> 
> Well you see you're already hitting up on issues there, they both take an
> rb_root_cached and the vma_*() ones do not instantly scream 'file-backed' do
> they? As VMAs are obviously used for buth anon and file-backed...
> 
> But later patches fix this stuff :)
> 
> And I feel it's hard visually to see where one set of definitions end and
> another begins, which was really the motive for this, as trivial as it is!
> 

Fair enough, I scanned the rest initially but trying to wrap my head
around everything as i go through one by one.  Generally this really
screams "fix the apis" not "comment the bad ones" - but i suppose that's
the whole point here.

It's definitely an improvement either way.

Reviewed-by: Gregory Price <gourry@gourry.net>

~Gregory

^ permalink raw reply

* Re: [PATCH 05/30] mm/rmap: update mm/interval_tree.c comments
From: Lorenzo Stoakes @ 2026-06-29 17:40 UTC (permalink / raw)
  To: Gregory Price
  Cc: Andrew Morton, Russell King, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, SeongJae Park,
	Miaohe Lin, Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <akKnNy64lhNqPtLL@gourry-fedora-PF4VCD3F>

On Mon, Jun 29, 2026 at 01:11:19PM -0400, Gregory Price wrote:
> On Mon, Jun 29, 2026 at 05:41:16PM +0100, Lorenzo Stoakes wrote:
> > On Mon, Jun 29, 2026 at 12:01:02PM -0400, Gregory Price wrote:
> > > On Mon, Jun 29, 2026 at 01:23:16PM +0100, Lorenzo Stoakes wrote:
> > > > Update the file comment to clarify that both file-backed and anonymous
> > > > interval trees are provided, referencing the relevant data types for
> > > > clarity.
> > > >
> > >
> > > Isn't this self-evident by nature of the function definitions?
> > > (one takes a vm_area_struct, the other takes an anon_vma_chain)
> >
> > Well you see you're already hitting up on issues there, they both take an
> > rb_root_cached and the vma_*() ones do not instantly scream 'file-backed' do
> > they? As VMAs are obviously used for buth anon and file-backed...
> >
> > But later patches fix this stuff :)
> >
> > And I feel it's hard visually to see where one set of definitions end and
> > another begins, which was really the motive for this, as trivial as it is!
> >
>
> Fair enough, I scanned the rest initially but trying to wrap my head
> around everything as i go through one by one.  Generally this really
> screams "fix the apis" not "comment the bad ones" - but i suppose that's
> the whole point here.

Yeah intent is to eventually completely remove the anon stuff from here at least
:)

And Pedro I think is looking at the file rmap so we'll get there :)

>
> It's definitely an improvement either way.
>
> Reviewed-by: Gregory Price <gourry@gourry.net>
>
> ~Gregory

Cheers, Lorenzo

^ permalink raw reply

* Re: [PATCH v2] tracing: Use seq_buf for string concatenation
From: Steven Rostedt @ 2026-06-29 18:39 UTC (permalink / raw)
  To: Woradorn Laodhanadhaworn
  Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
	linux-hardening, linux-kernel-mentees, shuah, skhan, me,
	jkoolstra
In-Reply-To: <20260622094623.18469-1-woradorn.laon@gmail.com>

On Mon, 22 Jun 2026 16:46:23 +0700
Woradorn Laodhanadhaworn <woradorn.laon@gmail.com> wrote:

>  
>  #include <trace/events/sched.h>
>  #include <trace/syscall.h>
> @@ -4500,14 +4501,20 @@ static void __add_event_to_tracers(struct trace_event_call *call)
>  extern struct trace_event_call *__start_ftrace_events[];
>  extern struct trace_event_call *__stop_ftrace_events[];
>  
> -static char bootup_event_buf[COMMAND_LINE_SIZE] __initdata;

Keep the above string and just assign it.

> +static struct seq_buf bootup_event_buf __initdata = {
> +	.buffer = (char[COMMAND_LINE_SIZE]) {},
> +	.size = COMMAND_LINE_SIZE,
> +};

static struct seq_buf bootup_event_seq __initdata = {
	.buffer = bootup_event_buf;
	.size = sizeof(bootup_event_buf);
};

>  
>  static __init int setup_trace_event(char *str)
>  {
> -	if (bootup_event_buf[0] != '\0')
> -		strlcat(bootup_event_buf, ",", COMMAND_LINE_SIZE);
> +	if (seq_buf_used(&bootup_event_buf) > 0)
> +		seq_buf_puts(&bootup_event_buf, ",");
> +
> +	seq_buf_puts(&bootup_event_buf, str);
>  
> -	strlcat(bootup_event_buf, str, COMMAND_LINE_SIZE);
> +	if (seq_buf_has_overflowed(&bootup_event_buf))
> +		return -ENOMEM;
>  
>  	trace_set_ring_buffer_expanded(NULL);
>  	disable_tracing_selftest("running event tracing");
> @@ -4766,7 +4773,7 @@ static __init int event_trace_enable(void)
>  	 */
>  	__trace_early_add_events(tr);
>  
> -	early_enable_events(tr, bootup_event_buf, false);
> +	early_enable_events(tr, (char *)seq_buf_str(&bootup_event_buf), false);

The above then would be:

	seq_buf_str(&bootup_event_seq);
	early_enable_events(tr, bootup_event_buf, false);

Don't typecast a const char* to non const.

>  
>  	trace_printk_start_comm();
>  
> @@ -4794,7 +4801,7 @@ static __init int event_trace_enable_again(void)
>  	if (!tr)
>  		return -ENODEV;
>  
> -	early_enable_events(tr, bootup_event_buf, true);
> +	early_enable_events(tr, (char *)seq_buf_str(&bootup_event_buf), true);

Same here.

-- Steve

>  
>  	return 0;
>  }


^ permalink raw reply

* Re: [PATCH 1/2] tracing: Embed 'char comm[16]' in a structure
From: Steven Rostedt @ 2026-06-29 20:26 UTC (permalink / raw)
  To: David Laight
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel, Michal Koutný
In-Reply-To: <20260626212356.64150-2-david.laight.linux@gmail.com>

On Fri, 26 Jun 2026 22:23:55 +0100
David Laight <david.laight.linux@gmail.com> wrote:

\> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -52,6 +52,7 @@
>  #include <linux/sort.h>
>  #include <linux/io.h> /* vmap_page_range() */
>  #include <linux/fs_context.h>
> +#include <linux/trace_printk.h>

Left over debugging? ;-)

-- Steve

^ permalink raw reply

* Re: [PATCH 1/2] x86/uprobes: Keep shadow stack in sync for emulated CALLs
From: David Windsor @ 2026-06-29 20:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, mhiramat, tglx, mingo, bp, dave.hansen, x86,
	shuah, linux-trace-kernel, linux-kselftest, linux-kernel
In-Reply-To: <CAEXv5_gFBvtP9HZrQ14VAYpFbrY44o0+Qw9i=c6GM1Mnp4R83A@mail.gmail.com>

On Sat, Jun 27, 2026 at 1:14 PM David Windsor <dwindsor@gmail.com> wrote:
>
> On Tue, Jun 23, 2026 at 9:25 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 06/23, Peter Zijlstra wrote:
> > >
> > > On Tue, Jun 23, 2026 at 02:52:32PM +0200, Oleg Nesterov wrote:
> > > > On 06/22, David Windsor wrote:
> > > > >
> > > > > --- a/arch/x86/kernel/uprobes.c
> > > > > +++ b/arch/x86/kernel/uprobes.c
> > > > > @@ -1246,8 +1246,12 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
> > > > >           long correction = utask->vaddr - utask->xol_vaddr;
> > > > >           regs->ip += correction;
> > > > >   } else if (auprobe->defparam.fixups & UPROBE_FIX_CALL) {
> > > > > +         unsigned long retaddr = utask->vaddr + auprobe->defparam.ilen;
> > > > > +
> > > > >           regs->sp += sizeof_long(regs); /* Pop incorrect return address */
> > > > > -         if (emulate_push_stack(regs, utask->vaddr + auprobe->defparam.ilen))
> > > > > +         if (emulate_push_stack(regs, retaddr))
> > > > > +                 return -ERESTART;
> > > > > +         if (shstk_update_last_frame(retaddr))
> > > > >                   return -ERESTART;
> > > >
> > > > Well, if shstk_update_last_frame() fails after emulate_push_stack(), we should
> > > > probably return another error, so that the caller handle_singlestep() will kill
> > > > this task?
> > >
> > > Makes sense, the other user has a force_sig(SIGSEGV) on failure.
> >
> > Offtopic question... both shstk_update_last_frame() and shstk_push() are only
> > used by arch/x86/kernel/uprobes.c. But they are not symmetric in that
> > shstk_update_last_frame() returns 0 if !features_enabled(ARCH_SHSTK_SHSTK),
> > while shstk_push() returns -ENOTSUPP in this case.
> >
> > That is why the users can't just do "if (shstk_push(xxx)) ...". This is really
> > minor, but perhaps it makes sense to change shstk_push() to return 0 in this
> > case too? I don't think -ENOTSUPP is actually useful...
> >
>
> Agreed, will send a follow-up patch changing shstk_push() to return 0
> rather than -ENOTSUPP.
>
> I'll send v2 shortly with the additional call to force_sig(SIGSEGV) to
> balance out the callers.
>

Actually, we don't need to force_sig() ourselves in this case, the
handle_singlestep() path will take care of that. We just need to
return something that's not -ERESTART. You may have meant that in your
reply, but just noting it for the record for when I send v2.

> > Oleg.
> >

^ permalink raw reply

* Re: [PATCH 2/2] tracing: Keep pid and comm[] in the same structure
From: Steven Rostedt @ 2026-06-29 20:49 UTC (permalink / raw)
  To: David Laight
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel, Michal Koutný
In-Reply-To: <20260626212356.64150-3-david.laight.linux@gmail.com>

On Fri, 26 Jun 2026 22:23:56 +0100
David Laight <david.laight.linux@gmail.com> wrote:

> Rather than have two separate dynamic arrays on the end of struct
> saved_commandlines_buffer have a single dynamic array where each
> entry contains the pid and associated task->comm[].
> This simplifies the initialisation and lookup.
> 
> Don't bother trying to initialise the pid field no a non-zero value,
> it only matters in the tracing_saved_cmdlines_seq_ops code.
> Allocate entry [0] first so that the tracing_saved_cmdlines_seq_ops
> code can just index the array with the file offset.
> 
> The code now uses the correct size when determining the page 'order'
> to free the structure. The smaller size will always give the same
> 'order'.
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>
> ---
> 
> Is there any reason why this code uses alloc_pages() rather
> than vmalloc()?

It's been a long time since I worked on this, but IIRC, it was to keep
the pressure down on the TLB when tracing. It updates at every
sched_switch that has a trace event occurring so, I likely used normal
pages which are part of the huge pages the kernel sets up and doesn't
affect the TLB as much. vmalloc does have impact on the TLB pressure,
and tracing should always try to avoid that.

> map_pid_to_cmdline[] is 64k*sizeof(int) so the whole structure
> expands to 512k with about 64k/20 (about 3200) pid entries even
> though the default is 128.

That's because it is not dynamic. That array needs to be able to hold
most PIDs. The default is 128 but it will expand to how much it can
hold to allocate the full map_pid_to_cmdline. The real default for 4098
page sized architectures is 6552 entries.

> AFAICT there is only one copy of the data - so it could be static.
> Perhaps with pointers to map_pid_cmdline[] and (after this patch)
> pid_comm[], both of which could be separately resized.

map_pid_t_cmdline[] is to hold the PID_MAX_DEFAULT amount of PIDs to
avoid collisions. I wouldn't resize it.

> 
> I also noticed that map_pid_to_cmdline[] contains indexes into
> pid_comm[], restricting these to 16bits would half the data area.

Hmm, yeah, this could be useful, as it doesn't appear one could make
saved_cmdline_size greater than 65536 (or even close to that).

-- Steve

^ permalink raw reply

* Re: [PATCH v13 04/11] perf/probe: Ignore comment lines in dynamic_events/kprobe_events file
From: Masami Hiramatsu @ 2026-06-29 22:32 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Masami Hiramatsu (Google), Steven Rostedt, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, linux-kernel, linux-trace-kernel,
	linux-doc, linux-kselftest
In-Reply-To: <178271361825.1176915.16095297120719039761.stgit@devnote2>

Hi Arnaldo, Namhyung,

I forgot to CC this. Can I pick this patch via linux-trace tree,
or would you pick this?
This is a part of typecast series [1] only for debugging.

[1] https://lore.kernel.org/all/178271361825.1176915.16095297120719039761.stgit@devnote2/

Thanks,

On Mon, 29 Jun 2026 15:13:38 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Since dynamic_events/kprobe_events files show the fetcharg debug
> information as comment lines, its reader needs to ignore it.
> 
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
>  tools/perf/util/probe-file.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
> index 4032572cbf55..4d12693a83b3 100644
> --- a/tools/perf/util/probe-file.c
> +++ b/tools/perf/util/probe-file.c
> @@ -197,6 +197,8 @@ struct strlist *probe_file__get_rawlist(int fd)
>  		idx = strlen(p) - 1;
>  		if (p[idx] == '\n')
>  			p[idx] = '\0';
> +		if (buf[0] == '#')
> +			continue;
>  		ret = strlist__add(sl, buf);
>  		if (ret < 0) {
>  			pr_debug("strlist__add failed (%d)\n", ret);
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v8 23/46] KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION
From: Ackerley Tng @ 2026-06-30  0:00 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Sean Christopherson, aik, andrew.jones, binbin.wu, brauner,
	chao.p.peng, david, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, tabba, willy, wyihan, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt, Kiryl Shutsemau,
	Baoquan He, Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
	linux-coco
In-Reply-To: <akI9m02jgKAdi4gX@yzhao56-desk.sh.intel.com>

Yan Zhao <yan.y.zhao@intel.com> writes:

> On Fri, Jun 26, 2026 at 08:28:32AM -0700, Ackerley Tng wrote:
>> Yan Zhao <yan.y.zhao@intel.com> writes:
>>
>> > On Thu, Jun 25, 2026 at 05:07:23PM -0700, Ackerley Tng wrote:
>> >> Yan Zhao <yan.y.zhao@intel.com> writes:
>> >>
>> >> > On Wed, Jun 24, 2026 at 04:00:32PM -0700, Ackerley Tng wrote:
>> >> >> Sean Christopherson <seanjc@google.com> writes:
>> >> >>
>> >> >> > On Tue, Jun 23, 2026, Yan Zhao wrote:
>> >> >> >> On Tue, Jun 23, 2026 at 01:16:14PM +0800, Yan Zhao wrote:
>> >> >> >> > On Mon, Jun 22, 2026 at 06:22:45PM -0700, Sean Christopherson wrote:
>> >> >> >> > > On Mon, Jun 22, 2026, Yan Zhao wrote:
>> >> >> >> > > > On Thu, Jun 18, 2026 at 05:32:00PM -0700, Ackerley Tng via B4 Relay wrote:
>> >> >> >> > > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>> >> >> >> > > > > index ffe9d0db58c59..56d10333c61a7 100644
>> >> >> >> > > > > --- a/arch/x86/kvm/vmx/tdx.c
>> >> >> >> > > > > +++ b/arch/x86/kvm/vmx/tdx.c
>> >> >> >> > > > > @@ -3198,8 +3198,12 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>> >> >> >> > > > >  	if (KVM_BUG_ON(kvm_tdx->page_add_src, kvm))
>> >> >> >> > > > >  		return -EIO;
>> >> >> >> > > > >
>> >> >> >> > > > > -	if (!src_page)
>> >> >> >> > > > > -		return -EOPNOTSUPP;
>> >> >> >> > > > > +	if (!src_page) {
>> >> >> >> > > > > +		if (!gmem_in_place_conversion)
>> >> >> >> > > > When userspace turns on gmem_in_place_conversion while creating guest_memfd
>> >> >> >> > > > without the MMAP flag, the absence of src_page should still be treated as an
>> >> >> >> > > > error.
>> >> >> >> > >
>> >> >> >> > > Why MMAP?
>> >> >> >> > Hmm, I was showing a scenario that in-place conversion couldn't occur.
>> >> >> >> > I didn't mean that with the MMAP flag, mmap() and user write must occur.
>> >> >> >> >
>> >> >> >> > > Shouldn't this be a general "if (!src_page && !up-to-date)"?  Just
>> >> >> >> > > because userspace _can_ mmap() the memory doesn't mean userspace _has_ mmap()'d
>> >> >> >> > > and written memory.  And when write() lands, MMAP wouldn't be necessary to
>> >> >> >> > > initialize the memory.
>> >> >> >> > Do you mean using up-to-date flag as below?
>> >> >> >
>> >> >> > Yes?  I didn't actually look at the implementation details.
>> >> >> >
>> >> >> >> > if (!src_page) {
>> >> >> >> > 	src_page = pfn_to_page(pfn);
>> >> >> >> > 	if (!folio_test_uptodate(page_folio(src_page)))
>> >> >> >> > 		return -EOPNOTSUPP;
>> >> >> >> > }
>> >> >>
>> >> >> Yan is right that with the earlier patch "Zero page while getting pfn",
>> >> >> folio_test_uptodate() here will always return true.
>> >> >>
>> >> >> Actually, this is an alternative fix for the issue Sashiko pointed out
>> >> >> on v7 where userspace can do a populate() (either TDX or SNP) without
>> >> >> first allocating the page, with src_address == NULL, and leak
>> >> >> uninitialized memory into the guest.
>> >> >>
>> >> >> Advantage of using the uptodate check in populate: if the host never
>> >> >> allocates the page, populate doesn't incur zeroing before writing the
>> >> >> page anyway in populate().
>> >> >>
>> >> >> Disadvantage: Both TDX and SNP will have to implement this uptodate
>> >> >> check. guest_memfd can't check centrally because for SNP, for a
>> >> >> PAGE_TYPE_ZERO, !src_page should be allowed with a !uptodate page since
>> >> >> firmware will zero and there's no leakage of uninitialized host memory?
>> >> > Another disadvantage: the uptodate flag is per-folio. What if the folio
>> >> > is only partially initialized by the userspace especially after huge page is
>> >> > supported?
>> >> >
>> >>
>> >> Good point on huge pages!
>> >>
>> >> The uptodate flag on the folio in guest_memfd means "this folio has been
>> >> written to". As of now (before patch at [1]), this happens when
>> >>
>> >> + folio is zeroed on first use by userspace
>> >> + folio is zeroed on first use of the guest
>> >> + folio is populated
>> >>
>> >> When huge pages are supported, the folio can't partially be initialized?
>> >>
>> >> On allocation, if any part is shared, we split the page. The parts are
>> >> separate folios that have their own uptodate flags.
>> >>
>> >> On splitting, if the huge page is uptodate, the split pages will also be
>> >> uptodate. If the huge page is not uptodate, the split pages won't be
>> >> uptodate, but that's ok since they will be marked uptodate on first use.
>> >>
>> >> On merging, the non-uptodate parts have to be zeroed and then marked
>> > If that's true, it would be good.
>> >
>> >> uptodate. Any parts that are in use would have been marked uptodate
>> >> already, so there's no overwriting data that is in use. I'll need to
>> >> think more about when it's safe to zero.
>> >>
>> >> I'm still on the fence between the two options
>> >>
>> >> 1. Using uptodate check in populate to reject src_pages that have never
>> >>    been written to or
>> >> 2. Always zero before populate
>> > 2 does not work?
>> > The flow is
>> > 1. mmap gmem_fd, make GFN shared, and write initial content.
>> > 2. convert GFN to private
>> > 3. invoke ioctl to trigger populate.
>> >
>>
>> This flow is correct, is what users of in-place conversion should do.
>>
>> "Always" is the wrong word, I should have said "zero if not uptodate
>> before populate", as in, with patch at [1].
>>
>> By doing the zeroing in __kvm_gmem_get_pfn instead, by the time populate
>> gets the pfn, the page would be zeroed, either because userspace faulted
>> it in, and the zeroing happened in kvm_gmem_fault_user_mapping(), or if
>> userspace never faulted it in, the zeroing would happen because
>> populate() allocated the page.
>
> I see.
>
>> >> but whether the uptodate flag is per-folio or not doesn't affect these
>> >> two options in terms of fixing the leak of uninitialized host memory,
>> >> right?
>> > yes, provided "On merging, the non-uptodate parts have to be zeroed and then
>> > marked uptodate".
>> >
>>
>> Thank you so much for bringing this up, I hadn't considered this
>> before. I'll do that when I get to guest_memfd hugepage restructuring.
>>
>> >> >
>> >> >> >> Another concern with this fix is that:
>> >> >> >> commit "KVM: guest_memfd: Zero page while getting pfn" [1] always marks the
>> >> >> >> folio uptodate before reaching post_populate().
>> >> >> >>
>> >> >> >> [1] https://lore.kernel.org/all/20260618-gmem-inplace-conversion-v8-21-9d2959357853@google.com/
>> >> >> >>
>> >> >> >> > One concern is that TDX now does not much care about the up-to-date flag since
>> >> >> >> > TDX doesn't rely on the flag to clear pages on conversions.
>> >> >> >> > I'm not sure if the flag can be reliably checked in this case. e.g.,
>> >> >> >> > now the whole folio is marked up-to-date even if only part of it is faulted by
>> >> >> >> > user access.
>> >> >> >> > Ensuring that the up-to-date flag works correctly with huge page support seems
>> >> >> >> > to have more effort than introducing a dedicated flag for TDX.
>> >> >> >> >
>> >> >> >> > > > Additionally, to properly enable in-place copying for the TDX initial memory
>> >> >> >> > > > region, userspace must not only specify source_addr to NULL, but also follow
>> >> >> >> > > > a specific sequence (where steps 1/2/3/7 are required only for in-place copy):
>> >> >> >> > > > 1. create guest_memfd with MMAP flag
>> >> >> >> > > > 2. mmap the guest_memfd.
>> >> >> >> > > > 3. convert the initial memory range to shared.
>> >> >> >> > > > 4. copy initial content to the source page.
>> >> >> >> > > > 5. convert the initial memory range to private
>> >> >> >> > > > 6. invoke ioctl KVM_TDX_INIT_MEM_REGION.
>> >> >> >> > > > 7. do not unmap the source backend.
>> >> >> >> > > >
>> >> >> >> > > > So, would it be reasonable to introduce a dedicated flag that allows userspace
>> >> >> >> > > > to explicitly opt into the in-place copy functionality? e.g.,
>> >> >> >> > >
>> >> >> >> > > Why?  It's userspace's responsibility to get the above right.  If userspace fails
>> >> >> >> > > to provide a src_page when it doesn't want in-place copy, that's a userspace bug.
>> >> >>
>> >> >> Yan, is your concern that userspace forgot to update the code and
>> >> >> forgets to provide a src_page, and if we keep the "Zero page while
>> >> > Yes. Previously, it would be rejected after GUP fails.
>> >> >
>> >>
>> >> I see, didn't realize previously it would be rejected because GUP
>> >> fails. GUP failed because it wasn't faulted into the host?
>> > GUP fails if 0 is not a valid user address.
>> > But GUP would not fail if 0 is a valid address. e.g., in below scenario:
>> >
>> > #include <sys/mman.h>
>> > #include <stdio.h>
>> > int main(void)
>> > {
>> >         void *p=mmap((void*)0,4096,PROT_READ|PROT_WRITE, MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS,-1,0);
>> >         if (p==MAP_FAILED) {
>> >                 perror("mmap");
>> >                 return 1;
>> >         }
>> >         *(char*)0='Y';
>> >         printf("addr0=%p val=%c\n",p,*(char*)0);
>> >         return 0;
>> > }
>> >
>> >
>> >> That's kind of orthogonal, I don't think GUP fail leading to rejecting
>> >> populate was meant to help userspace catch these issues. GUP would also
>> >> fail if the user did mmap(), write to it, unmap using
>> >> madvise(MADV_DONTNEED), then forget and pass 0 as src_address.
>> > The original uAPI did not explicitly define 0 as an invalid uaddr. Whether 0 was
>> > rejected depended on whether the user mmap()'d address 0. If 0 was a valid
>> > mapping, populate() could proceed.
>> >
>> > commit 2a62345b3052 ("KVM: guest_memfd: GUP source pages prior to populating
>> > guest memory") changed the behavior though. It would return -EOPNOTSUPP for a 0
>> > uaddr.
>> >
>>
>> I see, I only looked at this after commit 2a62345b3052.
>>
>> > But if a user configures 0 uaddr as valid, writes to it, and then passes 0 as
>> > source_addr(not from gmem), I'm not sure if it's good for the kernel to silently
>> > treat 0 uaddr as an identifier for in-place copy from the private PFN in gmem.
>> >
>>
>> I'd say the original uAPI perhaps just didn't document 0 as an
>> unsupported uaddr. Given that commit 2a62345b3052 already merged, uAPI
>> was perhaps accidentally changed and no customer complained, I think we
>> can move forward with 0 as an invalid src_address? I wouldn't think
>> anyone relies on 0 intentionally being a valid address.
>>
>> I could document that, if it helps?
> What about just documenting that 0 is an unsupported uaddr which will be
> re-purposed as an indicator to use the target pfn as the source, regardless of
> whether gmem_in_place_conversion is true? i.e.,
>
> if (!src_page)
> 	src_page = pfn_to_page(pfn);
>
> I don't get why the two scenarios should be treated differently:
> 1. gmem_in_place_conversion==true, shared memory is not from gmem
> 2. gmem_in_place_conversion==false, shared memory is not from gmem
>
> In both case, a 0 uaddr could be mapped to a valid page not from gmem.

This is true, but this check isn't about whether the page is from gmem.

> So why not update the uAPI to handle both cases consistently? :)
>

Wait, but before this series, if region.src_address = 0, src_page = NULL
and that's not supported so it returns -EOPNOTSUPP.

If that's dropped, then suddenly if region.src_address = 0 and
!gmem_in_place_conversion, tdx_gmem_post_populate() will now load the
memory (zeroed) after [1] into the guest? I don't think we want to
change that behavior.

I could document that 0 is an unsupported uaddr only for TDX, and only
when gmem_in_place_conversion = false.

Since it is unsupported only when gmem_in_place_conversion = false, the
check two lines marked with <<==== can't go away?

	if (!src_page) {
		if (!gmem_in_place_conversion)  <<====
			return -EOPNOTSUPP;     <<====

		src_page = pfn_to_page(pfn);
	}

Also, for SNP, src_address == 0 is permitted (and desired, I believe, to
avoid a pointless kernel memcpy) if the type of population is
KVM_SEV_SNP_PAGE_TYPE_ZERO.

>> >> >> getting pfn" patch, ends up with the guest silently having a zero page?
>> >> >> I think that would be found quite early in userspace VMM testing...
>>
>> [...snip...]
>>

^ permalink raw reply

* Re: [PATCH 16/30] mm/vma: use vma_start_pgoff(), linear_page_index() in mm code
From: SJ Park @ 2026-06-30  0:11 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: SJ Park, Andrew Morton, Dinh Nguyen, Simon Schuster,
	James E . J . Bottomley, Helge Deller, Jarkko Sakkinen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Ian Abbott, H Hartley Sweeten, Lucas Stach, David Airlie,
	Simona Vetter, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Rob Clark, Dmitry Baryshkov, Tomi Valkeinen,
	Thierry Reding, Mikko Perttunen, Jonathan Hunter,
	Christian Koenig, Huang Rui, Ankit Agrawal, Alex Williamson,
	Alexander Viro, Christian Brauner, Dan Williams, Muchun Song,
	Oscar Salvador, David Hildenbrand, Suren Baghdasaryan,
	Liam R . Howlett, Matthew Wilcox, Marek Szyprowski,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim,
	Masami Hiramatsu, Oleg Nesterov, Steven Rostedt, Miaohe Lin,
	Hugh Dickins, Mike Rapoport, Kees Cook, Paolo Bonzini,
	linux-kernel, linux-arm-kernel, linux-parisc, linux-sgx, etnaviv,
	dri-devel, linux-arm-msm, freedreno, linux-tegra, kvm,
	linux-fsdevel, nvdimm, linux-mm, iommu, linux-perf-users,
	linux-trace-kernel, kasan-dev, damon, Pedro Falcato, Rik van Riel,
	Harry Yoo, Jann Horn
In-Reply-To: <33d79008948391d30bab38db5ae31072ce12f0a1.1782735110.git.ljs@kernel.org>

On Mon, 29 Jun 2026 13:23:27 +0100 Lorenzo Stoakes <ljs@kernel.org> wrote:

> There are many instances in which linear_page_index() (as well as
> linear_page_delta()) is open-coded, which is confusing and inconsistent.
> 
> Additionally, vma->vm_pgoff doesn't necessarily make it clear that this is
> the page offset of the start of the VMA range.
> 
> Doing so also aids greppability.
> 
> So use vma_start_pgoff() in favour of directly accessing vma->vm_pgoff, and
> linear_page_index() where we can.
> 
> This also lays the ground for future changes which will add an anonymous
> page offset in order to be able to index MAP_PRIVATE-file backed anon
> folios in terms of their virtual page offset.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> ---
>  include/linux/huge_mm.h    |  1 +
>  include/linux/hugetlb.h    |  3 +--
>  include/linux/pagemap.h    |  2 +-
>  mm/damon/vaddr.c           |  5 +++--

I quickly looked only the DAMON part.  Looks nice and cleaner, thank you!

Reviewed-by: SJ Park <sj@kernel.org>


Thanks,
SJ

[...]

^ permalink raw reply

* [PATCH v2 1/2] x86/uprobes: Keep shadow stack in sync for emulated CALLs
From: David Windsor @ 2026-06-30  0:13 UTC (permalink / raw)
  To: mhiramat, oleg, peterz
  Cc: tglx, mingo, bp, dave.hansen, x86, shuah, rick.p.edgecombe, jolsa,
	linux-trace-kernel, linux-kselftest, linux-kernel, David Windsor

Uprobe CALL emulation updates the normal user stack, but not the CET user
shadow stack. The subsequent RET then sees a stale shadow stack entry and
raises #CP.

Update the relative CALL emulation and XOL CALL fixup paths to keep the
shadow stack in sync.

Fixes: 488af8ea7131 ("x86/shstk: Wire in shadow stack interface")
Signed-off-by: David Windsor <dwindsor@gmail.com>
---

v2:
 - propagate error from shshk_update_last_frame() rather than returning
   -ERESTART in default_post_xol_op(). (Oleg)

v1: https://lore.kernel.org/all/20260622183109.1137245-1-dwindsor@gmail.com/

 arch/x86/kernel/uprobes.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index ebb1baf1eb1d..d74bb54543b6 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -1246,9 +1246,15 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
 		long correction = utask->vaddr - utask->xol_vaddr;
 		regs->ip += correction;
 	} else if (auprobe->defparam.fixups & UPROBE_FIX_CALL) {
+		unsigned long retaddr = utask->vaddr + auprobe->defparam.ilen;
+		int err;
+
 		regs->sp += sizeof_long(regs); /* Pop incorrect return address */
-		if (emulate_push_stack(regs, utask->vaddr + auprobe->defparam.ilen))
+		if (emulate_push_stack(regs, retaddr))
 			return -ERESTART;
+		err = shstk_update_last_frame(retaddr);
+		if (err)
+			return err;
 	}
 	/* popf; tell the caller to not touch TF */
 	if (auprobe->defparam.fixups & UPROBE_FIX_SETF)
@@ -1338,6 +1344,10 @@ static bool branch_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
 		 */
 		if (emulate_push_stack(regs, new_ip))
 			return false;
+		if (shstk_push(new_ip) == -EFAULT) {
+			regs->sp += sizeof_long(regs);
+			return false;
+		}
 	} else if (!check_jmp_cond(auprobe, regs)) {
 		offs = 0;
 	}
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2 2/2] selftests/x86: Add shadow stack uprobe CALL test
From: David Windsor @ 2026-06-30  0:13 UTC (permalink / raw)
  To: mhiramat, oleg, peterz
  Cc: tglx, mingo, bp, dave.hansen, x86, shuah, rick.p.edgecombe, jolsa,
	linux-trace-kernel, linux-kselftest, linux-kernel, David Windsor
In-Reply-To: <8b5b1c7407b98f31664ad7b6a6faf20d2d4a6cad.1782777969.git.dwindsor@gmail.com>

Add coverage for entry uprobes installed on CALL instructions while user
shadow stack is enabled. The test puts an entry uprobe on a helper whose
first instruction is a relative CALL, then verifies that the call/return
sequence completes without SIGSEGV.

This catches regressions where x86 uprobe CALL emulation updates the
regular user stack but leaves the CET shadow stack stale.

Signed-off-by: David Windsor <dwindsor@gmail.com>
---

Notes:
    v2:
      - New patch. Adds a uprobe-on-CALL subtest to test_shadow_stack_64 to
        cover the fix in 1/2. Verifies that an emulated CALL through a
        uprobe leaves the shadow stack consistent with the user stack
        (no #CP on the matching RET).

 .../testing/selftests/x86/test_shadow_stack.c | 86 +++++++++++++++++++
 1 file changed, 86 insertions(+)

diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c
index 21af54d5f4ea..3d6ca33edba4 100644
--- a/tools/testing/selftests/x86/test_shadow_stack.c
+++ b/tools/testing/selftests/x86/test_shadow_stack.c
@@ -873,6 +873,86 @@ static int test_uretprobe(void)
 	return err;
 }
 
+/* Keep the CALL first so the function address is exactly the probed CALL. */
+extern void uprobe_call_trigger(void);
+asm (".pushsection .text\n"
+	".global uprobe_call_target\n"
+	".type uprobe_call_target, @function\n"
+	"uprobe_call_target:\n"
+	"	ret\n"
+	".size uprobe_call_target, .-uprobe_call_target\n"
+
+	".global uprobe_call_trigger\n"
+	".type uprobe_call_trigger, @function\n"
+	"uprobe_call_trigger:\n"
+	"	call uprobe_call_target\n"
+	"	ret\n"
+	".size uprobe_call_trigger, .-uprobe_call_trigger\n"
+	".popsection\n"
+);
+
+/* If CALL emulation misses the shadow stack update, this exits via SIGSEGV. */
+static int test_uprobe_call(void)
+{
+	const size_t attr_sz = sizeof(struct perf_event_attr);
+	const char *file = "/proc/self/exe";
+	int fd = -1, type, err = 1;
+	struct perf_event_attr attr;
+	struct sigaction sa = {};
+	ssize_t offset;
+
+	type = determine_uprobe_perf_type();
+	if (type < 0) {
+		if (type == -ENOENT)
+			printf("[SKIP]\tUprobe on CALL test, uprobes are not available\n");
+		return 0;
+	}
+
+	offset = get_uprobe_offset(uprobe_call_trigger);
+	if (offset < 0)
+		return 1;
+
+	sa.sa_sigaction = segv_gp_handler;
+	sa.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGSEGV, &sa, NULL))
+		return 1;
+
+	/* Setup entry uprobe through perf event interface. */
+	memset(&attr, 0, attr_sz);
+	attr.size = attr_sz;
+	attr.type = type;
+	attr.config = 0;
+	attr.config1 = (__u64)(unsigned long)file;
+	attr.config2 = offset;
+
+	fd = syscall(__NR_perf_event_open, &attr, 0 /* pid */, -1 /* cpu */,
+		     -1 /* group_fd */, PERF_FLAG_FD_CLOEXEC);
+	if (fd < 0)
+		goto out;
+
+	if (sigsetjmp(jmp_buffer, 1))
+		goto out;
+
+	if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK))
+		goto out;
+
+	/*
+	 * This either segfaults and goes through sigsetjmp above
+	 * or succeeds and we're good.
+	 */
+	uprobe_call_trigger();
+
+	printf("[OK]\tUprobe on CALL test\n");
+	err = 0;
+
+out:
+	ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
+	signal(SIGSEGV, SIG_DFL);
+	if (fd >= 0)
+		close(fd);
+	return err;
+}
+
 void segv_handler_ptrace(int signum, siginfo_t *si, void *uc)
 {
 	/* The SSP adjustment caused a segfault. */
@@ -1071,6 +1151,12 @@ int main(int argc, char *argv[])
 		goto out;
 	}
 
+	if (test_uprobe_call()) {
+		ret = 1;
+		printf("[FAIL]\tuprobe on CALL test\n");
+		goto out;
+	}
+
 	return ret;
 
 out:
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v13 04/11] perf/probe: Ignore comment lines in dynamic_events/kprobe_events file
From: Namhyung Kim @ 2026-06-30  0:33 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Arnaldo Carvalho de Melo, Steven Rostedt, Mathieu Desnoyers,
	Jonathan Corbet, Shuah Khan, linux-kernel, linux-trace-kernel,
	linux-doc, linux-kselftest
In-Reply-To: <20260630073211.2a505d1f31e5fae1bf03b81a@kernel.org>

Hi Masami,

On Tue, Jun 30, 2026 at 07:32:11AM +0900, Masami Hiramatsu wrote:
> Hi Arnaldo, Namhyung,
> 
> I forgot to CC this. Can I pick this patch via linux-trace tree,
> or would you pick this?
> This is a part of typecast series [1] only for debugging.

Thanks for letting me know.

I think it's better to route this through the perf tree as we're seeing
a lot of cleanups all around the code base.  Having this together would
reduce chances of future conflicts.  Does that sound ok to you?

Thanks,
Namhyung


> 
> [1] https://lore.kernel.org/all/178271361825.1176915.16095297120719039761.stgit@devnote2/
> 
> Thanks,
> 
> On Mon, 29 Jun 2026 15:13:38 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> 
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > Since dynamic_events/kprobe_events files show the fetcharg debug
> > information as comment lines, its reader needs to ignore it.
> > 
> > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > ---
> >  tools/perf/util/probe-file.c |    2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
> > index 4032572cbf55..4d12693a83b3 100644
> > --- a/tools/perf/util/probe-file.c
> > +++ b/tools/perf/util/probe-file.c
> > @@ -197,6 +197,8 @@ struct strlist *probe_file__get_rawlist(int fd)
> >  		idx = strlen(p) - 1;
> >  		if (p[idx] == '\n')
> >  			p[idx] = '\0';
> > +		if (buf[0] == '#')
> > +			continue;
> >  		ret = strlist__add(sl, buf);
> >  		if (ret < 0) {
> >  			pr_debug("strlist__add failed (%d)\n", ret);
> > 
> 
> 
> -- 
> Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v8 23/46] KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION
From: Sean Christopherson @ 2026-06-30  0:35 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, jmattson, jthoughton, michael.roth, oupton, pankaj.gupta,
	qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
	tabba, willy, wyihan, forkloop, pratyush, suzuki.poulose,
	aneesh.kumar, liam, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Baoquan He, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <akI9m02jgKAdi4gX@yzhao56-desk.sh.intel.com>

Gah, I thought I had sent this out this morning, long before Ackerley's response.
But I got distracted by a meeting and forgot to get back to this... *sigh*

Sending what I already wrote, even though there's a lot of overlap with Ackerley's
mail.

On Mon, Jun 29, 2026, Yan Zhao wrote:
> On Fri, Jun 26, 2026 at 08:28:32AM -0700, Ackerley Tng wrote:
> > Yan Zhao <yan.y.zhao@intel.com> writes:
> > > But if a user configures 0 uaddr as valid, writes to it, and then passes 0 as
> > > source_addr(not from gmem), I'm not sure if it's good for the kernel to silently
> > > treat 0 uaddr as an identifier for in-place copy from the private PFN in gmem.
> > >
> > 
> > I'd say the original uAPI perhaps just didn't document 0 as an
> > unsupported uaddr. Given that commit 2a62345b3052 already merged, uAPI
> > was perhaps accidentally changed and no customer complained, I think we
> > can move forward with 0 as an invalid src_address? I wouldn't think
> > anyone relies on 0 intentionally being a valid address.
> > 
> > I could document that, if it helps?
> What about just documenting that 0 is an unsupported uaddr which will be
> re-purposed as an indicator to use the target pfn as the source, regardless of
> whether gmem_in_place_conversion is true? i.e.,
> 
> if (!src_page) 
> 	src_page = pfn_to_page(pfn);

Because KVM can't generally use the target page as the source without in-place
conversion, it's not supported today, and out-of-place conversion is being
deprecated.

> I don't get why the two scenarios should be treated differently:
> 1. gmem_in_place_conversion==true, shared memory is not from gmem 
> 2. gmem_in_place_conversion==false, shared memory is not from gmem
> 
> In both case, a 0 uaddr could be mapped to a valid page not from gmem.

That's immaterial.  KVM's ABI (that we're solidifying) is that an address of '0'
for the source means NULL.  The fact that userspace could have a valid mapping
at virtual address '0' is irrelevant.

Again, just because something is technically possible doesn't mean it needs to
be supported by every piece of KVM's uAPI.

> So why not update the uAPI to handle both cases consistently? :)

Because retroactively adding support for out-of-place conversion is pointless
(requires a userspace update for a feature that's being deprecated), KVM can't
generally support using the source for out-of-place conversion (it's effectively
an obscure zero-page optimization), and IMO rejecting the out-of-place conversion
scenario is valuable for KVM developers, e.g. to help newcomers understand what
exactly is and isn't possible.

Side topic, isn't TDX broken if target page has already been added to the TD?
IIUC, kvm_tdp_mmu_map_private_pfn() will be a glorified nop due to the page
already having a valid S-EPT mapping, and so KVM will incorrectly allow a double
add.  Ahhh, no, because KVM will return RET_PF_SPURIOUS and
kvm_tdp_mmu_map_private_pfn() will then return -EIO.

^ permalink raw reply

* Re: [PATCH] lib/bootconfig: fix undefined behavior involving NULL pointer arithmetic
From: Masami Hiramatsu @ 2026-06-30  0:46 UTC (permalink / raw)
  To: Bradley Morgan
  Cc: Breno Leitao, akpm, mhiramat, linux-kernel, linux-trace-kernel,
	stable
In-Reply-To: <0B594835-45AD-4B37-85A3-C7F54F8D668A@grrlz.net>

On Mon, 29 Jun 2026 14:53:05 +0100
Bradley Morgan <include@grrlz.net> wrote:

> On 29 June 2026 14:41:37 BST, Breno Leitao <leitao@debian.org> wrote:
> >On Sun, Jun 28, 2026 at 11:56:16AM +0000, Bradley Morgan wrote:
> >> When xbc_snprint_cmdline() is called during the size-probing phase
> >> (with buf = NULL and size = 0), the function computes the end pointer
> >> as 'buf + size' (NULL + 0) and repeatedly advances the pointer via
> >> 'buf += ret'.
> >> 
> >> Under the C standard, performing pointer arithmetic on a NULL pointer is
> >> undefined behavior. While harmless inside the kernel, this code is also
> >> compiled into the userspace host tool 'tools/bootconfig', where host
> >> compilers with UBSan or FORTIFY_SOURCE enabled abort the build when they
> >> detect NULL pointer arithmetic.
> >> 
> >> Fix this by tracking the running written length as an integer offset
> >> ('len') rather than advancing 'buf' directly. Only perform pointer
> >> arithmetic if 'buf' is actually non-NULL.
> >> 
> >> Fixes: 5a643e462323 ("bootconfig: move xbc_snprint_cmdline() to
> >lib/bootconfig.c")
> >
> >Isn't commit 5a643e462323 ("bootconfig: move xbc_snprint_cmdline() to
> >lib/bootconfig.c") just a code movement?
> 
> Ugh, Geminis bullcrap, you are right. I should've just manually looked
> for the fixes tag (as I always do)

Yeah, please use the latest linus kernel. (v7.2-rc1, for now)

> 
> >>  	xbc_node_for_each_key_value(root, knode, val) {
> >> @@ -439,10 +437,12 @@ int __init xbc_snprint_cmdline(char *buf, size_t
> >size, struct xbc_node *root)
> >>  
> >>  		vnode = xbc_node_get_child(knode);
> >>  		if (!vnode) {
> >> -			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
> >> +			ret = snprintf(buf ? buf + len : NULL,
> >> +				       size > len ? size - len : 0,
> >
> >Why not keeping rest() and updating it, instead of open coding it?
> >
> >Thanks for the fix.
> 
> sure I'll do V2, btw if u didn't read, gemini found and fixed this.
> As in fully. :)

Hint: for fixing an issue, please just focus on fixing the issue
and try minimizing the change for keeping backportability.

Thanks,

> 
> 
> 
> >--breno
> >
> 
> Thanks!


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH v3 2/7] mm/page_owner: add MR_NEVER to enum migrate_reason and use it for last_migrate_reason
From: Ye Liu @ 2026-06-30  1:53 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Steven Rostedt,
	Masami Hiramatsu, Vlastimil Babka
  Cc: Ye Liu, Zi Yan, Matthew Brost, Joshua Hahn, Rakie Kim,
	Byungchul Park, Gregory Price, Ying Huang, Alistair Popple,
	Mathieu Desnoyers, Suren Baghdasaryan, Michal Hocko,
	Brendan Jackman, Johannes Weiner, linux-mm, linux-kernel,
	linux-trace-kernel
In-Reply-To: <20260630015331.147174-1-ye.liu@linux.dev>

The last_migrate_reason field uses -1 as a sentinel value to mean "no
migration has happened".  Replace the four bare -1 occurrences by
adding a proper MR_NEVER member to enum migrate_reason, defining a
corresponding "never_migrated" string in the MIGRATE_REASON trace
macro, and removing the local MIGRATE_REASON_NONE define.

No functional change.

Signed-off-by: Ye Liu <ye.liu@linux.dev>
---
 include/linux/migrate_mode.h   | 1 +
 include/trace/events/migrate.h | 3 ++-
 mm/page_owner.c                | 8 ++++----
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index 265c4328b36a..05102d4d2490 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -25,6 +25,7 @@ enum migrate_reason {
 	MR_LONGTERM_PIN,
 	MR_DEMOTION,
 	MR_DAMON,
+	MR_NEVER,		/* page has never been migrated */
 	MR_TYPES
 };
 
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index cd01dd7b3640..11bc0aa14c7e 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -23,7 +23,8 @@
 	EM( MR_CONTIG_RANGE,	"contig_range")			\
 	EM( MR_LONGTERM_PIN,	"longterm_pin")			\
 	EM( MR_DEMOTION,	"demotion")			\
-	EMe(MR_DAMON,		"damon")
+	EM( MR_DAMON,		"damon")			\
+	EMe(MR_NEVER,		"never_migrated")
 
 /*
  * First define the enums in the above macros to be exported to userspace
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 342549891a8d..c2f43ab860eb 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -339,7 +339,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
 	depot_stack_handle_t handle;
 
 	handle = save_stack(gfp_mask);
-	__update_page_owner_handle(page, handle, order, gfp_mask, -1,
+	__update_page_owner_handle(page, handle, order, gfp_mask, MR_NEVER,
 				   ts_nsec, current->pid, current->tgid,
 				   current->comm);
 	inc_stack_record_count(handle, gfp_mask, 1 << order);
@@ -596,7 +596,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
 	if (ret >= count)
 		goto err;
 
-	if (page_owner->last_migrate_reason != -1) {
+	if (page_owner->last_migrate_reason != MR_NEVER) {
 		ret += scnprintf(kbuf + ret, count - ret,
 			"Page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
@@ -667,7 +667,7 @@ void __dump_page_owner(const struct page *page)
 		stack_depot_print(handle);
 	}
 
-	if (page_owner->last_migrate_reason != -1)
+	if (page_owner->last_migrate_reason != MR_NEVER)
 		pr_alert("page has been migrated, last migrate reason: %s\n",
 			migrate_reason_names[page_owner->last_migrate_reason]);
 	page_ext_put(page_ext);
@@ -826,7 +826,7 @@ static void init_pages_in_zone(struct zone *zone)
 
 			/* Found early allocated page */
 			__update_page_owner_handle(page, early_handle, 0, 0,
-						   -1, local_clock(), current->pid,
+						   MR_NEVER, local_clock(), current->pid,
 						   current->tgid, current->comm);
 			count++;
 ext_put_continue:
-- 
2.43.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox