All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Carlier <devnexen@gmail.com>
To: akpm@linux-foundation.org
Cc: syzbot+fd95a72470f5a44e464c@syzkaller.appspotmail.com,
	David Carlier <devnexen@gmail.com>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Kevin Tian <kevin.tian@intel.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH] mm: pgtable: protect lockless kernel page table walks with RCU
Date: Fri, 12 Jun 2026 05:38:27 +0100	[thread overview]
Message-ID: <20260612043828.23558-1-devnexen@gmail.com> (raw)

ptdump walks the kernel page tables locklessly through
walk_kernel_page_table_range_lockless().  It only holds the init_mm
mmap lock and the memory hotplug lock, and neither excludes
vmalloc/ioremap teardown from freeing kernel PTE pages via
pmd_free_pte_page() -> pagetable_free_kernel().  syzbot hit a
use-after-free in ptdump_pte_entry() reading a PTE page that was freed
underneath the walk.

Deferring the kernel page table free only batches the TLB flush; it does
not wait for lockless walkers.  Mirror the user page table walk, where
pte_offset_map() already takes the RCU read lock: hold rcu_read_lock()
across the lockless kernel walk and wait for a grace period in the kernel
page table free worker before releasing the pages.  A walker then either
observes the cleared PMD and skips the page, or keeps it alive until it
drops the RCU read lock.

Fixes: 5ba2f0a15564 ("mm: introduce deferred freeing for kernel page tables")
Reported-by: syzbot+fd95a72470f5a44e464c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a287988.39669fcc.33b062.00a0.GAE@google.com/T/
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 mm/pagewalk.c        | 15 ++++++++++++++-
 mm/pgtable-generic.c |  8 ++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 3ae2586ff45b..6d9f14f86784 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -655,13 +655,26 @@ int walk_kernel_page_table_range_lockless(unsigned long start, unsigned long end
 		.private	= private,
 		.no_vma		= true
 	};
+	int err;
 
 	if (start >= end)
 		return -EINVAL;
 	if (!check_ops_safe(ops))
 		return -EINVAL;
 
-	return walk_pgd_range(start, end, &walk);
+	/*
+	 * Kernel intermediate page tables can be freed concurrently by
+	 * vmalloc/ioremap teardown (e.g. pmd_free_pte_page()), which routes
+	 * the freed pages through pagetable_free_kernel(). That path defers
+	 * the free past an RCU grace period, so hold the RCU read lock across
+	 * the lockless walk to prevent a page table from being freed while we
+	 * are still dereferencing it.
+	 */
+	rcu_read_lock();
+	err = walk_pgd_range(start, end, &walk);
+	rcu_read_unlock();
+
+	return err;
 }
 
 /**
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index b91b1a98029c..59e1315185b4 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -434,6 +434,14 @@ static void kernel_pgtable_work_func(struct work_struct *work)
 	spin_unlock(&kernel_pgtable_work.lock);
 
 	iommu_sva_invalidate_kva_range(PAGE_OFFSET, TLB_FLUSH_ALL);
+
+	/*
+	 * Lockless kernel page table walkers (ptdump, and any other user of
+	 * walk_kernel_page_table_range_lockless()) dereference these pages
+	 * under rcu_read_lock(). Wait for a grace period so no walker can
+	 * still be reading a page we are about to free.
+	 */
+	synchronize_rcu();
 	list_for_each_entry_safe(pt, next, &page_list, pt_list)
 		__pagetable_free(pt);
 }
-- 
2.53.0


             reply	other threads:[~2026-06-12  4:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12  4:38 David Carlier [this message]
2026-06-12  4:52 ` [PATCH] mm: pgtable: protect lockless kernel page table walks with RCU Matthew Wilcox
2026-06-12  4:59   ` David CARLIER
2026-06-12  5:05   ` [PATCH v2] " David Carlier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260612043828.23558-1-devnexen@gmail.com \
    --to=devnexen@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=david@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=syzbot+fd95a72470f5a44e464c@syzkaller.appspotmail.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.