From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45A622DF717 for ; Fri, 12 Jun 2026 17:24:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781285042; cv=none; b=IpfgTD1MAtMm38w/j3/L0RDy+Jmc5YXrYDmVrEAvZ/PVrFSO4svn7QUWC3aHUltECXC/G3ZfG75unCgA3zFXt5PxEKlVRsZGO1wgECc8dIVGWwc6WWo3bpMv/gRvO8o9FzzzR1JCIJOTLt+DN/AGqTzrx5P+vjHttQX8pFyVR00= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781285042; c=relaxed/simple; bh=YFD4kPieNm4e58FWXLmpUwq18/QNUiQibqM5+55SuIs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hKHVWbpw/0khcDuXmMEDmUlkoFkM+4/Pl9W907EUtTPWqpPAXG3Bg75UH520DCNJZdmWOUJHxhViIonClXqVLLK9322nV8jT9HC+6JZv3Nv5qxmfZUDkOpM+mzPwQdYJGtjqPEM9yeSKg29Fnv/Fid4g7AEORCd3h5ADtWWKtIE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QLaW3QP/; arc=none smtp.client-ip=209.85.128.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QLaW3QP/" Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-490aebf33e9so5966515e9.3 for ; Fri, 12 Jun 2026 10:24:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781285039; x=1781889839; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KMGHII74ZTHreA8zFAF4aBq+kvKQYj6uXVGGtKbtko4=; b=QLaW3QP/9MJ5dfhDIPFXCxLUq0ssZF5ikZ3DlWWKKl3t+BkoMQjT8OxL1Dys705BDM rVFK8Va8OYgA9/Jjg2GFQRCv5DjxmRKz4Y5XaeVSWFEVPbOTYbM1MRgAHy3WYuop67Ta kNNKa6rtXFpv31Gx8f1eZl+mfwoFNpfXXtDo0jEvWeTw+8E1IJ+iesJGB0j+C/fSvRve jTbayqH3RzAFGt4bFQLVuZVvuYbbMXkswfvzNK5j3ApLAflmXi+Igq6bFQeZ/UZ7iyvF yMfbOMluiunTWf1th+zDJhs+C+QgqzZUS0u82PuLIrdj8h2qobf4Qck2ve+VF1a4SZq5 ZZTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781285039; x=1781889839; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=KMGHII74ZTHreA8zFAF4aBq+kvKQYj6uXVGGtKbtko4=; b=cVAXI1hItiB+BtKCFZftu+DPKCX0du819ZmM6M0DJ3at0RjeMU98JLKEl8r2eJzYyy SZbyiVBJ/sp1twUR0R+62f20m2uU8YgUC4VVbLJSCaOlWNxfo+H3SNpWSQqZ5QBJ9SfC iMWFt1/NMu56dFA+jmiBqf5RSSeh2KEWDVlmXjXaXUZbZlbX47ofAkiuEAW9mDLSATFR 9Z96WMFpwa68ZJnuXvUmYoR5AusJY/u5yJUW6VNW3YJK4DF9xEwYpXW545li4lJOOR9f ybxRX0EdM29e66TRVarmTwvD5Qs/ZX1CKnVjQz+WC6CfbOfXApiqRH2RU34goTUWM4Up QH8Q== X-Forwarded-Encrypted: i=1; AFNElJ8Zt7MPvQex7gQvUPeVYoF2Daf6/QB+1o2XhP+HxWxKlnIfJy7g0UYmhTUM05yEjMEpGZDUpT+B876wJYM=@vger.kernel.org X-Gm-Message-State: AOJu0YzUx5kmu4N3OGrKft2I8e+yOECjoeKJBUF5Y2hvxM2RTNi/f5ff +iBZjQTkehCE/5Bdt8MWxk9EcnTXXgP3hptLHt+MB1fXLusi7/1OEJ5l X-Gm-Gg: Acq92OGcLUOqi2j2k8MPNkvRzwzmoK6cNA0yGWzBBaZB25sMt+FIaJ7AdSTBCjiD2Jh E+MyffyesmJQ4IrNXCX42Nr0dM0JZSpAdjGRQrmDWSVFHiwjduhCY9sqC+M1vsA52tQeSKxn51v 8kOaZk7Ttfk6sRv5VdoAN9srKRtTYKwswchV0uLL2MiDjXG44x6ZPDxa5c+MMtGTjHXzDtWA7OF yqIw9HQZo3hKLi6TmN00jFXmA2YZnoztBjIE4Hx/8SC6Y2uZkptn934lmtjAhUEqSdIyTfsuzdz J2l/WiReYAxIdbFPnaMjiBad69PJMEkU1A6zSNuOeMqCY/ckqSL+Aqr0mrKAEA1lmzXMYN6B6/k nF1XhHlEMHui+kTgbNpYx0nrjfXfGd7Baobxz9CCDjtnbxMWFBnwg7+00+hozqQ6yw1VkUe9Gua ucxXHOX+ENm13tIXgSWatb2lPZjk1lEGD+QW2lizOlGKjCbIdaqcBps3i1N67y8gGoTcmYHGlTE 6/PWope2dY= X-Received: by 2002:a05:600d:15a:20b0:490:bbc4:76a6 with SMTP id 5b1f17b1804b1-490ec4fb51fmr38439775e9.21.1781285039252; Fri, 12 Jun 2026 10:23:59 -0700 (PDT) Received: from dohko.chello.ie (188-141-5-72.dynamic.upc.ie. [188.141.5.72]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-4606f2ce361sm7009190f8f.31.2026.06.12.10.23.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jun 2026 10:23:58 -0700 (PDT) From: David Carlier To: akpm@linux-foundation.org Cc: syzbot+fd95a72470f5a44e464c@syzkaller.appspotmail.com, David Carlier , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Kevin Tian , Jason Gunthorpe , Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v3] mm: pgtable: protect lockless kernel page table walks with RCU Date: Fri, 12 Jun 2026 18:23:55 +0100 Message-ID: <20260612172356.356894-1-devnexen@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260612091215.b06dc7dc9dc894a5bfc75429@linux-foundation.org> References: <20260612091215.b06dc7dc9dc894a5bfc75429@linux-foundation.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ptdump walks the kernel page tables locklessly through walk_kernel_page_table_range_lockless(). It only holds the init_mm mmap lock and the memory hotplug lock, and neither excludes vmalloc/ioremap teardown from freeing kernel PTE pages via pmd_free_pte_page() -> pagetable_free_kernel(). syzbot hit a use-after-free in ptdump_pte_entry() reading a PTE page that was freed underneath the walk. Deferring the kernel page table free only batches the TLB flush; it does not wait for lockless walkers. Mirror the user page table walk, where pte_offset_map() already takes the RCU read lock: hold rcu_read_lock() across the kernel walk in the init_mm branch of walk_page_range_debug() and rcu-free the page tables in the kernel page table free worker, after the batched TLB flush. ptdump is the only walker that races with these frees and its callbacks do not sleep, so the lockless walker itself stays lockless for its other, exclusive-access callers (e.g. the arm64 page table split paths, which allocate with GFP_PGTABLE_KERNEL and may sleep). A walker then either observes the cleared PMD and skips the page, or keeps it alive until it drops the RCU read lock. Fixes: 5ba2f0a15564 ("mm: introduce deferred freeing for kernel page tables") Reported-by: syzbot+fd95a72470f5a44e464c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6a287988.39669fcc.33b062.00a0.GAE@google.com/T/ Assisted-by: Claude:claude-opus-4-8 Signed-off-by: David Carlier --- v3: take rcu_read_lock() only in the init_mm branch of walk_page_range_debug() instead of inside walk_kernel_page_table_range_lockless(). The lockless helper is also reached by the arm64 split paths, which allocate page tables with GFP_PGTABLE_KERNEL and can sleep, so it must stay lockless (Andrew, Sashiko). v2: rcu-free the page tables with call_rcu() instead of synchronize_rcu() (Matthew Wilcox). --- mm/pagewalk.c | 21 ++++++++++++++++++--- mm/pgtable-generic.c | 16 +++++++++++++++- 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 3ae2586ff45b..dbb443c72353 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -692,9 +692,24 @@ int walk_page_range_debug(struct mm_struct *mm, unsigned long start, }; /* For convenience, we allow traversal of kernel mappings. */ - if (mm == &init_mm) - return walk_kernel_page_table_range(start, end, ops, - pgd, private); + if (mm == &init_mm) { + int err; + + /* + * Kernel intermediate page tables can be freed concurrently by + * vmalloc/ioremap teardown (e.g. pmd_free_pte_page()), which + * routes the freed pages through pagetable_free_kernel(). That + * path defers the free past an RCU grace period, so hold the RCU + * read lock across the walk to prevent a page table from being + * freed while we are still dereferencing it. ptdump is the only + * caller here and its callbacks do not sleep, so this is safe. + */ + rcu_read_lock(); + err = walk_kernel_page_table_range(start, end, ops, pgd, private); + rcu_read_unlock(); + return err; + } + if (start >= end || !walk.mm) return -EINVAL; if (!check_ops_safe(ops)) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index b91b1a98029c..5b53e9a5b7f8 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -424,6 +424,13 @@ static struct { .work = __WORK_INITIALIZER(kernel_pgtable_work.work, kernel_pgtable_work_func), }; +static void kernel_pgtable_free_rcu(struct rcu_head *head) +{ + struct ptdesc *pt = container_of(head, struct ptdesc, pt_rcu_head); + + __pagetable_free(pt); +} + static void kernel_pgtable_work_func(struct work_struct *work) { struct ptdesc *pt, *next; @@ -434,8 +441,15 @@ static void kernel_pgtable_work_func(struct work_struct *work) spin_unlock(&kernel_pgtable_work.lock); iommu_sva_invalidate_kva_range(PAGE_OFFSET, TLB_FLUSH_ALL); + + /* + * Lockless kernel page table walkers (ptdump, and any other user of + * walk_kernel_page_table_range_lockless()) dereference these pages + * under rcu_read_lock(). Free them after a grace period so a walker + * cannot still be reading a page we release. + */ list_for_each_entry_safe(pt, next, &page_list, pt_list) - __pagetable_free(pt); + call_rcu(&pt->pt_rcu_head, kernel_pgtable_free_rcu); } void pagetable_free_kernel(struct ptdesc *pt) -- 2.53.0