public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Lu Baolu <baolu.lu@linux.intel.com>
To: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Jann Horn <jannh@google.com>,
	Vasant Hegde <vasant.hegde@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@intel.com>,
	Alistair Popple <apopple@nvidia.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Andy Lutomirski <luto@kernel.org>, Yi Lai <yi1.lai@intel.com>,
	David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: iommu@lists.linux.dev, security@kernel.org, x86@kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Lu Baolu <baolu.lu@linux.intel.com>
Subject: [PATCH v7 0/8] Fix stale IOTLB entries for kernel address space
Date: Wed, 22 Oct 2025 16:26:26 +0800	[thread overview]
Message-ID: <20251022082635.2462433-1-baolu.lu@linux.intel.com> (raw)

This proposes a fix for a security vulnerability related to IOMMU Shared
Virtual Addressing (SVA). In an SVA context, an IOMMU can cache kernel
page table entries. When a kernel page table page is freed and
reallocated for another purpose, the IOMMU might still hold stale,
incorrect entries. This can be exploited to cause a use-after-free or
write-after-free condition, potentially leading to privilege escalation
or data corruption.

This solution introduces a deferred freeing mechanism for kernel page
table pages, which provides a safe window to notify the IOMMU to
invalidate its caches before the page is reused.

Change log:
v7:
 - The use of pmd_ptdesc() introduced a bug reported at
   https://lore.kernel.org/linux-iommu/68eeb99e.050a0220.91a22.0220.GAE@google.com/.
   Fix this by replacing it with page_ptdesc().
 - Discussed the approach of backporting and reached a consensus that we
   need an extra patch to disable SVA for x86 arch and re-enable it after
   the kernel page table free callback is done.
 - Use "const struct ptdesc *ptdesc" as the parameter for
   ptdesc_test_kernel().
 - Move "select ASYNC_KERNEL_PGTABLE_FREE" to the last patch.

v6:
 - https://lore.kernel.org/linux-iommu/20251014130437.1090448-1-baolu.lu@linux.intel.com/
 - Follow commit 522abd92279a to set/clear/test a flag of struct
   ptdesc.
 - User pmd_ptdesc() helper.
 - Squash previous PATCH 6 and 7.
 - Rename CONFIG_ASYNC_PGTABLE_FREE to CONFIG_ASYNC_KERNEL_PGTABLE_FREE.
 - Refine commit message.
 - Rebase on top of v6.18-rc1.

v5:
 - https://lore.kernel.org/linux-iommu/20250919054007.472493-1-baolu.lu@linux.intel.com/
 - Renamed pagetable_free_async() to pagetable_free_kernel() to avoid
   confusion.
 - Removed list_del() when the list is on the stack, as it will be freed
   when the function returns.
 - Discussed a corner case related to memory unplug of memory that was
   present as reserved memory at boot. Given that it's extremely rare
   and cannot be triggered by unprivileged users. We decided to focus
   our efforts on the common vfree() case and noted that corner case in
   the commit message.
 - Some cleanups.

v4:
 - https://lore.kernel.org/linux-iommu/20250905055103.3821518-1-baolu.lu@linux.intel.com/
 - Introduce a mechanism to defer the freeing of page-table pages for
   KVA mappings. Call iommu_sva_invalidate_kva_range() in the deferred
   work thread before freeing the pages.

v3:
 - https://lore.kernel.org/linux-iommu/20250806052505.3113108-1-baolu.lu@linux.intel.com/
 - iommu_sva_mms is an unbound list; iterating it in an atomic context
   could introduce significant latency issues. Schedule it in a kernel
   thread and replace the spinlock with a mutex.
 - Replace the static key with a normal bool; it can be brought back if
   data shows the benefit.
 - Invalidate KVA range in the flush_tlb_all() paths.
 - All previous reviewed-bys are preserved. Please let me know if there
   are any objections.

v2:
 - https://lore.kernel.org/linux-iommu/20250709062800.651521-1-baolu.lu@linux.intel.com/
 - Remove EXPORT_SYMBOL_GPL(iommu_sva_invalidate_kva_range);
 - Replace the mutex with a spinlock to make the interface usable in the
   critical regions.

v1: https://lore.kernel.org/linux-iommu/20250704133056.4023816-1-baolu.lu@linux.intel.com/

Dave Hansen (5):
  mm: Add a ptdesc flag to mark kernel page tables
  mm: Actually mark kernel page table pages
  x86/mm: Use 'ptdesc' when freeing PMD pages
  mm: Introduce pure page table freeing function
  mm: Introduce deferred freeing for kernel page tables

Lu Baolu (3):
  iommu: Disable SVA when CONFIG_X86 is set
  x86/mm: Use pagetable_free()
  iommu/sva: Invalidate stale IOTLB entries for kernel address space

 arch/x86/Kconfig              |  1 +
 mm/Kconfig                    |  3 ++
 include/asm-generic/pgalloc.h | 18 ++++++++++
 include/linux/iommu.h         |  4 +++
 include/linux/mm.h            | 65 +++++++++++++++++++++++++++++++++--
 arch/x86/mm/init_64.c         |  2 +-
 arch/x86/mm/pat/set_memory.c  |  2 +-
 arch/x86/mm/pgtable.c         | 12 +++----
 drivers/iommu/iommu-sva.c     | 29 +++++++++++++++-
 mm/pgtable-generic.c          | 39 +++++++++++++++++++++
 10 files changed, 163 insertions(+), 12 deletions(-)

-- 
2.43.0



             reply	other threads:[~2025-10-22  8:29 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-22  8:26 Lu Baolu [this message]
2025-10-22  8:26 ` [PATCH v7 1/8] iommu: Disable SVA when CONFIG_X86 is set Lu Baolu
2025-10-22 19:50   ` Jason Gunthorpe
2025-10-22  8:26 ` [PATCH v7 2/8] mm: Add a ptdesc flag to mark kernel page tables Lu Baolu
2025-10-22 18:31   ` David Hildenbrand
2025-10-23  7:07   ` Mike Rapoport
2025-10-22  8:26 ` [PATCH v7 3/8] mm: Actually mark kernel page table pages Lu Baolu
2025-10-22  8:26 ` [PATCH v7 4/8] x86/mm: Use 'ptdesc' when freeing PMD pages Lu Baolu
2025-10-22 18:31   ` David Hildenbrand
2025-10-22  8:26 ` [PATCH v7 5/8] mm: Introduce pure page table freeing function Lu Baolu
2025-10-22  8:26 ` [PATCH v7 6/8] x86/mm: Use pagetable_free() Lu Baolu
2025-11-18  2:14   ` Vishal Moola (Oracle)
2025-11-20 10:35     ` Mike Rapoport
2025-10-22  8:26 ` [PATCH v7 7/8] mm: Introduce deferred freeing for kernel page tables Lu Baolu
2025-10-22 18:34   ` David Hildenbrand
2025-10-22 19:12     ` Dave Hansen
2025-10-22 19:52     ` Jason Gunthorpe
2025-10-23  7:10   ` Mike Rapoport
2025-10-22  8:26 ` [PATCH v7 8/8] iommu/sva: Invalidate stale IOTLB entries for kernel address space Lu Baolu
2025-10-22 19:01 ` [PATCH v7 0/8] Fix " Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251022082635.2462433-1-baolu.lu@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jannh@google.com \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=security@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vasant.hegde@amd.com \
    --cc=vbabka@suse.cz \
    --cc=vinicius.gomes@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yi1.lai@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox