From: Baolu Lu <baolu.lu@linux.intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>, Jason Gunthorpe <jgg@nvidia.com>
Cc: "Hansen, Dave" <dave.hansen@intel.com>,
Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>, Jann Horn <jannh@google.com>,
Vasant Hegde <vasant.hegde@amd.com>,
Alistair Popple <apopple@nvidia.com>,
Peter Zijlstra <peterz@infradead.org>,
Uladzislau Rezki <urezki@gmail.com>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
Andy Lutomirski <luto@kernel.org>, "Lai, Yi1" <yi1.lai@intel.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
"security@kernel.org" <security@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush
Date: Mon, 18 Aug 2025 14:21:55 +0800 [thread overview]
Message-ID: <87bfc80e-258e-4193-a56c-3096608aec30@linux.intel.com> (raw)
In-Reply-To: <BN9PR11MB52762A47B347C99F0C0E4C288C2FA@BN9PR11MB5276.namprd11.prod.outlook.com>
On 8/8/25 10:57, Tian, Kevin wrote:
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Friday, August 8, 2025 3:52 AM
>>
>> On Thu, Aug 07, 2025 at 10:40:39PM +0800, Baolu Lu wrote:
>>> +static void kernel_pte_work_func(struct work_struct *work)
>>> +{
>>> + struct page *page, *next;
>>> +
>>> + iommu_sva_invalidate_kva_range(0, TLB_FLUSH_ALL);
>>> +
>>> + guard(spinlock)(&kernel_pte_work.lock);
>>> + list_for_each_entry_safe(page, next, &kernel_pte_work.list, lru) {
>>> + list_del_init(&page->lru);
>>
>> Please don't add new usages of lru, we are trying to get rid of this. :(
>>
>> I think the memory should be struct ptdesc, use that..
>>
>
> btw with this change we should also defer free of the pmd page:
>
> pud_free_pmd_page()
> ...
> for (i = 0; i < PTRS_PER_PMD; i++) {
> if (!pmd_none(pmd_sv[i])) {
> pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
> pte_free_kernel(&init_mm, pte);
> }
> }
>
> free_page((unsigned long)pmd_sv);
>
> Otherwise the risk still exists if the pmd page is repurposed before the
> pte work is scheduled.
My understanding is that you were talking about pmd_free(), correct? It
appears that both pmd_free() and pte_free_kernel() call
pagetable_dtor_free(). If so, how about the following change?
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index dbddacdca2ce..04f6b5bc212c 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -172,10 +172,8 @@ static inline pmd_t *pmd_alloc_one_noprof(struct
mm_struct *mm, unsigned long ad
#ifndef __HAVE_ARCH_PMD_FREE
static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
{
- struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
-
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
- pagetable_dtor_free(ptdesc);
+ pte_free_kernel(mm, (pte_t *)pmd);
}
>
> another observation - pte_free_kernel is not used in remove_pagetable ()
> and __change_page_attr(). Is it straightforward to put it in those paths
> or do we need duplicate some deferring logic there?
>
In both of these cases, pages in an array or list require deferred
freeing. I don't believe the previous approach, which calls
pagetable_dtor_free(), will still work here. Perhaps a separate list is
needed? What about something like the following?
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 76e33bd7c556..2e887463c165 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1013,7 +1013,10 @@ static void __meminit free_pagetable(struct page
*page, int order)
free_reserved_pages(page, nr_pages);
#endif
} else {
- free_pages((unsigned long)page_address(page), order);
+ if (IS_ENABLED(CONFIG_ASYNC_PGTABLE_FREE))
+ kernel_pgtable_free_pages_async(page, order);
+ else
+ free_pages((unsigned long)page_address(page),
order);
}
}
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 8834c76f91c9..7e567bdfddce 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -436,9 +436,13 @@ static void cpa_collapse_large_pages(struct
cpa_data *cpa)
flush_tlb_all();
- list_for_each_entry_safe(ptdesc, tmp, &pgtables, pt_list) {
- list_del(&ptdesc->pt_list);
- __free_page(ptdesc_page(ptdesc));
+ if (IS_ENABLED(CONFIG_ASYNC_PGTABLE_FREE)) {
+ kernel_pgtable_free_list_async(&pgtables);
+ } else {
+ list_for_each_entry_safe(ptdesc, tmp, &pgtables, pt_list) {
+ list_del(&ptdesc->pt_list);
+ __free_page(ptdesc_page(ptdesc));
+ }
}
}
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 4938eff5b482..13bbe1588872 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -48,8 +48,12 @@ static inline pte_t
*pte_alloc_one_kernel_noprof(struct mm_struct *mm)
#ifdef CONFIG_ASYNC_PGTABLE_FREE
void pte_free_kernel_async(struct ptdesc *ptdesc);
+void kernel_pgtable_free_list_async(struct list_head *list);
+void kernel_pgtable_free_pages_async(struct page *pages, int order);
#else
static inline void pte_free_kernel_async(struct ptdesc *ptdesc) {}
+static inline void kernel_pgtable_free_list_async(struct list_head
*list) {}
+void kernel_pgtable_free_pages_async(struct page *pages, int order) {}
#endif
/**
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index b9a785dd6b8d..d1d19132bbe8 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -413,6 +413,7 @@ static void kernel_pte_work_func(struct work_struct
*work);
struct {
struct list_head list;
+ struct list_head pages;
spinlock_t lock;
struct work_struct work;
} kernel_pte_work = {
@@ -425,17 +426,24 @@ static void kernel_pte_work_func(struct
work_struct *work)
{
struct ptdesc *ptdesc, *next;
LIST_HEAD(free_list);
+ LIST_HEAD(pages_list);
iommu_sva_invalidate_kva_range(0, TLB_FLUSH_ALL);
spin_lock(&kernel_pte_work.lock);
list_move(&kernel_pte_work.list, &free_list);
+ list_move(&kernel_pte_work.pages, &pages_list);
spin_unlock(&kernel_pte_work.lock);
list_for_each_entry_safe(ptdesc, next, &free_list, pt_list) {
list_del(&ptdesc->pt_list);
pagetable_dtor_free(ptdesc);
}
+
+ list_for_each_entry_safe(ptdesc, next, &pages_list, pt_list) {
+ list_del(&ptdesc->pt_list);
+ __free_page(ptdesc_page(ptdesc));
+ }
}
void pte_free_kernel_async(struct ptdesc *ptdesc)
@@ -444,4 +452,25 @@ void pte_free_kernel_async(struct ptdesc *ptdesc)
list_add(&ptdesc->pt_list, &kernel_pte_work.list);
schedule_work(&kernel_pte_work.work);
}
+
+void kernel_pgtable_free_list_async(struct list_head *list)
+{
+ guard(spinlock)(&kernel_pte_work.lock);
+ list_splice_tail(list, &kernel_pte_work.pages);
+ schedule_work(&kernel_pte_work.work);
+}
+
+void kernel_pgtable_free_pages_async(struct page *pages, int order)
+{
+ unsigned long nr_pages = 1 << order;
+ struct ptdesc *ptdesc;
+ int i;
+
+ guard(spinlock)(&kernel_pte_work.lock);
+ for (i = 0; i < nr_pages; i++) {
+ ptdesc = page_ptdesc(&pages[i]);
+ list_add(&ptdesc->pt_list, &kernel_pte_work.pages);
+ }
+ schedule_work(&kernel_pte_work.work);
+}
#endif
Thanks,
baolu
next prev parent reply other threads:[~2025-08-18 6:24 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-06 5:25 [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush Lu Baolu
2025-08-06 15:03 ` Dave Hansen
2025-08-06 15:52 ` Jason Gunthorpe
2025-08-06 16:04 ` Dave Hansen
2025-08-06 16:09 ` Jason Gunthorpe
2025-08-06 16:34 ` Dave Hansen
2025-08-06 16:42 ` Jason Gunthorpe
2025-08-07 14:40 ` Baolu Lu
2025-08-07 15:31 ` Dave Hansen
2025-08-08 5:15 ` Baolu Lu
2025-08-10 7:19 ` Ethan Zhao
2025-08-11 9:15 ` Uladzislau Rezki
2025-08-11 12:55 ` Jason Gunthorpe
2025-08-15 9:23 ` Baolu Lu
2025-08-11 13:55 ` Dave Hansen
2025-08-11 14:56 ` Uladzislau Rezki
2025-08-12 1:17 ` Ethan Zhao
2025-08-15 14:35 ` Dave Hansen
2025-08-11 12:57 ` Jason Gunthorpe
2025-08-13 3:17 ` Ethan Zhao
2025-08-18 1:34 ` Baolu Lu
2025-08-07 19:51 ` Jason Gunthorpe
2025-08-08 2:57 ` Tian, Kevin
2025-08-15 9:16 ` Baolu Lu
2025-08-15 9:46 ` Tian, Kevin
2025-08-18 5:58 ` Baolu Lu
2025-08-15 14:31 ` Dave Hansen
2025-08-18 6:08 ` Baolu Lu
2025-08-18 6:21 ` Baolu Lu [this message]
2025-08-21 7:05 ` Tian, Kevin
2025-08-23 3:26 ` Baolu Lu
2025-08-25 22:36 ` Dave Hansen
2025-08-26 1:25 ` Baolu Lu
2025-08-26 2:49 ` Baolu Lu
2025-08-26 14:22 ` Dave Hansen
2025-08-26 14:33 ` Matthew Wilcox
2025-08-26 14:57 ` Dave Hansen
2025-08-27 10:58 ` Baolu Lu
2025-08-27 23:31 ` Dave Hansen
2025-08-28 5:31 ` Baolu Lu
2025-08-28 7:08 ` Tian, Kevin
2025-08-28 18:56 ` Dave Hansen
2025-08-28 19:10 ` Jason Gunthorpe
2025-08-28 19:31 ` Dave Hansen
2025-08-28 19:39 ` Matthew Wilcox
2025-08-26 16:21 ` Dave Hansen
2025-08-27 6:34 ` Baolu Lu
2025-08-08 5:08 ` Baolu Lu
2025-08-07 6:53 ` Baolu Lu
2025-08-14 4:48 ` Ethan Zhao
2025-08-15 7:48 ` Baolu Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bfc80e-258e-4193-a56c-3096608aec30@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=apopple@nvidia.com \
--cc=dave.hansen@intel.com \
--cc=iommu@lists.linux.dev \
--cc=jannh@google.com \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=robin.murphy@arm.com \
--cc=security@kernel.org \
--cc=stable@vger.kernel.org \
--cc=urezki@gmail.com \
--cc=vasant.hegde@amd.com \
--cc=will@kernel.org \
--cc=yi1.lai@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.