linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Baolu Lu <baolu.lu@linux.intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Cc: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>, Jann Horn <jannh@google.com>,
	Vasant Hegde <vasant.hegde@amd.com>,
	Alistair Popple <apopple@nvidia.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Andy Lutomirski <luto@kernel.org>, "Lai, Yi1" <yi1.lai@intel.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"security@kernel.org" <security@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	vishal.moola@gmail.com, Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush
Date: Wed, 27 Aug 2025 16:31:47 -0700	[thread overview]
Message-ID: <400cf9ab-de3f-4e8a-ab0a-4ac68c534bb8@intel.com> (raw)
In-Reply-To: <c69950ee-660b-4f51-9277-522470d0ce5d@linux.intel.com>

On 8/27/25 03:58, Baolu Lu wrote:
> Following the insights above, I wrote the code as follows. Does it look
> good?

I'd really prefer an actual diff that compiles. Because:

> #ifdef CONFIG_ASYNC_PGTABLE_FREE
> /* a 'struct ptdesc' that needs its destructor run */
> #define ASYNC_PGTABLE_FREE_DTOR    BIT(NR_PAGEFLAGS)

This doesn't work, does it? Don't you need to _allocate_ a new bit?
Wouldn't this die a hilariously horrible death if NR_PAGEFLAGS==64? ;)

Also, I'm pretty sure you can't just go setting random bits in this
because it aliases with page flags.

...
> static void kernel_pgtable_work_func(struct work_struct *work)
> {
>     struct ptdesc *ptdesc, *next;
>     LIST_HEAD(page_list);
> 
>     spin_lock(&kernel_pgtable_work.lock);
>     list_splice_tail_init(&kernel_pgtable_work.list, &page_list);
>     spin_unlock(&kernel_pgtable_work.lock);
> 
>     iommu_sva_invalidate_kva_range(0, TLB_FLUSH_ALL);
> 
>     list_for_each_entry_safe(ptdesc, next, &page_list, pt_list) {
>         list_del(&ptdesc->pt_list);
>         if (ptdesc->__page_flags & ASYNC_PGTABLE_FREE_DTOR)
>             pagetable_dtor_free(ptdesc);
>         else
>             __free_page(ptdesc_page(ptdesc));
>     }
> }
What I had in mind was that kernel_pgtable_work_func() only does:

	__free_pages(page, compound_order(page));

All of the things that queue gunk to the list do the legwork of making
the work_func() simple. This also guides them toward proving a single,
compound page because _they_ have to do the work if they don't want
something simple.

> void kernel_pgtable_async_free_dtor(struct ptdesc *ptdesc)
> {
>     spin_lock(&kernel_pgtable_work.lock);
>     ptdesc->__page_flags |= ASYNC_PGTABLE_FREE_DTOR;
>     list_add(&ptdesc->pt_list, &kernel_pgtable_work.list);
>     spin_unlock(&kernel_pgtable_work.lock);
> 
>     schedule_work(&kernel_pgtable_work.work);
> }
> 
> void kernel_pgtable_async_free_page_list(struct list_head *list)
> {
>     spin_lock(&kernel_pgtable_work.lock);
>     list_splice_tail(list, &kernel_pgtable_work.list);
>     spin_unlock(&kernel_pgtable_work.lock);
> 
>     schedule_work(&kernel_pgtable_work.work);
> }
> 
> void kernel_pgtable_async_free_page(struct page *page)
> {
>     spin_lock(&kernel_pgtable_work.lock);
>     list_add(&page_ptdesc(page)->pt_list, &kernel_pgtable_work.list);
>     spin_unlock(&kernel_pgtable_work.lock);
> 
>     schedule_work(&kernel_pgtable_work.work);
> }

I wouldn't have three of these, I'd have _one_. If you want to free a
bunch of pages, then just call it a bunch of times. This is not
performance critical.

Oh, and the ptdesc flag shouldn't be for "I need to be asynchronously
freed". All kernel PTE pages should ideally set it at allocation so you
can do this:

static inline void pagetable_free(struct ptdesc *pt)
{
        struct page *page = ptdesc_page(pt);

	if (ptdesc->some_field | PTDESC_KERNEL)
		kernel_pgtable_async_free_page(page);
	else
	        __free_pages(page, compound_order(page));
}

The folks that, today, call pagetable_dtor_free() don't have to do
_anything_ at free time. They just set the PTDESC_KERNEL bit at
allocation time.

See how that code reads? "If you have a kernel page table page, you must
asynchronously free it." That's actually meaningful.

I'm pretty sure the lower 24 bits of ptdesc->__page_type are free. So
I'd just do this:

#define PTDESC_TYPE_KERNEL BIT(0)

Then, something needs to set:

	ptdesc->__page_type |= PTDESC_TYPE_KERNEL;

That could happen as late as here:

static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
{
	struct ptdesc *ptdesc = virt_to_ptdesc(pte);

	// here 	

        pagetable_dtor_free(ptdesc);
}

Or as early as ptdesc allocation/constructor time. Then you check for
PTDESC_TYPE_KERNEL in pagetable_free().

  reply	other threads:[~2025-08-27 23:31 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06  5:25 [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush Lu Baolu
2025-08-06 15:03 ` Dave Hansen
2025-08-06 15:52   ` Jason Gunthorpe
2025-08-06 16:04     ` Dave Hansen
2025-08-06 16:09       ` Jason Gunthorpe
2025-08-06 16:34         ` Dave Hansen
2025-08-06 16:42           ` Jason Gunthorpe
2025-08-07 14:40           ` Baolu Lu
2025-08-07 15:31             ` Dave Hansen
2025-08-08  5:15               ` Baolu Lu
2025-08-10  7:19                 ` Ethan Zhao
2025-08-11  9:15                   ` Uladzislau Rezki
2025-08-11 12:55                     ` Jason Gunthorpe
2025-08-15  9:23                       ` Baolu Lu
2025-08-11 13:55                     ` Dave Hansen
2025-08-11 14:56                       ` Uladzislau Rezki
2025-08-12  1:17                       ` Ethan Zhao
2025-08-15 14:35                         ` Dave Hansen
2025-08-11 12:57                 ` Jason Gunthorpe
2025-08-13  3:17                   ` Ethan Zhao
2025-08-18  1:34                   ` Baolu Lu
2025-08-07 19:51             ` Jason Gunthorpe
2025-08-08  2:57               ` Tian, Kevin
2025-08-15  9:16                 ` Baolu Lu
2025-08-15  9:46                   ` Tian, Kevin
2025-08-18  5:58                     ` Baolu Lu
2025-08-15 14:31                   ` Dave Hansen
2025-08-18  6:08                     ` Baolu Lu
2025-08-18  6:21                 ` Baolu Lu
2025-08-21  7:05                   ` Tian, Kevin
2025-08-23  3:26                     ` Baolu Lu
2025-08-25 22:36                       ` Dave Hansen
2025-08-26  1:25                         ` Baolu Lu
2025-08-26  2:49                           ` Baolu Lu
2025-08-26 14:22                             ` Dave Hansen
2025-08-26 14:33                               ` Matthew Wilcox
2025-08-26 14:57                                 ` Dave Hansen
2025-08-27 10:58                               ` Baolu Lu
2025-08-27 23:31                                 ` Dave Hansen [this message]
2025-08-28  5:31                                   ` Baolu Lu
2025-08-28  7:08                                     ` Tian, Kevin
2025-08-28 18:56                                       ` Dave Hansen
2025-08-28 19:10                                         ` Jason Gunthorpe
2025-08-28 19:31                                           ` Dave Hansen
2025-08-28 19:39                                             ` Matthew Wilcox
2025-08-26 16:21                             ` Dave Hansen
2025-08-27  6:34                               ` Baolu Lu
2025-08-08  5:08               ` Baolu Lu
2025-08-07  6:53   ` Baolu Lu
2025-08-14  4:48 ` Ethan Zhao
2025-08-15  7:48   ` Baolu Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=400cf9ab-de3f-4e8a-ab0a-4ac68c534bb8@intel.com \
    --to=dave.hansen@intel.com \
    --cc=apopple@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=jannh@google.com \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=security@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=urezki@gmail.com \
    --cc=vasant.hegde@amd.com \
    --cc=vishal.moola@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yi1.lai@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).