From: Matthew Wilcox <willy@infradead.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Baolu Lu <baolu.lu@linux.intel.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
Jason Gunthorpe <jgg@nvidia.com>, Joerg Roedel <joro@8bytes.org>,
Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>, Jann Horn <jannh@google.com>,
Vasant Hegde <vasant.hegde@amd.com>,
Alistair Popple <apopple@nvidia.com>,
Peter Zijlstra <peterz@infradead.org>,
Uladzislau Rezki <urezki@gmail.com>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
Andy Lutomirski <luto@kernel.org>, "Lai, Yi1" <yi1.lai@intel.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
"security@kernel.org" <security@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
vishal.moola@gmail.com
Subject: Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush
Date: Tue, 26 Aug 2025 15:33:21 +0100 [thread overview]
Message-ID: <aK3FsU1Dds4OG79o@casper.infradead.org> (raw)
In-Reply-To: <b57d7b97-8110-47c5-9c7a-516b7b535ce9@intel.com>
On Tue, Aug 26, 2025 at 07:22:06AM -0700, Dave Hansen wrote:
> On 8/25/25 19:49, Baolu Lu wrote:
> >> The three separate lists are needed because we're handling three
> >> distinct types of page deallocation. Grouping the pages this way allows
> >> the workqueue handler to free each type using the correct function.
> >
> > Please allow me to add more details.
>
> Right, I know why it got added this way: it was the quickest way to hack
> together a patch that fixes the IOMMU issue without refactoring anything.
>
> I agree that you have three cases:
> 1. A full on 'struct ptdesc' that needs its destructor run
> 2. An order-0 'struct page'
> 3. A higher-order 'struct page'
>
> Long-term, #2 and #3 probably need to get converted over to 'struct
> ptdesc'. They don't look _that_ hard to convert to me. Willy, Vishal,
> any other mm folks: do you agree?
Uhh. I'm still quite ignorant about iommu page tables. Let me make
some general observations that may be helpful.
We are attempting to shrink struct page down to 8 bytes. That's going
to proceed in stages over the next two to three years. Depending how
much information you need to keep around, you may be able to keep using
struct page for a while, but eventually (unless you only need a few bits
of information), you're going to need a memdesc for your allocations.
One memdesc type already assigned is for page tables. Maybe iommu page
tables are the same / similar enough to a CPU page table that we keep
the same data structure. Maybe they'll need their own data structure.
I lack the knowledge to make that decision.
For more on memdescs, please see here:
https://kernelnewbies.org/MatthewWilcox/Memdescs
> Short-term, I'd just consolidate your issue down to a single list.
>
> #1: For 'struct ptdesc', modify pte_free_kernel() to pass information in
> to pagetable_dtor_free() to tell it to use the deferred page table
> free list. Do this with a bit in the ptdesc or a new argument to
> pagetable_dtor_free().
We should be able to reuse one of the flags in __page_flags or there
are 24 unused bits in __page_type.
> #2. Just append these to the deferred page table free list. Easy.
> #3. The biggest hacky way to do this is to just treat the higher-order
> non-compound page and put the pages on the deferred page table
> free list one at a time. The other way to do it is to track down how
> this thing got allocated in the first place and make sure it's got
> __GFP_COMP metadata. If so, you can just use __free_pages() for
> everything.
Non-compound allocations are bad news. Is there a good reason these
can't be made into compound allocations?
> Yeah, it'll take a couple patches up front to refactor some things. But
> that refactoring will make things more consistent instead of adding
> adding complexity to deal with the inconsistency.
next prev parent reply other threads:[~2025-08-26 14:33 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-06 5:25 [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush Lu Baolu
2025-08-06 15:03 ` Dave Hansen
2025-08-06 15:52 ` Jason Gunthorpe
2025-08-06 16:04 ` Dave Hansen
2025-08-06 16:09 ` Jason Gunthorpe
2025-08-06 16:34 ` Dave Hansen
2025-08-06 16:42 ` Jason Gunthorpe
2025-08-07 14:40 ` Baolu Lu
2025-08-07 15:31 ` Dave Hansen
2025-08-08 5:15 ` Baolu Lu
2025-08-10 7:19 ` Ethan Zhao
2025-08-11 9:15 ` Uladzislau Rezki
2025-08-11 12:55 ` Jason Gunthorpe
2025-08-15 9:23 ` Baolu Lu
2025-08-11 13:55 ` Dave Hansen
2025-08-11 14:56 ` Uladzislau Rezki
2025-08-12 1:17 ` Ethan Zhao
2025-08-15 14:35 ` Dave Hansen
2025-08-11 12:57 ` Jason Gunthorpe
2025-08-13 3:17 ` Ethan Zhao
2025-08-18 1:34 ` Baolu Lu
2025-08-07 19:51 ` Jason Gunthorpe
2025-08-08 2:57 ` Tian, Kevin
2025-08-15 9:16 ` Baolu Lu
2025-08-15 9:46 ` Tian, Kevin
2025-08-18 5:58 ` Baolu Lu
2025-08-15 14:31 ` Dave Hansen
2025-08-18 6:08 ` Baolu Lu
2025-08-18 6:21 ` Baolu Lu
2025-08-21 7:05 ` Tian, Kevin
2025-08-23 3:26 ` Baolu Lu
2025-08-25 22:36 ` Dave Hansen
2025-08-26 1:25 ` Baolu Lu
2025-08-26 2:49 ` Baolu Lu
2025-08-26 14:22 ` Dave Hansen
2025-08-26 14:33 ` Matthew Wilcox [this message]
2025-08-26 14:57 ` Dave Hansen
2025-08-27 10:58 ` Baolu Lu
2025-08-27 23:31 ` Dave Hansen
2025-08-28 5:31 ` Baolu Lu
2025-08-28 7:08 ` Tian, Kevin
2025-08-28 18:56 ` Dave Hansen
2025-08-28 19:10 ` Jason Gunthorpe
2025-08-28 19:31 ` Dave Hansen
2025-08-28 19:39 ` Matthew Wilcox
2025-08-26 16:21 ` Dave Hansen
2025-08-27 6:34 ` Baolu Lu
2025-08-08 5:08 ` Baolu Lu
2025-08-07 6:53 ` Baolu Lu
2025-08-14 4:48 ` Ethan Zhao
2025-08-15 7:48 ` Baolu Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aK3FsU1Dds4OG79o@casper.infradead.org \
--to=willy@infradead.org \
--cc=apopple@nvidia.com \
--cc=baolu.lu@linux.intel.com \
--cc=dave.hansen@intel.com \
--cc=iommu@lists.linux.dev \
--cc=jannh@google.com \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=robin.murphy@arm.com \
--cc=security@kernel.org \
--cc=stable@vger.kernel.org \
--cc=urezki@gmail.com \
--cc=vasant.hegde@amd.com \
--cc=vishal.moola@gmail.com \
--cc=will@kernel.org \
--cc=yi1.lai@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).