From: Jason Gunthorpe <jgg@nvidia.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>, linux-mm@kvack.org
Subject: Re: Where to put page->memdesc initially
Date: Tue, 2 Sep 2025 20:57:40 -0300 [thread overview]
Message-ID: <20250902235740.GD470103@nvidia.com> (raw)
In-Reply-To: <aLd8l8v_DL75NU43@casper.infradead.org>
On Wed, Sep 03, 2025 at 12:24:07AM +0100, Matthew Wilcox wrote:
> On Tue, Sep 02, 2025 at 06:15:14PM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 02, 2025 at 10:06:05PM +0100, Matthew Wilcox wrote:
> >
> > > I'm concerned by things like compaction that are executing
> > > asynchronously and might see a page mid-transition. Or something like
> > > GUP or lockless pagecache lookup that might get a stale page
> > > pointer.
> >
> > At least GUP fast obtains a page refcount before touching the rest of
> > struct page, so I think it can't see those kinds of races since the
> > page shouldn't be transitioning with a non-zero refcount?
>
> OK, so ...
>
> - For folios, there's already no such thing as a page refcount (you may
> already know this and are just being slightly sloppy while
> speaking).
I was thinking broadly about the impossible-in-page-tables things like
slab and ptdesc must continue to have a refcount field, it is just
fixed to 0, right? But yes, the code all goes through struct folio to
get there.
> you're silently redirected to the folio refcount.
>
> - That's not going to change with memdescs; for pages which are part of
> a memdesc, attempting to acess the page's refcount will redirect to
> the folio's refcount.
My point is that until the refcount memory is moved from struct folio
to a memdesc allocated struct, you should be able to continue to rely
on checking a non-zero refcount in the struct folio to stabilize
reading the memdesc/type.
That seems like it may address some of your concern for this inbetween
patch if a memdesc pointer and type is guarenteed to be stable when a
positive refcount is being held.
Then you'd change things like you describe:
> - READ_ONCE(page->memdesc)
> - Check that the bottom bits match a folio. If not, fall back to
> GUP-slow (or retry; I forget the details).
gup-slow sounds right to resolve any races to me.
> - tryget the refcount, if fail fall back/retry
> - if (READ_ONCE(page->memdesc) != memdesc) { folio_put(); retry/fallback }
> - yay, we succeeded.
It is the same as GUP fast does for the PTE today. So this would now
recheck the PTE and the memdesc.
This recheck is because GUP fast effectively runs under a
SLAB_TYPESAFE_BY_RCU type of behavior for the struct folio. I think
the memdesc would also need to follow a SLAB_TYPESAFE_BY_RCU design as
well.
Jason
next prev parent reply other threads:[~2025-09-02 23:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-02 19:03 Where to put page->memdesc initially Matthew Wilcox
2025-09-02 20:08 ` Jason Gunthorpe
2025-09-02 20:09 ` David Hildenbrand
2025-09-02 21:06 ` Matthew Wilcox
2025-09-02 21:15 ` Jason Gunthorpe
2025-09-02 23:24 ` Matthew Wilcox
2025-09-02 23:57 ` Jason Gunthorpe [this message]
2025-09-03 4:46 ` Matthew Wilcox
2025-09-03 9:38 ` David Hildenbrand
2025-09-03 12:28 ` Jason Gunthorpe
2025-09-03 12:43 ` Jason Gunthorpe
2025-09-03 9:33 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250902235740.GD470103@nvidia.com \
--to=jgg@nvidia.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.