How should we RCU-free folios?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* How should we RCU-free folios?
@ 2025-05-29 15:02 Matthew Wilcox
  2025-06-24 12:13 ` David Hildenbrand
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2025-05-29 15:02 UTC (permalink / raw)
  To: linux-mm

When folios are allocated separately from the underlying pages they
represent, they must also be freed.  See
https://kernelnewbies.org/MatthewWilcox/FolioAlloc

Since we want to do lockless lookups of folios in the page cache and
GUP, we must RCU free the folios somehow.  As I see it, we have three
options:

1. Free the folio back to the slab immediately, and mark the slab as
TYPESAFE_BY_RCU.  That means that the folio may get reallocated at
any time, but it must always remain a folio (until an RCU grace period
has passed and then the entire slab may be reallocated to a different
purpose).  Lookups will do:

a. Get a pointer to the folio
b. Tryget a refcount on the folio
c. If it succeeds, re-check the folio is still the one we want
   (If pagecache, check the xarray still points to the folio; if GUP,
   check the page still points to the folio)

2. RCU-free the folio.  The folio will not be reallocated until the
reader drops the RCU read lock.  The read side still needs to tryget
the folio refcount.  However, if it succeeds, it does not need to
re-check the pointer to the folio as the folio cannot have been
freed.  The downside is that folios will hang around in the system for
longer before being reallocated, and this may be an unacceptable
increase in memory usage.

3. RCU free the folio and RCU free the memory it controls.  Now an
RCU-protected lookup doesn't need to bump the refcount; if it found the
pointer, it knows the memory cannot be freed.  I think this is a
step too far and would 

I'm favouring option 1; it's what we currently do.  But I wanted to
give people a chance to chime in and tell me my tradeoffs are wrong.
Or propose a fourth option.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How should we RCU-free folios?
  2025-05-29 15:02 How should we RCU-free folios? Matthew Wilcox
@ 2025-06-24 12:13 ` David Hildenbrand
  2025-06-24 20:29   ` Matthew Wilcox
  0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand @ 2025-06-24 12:13 UTC (permalink / raw)
  To: Matthew Wilcox, linux-mm

On 29.05.25 17:02, Matthew Wilcox wrote:
> When folios are allocated separately from the underlying pages they
> represent, they must also be freed.  See
> https://kernelnewbies.org/MatthewWilcox/FolioAlloc
> 
> Since we want to do lockless lookups of folios in the page cache and
> GUP,

And in PFN walkers as well.

> we must RCU free the folios somehow.  As I see it, we have three
> options:
> 
> 1. Free the folio back to the slab immediately, and mark the slab as
> TYPESAFE_BY_RCU.  That means that the folio may get reallocated at
> any time, but it must always remain a folio (until an RCU grace period
> has passed and then the entire slab may be reallocated to a different
> purpose).  Lookups will do:
> 
> a. Get a pointer to the folio
> b. Tryget a refcount on the folio
> c. If it succeeds, re-check the folio is still the one we want
>     (If pagecache, check the xarray still points to the folio; if GUP,
>     check the page still points to the folio)

Hm, that means that all PFN walker would now also have to do a tryget 
unconditionally.

Also, free hugetlb folios have a refcount of 0 right now ...

> 
> 2. RCU-free the folio.  The folio will not be reallocated until the
> reader drops the RCU read lock.  The read side still needs to tryget
> the folio refcount.  However, if it succeeds, it does not need to
> re-check the pointer to the folio as the folio cannot have been
> freed.  The downside is that folios will hang around in the system for
> longer before being reallocated, and this may be an unacceptable
> increase in memory usage.
> 
> 3. RCU free the folio and RCU free the memory it controls.  Now an
> RCU-protected lookup doesn't need to bump the refcount; if it found the
> pointer, it knows the memory cannot be freed.  I think this is a
> step too far and would

That sound nice, though :)

> 
> I'm favouring option 1; it's what we currently do.  But I wanted to
> give people a chance to chime in and tell me my tradeoffs are wrong.
> Or propose a fourth option.

I really dislike the refcount dependency.

Also ... what about memdescs without a refcount (e.g., PFN walkers and 
slab?)?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How should we RCU-free folios?
  2025-06-24 12:13 ` David Hildenbrand
@ 2025-06-24 20:29   ` Matthew Wilcox
  0 siblings, 0 replies; 3+ messages in thread
From: Matthew Wilcox @ 2025-06-24 20:29 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-mm

On Tue, Jun 24, 2025 at 02:13:47PM +0200, David Hildenbrand wrote:
> On 29.05.25 17:02, Matthew Wilcox wrote:
> > When folios are allocated separately from the underlying pages they
> > represent, they must also be freed.  See
> > https://kernelnewbies.org/MatthewWilcox/FolioAlloc
> > 
> > Since we want to do lockless lookups of folios in the page cache and
> > GUP,
> 
> And in PFN walkers as well.

Good point (for those not quite clear what David means here, think
migration where we're doing a physical walk and trying to decide
what to do with the memory, although that's just an example; hwpoison
detection has similar problems)

> > 1. Free the folio back to the slab immediately, and mark the slab as
> > TYPESAFE_BY_RCU.  That means that the folio may get reallocated at
> > any time, but it must always remain a folio (until an RCU grace period
> > has passed and then the entire slab may be reallocated to a different
> > purpose).  Lookups will do:
> > 
> > a. Get a pointer to the folio
> > b. Tryget a refcount on the folio
> > c. If it succeeds, re-check the folio is still the one we want
> >     (If pagecache, check the xarray still points to the folio; if GUP,
> >     check the page still points to the folio)
> 
> Hm, that means that all PFN walker would now also have to do a tryget
> unconditionally.

To a certain extent.  At least for migration, there's a first pass where
we can just look at the value contained in the memdesc to decide if this
block is migratable, then in the second pass we get the refcount and
start doing migration-things to each page.

> Also, free hugetlb folios have a refcount of 0 right now ...

Right ... I think handling of hugetlb folios will probably change
a bit.  A free hugetlb folio probably doesn't free the folio, but
might set a flag indicating that it's free.  It'd be up to the
PFN walker to, say, grab the hugetlb_lock which would make sure this
hugetlb folio wasn't allocated while it's messing with it.

> > 2. RCU-free the folio.  The folio will not be reallocated until the
> > reader drops the RCU read lock.  The read side still needs to tryget
> > the folio refcount.  However, if it succeeds, it does not need to
> > re-check the pointer to the folio as the folio cannot have been
> > freed.  The downside is that folios will hang around in the system for
> > longer before being reallocated, and this may be an unacceptable
> > increase in memory usage.
> > 
> > 3. RCU free the folio and RCU free the memory it controls.  Now an
> > RCU-protected lookup doesn't need to bump the refcount; if it found the
> > pointer, it knows the memory cannot be freed.  I think this is a
> > step too far and would
> 
> That sound nice, though :)
> 
> > 
> > I'm favouring option 1; it's what we currently do.  But I wanted to
> > give people a chance to chime in and tell me my tradeoffs are wrong.
> > Or propose a fourth option.
> 
> I really dislike the refcount dependency.
> 
> Also ... what about memdescs without a refcount (e.g., PFN walkers and
> slab?)?

Depending on the PFN walker, it needs to know how to handle each kind
of memdesc.  Migration might choose to skip slabs and so "handle" them
by moving on to the next block.  hwpoison doesn't need to handle them
either (the system is dead if we see poison in a slab).  I'm not sure
how a PFN walker can protect against slab "doing something" with the
struct slab.  Maybe something like slab_lock() will be needed (yes,
I know mostly slab bypasses slab_lock).  But it is going to be a
per-memdesc kind of problem to solve.

Two things I did want to raise though:

First, this is an improvement.  There's altogether too much code that
thinks "If I raise the refcount on the page, that will prevent the memory
from being freed".  And it'll certainly prevent the page from being
returned to the page allocator, but it won't prevent the slab allocator
from reusing the memory.  Other allocators (eg dma_pool)?  No idea.

Second, struct slab doesn't need to be RCU freed (unless we discover
PFN walkers are going to force us to).  The slab allocator knows it is
the only user, and when it's done, it can just free it and there's no
chance anybody else is looking at it.  Unless PFN walkers look at it,
which they can't today because struct slab is in mm/slab.h.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-06-24 20:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-29 15:02 How should we RCU-free folios? Matthew Wilcox
2025-06-24 12:13 ` David Hildenbrand
2025-06-24 20:29   ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).