* Where to put page->memdesc initially @ 2025-09-02 19:03 Matthew Wilcox 2025-09-02 20:08 ` Jason Gunthorpe 2025-09-02 20:09 ` David Hildenbrand 0 siblings, 2 replies; 12+ messages in thread From: Matthew Wilcox @ 2025-09-02 19:03 UTC (permalink / raw) To: linux-mm; +Cc: David Hildenbrand, Jason Gunthorpe With the recent patches to slab, I'm just about ready to allocate struct slab separately from struct page. This will not be an immediate win, of course. Indeed, it will likely be a slowdown (overhead of a second allocation per slab). So there's no urgency to do this until we're ready to shrink struct page, when we can at least point to that win as justification. Still, we should understand how we're going to get to Page2025 [1] one step at a time. I had been thinking about coopting compound_head to point to struct slab. But looking at the places which call folio_test_slab() [an oxymoron in the New York Interpretation], it becomes apparent that we need to keep compound_head() and page_folio() working for all pages for a while. As a reminder, compound_head() will _eventually_ return NULL for slabs & folios. It will only be defined to work for page allocations. Likewise page_folio() will return NULL for any pages not part of a folio and page_slab() will return NULL for any pages not part of a slab. My best offer right now is to use page->lru.prev. At least one of the bottom two bits will be set to indicate that it's a memdesc (we're only going to use thirteen of the memdesc types initially). There are a few overlapping uses of these bits in struct page, so if we do nothing we may get confused. We can deal with mlock_count and order (for pcp_llist). But the biggest problem is the first tail page of a folio. Depending on word size and endianness, there are four different atomic_t fields that overlap with page->lru.prev. That can't be solved by using a different field in struct page; the first tail page is jam-packed. So, page_slab() will first load page->memdesc (the same bits as page->lru.prev), check the bottom four bits match the slab memdesc, and also check page->page_type matches PGTY_slab. I don't like this a lot, because it's two loads rather than one atomic load, but it should only be present for one commit. In the next commit, we can separately allocate struct folio, make page->memdesc point to struct folio and drop the PGTY_slab check (as there will be no more uses of the first tail page for the mapcount stuff). [1] https://kernelnewbies.org/MatthewWilcox/Memdescs/Path ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-02 19:03 Where to put page->memdesc initially Matthew Wilcox @ 2025-09-02 20:08 ` Jason Gunthorpe 2025-09-02 20:09 ` David Hildenbrand 1 sibling, 0 replies; 12+ messages in thread From: Jason Gunthorpe @ 2025-09-02 20:08 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-mm, David Hildenbrand On Tue, Sep 02, 2025 at 08:03:57PM +0100, Matthew Wilcox wrote: > So, page_slab() will first load page->memdesc (the same bits as > page->lru.prev), check the bottom four bits match the slab memdesc, and > also check page->page_type matches PGTY_slab. I don't like this a lot, > because it's two loads rather than one atomic load, but it should only > be present for one commit. > > In the next commit, we can separately allocate struct folio, make > page->memdesc point to struct folio and drop the PGTY_slab check (as > there will be no more uses of the first tail page for the mapcount stuff). So to rephrase, there is no great free space for memdesc in struct page right now, but after the folio is split then it is fine? Thus you have a few commits within a single series where it is less efficient? Seems OK to me.. Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-02 19:03 Where to put page->memdesc initially Matthew Wilcox 2025-09-02 20:08 ` Jason Gunthorpe @ 2025-09-02 20:09 ` David Hildenbrand 2025-09-02 21:06 ` Matthew Wilcox 1 sibling, 1 reply; 12+ messages in thread From: David Hildenbrand @ 2025-09-02 20:09 UTC (permalink / raw) To: Matthew Wilcox, linux-mm; +Cc: Jason Gunthorpe On 02.09.25 21:03, Matthew Wilcox wrote: > With the recent patches to slab, I'm just about ready to allocate struct > slab separately from struct page. This will not be an immediate win, > of course. Indeed, it will likely be a slowdown (overhead of a second > allocation per slab). So there's no urgency to do this until we're > ready to shrink struct page, when we can at least point to that win > as justification. > > Still, we should understand how we're going to get to Page2025 [1] one > step at a time. I had been thinking about coopting compound_head to point > to struct slab. But looking at the places which call folio_test_slab() > [an oxymoron in the New York Interpretation], it becomes apparent that > we need to keep compound_head() and page_folio() working for all pages > for a while. > > As a reminder, compound_head() will _eventually_ return NULL for > slabs & folios. It will only be defined to work for page allocations. > Likewise page_folio() will return NULL for any pages not part of a folio > and page_slab() will return NULL for any pages not part of a slab. > > My best offer right now is to use page->lru.prev. At least one of the > bottom two bits will be set to indicate that it's a memdesc (we're only > going to use thirteen of the memdesc types initially). > Just so I understand it correctly: Would you want to move the page type already from the mapcount into the memdesc? That sounds challenging, because for any typed folios we would not be allowed to reuse a field we want to use for the memdesc. IIRC< hugetlb pretty much uses all of it. The easy way out for now would be making this page type specific: Only selected typed pages will store the memdesc (here: slab pointer) e.g., in the old page->mapping place. So PageSlab() still checks the existing page type, put page_slab() would simply lookup the pointer in the old page->mapping place. > There are a few overlapping uses of these bits in struct page, so if we do > nothing we may get confused. We can deal with mlock_count and order (for > pcp_llist). But the biggest problem is the first tail page of a folio. > Depending on word size and endianness, there are four different atomic_t > fields that overlap with page->lru.prev. That can't be solved by using > a different field in struct page; the first tail page is jam-packed. > > So, page_slab() will first load page->memdesc (the same bits as > page->lru.prev), check the bottom four bits match the slab memdesc, and > also check page->page_type matches PGTY_slab. I don't like this a lot, > because it's two loads rather than one atomic load, but it should only > be present for one commit. As a first step, I would really not use the bottom four bits. Why perform two type checks initially? -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-02 20:09 ` David Hildenbrand @ 2025-09-02 21:06 ` Matthew Wilcox 2025-09-02 21:15 ` Jason Gunthorpe 2025-09-03 9:33 ` David Hildenbrand 0 siblings, 2 replies; 12+ messages in thread From: Matthew Wilcox @ 2025-09-02 21:06 UTC (permalink / raw) To: David Hildenbrand; +Cc: linux-mm, Jason Gunthorpe On Tue, Sep 02, 2025 at 10:09:49PM +0200, David Hildenbrand wrote: > Would you want to move the page type already from the mapcount into the > memdesc? That sounds challenging, because for any typed folios we would > not be allowed to reuse a field we want to use for the memdesc. IIRC< > hugetlb pretty much uses all of it. That would definitely be part of the same series. But possibly not the same patch. I think the series has to include separate allocations for slab, folio and whichever other memdescs won't fit into 32 bytes. > The easy way out for now would be making this page type specific: Only > selected typed pages will store the memdesc (here: slab pointer) e.g., > in the old page->mapping place. > > So PageSlab() still checks the existing page type, put page_slab() would > simply lookup the pointer in the old page->mapping place. I *think* that's roughly the same as what I'm proposing, except that we already have a meaning for "the bottom two bits of folio->mapping are set", so there's potential confusion for folio_test_anon() & friends. > > > There are a few overlapping uses of these bits in struct page, so if we do > > nothing we may get confused. We can deal with mlock_count and order (for > > pcp_llist). But the biggest problem is the first tail page of a folio. > > Depending on word size and endianness, there are four different atomic_t > > fields that overlap with page->lru.prev. That can't be solved by using > > a different field in struct page; the first tail page is jam-packed. > > > > So, page_slab() will first load page->memdesc (the same bits as > > page->lru.prev), check the bottom four bits match the slab memdesc, and > > also check page->page_type matches PGTY_slab. I don't like this a lot, > > because it's two loads rather than one atomic load, but it should only > > be present for one commit. > > As a first step, I would really not use the bottom four bits. Why > perform two type checks initially? I'm concerned by things like compaction that are executing asynchronously and might see a page mid-transition. Or something like GUP or lockless pagecache lookup that might get a stale page pointer. It's a lot easier to reason about if we can do a single load and treat that as a source of truth (with the appropriate reloads to make sure nothing changed after we got a refcount). Doing two loads makes my brain hurt a bit because it introduces more possibilities for inconsistency. I'll need to write it up pretty carefully (which annoys me because we're going to need it for a single or very few commits ...) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-02 21:06 ` Matthew Wilcox @ 2025-09-02 21:15 ` Jason Gunthorpe 2025-09-02 23:24 ` Matthew Wilcox 2025-09-03 9:33 ` David Hildenbrand 1 sibling, 1 reply; 12+ messages in thread From: Jason Gunthorpe @ 2025-09-02 21:15 UTC (permalink / raw) To: Matthew Wilcox; +Cc: David Hildenbrand, linux-mm On Tue, Sep 02, 2025 at 10:06:05PM +0100, Matthew Wilcox wrote: > I'm concerned by things like compaction that are executing > asynchronously and might see a page mid-transition. Or something like > GUP or lockless pagecache lookup that might get a stale page > pointer. At least GUP fast obtains a page refcount before touching the rest of struct page, so I think it can't see those kinds of races since the page shouldn't be transitioning with a non-zero refcount? Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-02 21:15 ` Jason Gunthorpe @ 2025-09-02 23:24 ` Matthew Wilcox 2025-09-02 23:57 ` Jason Gunthorpe 0 siblings, 1 reply; 12+ messages in thread From: Matthew Wilcox @ 2025-09-02 23:24 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: David Hildenbrand, linux-mm On Tue, Sep 02, 2025 at 06:15:14PM -0300, Jason Gunthorpe wrote: > On Tue, Sep 02, 2025 at 10:06:05PM +0100, Matthew Wilcox wrote: > > > I'm concerned by things like compaction that are executing > > asynchronously and might see a page mid-transition. Or something like > > GUP or lockless pagecache lookup that might get a stale page > > pointer. > > At least GUP fast obtains a page refcount before touching the rest of > struct page, so I think it can't see those kinds of races since the > page shouldn't be transitioning with a non-zero refcount? OK, so ... - For folios, there's already no such thing as a page refcount (you may already know this and are just being slightly sloppy while speaking). If you attempt to access the refcount on a tail page, you're silently redirected to the folio refcount. - That's not going to change with memdescs; for pages which are part of a memdesc, attempting to acess the page's refcount will redirect to the folio's refcount. What GUP-fast will do once we get to Page2025 is: - READ_ONCE(page->memdesc) - Check that the bottom bits match a folio. If not, fall back to GUP-slow (or retry; I forget the details). - tryget the refcount, if fail fall back/retry - if (READ_ONCE(page->memdesc) != memdesc) { folio_put(); retry/fallback } - yay, we succeeded. So that's all a little more complicated with two places to check as an intermediate state, but I think it's doable. It's just fiddly. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-02 23:24 ` Matthew Wilcox @ 2025-09-02 23:57 ` Jason Gunthorpe 2025-09-03 4:46 ` Matthew Wilcox 0 siblings, 1 reply; 12+ messages in thread From: Jason Gunthorpe @ 2025-09-02 23:57 UTC (permalink / raw) To: Matthew Wilcox; +Cc: David Hildenbrand, linux-mm On Wed, Sep 03, 2025 at 12:24:07AM +0100, Matthew Wilcox wrote: > On Tue, Sep 02, 2025 at 06:15:14PM -0300, Jason Gunthorpe wrote: > > On Tue, Sep 02, 2025 at 10:06:05PM +0100, Matthew Wilcox wrote: > > > > > I'm concerned by things like compaction that are executing > > > asynchronously and might see a page mid-transition. Or something like > > > GUP or lockless pagecache lookup that might get a stale page > > > pointer. > > > > At least GUP fast obtains a page refcount before touching the rest of > > struct page, so I think it can't see those kinds of races since the > > page shouldn't be transitioning with a non-zero refcount? > > OK, so ... > > - For folios, there's already no such thing as a page refcount (you may > already know this and are just being slightly sloppy while > speaking). I was thinking broadly about the impossible-in-page-tables things like slab and ptdesc must continue to have a refcount field, it is just fixed to 0, right? But yes, the code all goes through struct folio to get there. > you're silently redirected to the folio refcount. > > - That's not going to change with memdescs; for pages which are part of > a memdesc, attempting to acess the page's refcount will redirect to > the folio's refcount. My point is that until the refcount memory is moved from struct folio to a memdesc allocated struct, you should be able to continue to rely on checking a non-zero refcount in the struct folio to stabilize reading the memdesc/type. That seems like it may address some of your concern for this inbetween patch if a memdesc pointer and type is guarenteed to be stable when a positive refcount is being held. Then you'd change things like you describe: > - READ_ONCE(page->memdesc) > - Check that the bottom bits match a folio. If not, fall back to > GUP-slow (or retry; I forget the details). gup-slow sounds right to resolve any races to me. > - tryget the refcount, if fail fall back/retry > - if (READ_ONCE(page->memdesc) != memdesc) { folio_put(); retry/fallback } > - yay, we succeeded. It is the same as GUP fast does for the PTE today. So this would now recheck the PTE and the memdesc. This recheck is because GUP fast effectively runs under a SLAB_TYPESAFE_BY_RCU type of behavior for the struct folio. I think the memdesc would also need to follow a SLAB_TYPESAFE_BY_RCU design as well. Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-02 23:57 ` Jason Gunthorpe @ 2025-09-03 4:46 ` Matthew Wilcox 2025-09-03 9:38 ` David Hildenbrand ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Matthew Wilcox @ 2025-09-03 4:46 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: David Hildenbrand, linux-mm On Tue, Sep 02, 2025 at 08:57:40PM -0300, Jason Gunthorpe wrote: > On Wed, Sep 03, 2025 at 12:24:07AM +0100, Matthew Wilcox wrote: > > On Tue, Sep 02, 2025 at 06:15:14PM -0300, Jason Gunthorpe wrote: > > > On Tue, Sep 02, 2025 at 10:06:05PM +0100, Matthew Wilcox wrote: > > > > > > > I'm concerned by things like compaction that are executing > > > > asynchronously and might see a page mid-transition. Or something like > > > > GUP or lockless pagecache lookup that might get a stale page > > > > pointer. > > > > > > At least GUP fast obtains a page refcount before touching the rest of > > > struct page, so I think it can't see those kinds of races since the > > > page shouldn't be transitioning with a non-zero refcount? > > > > OK, so ... > > > > - For folios, there's already no such thing as a page refcount (you may > > already know this and are just being slightly sloppy while > > speaking). > > I was thinking broadly about the impossible-in-page-tables things like > slab and ptdesc must continue to have a refcount field, it is just > fixed to 0, right? But yes, the code all goes through struct folio to > get there. Once we switch to memdescs for these things, they no longer need a refcount field. By the end of Page2025, plain pages have a refcount, but folios/slabs/ptdesc/etc set the page->_refcount to 0. put_page() moves out of line because it's really complicated; it looks something like: void put_page(struct page *page) { memdesc_t memdesc = READ_ONCE(page->memdesc); if (memdesc_is_folio(memdesc)) { struct folio *folio = memdesc_folio(memdesc); folio_put(folio); } else if (memdesc_is_slab(memdesc) || memdesc_is_ptdesc(memdesc)) BUG(); } else { page = compound_head(page); if (page_put_testzero(page)) __free_page(page); } } ... there's probably a bit more to it ... get_page() probably looks similar. GUP-fast obviously wouldn't use get_page() because it needs to be very careful about what it's doing (and it needs to fail properly if it sees a non-folio page). > > you're silently redirected to the folio refcount. > > > > - That's not going to change with memdescs; for pages which are part of > > a memdesc, attempting to acess the page's refcount will redirect to > > the folio's refcount. > > My point is that until the refcount memory is moved from struct folio > to a memdesc allocated struct, you should be able to continue to rely > on checking a non-zero refcount in the struct folio to stabilize > reading the memdesc/type. Definitely once you have a refcuont on a folio, the page->folio relationship is stable. page->slab is stabilised if you've allocated an object from the slab. page->ptdesc is stabilised if you hold the PTE lock or the mmap_lock ... we need to write all these things down. > That seems like it may address some of your concern for this inbetween > patch if a memdesc pointer and type is guarenteed to be stable when a > positive refcount is being held. > > Then you'd change things like you describe: > > > - READ_ONCE(page->memdesc) > > - Check that the bottom bits match a folio. If not, fall back to > > GUP-slow (or retry; I forget the details). > > gup-slow sounds right to resolve any races to me. > > > - tryget the refcount, if fail fall back/retry > > - if (READ_ONCE(page->memdesc) != memdesc) { folio_put(); retry/fallback } > > - yay, we succeeded. > > It is the same as GUP fast does for the PTE today. So this would now > recheck the PTE and the memdesc. Ah, yes, I missed the step where we recheck the PTE. Thanks. > This recheck is because GUP fast effectively runs under a > SLAB_TYPESAFE_BY_RCU type of behavior for the struct folio. I think > the memdesc would also need to follow a SLAB_TYPESAFE_BY_RCU design as > well. I haven't quite figured out if _all_ memdescs need to be TYPESAFE_BY_RCU or only the ones which either have refcounts or are otherwise migratable. Slab should be safe to be not TYPESAFE because if we ever see a PageSlab, we won't try to dereference the pointer in GUP, pagecache lookup or migration. I need to look through David's recent patches again to understand how migration is going to work (obviously we won't try to migrate slab pages). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-03 4:46 ` Matthew Wilcox @ 2025-09-03 9:38 ` David Hildenbrand 2025-09-03 12:28 ` Jason Gunthorpe 2025-09-03 12:43 ` Jason Gunthorpe 2 siblings, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2025-09-03 9:38 UTC (permalink / raw) To: Matthew Wilcox, Jason Gunthorpe; +Cc: linux-mm >> It is the same as GUP fast does for the PTE today. So this would now >> recheck the PTE and the memdesc. > > Ah, yes, I missed the step where we recheck the PTE. Thanks. > >> This recheck is because GUP fast effectively runs under a >> SLAB_TYPESAFE_BY_RCU type of behavior for the struct folio. I think >> the memdesc would also need to follow a SLAB_TYPESAFE_BY_RCU design as >> well. > > I haven't quite figured out if _all_ memdescs need to be TYPESAFE_BY_RCU > or only the ones which either have refcounts or are otherwise > migratable. Slab should be safe to be not TYPESAFE because if we ever > see a PageSlab, we won't try to dereference the pointer in GUP, > pagecache lookup or migration. I need to look through David's recent > patches again to understand how migration is going to work (obviously > we won't try to migrate slab pages). The long term plan is to work on frozen pages (with balloon pages that's easy, with zsmalloc I am not sure yet). Migration core will be responsible for freeing these frozen pages after migration succeeded etc. PageOffline pages won't allocate any memdesc. Zsmalloc will have to allocate one. I would expect that the ->migrate_page() callback will just get a frozen page and the callback will figure out what to do in regards of the medesc. It's going to be a bunch of work and I am getting interrupted working on it ... -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-03 4:46 ` Matthew Wilcox 2025-09-03 9:38 ` David Hildenbrand @ 2025-09-03 12:28 ` Jason Gunthorpe 2025-09-03 12:43 ` Jason Gunthorpe 2 siblings, 0 replies; 12+ messages in thread From: Jason Gunthorpe @ 2025-09-03 12:28 UTC (permalink / raw) To: Matthew Wilcox; +Cc: David Hildenbrand, linux-mm On Wed, Sep 03, 2025 at 05:46:08AM +0100, Matthew Wilcox wrote: > > This recheck is because GUP fast effectively runs under a > > SLAB_TYPESAFE_BY_RCU type of behavior for the struct folio. I think > > the memdesc would also need to follow a SLAB_TYPESAFE_BY_RCU design as > > well. > > I haven't quite figured out if _all_ memdescs need to be TYPESAFE_BY_RCU > or only the ones which either have refcounts or are otherwise > migratable. Anything that de-references page->memdesc under RCU using the re-check flow you outlined has to also use TYPESAFE_BY_RCU for the page->memdesc allocation to avoid UAF under a RCU read side critical section. Likely it doesn't make sense to RCU de-reference page->memdesc for anything other than incrementing a refcount. Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-03 4:46 ` Matthew Wilcox 2025-09-03 9:38 ` David Hildenbrand 2025-09-03 12:28 ` Jason Gunthorpe @ 2025-09-03 12:43 ` Jason Gunthorpe 2 siblings, 0 replies; 12+ messages in thread From: Jason Gunthorpe @ 2025-09-03 12:43 UTC (permalink / raw) To: Matthew Wilcox; +Cc: David Hildenbrand, linux-mm On Wed, Sep 03, 2025 at 05:46:08AM +0100, Matthew Wilcox wrote: > Once we switch to memdescs for these things, they no longer need a > refcount field. By the end of Page2025, plain pages have a refcount, > but folios/slabs/ptdesc/etc set the page->_refcount to 0. Reading this again, I didn't quite get this till now. Maybe it is worth adding this detail to the wikki. In this case, what are "plain pages"? One I can think of is naked calls to alloc_page*(), which I see often used in place of kmalloc(PAGE_SIZE), do you imagine a project to favour kmalloc instead? Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Where to put page->memdesc initially 2025-09-02 21:06 ` Matthew Wilcox 2025-09-02 21:15 ` Jason Gunthorpe @ 2025-09-03 9:33 ` David Hildenbrand 1 sibling, 0 replies; 12+ messages in thread From: David Hildenbrand @ 2025-09-03 9:33 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-mm, Jason Gunthorpe On 02.09.25 23:06, Matthew Wilcox wrote: > On Tue, Sep 02, 2025 at 10:09:49PM +0200, David Hildenbrand wrote: >> Would you want to move the page type already from the mapcount into the >> memdesc? That sounds challenging, because for any typed folios we would >> not be allowed to reuse a field we want to use for the memdesc. IIRC< >> hugetlb pretty much uses all of it. > > That would definitely be part of the same series. But possibly not the > same patch. I think the series has to include separate allocations for > slab, folio and whichever other memdescs won't fit into 32 bytes. I was wondering whether there could be a single patch where we do this change (separate allocations), and just prepare the code in previous patches for that accordingly, such that the resulting patch is still reasonable small. I feel like this way of splitting patches might cause unnecessary headaches :) > >> The easy way out for now would be making this page type specific: Only >> selected typed pages will store the memdesc (here: slab pointer) e.g., >> in the old page->mapping place. >> >> So PageSlab() still checks the existing page type, put page_slab() would >> simply lookup the pointer in the old page->mapping place. > > I *think* that's roughly the same as what I'm proposing, except > that we already have a meaning for "the bottom two bits of > folio->mapping are set", so there's potential confusion for > folio_test_anon() & friends. IIRC, we must always make sure to never call folio_test_anon() on something that is a slab already. But if in doubt, we could use bit[2] in ->mapping, which should still be unussed IIRC. >> >>> There are a few overlapping uses of these bits in struct page, so if we do >>> nothing we may get confused. We can deal with mlock_count and order (for >>> pcp_llist). But the biggest problem is the first tail page of a folio. >>> Depending on word size and endianness, there are four different atomic_t >>> fields that overlap with page->lru.prev. That can't be solved by using >>> a different field in struct page; the first tail page is jam-packed. >>> >>> So, page_slab() will first load page->memdesc (the same bits as >>> page->lru.prev), check the bottom four bits match the slab memdesc, and >>> also check page->page_type matches PGTY_slab. I don't like this a lot, >>> because it's two loads rather than one atomic load, but it should only >>> be present for one commit. >> >> As a first step, I would really not use the bottom four bits. Why >> perform two type checks initially? > > I'm concerned by things like compaction that are executing > asynchronously and might see a page mid-transition. Or something like > GUP or lockless pagecache lookup that might get a stale page pointer. > It's a lot easier to reason about if we can do a single load and treat > that as a source of truth (with the appropriate reloads to make sure > nothing changed after we got a refcount). Doing two loads makes > my brain hurt a bit because it introduces more possibilities for > inconsistency. I'll need to write it up pretty carefully (which > annoys me because we're going to need it for a single or very > few commits ...) Makes sense, but maybe we can avoid all that by just structuring the patches differently :) -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-09-03 12:43 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-02 19:03 Where to put page->memdesc initially Matthew Wilcox 2025-09-02 20:08 ` Jason Gunthorpe 2025-09-02 20:09 ` David Hildenbrand 2025-09-02 21:06 ` Matthew Wilcox 2025-09-02 21:15 ` Jason Gunthorpe 2025-09-02 23:24 ` Matthew Wilcox 2025-09-02 23:57 ` Jason Gunthorpe 2025-09-03 4:46 ` Matthew Wilcox 2025-09-03 9:38 ` David Hildenbrand 2025-09-03 12:28 ` Jason Gunthorpe 2025-09-03 12:43 ` Jason Gunthorpe 2025-09-03 9:33 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).