Re: [PATCH v4 0/2] mm: improve folio refcount scalability

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Zi Yan <ziy@nvidia.com>,
	ilya.gladyshev@linux.dev,
	Andrew Morton <akpm@linux-foundation.org>
Cc: ivgorbunov@me.com, Liam.Howlett@oracle.com, apopple@nvidia.com,
	artem.kuzin@huawei.com, baolin.wang@linux.alibaba.com,
	foxido@foxido.dev, harry.yoo@oracle.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	lorenzo.stoakes@oracle.com, mhocko@suse.com,
	muchun.song@linux.dev, rppt@kernel.org, surenb@google.com,
	torvalds@linuxfoundation.org, vbabka@suse.cz,
	willy@infradead.org, yuzhao@google.com, pfalcato@suse.de,
	kirill@shutemov.name
Subject: Re: [PATCH v4 0/2] mm: improve folio refcount scalability
Date: Mon, 22 Jun 2026 09:55:42 +0200	[thread overview]
Message-ID: <6d576c2a-af34-493c-bccb-cf2e035e6506@kernel.org> (raw)
In-Reply-To: <DJECF2107V6T.23HGJJJ8ECKJ9@nvidia.com>

On 6/21/26 03:40, Zi Yan wrote:
> On Sat Jun 20, 2026 at 2:19 PM EDT,  wrote:
>>>
>>> Thanks. Nice numbers.
>>>
>>> AI review had some things to say:
>>>  https://sashiko.dev/#/patchset/df26082871b4c65b2bd38d409026237c08572836@linux.dev
>>
>> Among some minor issues, it also pointed out a funny ABA race:
>>
>> ```
>> T1/T2 work with pages of type X.
>> T3 works with pages of type Y.
>>
>> T1: page_dec_and_test()
>> T1: -> sub refcount [1 -> 0]
>> T1: -> *interrupted* (very bad hypervisor, for example)
>>
>> T2: optimistic get() [0 -> 1]
> 
> How is this possible? folio_get() and folio_try_get() should prevent
> getting a refcount-0 folio; get_page() uses folio_get(); try_get_page()
> also checks refcount before ref_inc.

With this patch set, it's different:

Essentially, there is a short time frame between dropping the refcount to 0 and
setting it to PAGEREF_FROZEN_BIT.

(1) atomic_sub_and_test() [1 -> 0]
(2) atomic_cmpxchg_relaxed(0, PAGEREF_FROZEN_BIT) [0 -> PAGEREF_FROZEN_BIT]

Someone in-between (1) and (2) can still "abort" this freeing process by
incrementing the refcount from 0 to 1. (2) will detect this and count it as
"well, not freed".

So with a refcount if 0, the page is "staged for freeing", which can be aborted.

> 
> For your patch, is it because of the separation of refcount-0 and
> frozen? The page goes refcount-0 before it is frozen? 

Exactly that. Refcount 0 is only transitional.

> Will it work if
> the page is frozen first then gets its refcount to 0? Basically, the
> frozen state prevents anyone else messing up with your refcount.
> 

Let me comment the original sequence:

```
T1/T2 work with pages of type X.
T3 works with pages of type Y.

T1: page_dec_and_test()
T1: -> sub refcount [1 -> 0]
T1: -> interrupted (very bad hypervisor, for example)

-> T1 did (1) but not (2)

T2: optimistic get() [0 -> 1]

-> T2 intercepted after T1's (1).

T2: put page back [1 -> 0]

-> T2 started (1)

T2: calls dtor for type X, returns into the allocator

-> For that to happen it must perform (2) through page_dec_and_test().
-> Page is frozen now. [0 -> PAGEREF_FROZEN_BIT]


T3: receives page of type Y, sets refcount to 1

-> Ordinary refcounted allocation. [PAGEREF_FROZEN_BIT -> 1]

T3: page_dec_and_test()
T3: -> sub refcount [1 -> 0]

-> T3 did (1) but not (2) yet.

T1 resumes execution

T1: -> CAS [0->LOCKED]

-> T1 did its (2) now.

T1: BUG: calls dtor of type X on page of type Y


I am confused about the "dtor of type X vs. type Y". We have the page in our
hand, once we won (2) we can lookup the type and do the right thing.


I guess the interesting part is where the destructor is called from the outside:

static inline void folio_put(struct folio *folio)
{
	if (folio_put_testzero(folio))
		__folio_put(folio);
}

Where we keep operating on it as if it were a folio, although it might now be a
non-folio thing.

So if T1 is doing a folio_put(), we'd call into page_cache_release() etc with
a non-folio thing.

But IIUC, that can happen today already when we do
folio_try_get(folio)+folio_put(folio) and it wasn't a folio in the first place?

I'd assume that will all change once the refcount moves into
separately-allocated "struct folios".

-- 
Cheers,

David

next prev parent reply	other threads:[~2026-06-22  7:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08 21:53 [PATCH v4 0/2] mm: improve folio refcount scalability Gladyshev Ilya
2026-06-08 21:54 ` [PATCH v4 1/2] mm: drop page refcount zero state semantics ilya.gladyshev
2026-06-08 21:54 ` [PATCH v4 2/2] mm: implement page refcount locking via dedicated bit Gladyshev Ilya
2026-06-08 22:47 ` [PATCH v4 0/2] mm: improve folio refcount scalability Andrew Morton
2026-06-09 10:28   ` David Hildenbrand (Arm)
2026-06-09 19:02   ` Gladyshev Ilya
2026-06-09 21:02     ` Pedro Falcato
2026-06-20 18:19   ` ilya.gladyshev
2026-06-21  1:40     ` Zi Yan
2026-06-22  7:55       ` David Hildenbrand (Arm) [this message]
2026-06-21  4:46     ` Linus Torvalds
2026-06-21 21:34       ` Gladyshev Ilya
2026-06-21 21:50         ` Matthew Wilcox
2026-06-21 22:53         ` Linus Torvalds
2026-06-22  8:15       ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6d576c2a-af34-493c-bccb-cf2e035e6506@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=artem.kuzin@huawei.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=foxido@foxido.dev \
    --cc=harry.yoo@oracle.com \
    --cc=ilya.gladyshev@linux.dev \
    --cc=ivgorbunov@me.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=pfalcato@suse.de \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=torvalds@linuxfoundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.