From: Bjorn Helgaas <helgaas@kernel.org>
To: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
Cc: david@kernel.org, Liam.Howlett@oracle.com,
akpm@linux-foundation.org, apopple@nvidia.com,
baolin.wang@linux.alibaba.com, gladyshev.ilya1@h-partners.com,
harry.yoo@oracle.com, kirill@shutemov.name,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
lorenzo.stoakes@oracle.com, mhocko@suse.com,
muchun.song@linux.dev, rppt@kernel.org, surenb@google.com,
torvalds@linuxfoundation.org, vbabka@suse.cz,
willy@infradead.org, yuzhao@google.com, ziy@nvidia.com,
artem.kuzin@huawei.com
Subject: Re: [PATCH v2 1/2] mm: drop page refcount zero state semantics
Date: Thu, 23 Apr 2026 13:07:52 -0500 [thread overview]
Message-ID: <20260423180752.GA31613@bhelgaas> (raw)
In-Reply-To: <9fd8ebbc0f4f45be611bae0d03dd25dd994233c0.1776350895.git.gorbunov.ivan@h-partners.com>
On Mon, Apr 20, 2026 at 08:01:18AM +0000, Gorbunov Ivan wrote:
> Right now 'zero' state could be interpreted in 2 ways
> 1) Unfrozen page which right now has no explicit owner
> 2) Frozen page
>
> This states can be 'logically' distinguished by operations such as
> page_ref_add, page_ref_inc, etc. In the first we would want the counter to
> increase.
>
> For example one can write
>
> page = alloc_frozen_page(...);
> page_ref_inc(page, 1);
>
> But in the second state increasing a counter of a frozen page, shouldn't be valid at all.
>
> Another reason for change is our other patch (mm: implement page refcount locking via dedicated bit)
> in which frozen pages do not have 0 value in refcount when frozen.
>
> This patch proposes 2 changes
> 1) Deprecate invariant that the value stored in reference count of frozen page is 0
> (Getter functions folio_ref_count/page_ref_count must still return 0 for frozen pages)
> 2) Allow modification operations like page_ref_add to be used only with
> pages with owners
>
> We've looked at places where pages are allocated, and they are
> always initialized via functions like set_page_count(page, 1). However, for
> clarity, we've added a debug BUG_ON inside modification functions to ensure
> that they are called only on pages with owners. In future those
> checks can be improved by replacing operations with their results
> returning analogs, if needed.
>
> Co-developed-by: Gladyshev Ilya <gorbunov.ivan@h-partners.com>
> Signed-off-by: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
> Signed-off-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
No opinion about the rest of the content, but the p2pdma.c change
looks like a no-op, so:
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # p2pdma.c
You might consider rewrapping this commit log to fit in 75 columns or
so, as the log for the second patch does.
> ---
> drivers/pci/p2pdma.c | 2 +-
> include/linux/page_ref.h | 17 +++++++++++++++++
> kernel/liveupdate/kexec_handover.c | 2 +-
> mm/hugetlb.c | 2 +-
> mm/mm_init.c | 6 +++---
> mm/page_alloc.c | 4 ++--
> 6 files changed, 25 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index e0f546166eb8..e060ae7e1644 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -158,7 +158,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
> * because we don't want to trigger the
> * p2pdma_folio_free() path.
> */
> - set_page_count(page, 0);
> + set_page_count_as_frozen(page);
> percpu_ref_put(ref);
> return ret;
> }
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index 94d3f0e71c06..a7a07b61d2ae 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -62,6 +62,11 @@ static inline void __page_ref_unfreeze(struct page *page, int v)
>
> #endif
>
> +static inline bool __page_count_is_frozen(int count)
> +{
> + return count == 0;
> +}
> +
> static inline int page_ref_count(const struct page *page)
> {
> return atomic_read(&page->_refcount);
> @@ -115,8 +120,14 @@ static inline void init_page_count(struct page *page)
> set_page_count(page, 1);
> }
>
> +static inline void set_page_count_as_frozen(struct page *page)
> +{
> + set_page_count(page, 0);
> +}
> +
> static inline void page_ref_add(struct page *page, int nr)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_add(nr, &page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, nr);
> @@ -129,6 +140,7 @@ static inline void folio_ref_add(struct folio *folio, int nr)
>
> static inline void page_ref_sub(struct page *page, int nr)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_sub(nr, &page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, -nr);
> @@ -142,6 +154,7 @@ static inline void folio_ref_sub(struct folio *folio, int nr)
> static inline int folio_ref_sub_return(struct folio *folio, int nr)
> {
> int ret = atomic_sub_return(nr, &folio->_refcount);
> + VM_BUG_ON(__page_count_is_frozen(ret + nr));
>
> if (page_ref_tracepoint_active(page_ref_mod_and_return))
> __page_ref_mod_and_return(&folio->page, -nr, ret);
> @@ -150,6 +163,7 @@ static inline int folio_ref_sub_return(struct folio *folio, int nr)
>
> static inline void page_ref_inc(struct page *page)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_inc(&page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, 1);
> @@ -162,6 +176,7 @@ static inline void folio_ref_inc(struct folio *folio)
>
> static inline void page_ref_dec(struct page *page)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_dec(&page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, -1);
> @@ -189,6 +204,7 @@ static inline int folio_ref_sub_and_test(struct folio *folio, int nr)
> static inline int page_ref_inc_return(struct page *page)
> {
> int ret = atomic_inc_return(&page->_refcount);
> + VM_BUG_ON(__page_count_is_frozen(ret - 1));
>
> if (page_ref_tracepoint_active(page_ref_mod_and_return))
> __page_ref_mod_and_return(page, 1, ret);
> @@ -217,6 +233,7 @@ static inline int folio_ref_dec_and_test(struct folio *folio)
> static inline int page_ref_dec_return(struct page *page)
> {
> int ret = atomic_dec_return(&page->_refcount);
> + VM_BUG_ON(__page_count_is_frozen(ret + 1));
>
> if (page_ref_tracepoint_active(page_ref_mod_and_return))
> __page_ref_mod_and_return(page, -1, ret);
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index b64f36a45296..36c21f3d8250 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -390,7 +390,7 @@ static void kho_init_folio(struct page *page, unsigned int order)
>
> /* For higher order folios, tail pages get a page count of zero. */
> for (unsigned long i = 1; i < nr_pages; i++)
> - set_page_count(page + i, 0);
> + set_page_count_as_frozen(page + i);
>
> if (order > 0)
> prep_compound_page(page, order);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1d41fa3dd43e..b364fda29111 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3186,7 +3186,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
> __init_single_page(page, pfn, zone, nid);
> prep_compound_tail(page, &folio->page, order);
> - set_page_count(page, 0);
> + set_page_count_as_frozen(page);
> }
> }
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index cec7bb758bdd..e4ec672a9f51 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1066,7 +1066,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
> case MEMORY_DEVICE_PRIVATE:
> case MEMORY_DEVICE_COHERENT:
> case MEMORY_DEVICE_PCI_P2PDMA:
> - set_page_count(page, 0);
> + set_page_count_as_frozen(page);
> break;
>
> case MEMORY_DEVICE_GENERIC:
> @@ -1112,7 +1112,7 @@ static void __ref memmap_init_compound(struct page *head,
>
> __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
> prep_compound_tail(page, head, order);
> - set_page_count(page, 0);
> + set_page_count_as_frozen(page);
> }
> prep_compound_head(head, order);
> }
> @@ -2250,7 +2250,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
>
> do {
> __ClearPageReserved(p);
> - set_page_count(p, 0);
> + set_page_count_as_frozen(p);
> } while (++p, --i);
>
> init_pageblock_migratetype(page, MIGRATE_CMA, false);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 65e702fade61..27734cf795da 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1639,14 +1639,14 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
> for (loop = 0; loop < nr_pages; loop++, p++) {
> VM_WARN_ON_ONCE(PageReserved(p));
> __ClearPageOffline(p);
> - set_page_count(p, 0);
> + set_page_count_as_frozen(p);
> }
>
> adjust_managed_page_count(page, nr_pages);
> } else {
> for (loop = 0; loop < nr_pages; loop++, p++) {
> __ClearPageReserved(p);
> - set_page_count(p, 0);
> + set_page_count_as_frozen(p);
> }
>
> /* memblock adjusts totalram_pages() manually. */
> --
> 2.43.0
>
next prev parent reply other threads:[~2026-04-23 18:07 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 8:01 [PATCH v2 0/2] mm: improve folio refcount scalability Gorbunov Ivan
2026-04-20 8:01 ` [PATCH v2 1/2] mm: drop page refcount zero state semantics Gorbunov Ivan
2026-04-23 18:07 ` Bjorn Helgaas [this message]
2026-04-23 19:32 ` Zi Yan
2026-04-20 8:01 ` [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit Gorbunov Ivan
2026-04-23 18:24 ` Matthew Wilcox
2026-04-23 18:31 ` Linus Torvalds
2026-04-23 19:20 ` David Hildenbrand (Arm)
2026-04-23 19:37 ` Zi Yan
2026-04-20 10:07 ` [syzbot ci] Re: mm: improve folio refcount scalability syzbot ci
2026-04-20 12:29 ` Gorbunov Ivan
2026-04-20 13:21 ` syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260423180752.GA31613@bhelgaas \
--to=helgaas@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=artem.kuzin@huawei.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=gladyshev.ilya1@h-partners.com \
--cc=gorbunov.ivan@h-partners.com \
--cc=harry.yoo@oracle.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=torvalds@linuxfoundation.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox