* [PATCH v2 1/2] mm: drop page refcount zero state semantics
2026-04-20 8:01 [PATCH v2 0/2] mm: improve folio refcount scalability Gorbunov Ivan
@ 2026-04-20 8:01 ` Gorbunov Ivan
2026-04-23 18:07 ` Bjorn Helgaas
2026-04-23 19:32 ` Zi Yan
2026-04-20 8:01 ` [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit Gorbunov Ivan
2026-04-20 10:07 ` [syzbot ci] Re: mm: improve folio refcount scalability syzbot ci
2 siblings, 2 replies; 12+ messages in thread
From: Gorbunov Ivan @ 2026-04-20 8:01 UTC (permalink / raw)
To: gorbunov.ivan
Cc: david, Liam.Howlett, akpm, apopple, baolin.wang, gladyshev.ilya1,
harry.yoo, kirill, linux-kernel, linux-mm, lorenzo.stoakes,
mhocko, muchun.song, rppt, surenb, torvalds, vbabka, willy,
yuzhao, ziy, artem.kuzin
Right now 'zero' state could be interpreted in 2 ways
1) Unfrozen page which right now has no explicit owner
2) Frozen page
This states can be 'logically' distinguished by operations such as
page_ref_add, page_ref_inc, etc. In the first we would want the counter to
increase.
For example one can write
page = alloc_frozen_page(...);
page_ref_inc(page, 1);
But in the second state increasing a counter of a frozen page, shouldn't be valid at all.
Another reason for change is our other patch (mm: implement page refcount locking via dedicated bit)
in which frozen pages do not have 0 value in refcount when frozen.
This patch proposes 2 changes
1) Deprecate invariant that the value stored in reference count of frozen page is 0
(Getter functions folio_ref_count/page_ref_count must still return 0 for frozen pages)
2) Allow modification operations like page_ref_add to be used only with
pages with owners
We've looked at places where pages are allocated, and they are
always initialized via functions like set_page_count(page, 1). However, for
clarity, we've added a debug BUG_ON inside modification functions to ensure
that they are called only on pages with owners. In future those
checks can be improved by replacing operations with their results
returning analogs, if needed.
Co-developed-by: Gladyshev Ilya <gorbunov.ivan@h-partners.com>
Signed-off-by: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
Signed-off-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
---
drivers/pci/p2pdma.c | 2 +-
include/linux/page_ref.h | 17 +++++++++++++++++
kernel/liveupdate/kexec_handover.c | 2 +-
mm/hugetlb.c | 2 +-
mm/mm_init.c | 6 +++---
mm/page_alloc.c | 4 ++--
6 files changed, 25 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index e0f546166eb8..e060ae7e1644 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -158,7 +158,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
* because we don't want to trigger the
* p2pdma_folio_free() path.
*/
- set_page_count(page, 0);
+ set_page_count_as_frozen(page);
percpu_ref_put(ref);
return ret;
}
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 94d3f0e71c06..a7a07b61d2ae 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -62,6 +62,11 @@ static inline void __page_ref_unfreeze(struct page *page, int v)
#endif
+static inline bool __page_count_is_frozen(int count)
+{
+ return count == 0;
+}
+
static inline int page_ref_count(const struct page *page)
{
return atomic_read(&page->_refcount);
@@ -115,8 +120,14 @@ static inline void init_page_count(struct page *page)
set_page_count(page, 1);
}
+static inline void set_page_count_as_frozen(struct page *page)
+{
+ set_page_count(page, 0);
+}
+
static inline void page_ref_add(struct page *page, int nr)
{
+ VM_BUG_ON(__page_count_is_frozen(page_count(page)));
atomic_add(nr, &page->_refcount);
if (page_ref_tracepoint_active(page_ref_mod))
__page_ref_mod(page, nr);
@@ -129,6 +140,7 @@ static inline void folio_ref_add(struct folio *folio, int nr)
static inline void page_ref_sub(struct page *page, int nr)
{
+ VM_BUG_ON(__page_count_is_frozen(page_count(page)));
atomic_sub(nr, &page->_refcount);
if (page_ref_tracepoint_active(page_ref_mod))
__page_ref_mod(page, -nr);
@@ -142,6 +154,7 @@ static inline void folio_ref_sub(struct folio *folio, int nr)
static inline int folio_ref_sub_return(struct folio *folio, int nr)
{
int ret = atomic_sub_return(nr, &folio->_refcount);
+ VM_BUG_ON(__page_count_is_frozen(ret + nr));
if (page_ref_tracepoint_active(page_ref_mod_and_return))
__page_ref_mod_and_return(&folio->page, -nr, ret);
@@ -150,6 +163,7 @@ static inline int folio_ref_sub_return(struct folio *folio, int nr)
static inline void page_ref_inc(struct page *page)
{
+ VM_BUG_ON(__page_count_is_frozen(page_count(page)));
atomic_inc(&page->_refcount);
if (page_ref_tracepoint_active(page_ref_mod))
__page_ref_mod(page, 1);
@@ -162,6 +176,7 @@ static inline void folio_ref_inc(struct folio *folio)
static inline void page_ref_dec(struct page *page)
{
+ VM_BUG_ON(__page_count_is_frozen(page_count(page)));
atomic_dec(&page->_refcount);
if (page_ref_tracepoint_active(page_ref_mod))
__page_ref_mod(page, -1);
@@ -189,6 +204,7 @@ static inline int folio_ref_sub_and_test(struct folio *folio, int nr)
static inline int page_ref_inc_return(struct page *page)
{
int ret = atomic_inc_return(&page->_refcount);
+ VM_BUG_ON(__page_count_is_frozen(ret - 1));
if (page_ref_tracepoint_active(page_ref_mod_and_return))
__page_ref_mod_and_return(page, 1, ret);
@@ -217,6 +233,7 @@ static inline int folio_ref_dec_and_test(struct folio *folio)
static inline int page_ref_dec_return(struct page *page)
{
int ret = atomic_dec_return(&page->_refcount);
+ VM_BUG_ON(__page_count_is_frozen(ret + 1));
if (page_ref_tracepoint_active(page_ref_mod_and_return))
__page_ref_mod_and_return(page, -1, ret);
diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index b64f36a45296..36c21f3d8250 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -390,7 +390,7 @@ static void kho_init_folio(struct page *page, unsigned int order)
/* For higher order folios, tail pages get a page count of zero. */
for (unsigned long i = 1; i < nr_pages; i++)
- set_page_count(page + i, 0);
+ set_page_count_as_frozen(page + i);
if (order > 0)
prep_compound_page(page, order);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1d41fa3dd43e..b364fda29111 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3186,7 +3186,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
__init_single_page(page, pfn, zone, nid);
prep_compound_tail(page, &folio->page, order);
- set_page_count(page, 0);
+ set_page_count_as_frozen(page);
}
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index cec7bb758bdd..e4ec672a9f51 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1066,7 +1066,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
case MEMORY_DEVICE_PRIVATE:
case MEMORY_DEVICE_COHERENT:
case MEMORY_DEVICE_PCI_P2PDMA:
- set_page_count(page, 0);
+ set_page_count_as_frozen(page);
break;
case MEMORY_DEVICE_GENERIC:
@@ -1112,7 +1112,7 @@ static void __ref memmap_init_compound(struct page *head,
__init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
prep_compound_tail(page, head, order);
- set_page_count(page, 0);
+ set_page_count_as_frozen(page);
}
prep_compound_head(head, order);
}
@@ -2250,7 +2250,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
do {
__ClearPageReserved(p);
- set_page_count(p, 0);
+ set_page_count_as_frozen(p);
} while (++p, --i);
init_pageblock_migratetype(page, MIGRATE_CMA, false);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 65e702fade61..27734cf795da 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1639,14 +1639,14 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
for (loop = 0; loop < nr_pages; loop++, p++) {
VM_WARN_ON_ONCE(PageReserved(p));
__ClearPageOffline(p);
- set_page_count(p, 0);
+ set_page_count_as_frozen(p);
}
adjust_managed_page_count(page, nr_pages);
} else {
for (loop = 0; loop < nr_pages; loop++, p++) {
__ClearPageReserved(p);
- set_page_count(p, 0);
+ set_page_count_as_frozen(p);
}
/* memblock adjusts totalram_pages() manually. */
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v2 1/2] mm: drop page refcount zero state semantics
2026-04-20 8:01 ` [PATCH v2 1/2] mm: drop page refcount zero state semantics Gorbunov Ivan
@ 2026-04-23 18:07 ` Bjorn Helgaas
2026-04-23 19:32 ` Zi Yan
1 sibling, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2026-04-23 18:07 UTC (permalink / raw)
To: Gorbunov Ivan
Cc: david, Liam.Howlett, akpm, apopple, baolin.wang, gladyshev.ilya1,
harry.yoo, kirill, linux-kernel, linux-mm, lorenzo.stoakes,
mhocko, muchun.song, rppt, surenb, torvalds, vbabka, willy,
yuzhao, ziy, artem.kuzin
On Mon, Apr 20, 2026 at 08:01:18AM +0000, Gorbunov Ivan wrote:
> Right now 'zero' state could be interpreted in 2 ways
> 1) Unfrozen page which right now has no explicit owner
> 2) Frozen page
>
> This states can be 'logically' distinguished by operations such as
> page_ref_add, page_ref_inc, etc. In the first we would want the counter to
> increase.
>
> For example one can write
>
> page = alloc_frozen_page(...);
> page_ref_inc(page, 1);
>
> But in the second state increasing a counter of a frozen page, shouldn't be valid at all.
>
> Another reason for change is our other patch (mm: implement page refcount locking via dedicated bit)
> in which frozen pages do not have 0 value in refcount when frozen.
>
> This patch proposes 2 changes
> 1) Deprecate invariant that the value stored in reference count of frozen page is 0
> (Getter functions folio_ref_count/page_ref_count must still return 0 for frozen pages)
> 2) Allow modification operations like page_ref_add to be used only with
> pages with owners
>
> We've looked at places where pages are allocated, and they are
> always initialized via functions like set_page_count(page, 1). However, for
> clarity, we've added a debug BUG_ON inside modification functions to ensure
> that they are called only on pages with owners. In future those
> checks can be improved by replacing operations with their results
> returning analogs, if needed.
>
> Co-developed-by: Gladyshev Ilya <gorbunov.ivan@h-partners.com>
> Signed-off-by: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
> Signed-off-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
No opinion about the rest of the content, but the p2pdma.c change
looks like a no-op, so:
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # p2pdma.c
You might consider rewrapping this commit log to fit in 75 columns or
so, as the log for the second patch does.
> ---
> drivers/pci/p2pdma.c | 2 +-
> include/linux/page_ref.h | 17 +++++++++++++++++
> kernel/liveupdate/kexec_handover.c | 2 +-
> mm/hugetlb.c | 2 +-
> mm/mm_init.c | 6 +++---
> mm/page_alloc.c | 4 ++--
> 6 files changed, 25 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index e0f546166eb8..e060ae7e1644 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -158,7 +158,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
> * because we don't want to trigger the
> * p2pdma_folio_free() path.
> */
> - set_page_count(page, 0);
> + set_page_count_as_frozen(page);
> percpu_ref_put(ref);
> return ret;
> }
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index 94d3f0e71c06..a7a07b61d2ae 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -62,6 +62,11 @@ static inline void __page_ref_unfreeze(struct page *page, int v)
>
> #endif
>
> +static inline bool __page_count_is_frozen(int count)
> +{
> + return count == 0;
> +}
> +
> static inline int page_ref_count(const struct page *page)
> {
> return atomic_read(&page->_refcount);
> @@ -115,8 +120,14 @@ static inline void init_page_count(struct page *page)
> set_page_count(page, 1);
> }
>
> +static inline void set_page_count_as_frozen(struct page *page)
> +{
> + set_page_count(page, 0);
> +}
> +
> static inline void page_ref_add(struct page *page, int nr)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_add(nr, &page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, nr);
> @@ -129,6 +140,7 @@ static inline void folio_ref_add(struct folio *folio, int nr)
>
> static inline void page_ref_sub(struct page *page, int nr)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_sub(nr, &page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, -nr);
> @@ -142,6 +154,7 @@ static inline void folio_ref_sub(struct folio *folio, int nr)
> static inline int folio_ref_sub_return(struct folio *folio, int nr)
> {
> int ret = atomic_sub_return(nr, &folio->_refcount);
> + VM_BUG_ON(__page_count_is_frozen(ret + nr));
>
> if (page_ref_tracepoint_active(page_ref_mod_and_return))
> __page_ref_mod_and_return(&folio->page, -nr, ret);
> @@ -150,6 +163,7 @@ static inline int folio_ref_sub_return(struct folio *folio, int nr)
>
> static inline void page_ref_inc(struct page *page)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_inc(&page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, 1);
> @@ -162,6 +176,7 @@ static inline void folio_ref_inc(struct folio *folio)
>
> static inline void page_ref_dec(struct page *page)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_dec(&page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, -1);
> @@ -189,6 +204,7 @@ static inline int folio_ref_sub_and_test(struct folio *folio, int nr)
> static inline int page_ref_inc_return(struct page *page)
> {
> int ret = atomic_inc_return(&page->_refcount);
> + VM_BUG_ON(__page_count_is_frozen(ret - 1));
>
> if (page_ref_tracepoint_active(page_ref_mod_and_return))
> __page_ref_mod_and_return(page, 1, ret);
> @@ -217,6 +233,7 @@ static inline int folio_ref_dec_and_test(struct folio *folio)
> static inline int page_ref_dec_return(struct page *page)
> {
> int ret = atomic_dec_return(&page->_refcount);
> + VM_BUG_ON(__page_count_is_frozen(ret + 1));
>
> if (page_ref_tracepoint_active(page_ref_mod_and_return))
> __page_ref_mod_and_return(page, -1, ret);
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index b64f36a45296..36c21f3d8250 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -390,7 +390,7 @@ static void kho_init_folio(struct page *page, unsigned int order)
>
> /* For higher order folios, tail pages get a page count of zero. */
> for (unsigned long i = 1; i < nr_pages; i++)
> - set_page_count(page + i, 0);
> + set_page_count_as_frozen(page + i);
>
> if (order > 0)
> prep_compound_page(page, order);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1d41fa3dd43e..b364fda29111 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3186,7 +3186,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
> __init_single_page(page, pfn, zone, nid);
> prep_compound_tail(page, &folio->page, order);
> - set_page_count(page, 0);
> + set_page_count_as_frozen(page);
> }
> }
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index cec7bb758bdd..e4ec672a9f51 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1066,7 +1066,7 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
> case MEMORY_DEVICE_PRIVATE:
> case MEMORY_DEVICE_COHERENT:
> case MEMORY_DEVICE_PCI_P2PDMA:
> - set_page_count(page, 0);
> + set_page_count_as_frozen(page);
> break;
>
> case MEMORY_DEVICE_GENERIC:
> @@ -1112,7 +1112,7 @@ static void __ref memmap_init_compound(struct page *head,
>
> __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
> prep_compound_tail(page, head, order);
> - set_page_count(page, 0);
> + set_page_count_as_frozen(page);
> }
> prep_compound_head(head, order);
> }
> @@ -2250,7 +2250,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
>
> do {
> __ClearPageReserved(p);
> - set_page_count(p, 0);
> + set_page_count_as_frozen(p);
> } while (++p, --i);
>
> init_pageblock_migratetype(page, MIGRATE_CMA, false);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 65e702fade61..27734cf795da 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1639,14 +1639,14 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
> for (loop = 0; loop < nr_pages; loop++, p++) {
> VM_WARN_ON_ONCE(PageReserved(p));
> __ClearPageOffline(p);
> - set_page_count(p, 0);
> + set_page_count_as_frozen(p);
> }
>
> adjust_managed_page_count(page, nr_pages);
> } else {
> for (loop = 0; loop < nr_pages; loop++, p++) {
> __ClearPageReserved(p);
> - set_page_count(p, 0);
> + set_page_count_as_frozen(p);
> }
>
> /* memblock adjusts totalram_pages() manually. */
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2 1/2] mm: drop page refcount zero state semantics
2026-04-20 8:01 ` [PATCH v2 1/2] mm: drop page refcount zero state semantics Gorbunov Ivan
2026-04-23 18:07 ` Bjorn Helgaas
@ 2026-04-23 19:32 ` Zi Yan
1 sibling, 0 replies; 12+ messages in thread
From: Zi Yan @ 2026-04-23 19:32 UTC (permalink / raw)
To: Gorbunov Ivan
Cc: david, Liam.Howlett, akpm, apopple, baolin.wang, gladyshev.ilya1,
harry.yoo, kirill, linux-kernel, linux-mm, lorenzo.stoakes,
mhocko, muchun.song, rppt, surenb, torvalds, vbabka, willy,
yuzhao, artem.kuzin
On 20 Apr 2026, at 4:01, Gorbunov Ivan wrote:
> Right now 'zero' state could be interpreted in 2 ways
> 1) Unfrozen page which right now has no explicit owner
> 2) Frozen page
>
> This states can be 'logically' distinguished by operations such as
> page_ref_add, page_ref_inc, etc. In the first we would want the counter to
> increase.
>
> For example one can write
>
> page = alloc_frozen_page(...);
> page_ref_inc(page, 1);
>
> But in the second state increasing a counter of a frozen page, shouldn't be valid at all.
>
> Another reason for change is our other patch (mm: implement page refcount locking via dedicated bit)
> in which frozen pages do not have 0 value in refcount when frozen.
>
> This patch proposes 2 changes
> 1) Deprecate invariant that the value stored in reference count of frozen page is 0
> (Getter functions folio_ref_count/page_ref_count must still return 0 for frozen pages)
> 2) Allow modification operations like page_ref_add to be used only with
> pages with owners
Should we also ban calling set_page_count() except init_page_count() and
set_page_count_as_frozen()? So that no one can manipulate page refcount
randomly. The same applies to folio_set_count().
All set_page_count(..., 0) are converted to set_page_count_as_frozen().
All set_page_count(..., 1) are converted to init_page_count().
We might need an init_folio_count() for folio_set_count(..., 1) users
and folio_set_count() has no other use.
I see set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); from
mm/page_frag_cache.c and it could be converted to page_ref_unfreeze(),
since above it there is free_frozen_page(), which makes me think the
page is frozen.
>
> We've looked at places where pages are allocated, and they are
> always initialized via functions like set_page_count(page, 1). However, for
> clarity, we've added a debug BUG_ON inside modification functions to ensure
> that they are called only on pages with owners. In future those
> checks can be improved by replacing operations with their results
> returning analogs, if needed.
>
> Co-developed-by: Gladyshev Ilya <gorbunov.ivan@h-partners.com>
> Signed-off-by: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
> Signed-off-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
> ---
> drivers/pci/p2pdma.c | 2 +-
> include/linux/page_ref.h | 17 +++++++++++++++++
> kernel/liveupdate/kexec_handover.c | 2 +-
> mm/hugetlb.c | 2 +-
> mm/mm_init.c | 6 +++---
> mm/page_alloc.c | 4 ++--
> 6 files changed, 25 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index e0f546166eb8..e060ae7e1644 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -158,7 +158,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
> * because we don't want to trigger the
> * p2pdma_folio_free() path.
> */
> - set_page_count(page, 0);
> + set_page_count_as_frozen(page);
This should be page_ref_dec(page), since the comment says “We don’t use put_page()”,
meaning the code is intended to drop a ref and set_page_count(page, 0) is
just a shortcut.
> percpu_ref_put(ref);
> return ret;
> }
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index 94d3f0e71c06..a7a07b61d2ae 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -62,6 +62,11 @@ static inline void __page_ref_unfreeze(struct page *page, int v)
>
> #endif
>
> +static inline bool __page_count_is_frozen(int count)
> +{
> + return count == 0;
> +}
> +
> static inline int page_ref_count(const struct page *page)
> {
> return atomic_read(&page->_refcount);
> @@ -115,8 +120,14 @@ static inline void init_page_count(struct page *page)
> set_page_count(page, 1);
> }
>
> +static inline void set_page_count_as_frozen(struct page *page)
> +{
> + set_page_count(page, 0);
> +}
> +
> static inline void page_ref_add(struct page *page, int nr)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_add(nr, &page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, nr);
> @@ -129,6 +140,7 @@ static inline void folio_ref_add(struct folio *folio, int nr)
>
> static inline void page_ref_sub(struct page *page, int nr)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_sub(nr, &page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, -nr);
> @@ -142,6 +154,7 @@ static inline void folio_ref_sub(struct folio *folio, int nr)
> static inline int folio_ref_sub_return(struct folio *folio, int nr)
> {
> int ret = atomic_sub_return(nr, &folio->_refcount);
> + VM_BUG_ON(__page_count_is_frozen(ret + nr));
>
> if (page_ref_tracepoint_active(page_ref_mod_and_return))
> __page_ref_mod_and_return(&folio->page, -nr, ret);
> @@ -150,6 +163,7 @@ static inline int folio_ref_sub_return(struct folio *folio, int nr)
>
> static inline void page_ref_inc(struct page *page)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_inc(&page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, 1);
> @@ -162,6 +176,7 @@ static inline void folio_ref_inc(struct folio *folio)
>
> static inline void page_ref_dec(struct page *page)
> {
> + VM_BUG_ON(__page_count_is_frozen(page_count(page)));
> atomic_dec(&page->_refcount);
> if (page_ref_tracepoint_active(page_ref_mod))
> __page_ref_mod(page, -1);
> @@ -189,6 +204,7 @@ static inline int folio_ref_sub_and_test(struct folio *folio, int nr)
> static inline int page_ref_inc_return(struct page *page)
> {
> int ret = atomic_inc_return(&page->_refcount);
> + VM_BUG_ON(__page_count_is_frozen(ret - 1));
>
> if (page_ref_tracepoint_active(page_ref_mod_and_return))
> __page_ref_mod_and_return(page, 1, ret);
> @@ -217,6 +233,7 @@ static inline int folio_ref_dec_and_test(struct folio *folio)
> static inline int page_ref_dec_return(struct page *page)
> {
> int ret = atomic_dec_return(&page->_refcount);
> + VM_BUG_ON(__page_count_is_frozen(ret + 1));
>
> if (page_ref_tracepoint_active(page_ref_mod_and_return))
> __page_ref_mod_and_return(page, -1, ret);
VM_WARN_ON_ONCE() might be better?
<snip>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 65e702fade61..27734cf795da 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1639,14 +1639,14 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
> for (loop = 0; loop < nr_pages; loop++, p++) {
> VM_WARN_ON_ONCE(PageReserved(p));
> __ClearPageOffline(p);
> - set_page_count(p, 0);
> + set_page_count_as_frozen(p);
> }
>
> adjust_managed_page_count(page, nr_pages);
> } else {
> for (loop = 0; loop < nr_pages; loop++, p++) {
> __ClearPageReserved(p);
> - set_page_count(p, 0);
> + set_page_count_as_frozen(p);
> }
>
> /* memblock adjusts totalram_pages() manually. */
> --
Not sure about these two, they freeze p and p goes into buddy without
unfreeze to refcount==0. But at the beginning, you said there are two
states:
1) Unfrozen page which right now has no explicit owner
2) Frozen page
1) means free pages in buddy. But the above change brings frozen pages
into buddy, mixing the two states into one. Is that intended?
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit
2026-04-20 8:01 [PATCH v2 0/2] mm: improve folio refcount scalability Gorbunov Ivan
2026-04-20 8:01 ` [PATCH v2 1/2] mm: drop page refcount zero state semantics Gorbunov Ivan
@ 2026-04-20 8:01 ` Gorbunov Ivan
2026-04-23 18:24 ` Matthew Wilcox
2026-04-23 19:37 ` Zi Yan
2026-04-20 10:07 ` [syzbot ci] Re: mm: improve folio refcount scalability syzbot ci
2 siblings, 2 replies; 12+ messages in thread
From: Gorbunov Ivan @ 2026-04-20 8:01 UTC (permalink / raw)
To: gorbunov.ivan
Cc: david, Liam.Howlett, akpm, apopple, baolin.wang, gladyshev.ilya1,
harry.yoo, kirill, linux-kernel, linux-mm, lorenzo.stoakes,
mhocko, muchun.song, rppt, surenb, torvalds, vbabka, willy,
yuzhao, ziy, artem.kuzin
From: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
The current atomic-based page refcount implementation treats zero
counter as dead and requires a compare-and-swap loop in folio_try_get()
to prevent incrementing a dead refcount. This CAS loop acts as a
serialization point and can become a significant bottleneck during
high-frequency file read operations.
This patch introduces PAGEREF_FROZEN_BIT to distinguish between a
(temporary) zero refcount and a locked (dead/frozen) state. Because now
incrementing counter doesn't affect it's locked/unlocked state, it is
possible to use an optimistic atomic_add_return() in
page_ref_add_unless_zero() that operates independently of the locked bit.
The locked state is handled after the increment attempt, eliminating the
need for the CAS loop.
If locked state is detected after atomic_add(), pageref counter will be
reset with CAS loop, eliminating theoretical possibility of overflow.
Co-developed-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
Signed-off-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
Signed-off-by: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
Acked-by: Linus Torvalds <torvalds@linuxfoundation.org>
---
include/linux/page-flags.h | 13 +++++++++++++
include/linux/page_ref.h | 28 ++++++++++++++++++++++++----
2 files changed, 37 insertions(+), 4 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0e03d816e8b9..b3e3da91a90a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -196,6 +196,19 @@ enum pageflags {
#define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1)
+/* Most significant bit in page refcount */
+#define PAGEREF_FROZEN_BIT BIT(31)
+
+/* Page reference counter can be in 4 logical states,
+ * which are described below with their value representation
+ * state | value
+ * (1) safe with owners | 1...INT_MAX
+ * (2) safe with no owners | 0
+ * (3) frozen | INT_MIN....-1
+ *
+ * State (2) can be only temporally inside dec_and_test.
+ */
+
#ifndef __GENERATING_BOUNDS_H
/*
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index a7a07b61d2ae..32194e953674 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -64,12 +64,17 @@ static inline void __page_ref_unfreeze(struct page *page, int v)
static inline bool __page_count_is_frozen(int count)
{
- return count == 0;
+ return count > 0 && !((count & PAGEREF_FROZEN_BIT) != 0);
}
static inline int page_ref_count(const struct page *page)
{
- return atomic_read(&page->_refcount);
+ int val = atomic_read(&page->_refcount);
+
+ if (unlikely(val & PAGEREF_FROZEN_BIT))
+ return 0;
+
+ return val;
}
/**
@@ -191,6 +196,9 @@ static inline int page_ref_sub_and_test(struct page *page, int nr)
{
int ret = atomic_sub_and_test(nr, &page->_refcount);
+ if (ret)
+ ret = !atomic_cmpxchg_relaxed(&page->_refcount, 0, PAGEREF_FROZEN_BIT);
+
if (page_ref_tracepoint_active(page_ref_mod_and_test))
__page_ref_mod_and_test(page, -nr, ret);
return ret;
@@ -220,6 +228,9 @@ static inline int page_ref_dec_and_test(struct page *page)
{
int ret = atomic_dec_and_test(&page->_refcount);
+ if (ret)
+ ret = !atomic_cmpxchg_relaxed(&page->_refcount, 0, PAGEREF_FROZEN_BIT);
+
if (page_ref_tracepoint_active(page_ref_mod_and_test))
__page_ref_mod_and_test(page, -1, ret);
return ret;
@@ -245,9 +256,18 @@ static inline int folio_ref_dec_return(struct folio *folio)
return page_ref_dec_return(&folio->page);
}
+#define _PAGEREF_FROZEN_LIMIT ((1 << 30) | PAGEREF_FROZEN_BIT)
+
static inline bool page_ref_add_unless_zero(struct page *page, int nr)
{
- bool ret = atomic_add_unless(&page->_refcount, nr, 0);
+ bool ret = false;
+ int val = atomic_add_return(nr, &page->_refcount);
+ // See PAGEREF_FROZEN_BIT declaration in page-flags.h for details
+ ret = !(val & PAGEREF_FROZEN_BIT);
+
+ /* Undo atomic_add() if counter is locked and scary big */
+ while (unlikely((unsigned int)val >= _PAGEREF_FROZEN_LIMIT))
+ val = atomic_cmpxchg_relaxed(&page->_refcount, val, PAGEREF_FROZEN_BIT);
if (page_ref_tracepoint_active(page_ref_mod_unless))
__page_ref_mod_unless(page, nr, ret);
@@ -282,7 +302,7 @@ static inline bool folio_ref_try_add(struct folio *folio, int count)
static inline int page_ref_freeze(struct page *page, int count)
{
- int ret = likely(atomic_cmpxchg(&page->_refcount, count, 0) == count);
+ int ret = likely(atomic_cmpxchg(&page->_refcount, count, PAGEREF_FROZEN_BIT) == count);
if (page_ref_tracepoint_active(page_ref_freeze))
__page_ref_freeze(page, count, ret);
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit
2026-04-20 8:01 ` [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit Gorbunov Ivan
@ 2026-04-23 18:24 ` Matthew Wilcox
2026-04-23 18:31 ` Linus Torvalds
2026-04-23 19:20 ` David Hildenbrand (Arm)
2026-04-23 19:37 ` Zi Yan
1 sibling, 2 replies; 12+ messages in thread
From: Matthew Wilcox @ 2026-04-23 18:24 UTC (permalink / raw)
To: Gorbunov Ivan
Cc: david, Liam.Howlett, akpm, apopple, baolin.wang, gladyshev.ilya1,
harry.yoo, kirill, linux-kernel, linux-mm, lorenzo.stoakes,
mhocko, muchun.song, rppt, surenb, torvalds, vbabka, yuzhao, ziy,
artem.kuzin
On Mon, Apr 20, 2026 at 08:01:19AM +0000, Gorbunov Ivan wrote:
> The current atomic-based page refcount implementation treats zero
> counter as dead and requires a compare-and-swap loop in folio_try_get()
> to prevent incrementing a dead refcount. This CAS loop acts as a
> serialization point and can become a significant bottleneck during
> high-frequency file read operations.
If the file read is high-frequency, then for the page refcount to be
the bottleneck, they must be small reads? Have you looked at this
patch?
https://lore.kernel.org/all/20251017141536.577466-1-kirill@shutemov.name/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit
2026-04-23 18:24 ` Matthew Wilcox
@ 2026-04-23 18:31 ` Linus Torvalds
2026-04-23 19:20 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2026-04-23 18:31 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Gorbunov Ivan, david, Liam.Howlett, akpm, apopple, baolin.wang,
gladyshev.ilya1, harry.yoo, kirill, linux-kernel, linux-mm,
lorenzo.stoakes, mhocko, muchun.song, rppt, surenb, vbabka,
yuzhao, ziy, artem.kuzin
On Thu, 23 Apr 2026 at 11:25, Matthew Wilcox <willy@infradead.org> wrote:
>
> If the file read is high-frequency, then for the page refcount to be
> the bottleneck, they must be small reads? Have you looked at this
> patch?
>
> https://lore.kernel.org/all/20251017141536.577466-1-kirill@shutemov.name/
I htink the impetus for this was partly that, see the original cover letter at
https://lore.kernel.org/all/cover.1766145604.git.gladyshev.ilya1@h-partners.com/
quoting:
>> This patch optimizes small file read performance and overall folio refcount
>> scalability by refactoring page_ref_add_unless [core of folio_try_get].
>> This is alternative approach to previous attempts to fix small read
>> performance by avoiding refcount bumps [1][2].
where that [2] is that link to Kirill's patch.
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit
2026-04-23 18:24 ` Matthew Wilcox
2026-04-23 18:31 ` Linus Torvalds
@ 2026-04-23 19:20 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-23 19:20 UTC (permalink / raw)
To: Matthew Wilcox, Gorbunov Ivan
Cc: Liam.Howlett, akpm, apopple, baolin.wang, gladyshev.ilya1,
harry.yoo, kirill, linux-kernel, linux-mm, lorenzo.stoakes,
mhocko, muchun.song, rppt, surenb, torvalds, vbabka, yuzhao, ziy,
artem.kuzin
On 4/23/26 20:24, Matthew Wilcox wrote:
> On Mon, Apr 20, 2026 at 08:01:19AM +0000, Gorbunov Ivan wrote:
>> The current atomic-based page refcount implementation treats zero
>> counter as dead and requires a compare-and-swap loop in folio_try_get()
>> to prevent incrementing a dead refcount. This CAS loop acts as a
>> serialization point and can become a significant bottleneck during
>> high-frequency file read operations.
>
> If the file read is high-frequency, then for the page refcount to be
> the bottleneck, they must be small reads? Have you looked at this
> patch?
>
> https://lore.kernel.org/all/20251017141536.577466-1-kirill@shutemov.name/
I hate that with passion. It reminds me of load_unaligned_zeropad(). Which I
hate with passion.
(somewhere on my todo list is investigating hot to get rid of that whole
load_unaligned_zeropad machinery)
--
Cheers,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit
2026-04-20 8:01 ` [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit Gorbunov Ivan
2026-04-23 18:24 ` Matthew Wilcox
@ 2026-04-23 19:37 ` Zi Yan
1 sibling, 0 replies; 12+ messages in thread
From: Zi Yan @ 2026-04-23 19:37 UTC (permalink / raw)
To: Gorbunov Ivan
Cc: david, Liam.Howlett, akpm, apopple, baolin.wang, gladyshev.ilya1,
harry.yoo, kirill, linux-kernel, linux-mm, lorenzo.stoakes,
mhocko, muchun.song, rppt, surenb, torvalds, vbabka, willy,
yuzhao, artem.kuzin
On 20 Apr 2026, at 4:01, Gorbunov Ivan wrote:
> From: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
>
> The current atomic-based page refcount implementation treats zero
> counter as dead and requires a compare-and-swap loop in folio_try_get()
> to prevent incrementing a dead refcount. This CAS loop acts as a
> serialization point and can become a significant bottleneck during
> high-frequency file read operations.
>
> This patch introduces PAGEREF_FROZEN_BIT to distinguish between a
> (temporary) zero refcount and a locked (dead/frozen) state. Because now
> incrementing counter doesn't affect it's locked/unlocked state, it is
> possible to use an optimistic atomic_add_return() in
> page_ref_add_unless_zero() that operates independently of the locked bit.
> The locked state is handled after the increment attempt, eliminating the
> need for the CAS loop.
>
> If locked state is detected after atomic_add(), pageref counter will be
> reset with CAS loop, eliminating theoretical possibility of overflow.
>
> Co-developed-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
> Signed-off-by: Gorbunov Ivan <gorbunov.ivan@h-partners.com>
> Signed-off-by: Gladyshev Ilya <gladyshev.ilya1@h-partners.com>
> Acked-by: Linus Torvalds <torvalds@linuxfoundation.org>
>
> ---
> include/linux/page-flags.h | 13 +++++++++++++
> include/linux/page_ref.h | 28 ++++++++++++++++++++++++----
> 2 files changed, 37 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 0e03d816e8b9..b3e3da91a90a 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -196,6 +196,19 @@ enum pageflags {
>
> #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1)
>
> +/* Most significant bit in page refcount */
> +#define PAGEREF_FROZEN_BIT BIT(31)
> +
> +/* Page reference counter can be in 4 logical states,
> + * which are described below with their value representation
> + * state | value
> + * (1) safe with owners | 1...INT_MAX
> + * (2) safe with no owners | 0
> + * (3) frozen | INT_MIN....-1
> + *
> + * State (2) can be only temporally inside dec_and_test.
> + */
> +
> #ifndef __GENERATING_BOUNDS_H
>
> /*
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index a7a07b61d2ae..32194e953674 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -64,12 +64,17 @@ static inline void __page_ref_unfreeze(struct page *page, int v)
>
> static inline bool __page_count_is_frozen(int count)
> {
> - return count == 0;
> + return count > 0 && !((count & PAGEREF_FROZEN_BIT) != 0);
> }
>
> static inline int page_ref_count(const struct page *page)
> {
> - return atomic_read(&page->_refcount);
> + int val = atomic_read(&page->_refcount);
> +
> + if (unlikely(val & PAGEREF_FROZEN_BIT))
> + return 0;
page_expected_state() (called by free_page_is_bad()) checks page_ref_count() == 0.
This change alone means frozen page can be leaked into buddy. But I think
buddy pages should have ->_refcount to be 0, since the comment above says
(2) safe with no owners | 0
> +
> + return val;
> }
>
> /**
> @@ -191,6 +196,9 @@ static inline int page_ref_sub_and_test(struct page *page, int nr)
> {
> int ret = atomic_sub_and_test(nr, &page->_refcount);
>
> + if (ret)
> + ret = !atomic_cmpxchg_relaxed(&page->_refcount, 0, PAGEREF_FROZEN_BIT);
> +
> if (page_ref_tracepoint_active(page_ref_mod_and_test))
> __page_ref_mod_and_test(page, -nr, ret);
> return ret;
> @@ -220,6 +228,9 @@ static inline int page_ref_dec_and_test(struct page *page)
> {
> int ret = atomic_dec_and_test(&page->_refcount);
>
> + if (ret)
> + ret = !atomic_cmpxchg_relaxed(&page->_refcount, 0, PAGEREF_FROZEN_BIT);
> +
> if (page_ref_tracepoint_active(page_ref_mod_and_test))
> __page_ref_mod_and_test(page, -1, ret);
> return ret;
> @@ -245,9 +256,18 @@ static inline int folio_ref_dec_return(struct folio *folio)
> return page_ref_dec_return(&folio->page);
> }
>
> +#define _PAGEREF_FROZEN_LIMIT ((1 << 30) | PAGEREF_FROZEN_BIT)
> +
> static inline bool page_ref_add_unless_zero(struct page *page, int nr)
> {
> - bool ret = atomic_add_unless(&page->_refcount, nr, 0);
> + bool ret = false;
> + int val = atomic_add_return(nr, &page->_refcount);
> + // See PAGEREF_FROZEN_BIT declaration in page-flags.h for details
> + ret = !(val & PAGEREF_FROZEN_BIT);
> +
> + /* Undo atomic_add() if counter is locked and scary big */
> + while (unlikely((unsigned int)val >= _PAGEREF_FROZEN_LIMIT))
> + val = atomic_cmpxchg_relaxed(&page->_refcount, val, PAGEREF_FROZEN_BIT);
>
> if (page_ref_tracepoint_active(page_ref_mod_unless))
> __page_ref_mod_unless(page, nr, ret);
> @@ -282,7 +302,7 @@ static inline bool folio_ref_try_add(struct folio *folio, int count)
>
> static inline int page_ref_freeze(struct page *page, int count)
> {
> - int ret = likely(atomic_cmpxchg(&page->_refcount, count, 0) == count);
> + int ret = likely(atomic_cmpxchg(&page->_refcount, count, PAGEREF_FROZEN_BIT) == count);
>
> if (page_ref_tracepoint_active(page_ref_freeze))
> __page_ref_freeze(page, count, ret);
> --
> 2.43.0
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* [syzbot ci] Re: mm: improve folio refcount scalability
2026-04-20 8:01 [PATCH v2 0/2] mm: improve folio refcount scalability Gorbunov Ivan
2026-04-20 8:01 ` [PATCH v2 1/2] mm: drop page refcount zero state semantics Gorbunov Ivan
2026-04-20 8:01 ` [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit Gorbunov Ivan
@ 2026-04-20 10:07 ` syzbot ci
2026-04-20 12:29 ` Gorbunov Ivan
2 siblings, 1 reply; 12+ messages in thread
From: syzbot ci @ 2026-04-20 10:07 UTC (permalink / raw)
To: akpm, apopple, artem.kuzin, baolin.wang, david, gladyshev.ilya1,
gorbunov.ivan, harry.yoo, kirill, liam.howlett, linux-kernel,
linux-mm, lorenzo.stoakes, mhocko, muchun.song, rppt, surenb,
torvalds, vbabka, willy, yuzhao, ziy
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v2] mm: improve folio refcount scalability
https://lore.kernel.org/all/cover.1776350895.git.gorbunov.ivan@h-partners.com
* [PATCH v2 1/2] mm: drop page refcount zero state semantics
* [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit
and found the following issue:
kernel BUG in get_page_bootmem
Full report is available here:
https://ci.syzbot.org/series/eb14b73a-c461-4be5-b5af-91864e939f4c
***
kernel BUG in get_page_bootmem
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: f4279f87cd6c82ebdaccdc56f38e7b80ca7fcc03
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/60ced5f4-8c33-43ea-a4ee-92d9b2b8f949/config
ACPI: HPET id: 0x8086a201 base: 0xfed00000
CPU topo: Max. logical packages: 2
CPU topo: Max. logical nodes: 1
CPU topo: Num. nodes per package: 1
CPU topo: Max. logical dies: 2
CPU topo: Max. dies per package: 1
CPU topo: Max. threads per core: 1
CPU topo: Num. cores per package: 1
CPU topo: Num. threads per package: 1
CPU topo: Allowing 2 present CPUs plus 0 hotplug CPUs
kvm-guest: APIC: eoi() replaced with kvm_guest_apic_eoi_write()
PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x000fffff]
PM: hibernation: Registered nosave memory: [mem 0x7ffdf000-0xffffffff]
[gap 0xc0000000-0xfed1bfff] available for PCI devices
Booting paravirtualized kernel on KVM
clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
Zone ranges:
DMA [mem 0x0000000000001000-0x0000000000ffffff]
DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
Normal [mem 0x0000000100000000-0x000000023fffffff]
Device empty
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000000001000-0x000000000009efff]
node 0: [mem 0x0000000000100000-0x000000007ffdefff]
node 0: [mem 0x0000000100000000-0x0000000160000fff]
node 1: [mem 0x0000000160001000-0x000000023fffffff]
Initmem setup node 0 [mem 0x0000000000001000-0x0000000160000fff]
Initmem setup node 1 [mem 0x0000000160001000-0x000000023fffffff]
On node 0, zone DMA: 1 pages in unavailable ranges
On node 0, zone DMA: 97 pages in unavailable ranges
On node 0, zone Normal: 33 pages in unavailable ranges
setup_percpu: NR_CPUS:8 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:2
percpu: Embedded 71 pages/cpu s250120 r8192 d32504 u2097152
kvm-guest: PV spinlocks disabled, no host support
Kernel command line: earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 \
Kernel command line: comedi.comedi_num_legacy_minors=4 panic_on_warn=1 root=/dev/sda console=ttyS0 root=/dev/sda1
Unknown kernel command line parameters "nbds_max=32", will be passed to user space.
printk: log buffer data + meta data: 262144 + 917504 = 1179648 bytes
software IO TLB: area num 2.
Fallback order for Node 0: 0 1
Fallback order for Node 1: 1 0
Built 2 zonelists, mobility grouping on. Total pages: 1834877
Policy zone: Normal
mem auto-init: stack:all(zero), heap alloc:on, heap free:off
stackdepot: allocating hash table via alloc_large_system_hash
stackdepot hash table entries: 1048576 (order: 12, 16777216 bytes, linear)
stackdepot: allocating space for 8192 stack pools via memblock
------------[ cut here ]------------
kernel BUG at ./include/linux/page_ref.h:171!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted syzkaller #0 PREEMPT(undef)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:get_page_bootmem+0x188/0x190
Code: 86 ff 90 0f 0b e8 98 52 86 ff 90 0f 0b e8 90 52 86 ff 48 89 df 48 c7 c6 00 e4 dd 8b e8 51 d7 e8 fe 90 0f 0b e8 79 52 86 ff 90 <0f> 0b 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0000:ffffffff8e407e50 EFLAGS: 00010093
RAX: ffffffff823f42b7 RBX: ffffea00057ffec0 RCX: ffffffff8e494ec0
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 0000000000000001 R08: ffffea00057ffef7 R09: 1ffffd4000afffde
R10: dffffc0000000000 R11: fffff94000afffdf R12: dffffc0000000000
R13: 0000000000000000 R14: ffffea00057ffef4 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff88818de62000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000000e54c000 CR4: 00000000000000b0
Call Trace:
<TASK>
register_page_bootmem_info_node+0x88/0x410
register_page_bootmem_info+0x77/0xc0
mem_init+0x5a/0xb0
mm_core_init+0x79/0xb0
start_kernel+0x15a/0x3d0
x86_64_start_reservations+0x24/0x30
x86_64_start_kernel+0x143/0x1c0
common_startup_64+0x13e/0x147
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:get_page_bootmem+0x188/0x190
Code: 86 ff 90 0f 0b e8 98 52 86 ff 90 0f 0b e8 90 52 86 ff 48 89 df 48 c7 c6 00 e4 dd 8b e8 51 d7 e8 fe 90 0f 0b e8 79 52 86 ff 90 <0f> 0b 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0000:ffffffff8e407e50 EFLAGS: 00010093
RAX: ffffffff823f42b7 RBX: ffffea00057ffec0 RCX: ffffffff8e494ec0
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: 0000000000000001 R08: ffffea00057ffef7 R09: 1ffffd4000afffde
R10: dffffc0000000000 R11: fffff94000afffdf R12: dffffc0000000000
R13: 0000000000000000 R14: ffffea00057ffef4 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff88818de62000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000000e54c000 CR4: 00000000000000b0
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [syzbot ci] Re: mm: improve folio refcount scalability
2026-04-20 10:07 ` [syzbot ci] Re: mm: improve folio refcount scalability syzbot ci
@ 2026-04-20 12:29 ` Gorbunov Ivan
2026-04-20 13:21 ` syzbot ci
0 siblings, 1 reply; 12+ messages in thread
From: Gorbunov Ivan @ 2026-04-20 12:29 UTC (permalink / raw)
To: syzbot ci, akpm, apopple, artem.kuzin, baolin.wang, david,
gladyshev.ilya1, harry.yoo, kirill, liam.howlett, linux-kernel,
linux-mm, lorenzo.stoakes, mhocko, muchun.song, rppt, surenb,
torvalds, vbabka, willy, yuzhao, ziy
Cc: syzbot, syzkaller-bugs
Apologies to all. The logic in the debug check was accidentally inverted
during rebase
#syz test
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 32194e953674..ca6e43b0cf95 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -64,7 +64,7 @@ static inline void __page_ref_unfreeze(struct page
*page, int v)
static inline bool __page_count_is_frozen(int count)
{
- return count > 0 && !((count & PAGEREF_FROZEN_BIT) != 0);
+ return count & PAGEREF_FROZEN_BIT;
}
static inline int page_ref_count(const struct page *page)
On 4/20/2026 1:07 PM, syzbot ci wrote:
> syzbot ci has tested the following series
>
> [v2] mm: improve folio refcount scalability
> https://lore.kernel.org/all/cover.1776350895.git.gorbunov.ivan@h-partners.com
> * [PATCH v2 1/2] mm: drop page refcount zero state semantics
> * [PATCH v2 2/2] mm: implement page refcount locking via dedicated bit
>
> and found the following issue:
> kernel BUG in get_page_bootmem
>
> Full report is available here:
> https://ci.syzbot.org/series/eb14b73a-c461-4be5-b5af-91864e939f4c
>
> ***
>
> kernel BUG in get_page_bootmem
>
> tree: mm-new
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
> base: f4279f87cd6c82ebdaccdc56f38e7b80ca7fcc03
> arch: amd64
> compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> config: https://ci.syzbot.org/builds/60ced5f4-8c33-43ea-a4ee-92d9b2b8f949/config
>
> ACPI: HPET id: 0x8086a201 base: 0xfed00000
> CPU topo: Max. logical packages: 2
> CPU topo: Max. logical nodes: 1
> CPU topo: Num. nodes per package: 1
> CPU topo: Max. logical dies: 2
> CPU topo: Max. dies per package: 1
> CPU topo: Max. threads per core: 1
> CPU topo: Num. cores per package: 1
> CPU topo: Num. threads per package: 1
> CPU topo: Allowing 2 present CPUs plus 0 hotplug CPUs
> kvm-guest: APIC: eoi() replaced with kvm_guest_apic_eoi_write()
> PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
> PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x000fffff]
> PM: hibernation: Registered nosave memory: [mem 0x7ffdf000-0xffffffff]
> [gap 0xc0000000-0xfed1bfff] available for PCI devices
> Booting paravirtualized kernel on KVM
> clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> Zone ranges:
> DMA [mem 0x0000000000001000-0x0000000000ffffff]
> DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
> Normal [mem 0x0000000100000000-0x000000023fffffff]
> Device empty
> Movable zone start for each node
> Early memory node ranges
> node 0: [mem 0x0000000000001000-0x000000000009efff]
> node 0: [mem 0x0000000000100000-0x000000007ffdefff]
> node 0: [mem 0x0000000100000000-0x0000000160000fff]
> node 1: [mem 0x0000000160001000-0x000000023fffffff]
> Initmem setup node 0 [mem 0x0000000000001000-0x0000000160000fff]
> Initmem setup node 1 [mem 0x0000000160001000-0x000000023fffffff]
> On node 0, zone DMA: 1 pages in unavailable ranges
> On node 0, zone DMA: 97 pages in unavailable ranges
> On node 0, zone Normal: 33 pages in unavailable ranges
> setup_percpu: NR_CPUS:8 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:2
> percpu: Embedded 71 pages/cpu s250120 r8192 d32504 u2097152
> kvm-guest: PV spinlocks disabled, no host support
> Kernel command line: earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 \
> Kernel command line: comedi.comedi_num_legacy_minors=4 panic_on_warn=1 root=/dev/sda console=ttyS0 root=/dev/sda1
> Unknown kernel command line parameters "nbds_max=32", will be passed to user space.
> printk: log buffer data + meta data: 262144 + 917504 = 1179648 bytes
> software IO TLB: area num 2.
> Fallback order for Node 0: 0 1
> Fallback order for Node 1: 1 0
> Built 2 zonelists, mobility grouping on. Total pages: 1834877
> Policy zone: Normal
> mem auto-init: stack:all(zero), heap alloc:on, heap free:off
> stackdepot: allocating hash table via alloc_large_system_hash
> stackdepot hash table entries: 1048576 (order: 12, 16777216 bytes, linear)
> stackdepot: allocating space for 8192 stack pools via memblock
> ------------[ cut here ]------------
> kernel BUG at ./include/linux/page_ref.h:171!
> Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
> CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted syzkaller #0 PREEMPT(undef)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:get_page_bootmem+0x188/0x190
> Code: 86 ff 90 0f 0b e8 98 52 86 ff 90 0f 0b e8 90 52 86 ff 48 89 df 48 c7 c6 00 e4 dd 8b e8 51 d7 e8 fe 90 0f 0b e8 79 52 86 ff 90 <0f> 0b 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> RSP: 0000:ffffffff8e407e50 EFLAGS: 00010093
> RAX: ffffffff823f42b7 RBX: ffffea00057ffec0 RCX: ffffffff8e494ec0
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
> RBP: 0000000000000001 R08: ffffea00057ffef7 R09: 1ffffd4000afffde
> R10: dffffc0000000000 R11: fffff94000afffdf R12: dffffc0000000000
> R13: 0000000000000000 R14: ffffea00057ffef4 R15: 0000000000000003
> FS: 0000000000000000(0000) GS:ffff88818de62000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff88823ffff000 CR3: 000000000e54c000 CR4: 00000000000000b0
> Call Trace:
> <TASK>
> register_page_bootmem_info_node+0x88/0x410
> register_page_bootmem_info+0x77/0xc0
> mem_init+0x5a/0xb0
> mm_core_init+0x79/0xb0
> start_kernel+0x15a/0x3d0
> x86_64_start_reservations+0x24/0x30
> x86_64_start_kernel+0x143/0x1c0
> common_startup_64+0x13e/0x147
> </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:get_page_bootmem+0x188/0x190
> Code: 86 ff 90 0f 0b e8 98 52 86 ff 90 0f 0b e8 90 52 86 ff 48 89 df 48 c7 c6 00 e4 dd 8b e8 51 d7 e8 fe 90 0f 0b e8 79 52 86 ff 90 <0f> 0b 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> RSP: 0000:ffffffff8e407e50 EFLAGS: 00010093
> RAX: ffffffff823f42b7 RBX: ffffea00057ffec0 RCX: ffffffff8e494ec0
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
> RBP: 0000000000000001 R08: ffffea00057ffef7 R09: 1ffffd4000afffde
> R10: dffffc0000000000 R11: fffff94000afffdf R12: dffffc0000000000
> R13: 0000000000000000 R14: ffffea00057ffef4 R15: 0000000000000003
> FS: 0000000000000000(0000) GS:ffff88818de62000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff88823ffff000 CR3: 000000000e54c000 CR4: 00000000000000b0
>
>
> ***
>
> If these findings have caused you to resend the series or submit a
> separate fix, please add the following tag to your commit message:
> Tested-by: syzbot@syzkaller.appspotmail.com
>
> ---
> This report is generated by a bot. It may contain errors.
> syzbot ci engineers can be reached at syzkaller@googlegroups.com.
>
> To test a patch for this bug, please reply with `#syz test`
> (should be on a separate line).
>
> The patch should be attached to the email.
> Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply related [flat|nested] 12+ messages in thread* [syzbot ci] Re: mm: improve folio refcount scalability
2026-04-20 12:29 ` Gorbunov Ivan
@ 2026-04-20 13:21 ` syzbot ci
0 siblings, 0 replies; 12+ messages in thread
From: syzbot ci @ 2026-04-20 13:21 UTC (permalink / raw)
To: gorbunov.ivan, akpm, apopple, artem.kuzin, baolin.wang, david,
gladyshev.ilya1, harry.yoo, kirill, liam.howlett, linux-kernel,
linux-mm, lorenzo.stoakes, mhocko, muchun.song, rppt, surenb,
syzbot, syzkaller-bugs, torvalds, vbabka, willy, yuzhao, ziy
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the suggested fix patch on top of the following series:
[v2] mm: improve folio refcount scalability
https://lore.kernel.org/all/cover.1776350895.git.gorbunov.ivan@h-partners.com
Patch: https://ci.syzbot.org/jobs/1f75dd6a-7a6f-4420-ae4b-67a071622e07/patch
The patch testing request could not be completed:
Testing failed due to an infrastructure error.
Testing results:
* [build 0] Build Patched: error
Full report is available here:
https://ci.syzbot.org/session/0e12b11a-0902-43fb-b549-6c0cc5ae45eb
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 12+ messages in thread