All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aboorva Devarajan <aboorvad@linux.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	gourry@gourry.net, mhocko@suse.com, david@kernel.org
Cc: vbabka@suse.cz, surenb@google.com, jackmanb@google.com,
	hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/page_alloc: make percpu_pagelist_high_fraction reads lock-free
Date: Mon, 08 Dec 2025 23:00:46 +0530	[thread overview]
Message-ID: <d35eca2bbdf8675c43d528571bb61c7520e669cb.camel@linux.ibm.com> (raw)
In-Reply-To: <20251201094112.07eb1e588b6da2ee70c4641d@linux-foundation.org>

On Mon, 2025-12-01 at 09:41 -0800, Andrew Morton wrote:
> On Mon,  1 Dec 2025 11:30:09 +0530 Aboorva Devarajan <aboorvad@linux.ibm.com> wrote:
> 
> > When page isolation loops indefinitely during memory offline, reading
> > /proc/sys/vm/percpu_pagelist_high_fraction blocks on pcp_batch_high_lock,
> > causing hung task warnings.
> 
> That's pretty bad behavior.
> 
> I wonder if there are other problems which can be caused by this
> lengthy hold time.
> 
> It would be better to address the lengthy hold time rather that having
> to work around it in one impacted site.


Sorry for the delayed response, I spent some time recreating this issue.


I've encountered this lengthy hold time several times during memory hot-unplug, with
operations hanging indefinitely (20+ hours). It occurs intermittently, and it has 
different failure signatures, here's one example where isolation fails on a single
slab page continuously:

..
[83310.373699] page dumped because: isolation failed
[83310.373704] failed to isolate pfn 4dc68
[83310.373708] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dc68
[83310.373714] flags: 0x23ffffe00000000(node=2|zone=0|lastcpupid=0x1fffff)
[83310.373722] page_type: f5(slab)
[83310.373727] raw: 023ffffe00000000 c0000028e001fa00 5deadbeef0000100 5deadbeef0000122
[83310.373735] raw: 0000000000000000 0000000001e101e1 00000002f5000000 0000000000000000
[83310.373741] page dumped because: isolation failed
[83310.373749] failed to isolate pfn 4dc68
[83310.373753] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dc68
[83310.373760] flags: 0x23ffffe00000000(node=2|zone=0|lastcpupid=0x1fffff)
[83310.373767] page_type: f5(slab)
[83310.373770] raw: 023ffffe00000000 c0000028e001fa00 5deadbeef0000100 5deadbeef0000122
[83310.373774] raw: 0000000000000000 0000000001e101e1 00000002f5000000 0000000000000000
[83310.373778] page dumped because: isolation failed
[83310.373788] failed to isolate pfn 4dc68
[83310.373791] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dc68
[83310.373794] flags: 0x23ffffe00000000(node=2|zone=0|lastcpupid=0x1fffff)
[83310.373797] page_type: f5(slab)
[83310.373799] raw: 023ffffe00000000 c0000028e001fa00 5deadbeef0000100 5deadbeef0000122
[83310.373803] raw: 0000000000000000 0000000001e101e1 00000002f5000000 0000000000000000
[83310.373809] page dumped because: isolation failed
[83315.383370] do_migrate_range: 1098409 callbacks suppressed
[83315.383377] failed to isolate pfn 4dc68
[83315.383406] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dc68
[83315.383411] flags: 0x23ffffe00000000(node=2|zone=0|lastcpupid=0x1fffff)
[83315.383416] page_type: f5(slab)
[83315.383420] raw: 023ffffe00000000 c0000028e001fa00 5deadbeef0000100 5deadbeef0000122
[83315.383423] raw: 0000000000000000 0000000001e101e1 00000002f5000000 0000000000000000
[83315.383426] page dumped because: isolation failed
[83315.383431] failed to isolate pfn 4dc68
[83315.383433] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dc68
[83315.383442] flags: 0x23ffffe00000000(node=2|zone=0|lastcpupid=0x1fffff)
[83315.383448] page_type: f5(slab)
[83315.383454] raw: 023ffffe00000000 c0000028e001fa00 5deadbeef0000100 5deadbeef0000122
[83315.383462] raw: 0000000000000000 0000000001e101e1 00000002f5000000 0000000000000000
[83315.383470] page dumped because: isolation failed
...
...
...


Given the following statement in the documentation, should this behavior be considered
expected?

From Documentation/admin-guide/mm/memory-hotplug.rst:
"Further, memory offlining might retry for a long time (or even forever), until
aborted by the user."


There's also a TODO in the code that confirms this issue:

mm/memory_hotplug.c
/*
 * TODO: fatal migration failures should bail
 * out
 */
do_migrate_range(pfn, end_pfn);


A possible improvement would be to add a retry limit or timeout for pages that repeatedly
fail isolation, returning -EBUSY after N attempts instead of looping indefinitely for
umovable pages. This would make the behavior more predictable.


-----


In addition to the above, I've also seen test_pages_isolated() return -EBUSY at the final
isolation check for the same page-block continuously

int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
			enum pb_isolate_mode mode)
{
	...

	/* Check all pages are free or marked as ISOLATED */
	zone = page_zone(page);
	spin_lock_irqsave(&zone->lock, flags);
	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn, mode); 
	spin_unlock_irqrestore(&zone->lock, flags);

	ret = pfn < end_pfn ? -EBUSY : 0; 
        ...

out:
	...
	return ret;
}

When __test_page_isolated_in_pageblock() encounters a page that isn't PageBuddy, PageHWPoison,
or PageOffline with count 0, it returns that pfn, causing -EBUSY. 


I'll work on capturing more traces for this failure scenario and follow up.

> 
> > Make procfs reads lock-free since percpu_pagelist_high_fraction is a simple
> > integer with naturally atomic reads, writers still serialize via the mutex.
> > 
> > This prevents hung task warnings when reading the procfs file during
> > long-running memory offline operations.
> > 
> > ...
> > 
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6611,11 +6611,14 @@ static int percpu_pagelist_high_fraction_sysctl_handler(const struct ctl_table *
> >  	int old_percpu_pagelist_high_fraction;
> >  	int ret;
> >  
> > +	if (!write)
> > +		return proc_dointvec_minmax(table, write, buffer, length, ppos);
> > +
> >  	mutex_lock(&pcp_batch_high_lock);
> >  	old_percpu_pagelist_high_fraction = percpu_pagelist_high_fraction;
> >  
> >  	ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
> > -	if (!write || ret < 0)
> > +	if (ret < 0)
> >  		goto out;
> >  
> >  	/* Sanity checking to avoid pcp imbalance */
> 
> That being said, I'll grab the patch and shall put a cc:stable on it,
> see what people think about this hold-time issue.

Thanks.


Regards,
Aboorva


  parent reply	other threads:[~2025-12-08 17:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-01  6:00 [PATCH] mm/page_alloc: make percpu_pagelist_high_fraction reads lock-free Aboorva Devarajan
2025-12-01 17:41 ` Andrew Morton
2025-12-03  8:27   ` Michal Hocko
2025-12-03  8:35     ` Gregory Price
2025-12-03  8:42       ` Michal Hocko
2025-12-03  8:51         ` David Hildenbrand (Red Hat)
2025-12-03  9:02           ` Gregory Price
2025-12-03  9:08             ` David Hildenbrand (Red Hat)
2025-12-03  9:23               ` Gregory Price
2025-12-03  9:26                 ` Gregory Price
2025-12-03 11:28                 ` David Hildenbrand (Red Hat)
2025-12-03  8:59         ` Gregory Price
2025-12-03  9:15           ` David Hildenbrand (Red Hat)
2025-12-03  9:42             ` Michal Hocko
2025-12-03 11:22               ` David Hildenbrand (Red Hat)
2025-12-08 17:30   ` Aboorva Devarajan [this message]
2025-12-08 18:15     ` Michal Hocko
2025-12-08 19:29     ` David Hildenbrand (Red Hat)
2025-12-03  8:21 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d35eca2bbdf8675c43d528571bb61c7520e669cb.camel@linux.ibm.com \
    --to=aboorvad@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.