[RFC PATCH] mm: Drain PCP during direct reclaim

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] mm: Drain PCP during direct reclaim
@ 2025-06-06  6:59 Wupeng Ma
  2025-06-06 11:19 ` Johannes Weiner
  2025-06-11  7:55 ` Raghavendra K T
  0 siblings, 2 replies; 4+ messages in thread
From: Wupeng Ma @ 2025-06-06  6:59 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: surenb, jackmanb, hannes, ziy, wangkefeng.wang, mawupeng1,
	linux-mm, linux-kernel

Memory retained in Per-CPU Pages (PCP) caches can prevent hugepage
allocations from succeeding despite sufficient free system memory. This
occurs because:
1. Hugepage allocations don't actively trigger PCP draining
2. Direct reclaim path fails to trigger drain_all_pages() when:
   a) All zone pages are free/hugetlb (!did_some_progress)
   b) Compaction skips due to costly order watermarks (COMPACT_SKIPPED)

Reproduction:
  - Alloc page and free the page via put_page to release to pcp
  - Observe hugepage reservation failure

Solution:
  Actively drain PCP during direct reclaim for memory allocations.
  This increases page allocation success rate by making stranded pages
  available to any order allocations.

Verification:
  This issue can be reproduce easily in zone movable with the following
  step:

w/o this patch
  # numactl -m 2 dd if=/dev/urandom of=/dev/shm/testfile bs=4k count=64
  # rm -f /dev/shm/testfile
  # sync
  # echo 3 > /proc/sys/vm/drop_caches
  # echo 2048 > /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
  # cat /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
    2029

w/ this patch
  # numactl -m 2 dd if=/dev/urandom of=/dev/shm/testfile bs=4k count=64
  # rm -f /dev/shm/testfile
  # sync
  # echo 3 > /proc/sys/vm/drop_caches
  # echo 2048 > /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
  # cat /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
    2047

Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
---
 mm/page_alloc.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ef3c07266b3..464f2e48651e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4137,28 +4137,22 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 {
 	struct page *page = NULL;
 	unsigned long pflags;
-	bool drained = false;
 
 	psi_memstall_enter(&pflags);
 	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
-	if (unlikely(!(*did_some_progress)))
-		goto out;
-
-retry:
-	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
+	if (likely(*did_some_progress))
+		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
 	 * pages are pinned on the per-cpu lists or in high alloc reserves.
 	 * Shrink them and try again
 	 */
-	if (!page && !drained) {
+	if (!page) {
 		unreserve_highatomic_pageblock(ac, false);
 		drain_all_pages(NULL);
-		drained = true;
-		goto retry;
+		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
 	}
-out:
 	psi_memstall_leave(&pflags);
 
 	return page;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm: Drain PCP during direct reclaim
  2025-06-06  6:59 [RFC PATCH] mm: Drain PCP during direct reclaim Wupeng Ma
@ 2025-06-06 11:19 ` Johannes Weiner
  2025-06-10  9:18   ` mawupeng
  2025-06-11  7:55 ` Raghavendra K T
  1 sibling, 1 reply; 4+ messages in thread
From: Johannes Weiner @ 2025-06-06 11:19 UTC (permalink / raw)
  To: Wupeng Ma
  Cc: akpm, vbabka, surenb, jackmanb, ziy, wangkefeng.wang, linux-mm,
	linux-kernel

On Fri, Jun 06, 2025 at 02:59:30PM +0800, Wupeng Ma wrote:
> Memory retained in Per-CPU Pages (PCP) caches can prevent hugepage
> allocations from succeeding despite sufficient free system memory. This
> occurs because:
> 1. Hugepage allocations don't actively trigger PCP draining
> 2. Direct reclaim path fails to trigger drain_all_pages() when:
>    a) All zone pages are free/hugetlb (!did_some_progress)
>    b) Compaction skips due to costly order watermarks (COMPACT_SKIPPED)

This doesn't sound quite right. Direct reclaim skips when compaction
is suitable. Compaction says COMPACT_SKIPPED when it *isn't* suitable.

So if direct reclaim didn't drain, presumably compaction ran but
returned COMPLETE or PARTIAL_SKIPPED because the freelist checks in
__compact_finished() never succeed due to the pcp?

> @@ -4137,28 +4137,22 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>  {
>  	struct page *page = NULL;
>  	unsigned long pflags;
> -	bool drained = false;
>  
>  	psi_memstall_enter(&pflags);
>  	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
> -	if (unlikely(!(*did_some_progress)))
> -		goto out;
> -
> -retry:
> -	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> +	if (likely(*did_some_progress))
> +		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
>  
>  	/*
>  	 * If an allocation failed after direct reclaim, it could be because
>  	 * pages are pinned on the per-cpu lists or in high alloc reserves.
>  	 * Shrink them and try again
>  	 */
> -	if (!page && !drained) {
> +	if (!page) {
>  		unreserve_highatomic_pageblock(ac, false);
>  		drain_all_pages(NULL);
> -		drained = true;
> -		goto retry;
> +		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);

This seems like the wrong place to fix the issue.

Kcompactd has a drain_all_pages() call. Move that to compact_zone(),
so that it also applies to the try_to_compact_pages() path?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm: Drain PCP during direct reclaim
  2025-06-06 11:19 ` Johannes Weiner
@ 2025-06-10  9:18   ` mawupeng
  0 siblings, 0 replies; 4+ messages in thread
From: mawupeng @ 2025-06-10  9:18 UTC (permalink / raw)
  To: hannes
  Cc: mawupeng1, akpm, vbabka, surenb, jackmanb, ziy, wangkefeng.wang,
	linux-mm, linux-kernel



On 2025/6/6 19:19, Johannes Weiner wrote:
> On Fri, Jun 06, 2025 at 02:59:30PM +0800, Wupeng Ma wrote:
>> Memory retained in Per-CPU Pages (PCP) caches can prevent hugepage
>> allocations from succeeding despite sufficient free system memory. This
>> occurs because:
>> 1. Hugepage allocations don't actively trigger PCP draining
>> 2. Direct reclaim path fails to trigger drain_all_pages() when:
>>    a) All zone pages are free/hugetlb (!did_some_progress)
>>    b) Compaction skips due to costly order watermarks (COMPACT_SKIPPED)
> 
> This doesn't sound quite right. Direct reclaim skips when compaction
> is suitable. Compaction says COMPACT_SKIPPED when it *isn't* suitable.
> 
> So if direct reclaim didn't drain, presumably compaction ran but
> returned COMPLETE or PARTIAL_SKIPPED because the freelist checks in
> __compact_finished() never succeed due to the pcp?

Yes, compaction do run, however since all pages in this movable node
are free or in pcp. there is no way for compaction to reclaim a page.

> 
>> @@ -4137,28 +4137,22 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>>  {
>>  	struct page *page = NULL;
>>  	unsigned long pflags;
>> -	bool drained = false;
>>  
>>  	psi_memstall_enter(&pflags);
>>  	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
>> -	if (unlikely(!(*did_some_progress)))
>> -		goto out;
>> -
>> -retry:
>> -	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
>> +	if (likely(*did_some_progress))
>> +		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
>>  
>>  	/*
>>  	 * If an allocation failed after direct reclaim, it could be because
>>  	 * pages are pinned on the per-cpu lists or in high alloc reserves.
>>  	 * Shrink them and try again
>>  	 */
>> -	if (!page && !drained) {
>> +	if (!page) {
>>  		unreserve_highatomic_pageblock(ac, false);
>>  		drain_all_pages(NULL);
>> -		drained = true;
>> -		goto retry;
>> +		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> 
> This seems like the wrong place to fix the issue.
> 
> Kcompactd has a drain_all_pages() call. Move that to compact_zone(),
> so that it also applies to the try_to_compact_pages() path?

Since there is no pages isolated during isolate_migratepages(), it is
strange to drain_pcp here?








^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm: Drain PCP during direct reclaim
  2025-06-06  6:59 [RFC PATCH] mm: Drain PCP during direct reclaim Wupeng Ma
  2025-06-06 11:19 ` Johannes Weiner
@ 2025-06-11  7:55 ` Raghavendra K T
  1 sibling, 0 replies; 4+ messages in thread
From: Raghavendra K T @ 2025-06-11  7:55 UTC (permalink / raw)
  To: Wupeng Ma, akpm, vbabka
  Cc: surenb, jackmanb, hannes, ziy, wangkefeng.wang, linux-mm,
	linux-kernel, Nikhil Dhama, Huang, Ying, Rao, Bharata Bhasker

++
On 6/6/2025 12:29 PM, Wupeng Ma wrote:
> Memory retained in Per-CPU Pages (PCP) caches can prevent hugepage
> allocations from succeeding despite sufficient free system memory. This
> occurs because:
> 1. Hugepage allocations don't actively trigger PCP draining
> 2. Direct reclaim path fails to trigger drain_all_pages() when:
>     a) All zone pages are free/hugetlb (!did_some_progress)
>     b) Compaction skips due to costly order watermarks (COMPACT_SKIPPED)
> 
> Reproduction:
>    - Alloc page and free the page via put_page to release to pcp
>    - Observe hugepage reservation failure
> 
> Solution:
>    Actively drain PCP during direct reclaim for memory allocations.
>    This increases page allocation success rate by making stranded pages
>    available to any order allocations.
> 
> Verification:
>    This issue can be reproduce easily in zone movable with the following
>    step:
> 
> w/o this patch
>    # numactl -m 2 dd if=/dev/urandom of=/dev/shm/testfile bs=4k count=64
>    # rm -f /dev/shm/testfile
>    # sync
>    # echo 3 > /proc/sys/vm/drop_caches
>    # echo 2048 > /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
>    # cat /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
>      2029
> 
> w/ this patch
>    # numactl -m 2 dd if=/dev/urandom of=/dev/shm/testfile bs=4k count=64
>    # rm -f /dev/shm/testfile
>    # sync
>    # echo 3 > /proc/sys/vm/drop_caches
>    # echo 2048 > /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
>    # cat /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
>      2047
> 

Hello Wupeng Ma,

Can you also post results of iperf/netperf for this patch in future?

Thanks and Regards
- Raghu



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-11  7:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-06  6:59 [RFC PATCH] mm: Drain PCP during direct reclaim Wupeng Ma
2025-06-06 11:19 ` Johannes Weiner
2025-06-10  9:18   ` mawupeng
2025-06-11  7:55 ` Raghavendra K T

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).