Re: [PATCH v8 6/7] mm, folio_zero_user: support clearing page ranges

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ankur Arora <ankur.a.arora@oracle.com>
To: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	akpm@linux-foundation.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	mjguzik@gmail.com, luto@kernel.org, peterz@infradead.org,
	acme@kernel.org, namhyung@kernel.org, tglx@linutronix.de,
	willy@infradead.org, raghavendra.kt@amd.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com
Subject: Re: [PATCH v8 6/7] mm, folio_zero_user: support clearing page ranges
Date: Mon, 10 Nov 2025 22:24:25 -0800	[thread overview]
Message-ID: <875xbhaxye.fsf@oracle.com> (raw)
In-Reply-To: <93b2f5eb-362c-49b7-9d90-01d250c9b6ff@kernel.org>


David Hildenbrand (Red Hat) <david@kernel.org> writes:

> On 10.11.25 08:20, Ankur Arora wrote:
>> David Hildenbrand (Red Hat) <david@kernel.org> writes:
>>
>>> On 27.10.25 21:21, Ankur Arora wrote:
>>>> Clear contiguous page ranges in folio_zero_user() instead of clearing
>>>> a page-at-a-time. This enables CPU specific optimizations based on
>>>> the length of the region.
>>>> Operating on arbitrarily large regions can lead to high preemption
>>>> latency under cooperative preemption models. So, limit the worst
>>>> case preemption latency via architecture specified PAGE_CONTIG_NR
>>>> units.
>>>> The resultant performance depends on the kinds of optimizations
>>>> available to the CPU for the region being cleared. Two classes of
>>>> of optimizations:
>>>>     - clearing iteration costs can be amortized over a range larger
>>>>       than a single page.
>>>>     - cacheline allocation elision (seen on AMD Zen models).
>>>> Testing a demand fault workload shows an improved baseline from the
>>>> first optimization and a larger improvement when the region being
>>>> cleared is large enough for the second optimization.
>>>> AMD Milan (EPYC 7J13, boost=0, region=64GB on the local NUMA node):
>>>>    $ perf bench mem map -p $pg-sz -f demand -s 64GB -l 5
>>>>                       page-at-a-time     contiguous clearing      change
>>>>                     (GB/s  +- %stdev)     (GB/s  +- %stdev)
>>>>      pg-sz=2MB       12.92  +- 2.55%        17.03  +-  0.70%       + 31.8%
>>>> preempt=*
>>>>      pg-sz=1GB       17.14  +- 2.27%        18.04  +-  1.05% [#]   +  5.2%
>>>> preempt=none|voluntary
>>>>      pg-sz=1GB       17.26  +- 1.24%        42.17  +-  4.21%       +144.3%	preempt=full|lazy
>>>> [#] AMD Milan uses a threshold of LLC-size (~32MB) for eliding cacheline
>>>> allocation, which is larger than ARCH_PAGE_CONTIG_NR, so
>>>> preempt=none|voluntary see no improvement on the pg-sz=1GB.
>>>> Also as mentioned earlier, the baseline improvement is not specific to
>>>> AMD Zen platforms. Intel Icelakex (pg-sz=2MB|1GB) sees a similar
>>>> improvement as the Milan pg-sz=2MB workload above (~30%).
>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>>> Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com>
>>>> Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
>>>> ---
>>>>    include/linux/mm.h |  6 ++++++
>>>>    mm/memory.c        | 42 +++++++++++++++++++++---------------------
>>>>    2 files changed, 27 insertions(+), 21 deletions(-)
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index ecbcb76df9de..02db84667f97 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -3872,6 +3872,12 @@ static inline void clear_page_guard(struct zone *zone, struct page *page,
>>>>    				unsigned int order) {}
>>>>    #endif	/* CONFIG_DEBUG_PAGEALLOC */
>>>>    +#ifndef ARCH_PAGE_CONTIG_NR
>>>> +#define PAGE_CONTIG_NR	1
>>>> +#else
>>>> +#define PAGE_CONTIG_NR	ARCH_PAGE_CONTIG_NR
>>>> +#endif
>>>
>>> The name is a bit misleading. We need something that tells us that this is for
>>> patch-processing (clearing? maybe alter copying?) contig pages. Likely spelling
>>> out that this is for the non-preemptible case only.
>>>
>>> I assume we can drop the "CONTIG", just like clear_pages() doesn't contain it
>>> etc.
>>>
>>> CLEAR_PAGES_NON_PREEMPT_BATCH
>>>
>>> PROCESS_PAGES_NON_PREEMPT_BATCH
>> I think this version is clearer. And would be viable for copying as well.
>>
>>> Can you remind me again why this is arch specific, and why the default is 1
>>> instead of, say 2,4,8 ... ?
>> So, the only use for this value is to decide a reasonable frequency
>> for calling cond_resched() when operating on hugepages.
>> And the idea was the arch was best placed to have a reasonably safe
>> value based on the expected spread of bandwidths it might see across
>> uarchs. And the default choice of 1 was to keep it close to what we
>> have now.
>> Thinking about it now though, maybe it is better to instead do this
>> in common code. We could have two sets of defines,
>> PROCESS_PAGES_NON_PREEMPT_BATCH_{LARGE,SMALL}, the first for archs
>> that define __HAVE_ARCH_CLEAR_PAGES and the second, without.
>
> Right, avoiding this dependency on arch code would be nice.
>
> Also, it feels like something we can later optimize for archs without
> __HAVE_ARCH_CLEAR_PAGES in common code.

That makes sense. Will keep the default where it is (1) and just get
rid of the arch dependency.

--
ankur

next prev parent reply	other threads:[~2025-11-11  6:25 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 20:21 [PATCH v8 0/7] mm: folio_zero_user: clear contiguous pages Ankur Arora
2025-10-27 20:21 ` [PATCH v8 1/7] treewide: provide a generic clear_user_page() variant Ankur Arora
2025-11-18  7:32   ` David Hildenbrand (Red Hat)
2025-10-27 20:21 ` [PATCH v8 2/7] mm: introduce clear_pages() and clear_user_pages() Ankur Arora
2025-11-07  8:47   ` David Hildenbrand (Red Hat)
2025-11-18  7:34   ` David Hildenbrand (Red Hat)
2025-11-18 19:23     ` Ankur Arora
2025-10-27 20:21 ` [PATCH v8 3/7] mm/highmem: introduce clear_user_highpages() Ankur Arora
2025-11-07  8:48   ` David Hildenbrand (Red Hat)
2025-11-10  7:20     ` Ankur Arora
2025-10-27 20:21 ` [PATCH v8 4/7] x86/mm: Simplify clear_page_* Ankur Arora
2025-10-28 13:36   ` Borislav Petkov
2025-10-29 23:26     ` Ankur Arora
2025-10-30  0:17       ` Borislav Petkov
2025-10-30  5:21         ` Ankur Arora
2025-10-27 20:21 ` [PATCH v8 5/7] x86/clear_page: Introduce clear_pages() Ankur Arora
2025-10-28 13:56   ` Borislav Petkov
2025-10-28 18:51     ` Ankur Arora
2025-10-29 22:57       ` Borislav Petkov
2025-10-29 23:31         ` Ankur Arora
2025-10-27 20:21 ` [PATCH v8 6/7] mm, folio_zero_user: support clearing page ranges Ankur Arora
2025-11-07  8:59   ` David Hildenbrand (Red Hat)
2025-11-10  7:20     ` Ankur Arora
2025-11-10  8:57       ` David Hildenbrand (Red Hat)
2025-11-11  6:24         ` Ankur Arora [this message]
2025-10-27 20:21 ` [PATCH v8 7/7] mm: folio_zero_user: cache neighbouring pages Ankur Arora
2025-10-27 21:33 ` [PATCH v8 0/7] mm: folio_zero_user: clear contiguous pages Andrew Morton
2025-10-28 17:22   ` Ankur Arora
2025-11-07  5:33     ` Ankur Arora
2025-11-07  8:59       ` David Hildenbrand (Red Hat)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875xbhaxye.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=hpa@zytor.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.