From: Ankur Arora <ankur.a.arora@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
david@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, mingo@redhat.com, mjguzik@gmail.com,
luto@kernel.org, peterz@infradead.org, acme@kernel.org,
namhyung@kernel.org, tglx@linutronix.de, willy@infradead.org,
raghavendra.kt@amd.com, boris.ostrovsky@oracle.com,
konrad.wilk@oracle.com, paulmck@kernel.org
Subject: Re: [PATCH v7 13/16] mm: memory: support clearing page ranges
Date: Wed, 17 Sep 2025 21:54:00 -0700 [thread overview]
Message-ID: <87frckgy3b.fsf@oracle.com> (raw)
In-Reply-To: <20250917144418.25cb9117d64b32cb0c7908d9@linux-foundation.org>
[ Added Paul McKenney. ]
Andrew Morton <akpm@linux-foundation.org> writes:
> On Wed, 17 Sep 2025 08:24:15 -0700 Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
>> Change folio_zero_user() to clear contiguous page ranges instead of
>> clearing using the current page-at-a-time approach. Exposing the largest
>> feasible length can be useful in enabling processors to optimize based
>> on extent.
>
> This patch is something which MM developers might care to take a closer
> look at.
>
>> However, clearing in large chunks can have two problems:
>>
>> - cache locality when clearing small folios (< MAX_ORDER_NR_PAGES)
>> (larger folios don't have any expectation of cache locality).
>>
>> - preemption latency when clearing large folios.
>>
>> Handle the first by splitting the clearing in three parts: the
>> faulting page and its immediate locality, its left and right
>> regions; with the local neighbourhood cleared last.
>
> Has this optimization been shown to be beneficial?
So, this was mostly meant to be defensive. The current code does a
rather extensive left-right dance around the faulting page via
c6ddfb6c58 ("mm, clear_huge_page: move order algorithm into a separate
function") and I wanted to keep the cache hot property for the region
closest to the address touched by the user.
But, no I haven't run any tests showing that it helps.
> If so, are you able to share some measurements?
From some quick kernel builds (with THP) I do see a consistent
difference of a few seconds (1% worse) if I remove this optimization.
(I'm not sure right now why it is worse -- my expectation was that we
would have higher cache misses, but I see pretty similar cache numbers.)
But let me do a more careful test and report back.
> If not, maybe it should be removed?
>
>> ...
>>
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -7021,40 +7021,80 @@ static inline int process_huge_page(
>> return 0;
>> }
>>
>> -static void clear_gigantic_page(struct folio *folio, unsigned long addr_hint,
>> - unsigned int nr_pages)
>> +/*
>> + * Clear contiguous pages chunking them up when running under
>> + * non-preemptible models.
>> + */
>> +static void clear_contig_highpages(struct page *page, unsigned long addr,
>> + unsigned int npages)
>
> Called "_highpages" because it wraps clear_user_highpages(). It really
> should be called clear_contig_user_highpages() ;) (Not serious)
Or maybe clear_user_contig_highpages(), so when we get rid of HUGEMEM,
the _highpages could just be chopped off :D.
>> {
>> - unsigned long addr = ALIGN_DOWN(addr_hint, folio_size(folio));
>> - int i;
>> + unsigned int i, count, unit;
>>
>> - might_sleep();
>> - for (i = 0; i < nr_pages; i++) {
>> + unit = preempt_model_preemptible() ? npages : PAGE_CONTIG_NR;
>
> Almost nothing uses preempt_model_preemptible() and I'm not usefully
> familiar with it. Will this check avoid all softlockup/rcu/etc
> detections in all situations (ie, configs)?
IMO, yes. The code invoked under preempt_model_preemptible() will boil
down to a single interruptible REP STOSB which might execute over
an extent of 1GB (with the last patch). From prior experiments, I know
that irqs are able to interrupt this. And, I /think/ that is a sufficient
condition for avoiding RCU stalls/softlockups etc.
Also, when we were discussing lazy preemption (which Thomas had
suggested as a way to handle scenarios like this or long running Xen
hypercalls etc) this seemed like a scenario that didn't need any extra
handling for CONFIG_PREEMPT.
We did need 83b28cfe79 ("rcu: handle quiescent states for PREEMPT_RCU=n,
PREEMPT_COUNT=y") for CONFIG_PREEMPT_LAZY but AFAICS this should be safe.
Anyway let me think about your all configs point (though only ones which
can have some flavour for hugetlb.)
Also, I would like x86 folks opinion on this. And, maybe Paul McKenney
just to make sure I'm not missing something on RCU side.
Thanks for the comments.
--
ankur
next prev parent reply other threads:[~2025-09-18 4:54 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-17 15:24 [PATCH v7 00/16] mm: folio_zero_user: clear contiguous pages Ankur Arora
2025-09-17 15:24 ` [PATCH v7 01/16] perf bench mem: Remove repetition around time measurement Ankur Arora
2025-09-17 15:24 ` [PATCH v7 02/16] perf bench mem: Defer type munging of size to float Ankur Arora
2025-09-17 15:24 ` [PATCH v7 03/16] perf bench mem: Move mem op parameters into a structure Ankur Arora
2025-09-17 15:24 ` [PATCH v7 04/16] perf bench mem: Pull out init/fini logic Ankur Arora
2025-09-17 15:24 ` [PATCH v7 05/16] perf bench mem: Switch from zalloc() to mmap() Ankur Arora
2025-09-17 15:24 ` [PATCH v7 06/16] perf bench mem: Allow mapping of hugepages Ankur Arora
2025-09-17 15:24 ` [PATCH v7 07/16] perf bench mem: Allow chunking on a memory region Ankur Arora
2025-09-17 15:24 ` [PATCH v7 08/16] perf bench mem: Refactor mem_options Ankur Arora
2025-09-17 15:24 ` [PATCH v7 09/16] perf bench mem: Add mmap() workloads Ankur Arora
2025-09-17 15:24 ` [PATCH v7 10/16] mm: define clear_pages(), clear_user_pages() Ankur Arora
2025-09-23 8:04 ` David Hildenbrand
2025-09-23 20:26 ` Ankur Arora
2025-09-24 11:05 ` David Hildenbrand
2025-09-25 5:25 ` Ankur Arora
2025-09-30 9:43 ` David Hildenbrand
2025-10-10 10:37 ` David Hildenbrand
2025-10-10 13:03 ` David Hildenbrand
2025-09-17 15:24 ` [PATCH v7 11/16] mm/highmem: introduce clear_user_highpages() Ankur Arora
2025-09-23 8:06 ` David Hildenbrand
2025-09-23 20:34 ` Ankur Arora
2025-09-24 11:06 ` David Hildenbrand
2025-09-25 5:26 ` Ankur Arora
2025-09-30 9:44 ` David Hildenbrand
2025-09-17 15:24 ` [PATCH v7 12/16] arm: mm: define clear_user_highpages() Ankur Arora
2025-09-23 8:09 ` David Hildenbrand
2025-09-23 22:25 ` Ankur Arora
2025-09-24 11:10 ` David Hildenbrand
2025-09-25 6:08 ` Ankur Arora
2025-09-30 9:51 ` David Hildenbrand
2025-10-07 6:43 ` Ankur Arora
2025-09-17 15:24 ` [PATCH v7 13/16] mm: memory: support clearing page ranges Ankur Arora
2025-09-17 21:44 ` Andrew Morton
2025-09-18 4:54 ` Ankur Arora [this message]
2025-09-23 8:14 ` David Hildenbrand
2025-09-23 8:36 ` Raghavendra K T
2025-09-23 9:13 ` Raghavendra K T
2025-10-07 6:17 ` Ankur Arora
2025-09-19 11:33 ` kernel test robot
2025-09-17 15:24 ` [PATCH v7 14/16] x86/mm: Simplify clear_page_* Ankur Arora
2025-09-17 15:24 ` [PATCH v7 15/16] x86/clear_page: Introduce clear_pages() Ankur Arora
2025-09-17 15:24 ` [PATCH v7 16/16] x86/clear_pages: Support clearing of page-extents Ankur Arora
2025-09-17 16:29 ` [PATCH v7 00/16] mm: folio_zero_user: clear contiguous pages Arnaldo Carvalho de Melo
2025-09-18 4:00 ` Ankur Arora
2025-09-23 6:29 ` Raghavendra K T
2025-10-07 6:15 ` Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87frckgy3b.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hpa@zytor.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=namhyung@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.