Re: [PATCH v10 7/8] mm, folio_zero_user: support clearing page ranges

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ankur Arora <ankur.a.arora@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	david@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, mingo@redhat.com, mjguzik@gmail.com,
	luto@kernel.org, peterz@infradead.org, tglx@linutronix.de,
	willy@infradead.org, raghavendra.kt@amd.com, chleroy@kernel.org,
	ioworker0@gmail.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com
Subject: Re: [PATCH v10 7/8] mm, folio_zero_user: support clearing page ranges
Date: Wed, 17 Dec 2025 00:48:50 -0800	[thread overview]
Message-ID: <87fr994hot.fsf@oracle.com> (raw)
In-Reply-To: <20251216071250.e49ecf7490acf7f377dbfdc0@linux-foundation.org>


Andrew Morton <akpm@linux-foundation.org> writes:

> On Mon, 15 Dec 2025 22:49:25 -0800 Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
>> >>  [#] Notice that we perform much better with preempt=full|lazy. As
>> >>   mentioned above, preemptible models not needing explicit invocations
>> >>   of cond_resched() allow clearing of the full extent (1GB) as a
>> >>   single unit.
>> >>   In comparison the maximum extent used for preempt=none|voluntary is
>> >>   PROCESS_PAGES_NON_PREEMPT_BATCH (8MB).
>> >>
>> >>   The larger extent allows the processor to elide cacheline
>> >>   allocation (on Milan the threshold is LLC-size=32MB.)
>> >
>> > It is this?
>>
>> Yeah I think so. For size >= 32MB, the microcoder can really just elide
>> cacheline allocation, and with the foreknowledge of the extent can perhaps
>> optimize on cache coherence traffic (this last one is my speculation).
>>
>> On cacheline allocation elision, compare the L1-dcache-load in the two versions
>> below:
>>
>> pg-sz=1GB:
>>   -  9,250,034,512      cycles                           #    2.418 GHz                         ( +-  0.43% )  (46.16%)
>>   -    544,878,976      instructions                     #    0.06  insn per cycle
>>   -  2,331,332,516      L1-dcache-loads                  #  609.471 M/sec                       ( +-  0.03% )  (46.16%)
>>   -  1,075,122,960      L1-dcache-load-misses            #   46.12% of all L1-dcache accesses   ( +-  0.01% )  (46.15%)
>>
>>   +  3,688,681,006      cycles                           #    2.420 GHz                         ( +-  3.48% )  (46.01%)
>>   +     10,979,121      instructions                     #    0.00  insn per cycle
>>   +     31,829,258      L1-dcache-loads                  #   20.881 M/sec                       ( +-  4.92% )  (46.34%)
>>   +     13,677,295      L1-dcache-load-misses            #   42.97% of all L1-dcache accesses   ( +-  6.15% )  (46.32%)
>>
>
> That says L1 d-cache loads went from 600 million/sec down to 20
> million/sec when using 32MB chunks?

Sorry, should have mentioned that that run was with preempt=full/lazy.
For those the chunk size is the whole page (GB page in that case).

The context for 32MB was that that's the LLC-size for these systems.
And, from observed behaviour the cacheline allocation elision
optimization only happens when the chunk size used is larger than that.

> Do you know what happens to preemption latency if you increase that
> chunk size from 8MB to 32MB?

So, I gathered some numbers on a Zen4/Genoa system. The ones above are
from Zen3/Milan.

region-sz=64GB, loop-count=3 (total region-size=3*64GB):

                                Bandwidth    L1-dcache-loads

    pg-sz=2MB, batch-sz= 8MB   25.10 GB/s    6,745,859,855  # 2.00 L1-dcache-loads/64B
       # pg-sz=2MB for context

    pg-sz=1GB, batch-sz= 8MB   26.88 GB/s    6,469,900,728  # 2.00 L1-dcache-loads/64B
    pg-sz=1GB, batch-sz=32MB   38.69 GB/s    2,559,249,546  # 0.79 L1-dcache-loads/64B
    pg-sz=1GB, batch-sz=64MB   46.91 GB/s      919,539,544  # 0.28 L1-dcache-loads/64B

    pg-sz=1GB, batch-sz= 1GB   58.68 GB/s       79,458,439  # 0.024 L1-dcache-loads/64B

All of these are for preempt=none, and with boost=0. (With boost=1 the
BW increases by ~25%.)

So, I wasn't quite right about the LLC-size=32MB being the threshold for
this optimization. There is a change in behaviour at that point but it
does improve beyond that.
(Ideally this threshold would be a processor MSR. That way we could
use this for 2MB pages as well. Oh well.)

> At 42GB/sec, 32MB will take less than a
> millisecond, yes?  I'm not aware of us really having any latency
> targets in these preemption modes, but 1 millisecond sounds pretty
> good.

Agreed. The only complaint threshold I see is 100ms (default value of
sysctl_resched_latency_warn_ms) which is pretty far from ~1ms.

And having a threshold of 32MB might benefit other applications since
we won't be discarding their cachelines in favour of filling up the
cache with zeroes.

I think the only problem cases might be slow uarchs and workloads where
the memory bus is saturated which might dilate the preemption latency.

And, even if the operation takes say ~20ms, that should still leave us
with a reasonably large margin.
(And, any latency senstive users are probably not running with
preempt=none/voluntary.)

--
ankur

next prev parent reply	other threads:[~2025-12-17  8:51 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-15 20:49 [PATCH v10 0/8] mm: folio_zero_user: clear contiguous pages Ankur Arora
2025-12-15 20:49 ` [PATCH v10 1/8] treewide: provide a generic clear_user_page() variant Ankur Arora
2025-12-18  7:11   ` David Hildenbrand (Red Hat)
2025-12-18 19:31     ` Ankur Arora
2025-12-15 20:49 ` [PATCH v10 2/8] highmem: introduce clear_user_highpages() Ankur Arora
2025-12-15 20:49 ` [PATCH v10 3/8] mm: introduce clear_pages() and clear_user_pages() Ankur Arora
2025-12-15 20:49 ` [PATCH v10 4/8] highmem: do range clearing in clear_user_highpages() Ankur Arora
2025-12-18  7:15   ` David Hildenbrand (Red Hat)
2025-12-18 20:01     ` Ankur Arora
2025-12-15 20:49 ` [PATCH v10 5/8] x86/mm: Simplify clear_page_* Ankur Arora
2025-12-15 20:49 ` [PATCH v10 6/8] x86/clear_page: Introduce clear_pages() Ankur Arora
2025-12-18  7:22   ` David Hildenbrand (Red Hat)
2025-12-15 20:49 ` [PATCH v10 7/8] mm, folio_zero_user: support clearing page ranges Ankur Arora
2025-12-16  2:44   ` Andrew Morton
2025-12-16  6:49     ` Ankur Arora
2025-12-16 15:12       ` Andrew Morton
2025-12-17  8:48         ` Ankur Arora [this message]
2025-12-17 18:54           ` Andrew Morton
2025-12-17 19:51             ` Ankur Arora
2025-12-17 20:26               ` Andrew Morton
2025-12-18  0:51                 ` Ankur Arora
2025-12-18  7:36   ` David Hildenbrand (Red Hat)
2025-12-18 20:16     ` Ankur Arora
2025-12-15 20:49 ` [PATCH v10 8/8] mm: folio_zero_user: cache neighbouring pages Ankur Arora
2025-12-18  7:49   ` David Hildenbrand (Red Hat)
2025-12-18 21:01     ` Ankur Arora
2025-12-18 21:23       ` Ankur Arora
2025-12-23 10:11         ` David Hildenbrand (Red Hat)
2025-12-16  2:48 ` [PATCH v10 0/8] mm: folio_zero_user: clear contiguous pages Andrew Morton
2025-12-16  5:04   ` Ankur Arora
2025-12-18  7:38     ` David Hildenbrand (Red Hat)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fr994hot.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=chleroy@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=hpa@zytor.com \
    --cc=ioworker0@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.