Re: [PATCH v3 4/4] x86/folio_zero_user: multi-page clearing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ankur Arora <ankur.a.arora@oracle.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	mingo@redhat.com, luto@kernel.org, peterz@infradead.org,
	paulmck@kernel.org, rostedt@goodmis.org, tglx@linutronix.de,
	willy@infradead.org, jon.grimm@amd.com, bharata@amd.com,
	raghavendra.kt@amd.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com
Subject: Re: [PATCH v3 4/4] x86/folio_zero_user: multi-page clearing
Date: Mon, 14 Apr 2025 23:36:48 -0700	[thread overview]
Message-ID: <87jz7mx75r.fsf@oracle.com> (raw)
In-Reply-To: <Z_yzshvBmYiPrxU0@gmail.com>


Ingo Molnar <mingo@kernel.org> writes:

> * Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
>> clear_pages_rep(), clear_pages_erms() use string instructions to zero
>> memory. When operating on more than a single page, we can use these
>> more effectively by explicitly advertising the region-size to the
>> processor, which can use that as a hint to optimize the clearing
>> (ex. by eliding cacheline allocation.)
>>
>> As a secondary benefit, string instructions are typically microcoded,
>> and working with larger regions helps amortize the cost of the decode.
>
> Not just the decoding, but also iterations around page-sized chunks are
> not cheap these days: there's various compiler generated mitigations
> and other overhead that applies on a typical kernel, and using larger
> sizes amortizes that per-page-iteration setup cost.

Thanks. Yeah, I was completely forgetting that even the cost of returns
has gone up in the mitigation era :D.

Is retbleed the one you were alluding to or there might be others that
would apply here as well?

>> When zeroing the 2MB page, maximize spatial locality by clearing in
>> three sections: the faulting page and its immediate neighbourhood, the
>> left and the right regions, with the local neighbourhood cleared last.
>
> s/zeroing the 2MB page
>  /zeroing a 2MB page
>
>
>> It's not entirely clear why the performance for pg-sz=2MB improves.
>> We decode fewer instructions and the hardware prefetcher can do a
>> better job, but the perf stats for both of those aren't convincing
>> enough to the extent of ~30%.
>
> s/why the performance
>  /why performance
>
>> For both page-sizes, Icelakex, behaves similarly to Milan pg-sz=2MB: we
>> see a drop in cycles but there's no drop in cacheline allocation.
>
> s/Icelakex, behaves similarly
>  /Icelakex behaves similarly

Ack to all of the above.

>> Performance for preempt=none|voluntary remains unchanged.
>
> CONFIG_PREEMPT_VOLUNTARY=y is the default on a number of major
> distributions, such as Ubuntu, and a lot of enterprise distro kernels -
> and this patch does nothing for them, for no good reason.
> So could you please provide a sensible size granularity cutoff of 16MB
> or so on non-preemptible kernels, instead of this weird build-time
> all-or-nothing binary cutoff based on preemption modes?

So, the reason for associating this with preemption modes was in part not
the difficulty of deciding a sensible granularity cutoff.

I had done a variety of chunking for an earlier version which was a bit
of a mess:
https://lore.kernel.org/lkml/20220606203725.1313715-11-ankur.a.arora@oracle.com/.

Fixed size chunking should be straight-forward enough. However, 16MB is
around 1.6ms if you zero at 10GBps. And, longer if you are on older
hardware.

> On preempt=full/lazy the granularity limit would be infinite.
>
> I.e the only code dependent on the preemption mode should be the size
> cutoff/limit.
> On full/lazy preemption the code would, ideally, compile to something
> close to your current code.

Yeah, agree.

>> +obj-$(CONFIG_PREEMPTION)	+= memory.o
>
>> +#ifndef CONFIG_HIGHMEM
>> +/*
>> + * folio_zero_user_preemptible(): multi-page clearing variant of folio_zero_user().
>
> We don't care much about HIGHMEM these days I suppose, but this
> dependency still feels wrong. Is this a stealth dependency on x86-64,
> trying to avoid a new arch Kconfig for this new API, right? ;-)

Alas nothing so crafty :). HIGHMEM means that we need to map pages in a
hugepage folio via kmap_local_page() -- so cannot treat a hugepage folio
as continguous memory and thus cannot use REP; STOS on it.

I guess the CONFIG_HIGHMEM condition clearly warrants a comment.

--
ankur

next prev parent reply	other threads:[~2025-04-15  6:37 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14  3:46 [PATCH v3 0/4] mm/folio_zero_user: add multi-page clearing Ankur Arora
2025-04-14  3:46 ` [PATCH v3 1/4] x86/clear_page: extend clear_page*() for " Ankur Arora
2025-04-14  6:32   ` Ingo Molnar
2025-04-14 11:02     ` Peter Zijlstra
2025-04-14 11:14       ` Ingo Molnar
2025-04-14 19:46       ` Ankur Arora
2025-04-14 22:26       ` Mateusz Guzik
2025-04-15  6:14         ` Ankur Arora
2025-04-15  8:22           ` Mateusz Guzik
2025-04-15 20:01             ` Ankur Arora
2025-04-15 20:32               ` Mateusz Guzik
2025-04-14 19:52     ` Ankur Arora
2025-04-14 20:09       ` Matthew Wilcox
2025-04-15 21:59         ` Ankur Arora
2025-04-14  3:46 ` [PATCH v3 2/4] x86/clear_page: add clear_pages() Ankur Arora
2025-04-14  3:46 ` [PATCH v3 3/4] huge_page: allow arch override for folio_zero_user() Ankur Arora
2025-04-14  3:46 ` [PATCH v3 4/4] x86/folio_zero_user: multi-page clearing Ankur Arora
2025-04-14  6:53   ` Ingo Molnar
2025-04-14 21:21     ` Ankur Arora
2025-04-14  7:05   ` Ingo Molnar
2025-04-15  6:36     ` Ankur Arora [this message]
2025-04-22  6:36     ` Raghavendra K T
2025-04-22 19:14       ` Ankur Arora
2025-04-15 10:16   ` Mateusz Guzik
2025-04-15 21:46     ` Ankur Arora
2025-04-15 22:01       ` Mateusz Guzik
2025-04-16  4:46         ` Ankur Arora
2025-04-17 14:06           ` Mateusz Guzik
2025-04-14  5:34 ` [PATCH v3 0/4] mm/folio_zero_user: add " Ingo Molnar
2025-04-14 19:30   ` Ankur Arora
2025-04-14  6:36 ` Ingo Molnar
2025-04-14 19:19   ` Ankur Arora
2025-04-15 19:10 ` Zi Yan
2025-04-22 19:32   ` Ankur Arora
2025-04-22  6:23 ` Raghavendra K T
2025-04-22 19:22   ` Ankur Arora
2025-04-23  8:12     ` Raghavendra K T
2025-04-23  9:18       ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jz7mx75r.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jon.grimm@amd.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.