From: Mateusz Guzik <mjguzik@gmail.com>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
akpm@linux-foundation.org, david@kernel.org, bp@alien8.de,
dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
luto@kernel.org, peterz@infradead.org, tglx@linutronix.de,
willy@infradead.org, raghavendra.kt@amd.com,
boris.ostrovsky@oracle.com, konrad.wilk@oracle.com
Subject: Re: [PATCH v9 4/7] x86/mm: Simplify clear_page_*
Date: Wed, 26 Nov 2025 11:01:43 +0100 [thread overview]
Message-ID: <CAGudoHFgDEEBgQK5PrEUAJsb=iFpsT5OJ8+7W8PV0CGNePR4JQ@mail.gmail.com> (raw)
In-Reply-To: <20251121202352.494700-5-ankur.a.arora@oracle.com>
On Fri, Nov 21, 2025 at 9:24 PM Ankur Arora <ankur.a.arora@oracle.com> wrote:
> + * Switch between three implementations of page clearing based on CPU
> + * capabilities:
> + *
> + * - __clear_pages_unrolled(): the oldest, slowest and universally
> + * supported method. Zeroes via 8-byte MOV instructions unrolled 8x
> + * to write a 64-byte cacheline in each loop iteration.
> + *
> + * - "REP; STOSQ": really old CPUs had crummy REP implementations.
> + * Vendor CPU setup code sets 'REP_GOOD' on CPUs where REP can be
> + * trusted. The instruction writes 8-byte per REP iteration but
> + * CPUs can internally batch these together and do larger writes.
> + *
> + * - "REP; STOSB": CPUs that enumerate 'ERMS' have an improved STOS
> + * implementation that is less picky about alignment and where
> + * STOSB (1-byte at a time) is actually faster than STOSQ (8-bytes
> + * at a time.)
> + *
I think this is somewhat odd commentary in this context.
Note about "crummy REP implementations" should be in description of
__clear_pages_unrolled as it justifies its existence (I think the
routine would be best whacked btw, but I'm not going to argue about it
in this thread).
Description of STOSQ notes the CPU can do more than 8 bytes at a time,
while description of STOSB claim does not make such a clarification.
At the same time the note about less picky about alignment makes no
significance in the context of page clearing as they are, well, page
aligned.
There is a fucky real-world problem with ERMS worth noting: there are
hypervisor setups out there which *hide* the bit by default (no
really, see Proxmox for example -- you get a bare bones pre-ERMS
cpuid)
With all this in mind, modulo poor grammar on my end, I would suggest
something like this:
<quote>
There are 3 variants implemented:
- REP; STOSB: used if the CPU supports "Enhanced REP MOVSB/STOSB" (aka
ERMS), which is true for majority of microarchitectures today
- REP; STOSQ: fallback if the ERMS bit is not present
- __clear_pages_unrolled: code for CPUs which are determined to have
poor REP support, only concerns long obsolete uarchs.
Warnings: some hypervisors are configured to expose a very limited set
of capabilites in the guest, fitering out ERMS even if present. As
such the STOSQ variant is still in active use on some setups even when
hardware does not need it.
</quote>
next prev parent reply other threads:[~2025-11-26 10:02 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-21 20:23 [PATCH v9 0/7] mm: folio_zero_user: clear contiguous pages Ankur Arora
2025-11-21 20:23 ` [PATCH v9 1/7] treewide: provide a generic clear_user_page() variant Ankur Arora
2025-11-23 11:53 ` Christophe Leroy (CS GROUP)
2025-11-24 10:17 ` David Hildenbrand (Red Hat)
2025-11-24 14:02 ` David Hildenbrand (Red Hat)
2025-11-25 7:52 ` Ankur Arora
2025-11-27 23:57 ` Ankur Arora
2025-11-28 7:39 ` Christophe Leroy (CS GROUP)
2025-11-28 22:19 ` Ankur Arora
2025-11-21 20:23 ` [PATCH v9 2/7] mm: introduce clear_pages() and clear_user_pages() Ankur Arora
2025-11-23 13:17 ` Christophe Leroy (CS GROUP)
2025-11-24 10:26 ` David Hildenbrand (Red Hat)
2025-11-28 10:13 ` Lance Yang
2025-11-28 21:59 ` Ankur Arora
2025-11-21 20:23 ` [PATCH v9 3/7] mm/highmem: introduce clear_user_highpages() Ankur Arora
2025-11-21 20:23 ` [PATCH v9 4/7] x86/mm: Simplify clear_page_* Ankur Arora
2025-11-25 13:47 ` Borislav Petkov
2025-11-25 19:01 ` Ankur Arora
2025-11-26 10:01 ` Mateusz Guzik [this message]
2025-11-27 5:28 ` Ankur Arora
2025-11-21 20:23 ` [PATCH v9 5/7] x86/clear_page: Introduce clear_pages() Ankur Arora
2025-11-21 20:23 ` [PATCH v9 6/7] mm, folio_zero_user: support clearing page ranges Ankur Arora
2025-11-21 20:23 ` [PATCH v9 7/7] mm: folio_zero_user: cache neighbouring pages Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGudoHFgDEEBgQK5PrEUAJsb=iFpsT5OJ8+7W8PV0CGNePR4JQ@mail.gmail.com' \
--to=mjguzik@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=ankur.a.arora@oracle.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=hpa@zytor.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=tglx@linutronix.de \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).