Re: [PATCH v10 8/8] mm: folio_zero_user: cache neighbouring pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org
Cc: akpm@linux-foundation.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	mjguzik@gmail.com, luto@kernel.org, peterz@infradead.org,
	tglx@linutronix.de, willy@infradead.org, raghavendra.kt@amd.com,
	chleroy@kernel.org, ioworker0@gmail.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com
Subject: Re: [PATCH v10 8/8] mm: folio_zero_user: cache neighbouring pages
Date: Thu, 18 Dec 2025 08:49:41 +0100	[thread overview]
Message-ID: <b34ce151-45d5-4074-a869-de93b2bcdef0@kernel.org> (raw)
In-Reply-To: <20251215204922.475324-9-ankur.a.arora@oracle.com>

On 12/15/25 21:49, Ankur Arora wrote:
> folio_zero_user() does straight zeroing without caring about
> temporal locality for caches.
> 
> This replaced commit c6ddfb6c5890 ("mm, clear_huge_page: move order
> algorithm into a separate function") where we cleared a page at a
> time converging to the faulting page from the left and the right.
> 
> To retain limited temporal locality, split the clearing in three
> parts: the faulting page and its immediate neighbourhood, and, the
> remaining regions on the left and the right. The local neighbourhood
> will be cleared last.
> Do this only when zeroing small folios (< MAX_ORDER_NR_PAGES) since
> there isn't much expectation of cache locality for large folios.
> 
> Performance
> ===
> 
> AMD Genoa (EPYC 9J14, cpus=2 sockets * 96 cores * 2 threads,
>    memory=2.2 TB, L1d= 16K/thread, L2=512K/thread, L3=2MB/thread)
> 
> anon-w-seq (vm-scalability):
>                              stime                  utime
> 
>    page-at-a-time      1654.63 ( +- 3.84% )     811.00 ( +- 3.84% )
>    contiguous clearing 1602.86 ( +- 3.00% )     970.75 ( +- 4.68% )
>    neighbourhood-last  1630.32 ( +- 2.73% )     886.37 ( +- 5.19% )
> 
> Both stime and utime respond in expected ways. stime drops for both
> contiguous clearing (-3.14%) and neighbourhood-last (-1.46%)
> approaches. However, utime increases for both contiguous clearing
> (+19.7%) and neighbourhood-last (+9.28%).
> 
> In part this is because anon-w-seq runs with 384 processes zeroing
> anonymously mapped memory which they then access sequentially. As
> such this is likely an uncommon pattern where the memory bandwidth
> is saturated while also being cache limited because we access the
> entire region.
> 
> Kernel make workload (make -j 12 bzImage):
> 
>                              stime                  utime
> 
>    page-at-a-time       138.16 ( +- 0.31% )    1015.11 ( +- 0.05% )
>    contiguous clearing  133.42 ( +- 0.90% )    1013.49 ( +- 0.05% )
>    neighbourhood-last   131.20 ( +- 0.76% )    1011.36 ( +- 0.07% )
> 
> For make the utime stays relatively flat with an up to 4.9% improvement
> in the stime.

Nice evaluation!

> 
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> Reviewed-by: Raghavendra K T <raghavendra.kt@amd.com>
> Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
> ---
>   mm/memory.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 974c48db6089..d22348b95227 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -7268,13 +7268,53 @@ static void clear_contig_highpages(struct page *page, unsigned long addr,
>    * @addr_hint: The address accessed by the user or the base address.
>    *
>    * Uses architectural support to clear page ranges.
> + *
> + * Clearing of small folios (< MAX_ORDER_NR_PAGES) is split in three parts:
> + * pages in the immediate locality of the faulting page, and its left, right
> + * regions; the local neighbourhood is cleared last in order to keep cache
> + * lines of the faulting region hot.
> + *
> + * For larger folios we assume that there is no expectation of cache locality
> + * and just do a straight zero.

Just wondering: why not do the same thing here as well? Probably 
shouldn't hurt and would get rid of some code?

>    */
>   void folio_zero_user(struct folio *folio, unsigned long addr_hint)
>   {
>   	unsigned long base_addr = ALIGN_DOWN(addr_hint, folio_size(folio));

While at it you could turn that const as well.

> +	const long fault_idx = (addr_hint - base_addr) / PAGE_SIZE;
> +	const struct range pg = DEFINE_RANGE(0, folio_nr_pages(folio) - 1);
> +	const int width = 2; /* number of pages cleared last on either side */

Is "width" really the right terminology? (the way you describe it, it's 
more like diameter?)

Wondering whether we should turn that into a define to make it clearer 
that we are dealing with a magic value.

Speaking of magic values, why 2 and not 3? :)

> +	struct range r[3];
> +	int i;
>   
> -	clear_contig_highpages(folio_page(folio, 0),
> -				base_addr, folio_nr_pages(folio));
> +	if (folio_nr_pages(folio) > MAX_ORDER_NR_PAGES) {
> +		clear_contig_highpages(folio_page(folio, 0),
> +				       base_addr, folio_nr_pages(folio));
> +		return;
> +	}
> +
> +	/*
> +	 * Faulting page and its immediate neighbourhood. Cleared at the end to
> +	 * ensure it sticks around in the cache.
> +	 */
> +	r[2] = DEFINE_RANGE(clamp_t(s64, fault_idx - width, pg.start, pg.end),
> +			    clamp_t(s64, fault_idx + width, pg.start, pg.end));
> +
> +	/* Region to the left of the fault */
> +	r[1] = DEFINE_RANGE(pg.start,
> +			    clamp_t(s64, r[2].start-1, pg.start-1, r[2].start));

"start-1" -> "start - 1" etc.

> +
> +	/* Region to the right of the fault: always valid for the common fault_idx=0 case. */
> +	r[0] = DEFINE_RANGE(clamp_t(s64, r[2].end+1, r[2].end, pg.end+1),
> +			    pg.end);

Same here.

> +
> +	for (i = 0; i <= 2; i++) {

Can we use ARRAY_SIZE instead of "2" ?

> +		unsigned int npages = range_len(&r[i]);
> +		struct page *page = folio_page(folio, r[i].start);
> +		unsigned long addr = base_addr + folio_page_idx(folio, page) * PAGE_SIZE;

Can't you compute that from r[i].start) instead? The folio_page_idx() 
seems avoidable unless I am missing something.

Could make npages and addr const.

const unsigned long addr = base_addr + r[i].start * PAGE_SIZE;
const unsigned int npages = range_len(&r[i]);
struct page *page = folio_page(folio, r[i].start);

> +
> +		if (npages > 0)
> +			clear_contig_highpages(page, addr, npages);
> +	}
>   }
>   
>   static int copy_user_gigantic_page(struct folio *dst, struct folio *src,


-- 
Cheers

David

next prev parent reply	other threads:[~2025-12-18  7:49 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-15 20:49 [PATCH v10 0/8] mm: folio_zero_user: clear contiguous pages Ankur Arora
2025-12-15 20:49 ` [PATCH v10 1/8] treewide: provide a generic clear_user_page() variant Ankur Arora
2025-12-18  7:11   ` David Hildenbrand (Red Hat)
2025-12-18 19:31     ` Ankur Arora
2025-12-15 20:49 ` [PATCH v10 2/8] highmem: introduce clear_user_highpages() Ankur Arora
2025-12-15 20:49 ` [PATCH v10 3/8] mm: introduce clear_pages() and clear_user_pages() Ankur Arora
2025-12-15 20:49 ` [PATCH v10 4/8] highmem: do range clearing in clear_user_highpages() Ankur Arora
2025-12-18  7:15   ` David Hildenbrand (Red Hat)
2025-12-18 20:01     ` Ankur Arora
2025-12-15 20:49 ` [PATCH v10 5/8] x86/mm: Simplify clear_page_* Ankur Arora
2025-12-15 20:49 ` [PATCH v10 6/8] x86/clear_page: Introduce clear_pages() Ankur Arora
2025-12-18  7:22   ` David Hildenbrand (Red Hat)
2025-12-15 20:49 ` [PATCH v10 7/8] mm, folio_zero_user: support clearing page ranges Ankur Arora
2025-12-16  2:44   ` Andrew Morton
2025-12-16  6:49     ` Ankur Arora
2025-12-16 15:12       ` Andrew Morton
2025-12-17  8:48         ` Ankur Arora
2025-12-17 18:54           ` Andrew Morton
2025-12-17 19:51             ` Ankur Arora
2025-12-17 20:26               ` Andrew Morton
2025-12-18  0:51                 ` Ankur Arora
2025-12-18  7:36   ` David Hildenbrand (Red Hat)
2025-12-18 20:16     ` Ankur Arora
2025-12-15 20:49 ` [PATCH v10 8/8] mm: folio_zero_user: cache neighbouring pages Ankur Arora
2025-12-18  7:49   ` David Hildenbrand (Red Hat) [this message]
2025-12-18 21:01     ` Ankur Arora
2025-12-18 21:23       ` Ankur Arora
2025-12-23 10:11         ` David Hildenbrand (Red Hat)
2025-12-16  2:48 ` [PATCH v10 0/8] mm: folio_zero_user: clear contiguous pages Andrew Morton
2025-12-16  5:04   ` Ankur Arora
2025-12-18  7:38     ` David Hildenbrand (Red Hat)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b34ce151-45d5-4074-a869-de93b2bcdef0@kernel.org \
    --to=david@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=chleroy@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=ioworker0@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.