Re: [PATCH v3 4/4] x86/folio_zero_user: multi-page clearing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	mingo@redhat.com, luto@kernel.org, peterz@infradead.org,
	paulmck@kernel.org, rostedt@goodmis.org, tglx@linutronix.de,
	willy@infradead.org, jon.grimm@amd.com, bharata@amd.com,
	raghavendra.kt@amd.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com
Subject: Re: [PATCH v3 4/4] x86/folio_zero_user: multi-page clearing
Date: Mon, 14 Apr 2025 08:53:02 +0200	[thread overview]
Message-ID: <Z_ywzkEEqUOMHcO0@gmail.com> (raw)
In-Reply-To: <20250414034607.762653-5-ankur.a.arora@oracle.com>


* Ankur Arora <ankur.a.arora@oracle.com> wrote:

> clear_pages_rep(), clear_pages_erms() use string instructions to zero
> memory. When operating on more than a single page, we can use these
> more effectively by explicitly advertising the region-size to the
> processor, which can use that as a hint to optimize the clearing
> (ex. by eliding cacheline allocation.)

> +#ifndef CONFIG_HIGHMEM
> +/*
> + * folio_zero_user_preemptible(): multi-page clearing variant of folio_zero_user().
> + *
> + * Taking inspiration from the common code variant, we split the zeroing in
> + * three parts: left of the fault, right of the fault, and up to 5 pages
> + * in the immediate neighbourhood of the target page.
> + *
> + * Cleared in that order to keep cache lines of the target region hot.
> + *
> + * For gigantic pages, there is no expectation of cache locality so just do a
> + * straight zero.
> + */
> +void folio_zero_user_preemptible(struct folio *folio, unsigned long addr_hint)
> +{
> +	unsigned long base_addr = ALIGN_DOWN(addr_hint, folio_size(folio));
> +	const long fault_idx = (addr_hint - base_addr) / PAGE_SIZE;
> +	const struct range pg = DEFINE_RANGE(0, folio_nr_pages(folio) - 1);
> +	int width = 2; /* pages cleared last on either side */
> +	struct range r[3];
> +	int i;
> +
> +	if (folio_nr_pages(folio) > MAX_ORDER_NR_PAGES) {
> +		clear_pages(page_address(folio_page(folio, 0)), folio_nr_pages(folio));

> +			clear_pages(page_address(folio_page(folio, r[i].start)), len);

So the _user postfix naming is super confusing here and elsewhere in 
this series.

clear_page(), and by extension the clear_pages() interface you extended 
it to, fundamentally only works on kernel addresses:

/*
 * Zero a page.
 * %rdi - page
 */
SYM_TYPED_FUNC_START(clear_page_rep)
        movl $4096/8,%ecx
        xorl %eax,%eax
        rep stosq
        RET

Note the absolute lack of fault & exception handling.

But folio_zero_user*() uses the kernel-space variants of page clearing 
AFAICT (contrary to the naming):

void folio_zero_user(struct folio *folio, unsigned long addr_hint)
{
        unsigned int nr_pages = folio_nr_pages(folio);

        if (unlikely(nr_pages > MAX_ORDER_NR_PAGES))
                clear_gigantic_page(folio, addr_hint, nr_pages);
        else
                process_huge_page(addr_hint, nr_pages, clear_subpage, folio);
}


static void clear_gigantic_page(struct folio *folio, unsigned long addr_hint,
                                unsigned int nr_pages)
{
        unsigned long addr = ALIGN_DOWN(addr_hint, folio_size(folio));
        int i;
         
        might_sleep();
        for (i = 0; i < nr_pages; i++) {
                cond_resched();
                clear_user_highpage(folio_page(folio, i), addr + i * PAGE_SIZE);
        }       
}

Which on x86 is simply mapped into a kernel-memory interface:

static inline void clear_user_page(void *page, unsigned long vaddr,
                                   struct page *pg)
{
        clear_page(page);
}

So at minimum this is a misnomer and a confusing mixture of user/kernel 
interface names on an epic scale that TBH should be cleaned up first 
before extended...

> +out:
> +	/* Explicitly invoke cond_resched() to handle any live patching necessary. */
> +	cond_resched();

What again?

Thanks,

	Ingo

next prev parent reply	other threads:[~2025-04-14  6:53 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14  3:46 [PATCH v3 0/4] mm/folio_zero_user: add multi-page clearing Ankur Arora
2025-04-14  3:46 ` [PATCH v3 1/4] x86/clear_page: extend clear_page*() for " Ankur Arora
2025-04-14  6:32   ` Ingo Molnar
2025-04-14 11:02     ` Peter Zijlstra
2025-04-14 11:14       ` Ingo Molnar
2025-04-14 19:46       ` Ankur Arora
2025-04-14 22:26       ` Mateusz Guzik
2025-04-15  6:14         ` Ankur Arora
2025-04-15  8:22           ` Mateusz Guzik
2025-04-15 20:01             ` Ankur Arora
2025-04-15 20:32               ` Mateusz Guzik
2025-04-14 19:52     ` Ankur Arora
2025-04-14 20:09       ` Matthew Wilcox
2025-04-15 21:59         ` Ankur Arora
2025-04-14  3:46 ` [PATCH v3 2/4] x86/clear_page: add clear_pages() Ankur Arora
2025-04-14  3:46 ` [PATCH v3 3/4] huge_page: allow arch override for folio_zero_user() Ankur Arora
2025-04-14  3:46 ` [PATCH v3 4/4] x86/folio_zero_user: multi-page clearing Ankur Arora
2025-04-14  6:53   ` Ingo Molnar [this message]
2025-04-14 21:21     ` Ankur Arora
2025-04-14  7:05   ` Ingo Molnar
2025-04-15  6:36     ` Ankur Arora
2025-04-22  6:36     ` Raghavendra K T
2025-04-22 19:14       ` Ankur Arora
2025-04-15 10:16   ` Mateusz Guzik
2025-04-15 21:46     ` Ankur Arora
2025-04-15 22:01       ` Mateusz Guzik
2025-04-16  4:46         ` Ankur Arora
2025-04-17 14:06           ` Mateusz Guzik
2025-04-14  5:34 ` [PATCH v3 0/4] mm/folio_zero_user: add " Ingo Molnar
2025-04-14 19:30   ` Ankur Arora
2025-04-14  6:36 ` Ingo Molnar
2025-04-14 19:19   ` Ankur Arora
2025-04-15 19:10 ` Zi Yan
2025-04-22 19:32   ` Ankur Arora
2025-04-22  6:23 ` Raghavendra K T
2025-04-22 19:22   ` Ankur Arora
2025-04-23  8:12     ` Raghavendra K T
2025-04-23  9:18       ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_ywzkEEqUOMHcO0@gmail.com \
    --to=mingo@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jon.grimm@amd.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.