Re: [PATCH v3 2/3] xen/mm: allow deferred scrub of physmap populate allocated pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Anthony PERARD <anthony.perard@vates.tech>,
	Michal Orzel <michal.orzel@amd.com>,
	Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v3 2/3] xen/mm: allow deferred scrub of physmap populate allocated pages
Date: Tue, 27 Jan 2026 11:40:17 +0100	[thread overview]
Message-ID: <aXiVeAQFUMQkIK1_@Mac.lan> (raw)
In-Reply-To: <69a64a89-af3f-47fe-8998-a3ff76e9484e@suse.com>

On Mon, Jan 26, 2026 at 12:14:35PM +0100, Jan Beulich wrote:
> On 22.01.2026 18:38, Roger Pau Monne wrote:
> > Physmap population has the need to use pages as big as possible to reduce
> > p2m shattering.  However that triggers issues when big enough pages are not
> > yet scrubbed, and so scrubbing must be done at allocation time.  On some
> > scenarios with added contention the watchdog can trigger:
> > 
> > Watchdog timer detects that CPU55 is stuck!
> > ----[ Xen-4.17.5-21  x86_64  debug=n  Not tainted ]----
> > CPU:    55
> > RIP:    e008:[<ffff82d040204c4a>] clear_page_sse2+0x1a/0x30
> > RFLAGS: 0000000000000202   CONTEXT: hypervisor (d0v12)
> > [...]
> > Xen call trace:
> >    [<ffff82d040204c4a>] R clear_page_sse2+0x1a/0x30
> >    [<ffff82d04022a121>] S clear_domain_page+0x11/0x20
> >    [<ffff82d04022c170>] S common/page_alloc.c#alloc_heap_pages+0x400/0x5a0
> >    [<ffff82d04022d4a7>] S alloc_domheap_pages+0x67/0x180
> >    [<ffff82d040226f9f>] S common/memory.c#populate_physmap+0x22f/0x3b0
> >    [<ffff82d040228ec8>] S do_memory_op+0x728/0x1970
> > 
> > Introduce a mechanism to preempt page scrubbing in populate_physmap().  It
> > relies on stashing the dirty page in the domain struct temporarily to
> > preempt to guest context, so the scrubbing can resume when the domain
> > re-enters the hypercall.  The added deferral mechanism will only be used for
> > domain construction, and is designed to be used with a single threaded
> > domain builder.  If the toolstack makes concurrent calls to
> > XENMEM_populate_physmap for the same target domain it will trash stashed
> > pages, resulting in slow domain physmap population.
> > 
> > Note a similar issue is present in increase reservation.  However that
> > hypercall is likely to only be used once the domain is already running and
> > the known implementations use 4K pages. It will be deal with in a separate
> > patch using a different approach, that will also take care of the
> > allocation in populate_physmap() once the domain is running.
> > 
> > Fixes: 74d2e11ccfd2 ("mm: Scrub pages in alloc_heap_pages() if needed")
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v2:
> >  - Introduce FREE_DOMHEAP_PAGE{,S}().
> >  - Remove j local counter.
> >  - Free page pending scrub in domain_kill() also.
> 
> Yet still not right in domain_unpause_by_systemcontroller() as well. I.e. a
> toolstack action is still needed after the crash to make the memory usable
> again. If you made ...

Oh, I've misread your previous reply and it seemed to me your
preference was to do it in domain_kill().

> > @@ -1286,6 +1293,19 @@ int domain_kill(struct domain *d)
> >          rspin_barrier(&d->domain_lock);
> >          argo_destroy(d);
> >          vnuma_destroy(d->vnuma);
> > +        /*
> > +         * Attempt to free any pages pending scrub early.  Toolstack can still
> > +         * trigger populate_physmap() operations at this point, and hence a
> > +         * final cleanup must be done in _domain_destroy().
> > +         */
> > +        rspin_lock(&d->page_alloc_lock);
> > +        if ( d->pending_scrub )
> > +        {
> > +            FREE_DOMHEAP_PAGES(d->pending_scrub, d->pending_scrub_order);
> > +            d->pending_scrub_order = 0;
> > +            d->pending_scrub_index = 0;
> > +        }
> > +        rspin_unlock(&d->page_alloc_lock);
> 
> ... this into a small helper function (usable even from _domain_destroy(),
> as locking being used doesn't matter there), it would have negligible
> footprint there.
> 
> As to the comment, not being a native speaker it still feels to me as if
> moving "early" earlier (after "free") might help parsing of the 1st sentence.

I could also drop "early" completely from the sentence.  I've moved
the comment at the top of the newly introduced helper and reworded it
as:

/*
 * Called multiple times during domain destruction, to attempt to early free
 * any stashed pages to be scrubbed.  The call from _domain_destroy() is done
 * when the toolstack can no longer stash any pages.
 */

Let me know if that's OK.

> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -159,6 +159,66 @@ static void increase_reservation(struct memop_args *a)
> >      a->nr_done = i;
> >  }
> >  
> > +/*
> > + * Temporary storage for a domain assigned page that's not been fully scrubbed.
> > + * Stored pages must be domheap ones.
> > + *
> > + * The stashed page can be freed at any time by Xen, the caller must pass the
> > + * order and NUMA node requirement to the fetch function to ensure the
> > + * currently stashed page matches it's requirements.
> > + */
> > +static void stash_allocation(struct domain *d, struct page_info *page,
> > +                             unsigned int order, unsigned int scrub_index)
> > +{
> > +    rspin_lock(&d->page_alloc_lock);
> > +
> > +    /*
> > +     * Drop any stashed allocation to accommodated the current one.  This
> > +     * interface is designed to be used for single-threaded domain creation.
> > +     */
> > +    if ( d->pending_scrub )
> > +        free_domheap_pages(d->pending_scrub, d->pending_scrub_order);
> 
> Didn't you indicate you'd move the freeing ...
> 
> > +    d->pending_scrub_index = scrub_index;
> > +    d->pending_scrub_order = order;
> > +    d->pending_scrub = page;
> > +
> > +    rspin_unlock(&d->page_alloc_lock);
> > +}
> > +
> > +static struct page_info *get_stashed_allocation(struct domain *d,
> > +                                                unsigned int order,
> > +                                                nodeid_t node,
> > +                                                unsigned int *scrub_index)
> > +{
> 
> ... into this function?

I could add freeing to get_stashed_allocation(), but it seems
pointless, because the freeing in stash_allocation() will have to stay
to deal with concurrent callers.  Even if a context frees the stashed
page in get_stashed_allocation() there's no guarantee the field will
still be free when stash_allocation() is called, as another concurrent
thread might have stashed a page in the meantime.

I think it's best to consistently do it only in stash_allocation(), as
that's clearer.

Thanks, Roger.

next prev parent reply	other threads:[~2026-01-27 10:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-22 17:38 [PATCH v3 0/3] xen/mm: limit in-place scrubbing Roger Pau Monne
2026-01-22 17:38 ` [PATCH v3 1/3] xen/mm: enforce SCRUB_DEBUG checks for MEMF_no_scrub allocations Roger Pau Monne
2026-01-22 17:38 ` [PATCH v3 2/3] xen/mm: allow deferred scrub of physmap populate allocated pages Roger Pau Monne
2026-01-26 11:14   ` Jan Beulich
2026-01-27 10:40     ` Roger Pau Monné [this message]
2026-01-27 11:06       ` Jan Beulich
2026-01-27 15:01         ` Roger Pau Monné
2026-01-27 15:49           ` Jan Beulich
2026-01-22 17:38 ` [PATCH v3 3/3] xen/mm: limit non-scrubbed allocations to a specific order Roger Pau Monne
2026-01-26 11:21   ` Jan Beulich
2026-01-27 10:45     ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXiVeAQFUMQkIK1_@Mac.lan \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anthony.perard@vates.tech \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=michal.orzel@amd.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.