* [PATCH 0/2] xen/mm: fix fallout from populate_physmap() deferred scrub change
@ 2026-03-25 10:08 Roger Pau Monne
2026-03-25 10:08 ` [PATCH 1/2] xen/mm: don't unconditionally clear PGC_need_scrub in alloc_heap_pages() Roger Pau Monne
2026-03-25 10:08 ` [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed Roger Pau Monne
0 siblings, 2 replies; 12+ messages in thread
From: Roger Pau Monne @ 2026-03-25 10:08 UTC (permalink / raw)
To: xen-devel
Cc: Roger Pau Monne, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Stefano Stabellini
Hello,
Two fixes for the populate_physmap() deferred scrubbing changes.
Thanks, Roger.
Roger Pau Monne (2):
xen/mm: don't unconditionally clear PGC_need_scrub in
alloc_heap_pages()
xen/mm: do not assign pages to a domain until they are scrubbed
xen/common/memory.c | 9 ++++++++-
xen/common/page_alloc.c | 39 +++++++++++++++++++++++++++++----------
xen/include/xen/mm.h | 2 ++
3 files changed, 39 insertions(+), 11 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] xen/mm: don't unconditionally clear PGC_need_scrub in alloc_heap_pages()
2026-03-25 10:08 [PATCH 0/2] xen/mm: fix fallout from populate_physmap() deferred scrub change Roger Pau Monne
@ 2026-03-25 10:08 ` Roger Pau Monne
2026-03-25 13:37 ` Jan Beulich
2026-03-25 10:08 ` [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed Roger Pau Monne
1 sibling, 1 reply; 12+ messages in thread
From: Roger Pau Monne @ 2026-03-25 10:08 UTC (permalink / raw)
To: xen-devel
Cc: Roger Pau Monne, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Stefano Stabellini, Ayden Bottos
alloc_heap_pages() will unconditionally clear PGC_need_scrub, even when
MEMF_no_scrub is requested. This is kind of expected as otherwise some
callers will assert on seeing non-expected flags set on the count_info
field.
Introduce a new MEMF bit to signal to alloc_heap_pages() that non-scrubbed
pages should keep the PGC_need_scrub bit set. This fixes returning dirty
pages from alloc_domheap_pages() without the PGC_need_scrub bit set for
populate_physmap() to consume.
With the above change alloc_domheap_pages() needs an adjustment to cope
with allocated pages possibly having the PGC_need_scrub set.
Fixes: 83a784a15b47 ("xen/mm: allow deferred scrub of physmap populate allocated pages")
Reported-by: Ayden Bottos <aydenbottos12@gmail.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
This issue was initially reported to the Xen Security Team, and it did turn
out to not require an XSA only because the code hasn't been part of any
release, otherwise an XSA would have been issued.
The Security Team would like to thanks Ayden for the prompt report.
In the scrubbing loop in alloc_heap_pages() i should better be unsigned
long.
---
xen/common/memory.c | 3 ++-
xen/common/page_alloc.c | 31 ++++++++++++++++++++++---------
xen/include/xen/mm.h | 2 ++
3 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 918510f287a0..f0ff1311881c 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -345,7 +345,8 @@ static void populate_physmap(struct memop_args *a)
unsigned int scrub_start = 0;
unsigned int memflags =
a->memflags | (d->creation_finished ? 0
- : MEMF_no_scrub);
+ : (MEMF_no_scrub |
+ MEMF_keep_scrub));
nodeid_t node =
(a->memflags & MEMF_exact_node) ? MEMF_get_node(a->memflags)
: NUMA_NO_NODE;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 588b5b99cbc7..1316dfbd15ee 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -989,6 +989,8 @@ static struct page_info *alloc_heap_pages(
ASSERT(zone_lo <= zone_hi);
ASSERT(zone_hi < NR_ZONES);
+ ASSERT(!(memflags & MEMF_keep_scrub) || (memflags & MEMF_no_scrub));
+
if ( unlikely(order > MAX_ORDER) )
return NULL;
@@ -1110,17 +1112,26 @@ static struct page_info *alloc_heap_pages(
{
bool cold = d && d != current->domain;
- for ( i = 0; i < (1U << order); i++ )
+ if ( !(memflags & MEMF_no_scrub) )
{
- if ( test_and_clear_bit(_PGC_need_scrub, &pg[i].count_info) )
+ for ( i = 0; i < (1U << order); i++ )
{
- if ( !(memflags & MEMF_no_scrub) )
+ if ( test_and_clear_bit(_PGC_need_scrub, &pg[i].count_info) )
+ {
scrub_one_page(&pg[i], cold);
-
- dirty_cnt++;
+ dirty_cnt++;
+ }
+ else
+ check_one_page(&pg[i]);
}
- else if ( !(memflags & MEMF_no_scrub) )
- check_one_page(&pg[i]);
+ }
+ else
+ {
+ for ( i = 0; i < (1U << order); i++ )
+ if ( (memflags & MEMF_keep_scrub)
+ ? test_bit(_PGC_need_scrub, &pg[i].count_info)
+ : test_and_clear_bit(_PGC_need_scrub, &pg[i].count_info) )
+ dirty_cnt++;
}
if ( dirty_cnt )
@@ -2696,8 +2707,10 @@ struct page_info *alloc_domheap_pages(
for ( i = 0; i < (1UL << order); i++ )
{
- ASSERT(!pg[i].count_info);
- pg[i].count_info = PGC_extra;
+ ASSERT(!(pg[i].count_info &
+ ~((memflags & MEMF_keep_scrub) ? PGC_need_scrub
+ : 0UL)));
+ pg[i].count_info |= PGC_extra;
}
}
if ( assign_page(pg, order, d, memflags) )
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index d80bfba6d393..0639fc0d21fb 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -208,6 +208,8 @@ struct npfec {
#define MEMF_no_refcount (1U<<_MEMF_no_refcount)
#define _MEMF_populate_on_demand 1
#define MEMF_populate_on_demand (1U<<_MEMF_populate_on_demand)
+#define _MEMF_keep_scrub 2
+#define MEMF_keep_scrub (1U<<_MEMF_keep_scrub)
#define _MEMF_no_dma 3
#define MEMF_no_dma (1U<<_MEMF_no_dma)
#define _MEMF_exact_node 4
--
2.51.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed
2026-03-25 10:08 [PATCH 0/2] xen/mm: fix fallout from populate_physmap() deferred scrub change Roger Pau Monne
2026-03-25 10:08 ` [PATCH 1/2] xen/mm: don't unconditionally clear PGC_need_scrub in alloc_heap_pages() Roger Pau Monne
@ 2026-03-25 10:08 ` Roger Pau Monne
2026-03-25 14:56 ` Jan Beulich
1 sibling, 1 reply; 12+ messages in thread
From: Roger Pau Monne @ 2026-03-25 10:08 UTC (permalink / raw)
To: xen-devel
Cc: Roger Pau Monne, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Stefano Stabellini
Assigning pages to a domain make them the possible target of hypercalls
like XENMEM_decrease_reservation ahead of such pages being scrubbed in
populate_physmap() when the guest is running in PV mode. This might allow
pages to be freed ahead of being scrubbed for example, as a stubdomain
already running could target them by guessing their MFNs. It's also
possible other action could set the page type ahead of scrubbing, which
would be problematic.
Prevent the pages pending scrub from being assigned to the domain, and only
do the assign once the scrubbing has finished. This has the disadvantage
that the allocated pages will be removed from the free pool, but not yet
accounted towards the domain consumed page quota. However there can only
be one stashed page in that state, and it's maximum size is bounded by the
memop-max-order option. This is not too different from the current logic,
where assigning pages to a domain (and thus checking whether such domain
doesn't overflow it's quota) is also done after the memory has been
allocated and removed from the pool of free pages.
Fixes: 83a784a15b47 ("xen/mm: allow deferred scrub of physmap populate allocated pages")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I've attempted various different ways to solve this, but they all ended up
being impossible.
* Prevent non-scrubbed pages from getting extra refcounts (iow: make
get_page() fail for them). This seemed nice, but the cleanup using
put_page_alloc_ref() was impossible as non-scrubbed pages would return
failure in get_page(), and so I couldn't take the extra reference ahead
of calling put_page_alloc_ref().
* Disallow XENMEM_decrease_reservation until the domain has finished
creation would fix the issue of pages being freed while pending scrub,
but it's not clear there might be other usages that would be problematic,
as get_page() on non-scrubbed pages would still return success.
---
xen/common/memory.c | 6 ++++++
xen/common/page_alloc.c | 8 +++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/xen/common/memory.c b/xen/common/memory.c
index f0ff1311881c..1ad4b51c5b02 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -388,6 +388,12 @@ static void populate_physmap(struct memop_args *a)
goto out;
}
}
+
+ if ( assign_page(page, a->extent_order, d, memflags) )
+ {
+ free_domheap_pages(page, a->extent_order);
+ goto out;
+ }
}
if ( unlikely(a->memflags & MEMF_no_tlbflush) )
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 1316dfbd15ee..b72a74c705ba 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2699,7 +2699,13 @@ struct page_info *alloc_domheap_pages(
memflags, d)) == NULL)) )
return NULL;
- if ( d && !(memflags & MEMF_no_owner) )
+ /*
+ * Don't add pages with the PGC_need_scrub bit set to the domain, the
+ * caller must clean the bit and then manually call assign_pages().
+ * Otherwise pages with the PGC_need_scrub would be reachable using
+ * get_page().
+ */
+ if ( d && !(memflags & MEMF_no_owner) && !(memflags & MEMF_keep_scrub) )
{
if ( memflags & MEMF_no_refcount )
{
--
2.51.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] xen/mm: don't unconditionally clear PGC_need_scrub in alloc_heap_pages()
2026-03-25 10:08 ` [PATCH 1/2] xen/mm: don't unconditionally clear PGC_need_scrub in alloc_heap_pages() Roger Pau Monne
@ 2026-03-25 13:37 ` Jan Beulich
2026-03-25 14:15 ` Ayden Bottos
0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2026-03-25 13:37 UTC (permalink / raw)
To: Roger Pau Monne
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, Ayden Bottos, xen-devel
On 25.03.2026 11:08, Roger Pau Monne wrote:
> alloc_heap_pages() will unconditionally clear PGC_need_scrub, even when
> MEMF_no_scrub is requested. This is kind of expected as otherwise some
> callers will assert on seeing non-expected flags set on the count_info
> field.
>
> Introduce a new MEMF bit to signal to alloc_heap_pages() that non-scrubbed
> pages should keep the PGC_need_scrub bit set. This fixes returning dirty
> pages from alloc_domheap_pages() without the PGC_need_scrub bit set for
> populate_physmap() to consume.
>
> With the above change alloc_domheap_pages() needs an adjustment to cope
> with allocated pages possibly having the PGC_need_scrub set.
>
> Fixes: 83a784a15b47 ("xen/mm: allow deferred scrub of physmap populate allocated pages")
> Reported-by: Ayden Bottos <aydenbottos12@gmail.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one nit (minor request) at the bottom.
> ---
> This issue was initially reported to the Xen Security Team, and it did turn
> out to not require an XSA only because the code hasn't been part of any
> release, otherwise an XSA would have been issued.
>
> The Security Team would like to thanks Ayden for the prompt report.
>
> In the scrubbing loop in alloc_heap_pages() i should better be unsigned
> long.
This issue is wider than just that function. As long as MAX_ORDER <= BITS_PER_INT,
I think we could have all such loops consistently use unsigned int induction
variables. But of course switching to unsigned long would be okay as well, just
perhaps a little less efficient on (at least) x86. My main wish would be for all
of those variables to be consistent in type (and hence all involved literal
number suffixes also being consistently U or UL).
> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -208,6 +208,8 @@ struct npfec {
> #define MEMF_no_refcount (1U<<_MEMF_no_refcount)
> #define _MEMF_populate_on_demand 1
> #define MEMF_populate_on_demand (1U<<_MEMF_populate_on_demand)
> +#define _MEMF_keep_scrub 2
> +#define MEMF_keep_scrub (1U<<_MEMF_keep_scrub)
> #define _MEMF_no_dma 3
> #define MEMF_no_dma (1U<<_MEMF_no_dma)
> #define _MEMF_exact_node 4
Irrespective of all the similar issues in surrounding code, may I ask that << be
surrounded by blanks in the new addition, to conform to ./CODING_STYLE?
As an aside, I wonder whether we really need the separate _MEMF_keep_scrub, but
the same likely applies to most other _MEMF_*.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] xen/mm: don't unconditionally clear PGC_need_scrub in alloc_heap_pages()
2026-03-25 13:37 ` Jan Beulich
@ 2026-03-25 14:15 ` Ayden Bottos
0 siblings, 0 replies; 12+ messages in thread
From: Ayden Bottos @ 2026-03-25 14:15 UTC (permalink / raw)
To: Jan Beulich
Cc: Roger Pau Monne, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, xen-devel
This looks good to me. I would also add a brief comment in mm.h to
make the contract clearer for future callers: MEMF_keep_scrub is an
internal allocator flag and only valid together with MEMF_no_scrub.
On Thu, Mar 26, 2026 at 12:37 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 25.03.2026 11:08, Roger Pau Monne wrote:
> > alloc_heap_pages() will unconditionally clear PGC_need_scrub, even when
> > MEMF_no_scrub is requested. This is kind of expected as otherwise some
> > callers will assert on seeing non-expected flags set on the count_info
> > field.
> >
> > Introduce a new MEMF bit to signal to alloc_heap_pages() that non-scrubbed
> > pages should keep the PGC_need_scrub bit set. This fixes returning dirty
> > pages from alloc_domheap_pages() without the PGC_need_scrub bit set for
> > populate_physmap() to consume.
> >
> > With the above change alloc_domheap_pages() needs an adjustment to cope
> > with allocated pages possibly having the PGC_need_scrub set.
> >
> > Fixes: 83a784a15b47 ("xen/mm: allow deferred scrub of physmap populate allocated pages")
> > Reported-by: Ayden Bottos <aydenbottos12@gmail.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with one nit (minor request) at the bottom.
>
> > ---
> > This issue was initially reported to the Xen Security Team, and it did turn
> > out to not require an XSA only because the code hasn't been part of any
> > release, otherwise an XSA would have been issued.
> >
> > The Security Team would like to thanks Ayden for the prompt report.
> >
> > In the scrubbing loop in alloc_heap_pages() i should better be unsigned
> > long.
>
> This issue is wider than just that function. As long as MAX_ORDER <= BITS_PER_INT,
> I think we could have all such loops consistently use unsigned int induction
> variables. But of course switching to unsigned long would be okay as well, just
> perhaps a little less efficient on (at least) x86. My main wish would be for all
> of those variables to be consistent in type (and hence all involved literal
> number suffixes also being consistently U or UL).
>
> > --- a/xen/include/xen/mm.h
> > +++ b/xen/include/xen/mm.h
> > @@ -208,6 +208,8 @@ struct npfec {
> > #define MEMF_no_refcount (1U<<_MEMF_no_refcount)
> > #define _MEMF_populate_on_demand 1
> > #define MEMF_populate_on_demand (1U<<_MEMF_populate_on_demand)
> > +#define _MEMF_keep_scrub 2
> > +#define MEMF_keep_scrub (1U<<_MEMF_keep_scrub)
> > #define _MEMF_no_dma 3
> > #define MEMF_no_dma (1U<<_MEMF_no_dma)
> > #define _MEMF_exact_node 4
>
> Irrespective of all the similar issues in surrounding code, may I ask that << be
> surrounded by blanks in the new addition, to conform to ./CODING_STYLE?
>
> As an aside, I wonder whether we really need the separate _MEMF_keep_scrub, but
> the same likely applies to most other _MEMF_*.
>
> Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed
2026-03-25 10:08 ` [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed Roger Pau Monne
@ 2026-03-25 14:56 ` Jan Beulich
2026-03-25 16:26 ` Roger Pau Monné
0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2026-03-25 14:56 UTC (permalink / raw)
To: Roger Pau Monne
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, xen-devel
On 25.03.2026 11:08, Roger Pau Monne wrote:
> ---
> I've attempted various different ways to solve this, but they all ended up
> being impossible.
>
> * Prevent non-scrubbed pages from getting extra refcounts (iow: make
> get_page() fail for them). This seemed nice, but the cleanup using
> put_page_alloc_ref() was impossible as non-scrubbed pages would return
> failure in get_page(), and so I couldn't take the extra reference ahead
> of calling put_page_alloc_ref().
A special-case variant of get_page() could be introduced, but maybe that
would still be overly fragile.
When we discussed this, what I had proposed didn't require use of get_page()
though. assign_pages() would install two general references (plus one type
ref for PGT_writable) in this special case. To free, you'd call
put_page_alloc_ref() followed by put_page_and_type().
That said, the patch here is still less intrusive than I feared, so I'm not
asking to re-work this again.
> * Disallow XENMEM_decrease_reservation until the domain has finished
> creation would fix the issue of pages being freed while pending scrub,
> but it's not clear there might be other usages that would be problematic,
> as get_page() on non-scrubbed pages would still return success.
I agree this is of concern.
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -388,6 +388,12 @@ static void populate_physmap(struct memop_args *a)
> goto out;
> }
> }
> +
> + if ( assign_page(page, a->extent_order, d, memflags) )
> + {
> + free_domheap_pages(page, a->extent_order);
The pages don't have an owner set yet, so that function will go straight
to free_heap_pages(), needlessly passing "true" as last argument. Correct,
but (for large pages, which the stashing is about) highly inefficient.
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2699,7 +2699,13 @@ struct page_info *alloc_domheap_pages(
> memflags, d)) == NULL)) )
> return NULL;
>
> - if ( d && !(memflags & MEMF_no_owner) )
> + /*
> + * Don't add pages with the PGC_need_scrub bit set to the domain, the
> + * caller must clean the bit and then manually call assign_pages().
> + * Otherwise pages with the PGC_need_scrub would be reachable using
> + * get_page().
> + */
How about replacing the latter "with the PGC_need_scrub" by "still subject
to scrubbing"?
> + if ( d && !(memflags & MEMF_no_owner) && !(memflags & MEMF_keep_scrub) )
> {
> if ( memflags & MEMF_no_refcount )
> {
This no-refcount code isn't repeated at the new call site of assign_page().
It's not needed there, yes, but wouldn't we better allow this to be taken
care of right here, moving the MEMF_keep_scrub check immediately ahead of
the call to assign_page()?
Otherwise should we reject (much earlier) MEMF_no_refcount used together
with MEMF_keep_scrub?
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed
2026-03-25 14:56 ` Jan Beulich
@ 2026-03-25 16:26 ` Roger Pau Monné
2026-03-25 16:54 ` Roger Pau Monné
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Roger Pau Monné @ 2026-03-25 16:26 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, xen-devel
On Wed, Mar 25, 2026 at 03:56:05PM +0100, Jan Beulich wrote:
> On 25.03.2026 11:08, Roger Pau Monne wrote:
> > ---
> > I've attempted various different ways to solve this, but they all ended up
> > being impossible.
> >
> > * Prevent non-scrubbed pages from getting extra refcounts (iow: make
> > get_page() fail for them). This seemed nice, but the cleanup using
> > put_page_alloc_ref() was impossible as non-scrubbed pages would return
> > failure in get_page(), and so I couldn't take the extra reference ahead
> > of calling put_page_alloc_ref().
>
> A special-case variant of get_page() could be introduced, but maybe that
> would still be overly fragile.
It seemed too much complexity (and risk), just to deal with this
scenario.
> When we discussed this, what I had proposed didn't require use of get_page()
> though. assign_pages() would install two general references (plus one type
> ref for PGT_writable) in this special case. To free, you'd call
> put_page_alloc_ref() followed by put_page_and_type().
Doesn't that risk under flowing the page counter if there's a parallel
call to decrease_reservation() against this MFN before?
How would the freeing done in populate_physmap() (in case of
concurrent calls) know whether already scrubbed pages have had it's
PGC_allocated bit dropped?
> That said, the patch here is still less intrusive than I feared, so I'm not
> asking to re-work this again.
Thanks.
> > * Disallow XENMEM_decrease_reservation until the domain has finished
> > creation would fix the issue of pages being freed while pending scrub,
> > but it's not clear there might be other usages that would be problematic,
> > as get_page() on non-scrubbed pages would still return success.
>
> I agree this is of concern.
>
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -388,6 +388,12 @@ static void populate_physmap(struct memop_args *a)
> > goto out;
> > }
> > }
> > +
> > + if ( assign_page(page, a->extent_order, d, memflags) )
> > + {
> > + free_domheap_pages(page, a->extent_order);
>
> The pages don't have an owner set yet, so that function will go straight
> to free_heap_pages(), needlessly passing "true" as last argument. Correct,
> but (for large pages, which the stashing is about) highly inefficient.
My bad, I was sure I was using the same freeing function as
alloc_domheap_pages() on failure to assign, but I clearly wasn't. I
will switch to using free_heap_pages().
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -2699,7 +2699,13 @@ struct page_info *alloc_domheap_pages(
> > memflags, d)) == NULL)) )
> > return NULL;
> >
> > - if ( d && !(memflags & MEMF_no_owner) )
> > + /*
> > + * Don't add pages with the PGC_need_scrub bit set to the domain, the
> > + * caller must clean the bit and then manually call assign_pages().
> > + * Otherwise pages with the PGC_need_scrub would be reachable using
> > + * get_page().
> > + */
>
> How about replacing the latter "with the PGC_need_scrub" by "still subject
> to scrubbing"?
Sure.
> > + if ( d && !(memflags & MEMF_no_owner) && !(memflags & MEMF_keep_scrub) )
> > {
> > if ( memflags & MEMF_no_refcount )
> > {
>
> This no-refcount code isn't repeated at the new call site of assign_page().
> It's not needed there, yes, but wouldn't we better allow this to be taken
> care of right here, moving the MEMF_keep_scrub check immediately ahead of
> the call to assign_page()?
>
> Otherwise should we reject (much earlier) MEMF_no_refcount used together
> with MEMF_keep_scrub?
hm, I was likely too focus on the specific use-case of
populate_physmap(), which is the only user of MEMF_keep_scrub. I've
noticed the MEMF_no_refcount bits here, but since populate_physmap()
never uses that flag I just left it as-is.
Will see about adjusting it.
Thanks, Roger.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed
2026-03-25 16:26 ` Roger Pau Monné
@ 2026-03-25 16:54 ` Roger Pau Monné
2026-03-26 8:18 ` Jan Beulich
2026-03-25 16:55 ` Jan Beulich
2026-03-26 8:35 ` Jan Beulich
2 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2026-03-25 16:54 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, xen-devel
On Wed, Mar 25, 2026 at 05:26:01PM +0100, Roger Pau Monné wrote:
> On Wed, Mar 25, 2026 at 03:56:05PM +0100, Jan Beulich wrote:
> > On 25.03.2026 11:08, Roger Pau Monne wrote:
> > > * Disallow XENMEM_decrease_reservation until the domain has finished
> > > creation would fix the issue of pages being freed while pending scrub,
> > > but it's not clear there might be other usages that would be problematic,
> > > as get_page() on non-scrubbed pages would still return success.
> >
> > I agree this is of concern.
> >
> > > --- a/xen/common/memory.c
> > > +++ b/xen/common/memory.c
> > > @@ -388,6 +388,12 @@ static void populate_physmap(struct memop_args *a)
> > > goto out;
> > > }
> > > }
> > > +
> > > + if ( assign_page(page, a->extent_order, d, memflags) )
> > > + {
> > > + free_domheap_pages(page, a->extent_order);
> >
> > The pages don't have an owner set yet, so that function will go straight
> > to free_heap_pages(), needlessly passing "true" as last argument. Correct,
> > but (for large pages, which the stashing is about) highly inefficient.
>
> My bad, I was sure I was using the same freeing function as
> alloc_domheap_pages() on failure to assign, but I clearly wasn't. I
> will switch to using free_heap_pages().
Coming back to this, I can export free_heap_pages(), but then the call
would also unconditionally have need_scrub == true, as the pages have
been allocated without scrubbing. So I don't see much benefit of
using free_heap_pages(), the more that it requires making such
function global then.
Thanks, Roger.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed
2026-03-25 16:26 ` Roger Pau Monné
2026-03-25 16:54 ` Roger Pau Monné
@ 2026-03-25 16:55 ` Jan Beulich
2026-03-26 8:35 ` Jan Beulich
2 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2026-03-25 16:55 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, xen-devel
On 25.03.2026 17:26, Roger Pau Monné wrote:
> On Wed, Mar 25, 2026 at 03:56:05PM +0100, Jan Beulich wrote:
>> On 25.03.2026 11:08, Roger Pau Monne wrote:
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -388,6 +388,12 @@ static void populate_physmap(struct memop_args *a)
>>> goto out;
>>> }
>>> }
>>> +
>>> + if ( assign_page(page, a->extent_order, d, memflags) )
>>> + {
>>> + free_domheap_pages(page, a->extent_order);
>>
>> The pages don't have an owner set yet, so that function will go straight
>> to free_heap_pages(), needlessly passing "true" as last argument. Correct,
>> but (for large pages, which the stashing is about) highly inefficient.
>
> My bad, I was sure I was using the same freeing function as
> alloc_domheap_pages() on failure to assign, but I clearly wasn't. I
> will switch to using free_heap_pages().
Well, not so much your bad, but likely a result of free_heap_pages()
being static in page_alloc.c.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed
2026-03-25 16:54 ` Roger Pau Monné
@ 2026-03-26 8:18 ` Jan Beulich
2026-03-26 8:55 ` Roger Pau Monné
0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2026-03-26 8:18 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, xen-devel
On 25.03.2026 17:54, Roger Pau Monné wrote:
> On Wed, Mar 25, 2026 at 05:26:01PM +0100, Roger Pau Monné wrote:
>> On Wed, Mar 25, 2026 at 03:56:05PM +0100, Jan Beulich wrote:
>>> On 25.03.2026 11:08, Roger Pau Monne wrote:
>>>> * Disallow XENMEM_decrease_reservation until the domain has finished
>>>> creation would fix the issue of pages being freed while pending scrub,
>>>> but it's not clear there might be other usages that would be problematic,
>>>> as get_page() on non-scrubbed pages would still return success.
>>>
>>> I agree this is of concern.
>>>
>>>> --- a/xen/common/memory.c
>>>> +++ b/xen/common/memory.c
>>>> @@ -388,6 +388,12 @@ static void populate_physmap(struct memop_args *a)
>>>> goto out;
>>>> }
>>>> }
>>>> +
>>>> + if ( assign_page(page, a->extent_order, d, memflags) )
>>>> + {
>>>> + free_domheap_pages(page, a->extent_order);
>>>
>>> The pages don't have an owner set yet, so that function will go straight
>>> to free_heap_pages(), needlessly passing "true" as last argument. Correct,
>>> but (for large pages, which the stashing is about) highly inefficient.
>>
>> My bad, I was sure I was using the same freeing function as
>> alloc_domheap_pages() on failure to assign, but I clearly wasn't. I
>> will switch to using free_heap_pages().
>
> Coming back to this, I can export free_heap_pages(), but then the call
> would also unconditionally have need_scrub == true, as the pages have
> been allocated without scrubbing.
But the assign_page() call is here to have the scrubbing done ahead of
it, so re-scrubbing after freeing shouldn't be necessary?
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed
2026-03-25 16:26 ` Roger Pau Monné
2026-03-25 16:54 ` Roger Pau Monné
2026-03-25 16:55 ` Jan Beulich
@ 2026-03-26 8:35 ` Jan Beulich
2 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2026-03-26 8:35 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, xen-devel
On 25.03.2026 17:26, Roger Pau Monné wrote:
> On Wed, Mar 25, 2026 at 03:56:05PM +0100, Jan Beulich wrote:
>> On 25.03.2026 11:08, Roger Pau Monne wrote:
>>> ---
>>> I've attempted various different ways to solve this, but they all ended up
>>> being impossible.
>>>
>>> * Prevent non-scrubbed pages from getting extra refcounts (iow: make
>>> get_page() fail for them). This seemed nice, but the cleanup using
>>> put_page_alloc_ref() was impossible as non-scrubbed pages would return
>>> failure in get_page(), and so I couldn't take the extra reference ahead
>>> of calling put_page_alloc_ref().
>>
>> A special-case variant of get_page() could be introduced, but maybe that
>> would still be overly fragile.
>
> It seemed too much complexity (and risk), just to deal with this
> scenario.
>
>> When we discussed this, what I had proposed didn't require use of get_page()
>> though. assign_pages() would install two general references (plus one type
>> ref for PGT_writable) in this special case. To free, you'd call
>> put_page_alloc_ref() followed by put_page_and_type().
>
> Doesn't that risk under flowing the page counter if there's a parallel
> call to decrease_reservation() against this MFN before?
>
> How would the freeing done in populate_physmap() (in case of
> concurrent calls) know whether already scrubbed pages have had it's
> PGC_allocated bit dropped?
In that case put_page_alloc_ref() simply does nothing. That's why we have
this wrapper: To avoid open-coding the same check in many places.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed
2026-03-26 8:18 ` Jan Beulich
@ 2026-03-26 8:55 ` Roger Pau Monné
0 siblings, 0 replies; 12+ messages in thread
From: Roger Pau Monné @ 2026-03-26 8:55 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, xen-devel
On Thu, Mar 26, 2026 at 09:18:14AM +0100, Jan Beulich wrote:
> On 25.03.2026 17:54, Roger Pau Monné wrote:
> > On Wed, Mar 25, 2026 at 05:26:01PM +0100, Roger Pau Monné wrote:
> >> On Wed, Mar 25, 2026 at 03:56:05PM +0100, Jan Beulich wrote:
> >>> On 25.03.2026 11:08, Roger Pau Monne wrote:
> >>>> * Disallow XENMEM_decrease_reservation until the domain has finished
> >>>> creation would fix the issue of pages being freed while pending scrub,
> >>>> but it's not clear there might be other usages that would be problematic,
> >>>> as get_page() on non-scrubbed pages would still return success.
> >>>
> >>> I agree this is of concern.
> >>>
> >>>> --- a/xen/common/memory.c
> >>>> +++ b/xen/common/memory.c
> >>>> @@ -388,6 +388,12 @@ static void populate_physmap(struct memop_args *a)
> >>>> goto out;
> >>>> }
> >>>> }
> >>>> +
> >>>> + if ( assign_page(page, a->extent_order, d, memflags) )
> >>>> + {
> >>>> + free_domheap_pages(page, a->extent_order);
> >>>
> >>> The pages don't have an owner set yet, so that function will go straight
> >>> to free_heap_pages(), needlessly passing "true" as last argument. Correct,
> >>> but (for large pages, which the stashing is about) highly inefficient.
> >>
> >> My bad, I was sure I was using the same freeing function as
> >> alloc_domheap_pages() on failure to assign, but I clearly wasn't. I
> >> will switch to using free_heap_pages().
> >
> > Coming back to this, I can export free_heap_pages(), but then the call
> > would also unconditionally have need_scrub == true, as the pages have
> > been allocated without scrubbing.
>
> But the assign_page() call is here to have the scrubbing done ahead of
> it, so re-scrubbing after freeing shouldn't be necessary?
I think I've done what you suggested in patch 3 of the v2. For the
call here, yes, we could entirely avoid the scrubbing. For the other
free_domheap_pages() calls in stash_allocation() and
get_stashed_allocation() respectively we need to be more careful as
some pages will still be pending scrub.
Thanks, Roger.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-03-26 8:55 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25 10:08 [PATCH 0/2] xen/mm: fix fallout from populate_physmap() deferred scrub change Roger Pau Monne
2026-03-25 10:08 ` [PATCH 1/2] xen/mm: don't unconditionally clear PGC_need_scrub in alloc_heap_pages() Roger Pau Monne
2026-03-25 13:37 ` Jan Beulich
2026-03-25 14:15 ` Ayden Bottos
2026-03-25 10:08 ` [PATCH 2/2] xen/mm: do not assign pages to a domain until they are scrubbed Roger Pau Monne
2026-03-25 14:56 ` Jan Beulich
2026-03-25 16:26 ` Roger Pau Monné
2026-03-25 16:54 ` Roger Pau Monné
2026-03-26 8:18 ` Jan Beulich
2026-03-26 8:55 ` Roger Pau Monné
2026-03-25 16:55 ` Jan Beulich
2026-03-26 8:35 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.