* [PATCH v2 1/7] xen/page_alloc: Simplify domain_adjust_tot_pages for future changes
2025-08-16 11:19 [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Bernhard Kaindl
@ 2025-08-16 11:19 ` Bernhard Kaindl
2025-08-26 7:59 ` Jan Beulich
2025-08-16 11:19 ` [PATCH v2 2/7] xen/page_alloc: Remove `claim` from domain_set_outstanding_pages() Bernhard Kaindl
` (6 subsequent siblings)
7 siblings, 1 reply; 13+ messages in thread
From: Bernhard Kaindl @ 2025-08-16 11:19 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Alejandro Vallejo
domain_adjust_tot_pages() is used to update the domain's total pages
after allocating and freeing memory.
Simplify the design for updating it for single and even more so
for multi-node claims regarding the case where we could have
allocated more memory than we had claims left.
Replace it with min() to avoid reducing the outstadings claims
by more than we had left to claim:
When domain memory is freed, we skip changing the claim. Thus, this
only handles reducing the claims after allocating. So, min() is fine.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Cc: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
xen/common/page_alloc.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index ec9dec365e..e1ac22b9ed 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -510,8 +510,14 @@ static unsigned long avail_heap_pages(
return free_pages;
}
+/*
+ * Update the total number of pages and outstanding claims of a domain.
+ * - When pages were freed, we do not increase outstanding claims.
+ */
unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
{
+ unsigned long adjustment;
+
ASSERT(rspin_is_locked(&d->page_alloc_lock));
d->tot_pages += pages;
@@ -519,23 +525,22 @@ unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
* can test d->outstanding_pages race-free because it can only change
* if d->page_alloc_lock and heap_lock are both held, see also
* domain_set_outstanding_pages below
+ *
+ * If the domain has no outstanding claims (or we freed pages instead),
+ * we don't update outstanding claims and skip the claims adjustment.
*/
if ( !d->outstanding_pages || pages <= 0 )
goto out;
spin_lock(&heap_lock);
BUG_ON(outstanding_claims < d->outstanding_pages);
- if ( d->outstanding_pages < pages )
- {
- /* `pages` exceeds the domain's outstanding count. Zero it out. */
- outstanding_claims -= d->outstanding_pages;
- d->outstanding_pages = 0;
- }
- else
- {
- outstanding_claims -= pages;
- d->outstanding_pages -= pages;
- }
+ /*
+ * Reduce claims by outstanding claims or pages (whichever is smaller):
+ * If allocated > outstanding, reduce the claims only by outstanding pages.
+ */
+ adjustment = min(d->outstanding_pages, (unsigned int)pages);
+ d->outstanding_pages -= adjustment;
+ outstanding_claims -= adjustment;
spin_unlock(&heap_lock);
out:
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v2 1/7] xen/page_alloc: Simplify domain_adjust_tot_pages for future changes
2025-08-16 11:19 ` [PATCH v2 1/7] xen/page_alloc: Simplify domain_adjust_tot_pages for future changes Bernhard Kaindl
@ 2025-08-26 7:59 ` Jan Beulich
0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2025-08-26 7:59 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, Alejandro Vallejo,
xen-devel
On 16.08.2025 13:19, Bernhard Kaindl wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -510,8 +510,14 @@ static unsigned long avail_heap_pages(
> return free_pages;
> }
>
> +/*
> + * Update the total number of pages and outstanding claims of a domain.
> + * - When pages were freed, we do not increase outstanding claims.
> + */
If already you add such a comment, please have it be complete: There's
also an update to the global "outstanding_claims" in here.
> unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
> {
> + unsigned long adjustment;
> +
> ASSERT(rspin_is_locked(&d->page_alloc_lock));
> d->tot_pages += pages;
>
> @@ -519,23 +525,22 @@ unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
> * can test d->outstanding_pages race-free because it can only change
> * if d->page_alloc_lock and heap_lock are both held, see also
> * domain_set_outstanding_pages below
> + *
> + * If the domain has no outstanding claims (or we freed pages instead),
> + * we don't update outstanding claims and skip the claims adjustment.
> */
> if ( !d->outstanding_pages || pages <= 0 )
> goto out;
>
> spin_lock(&heap_lock);
> BUG_ON(outstanding_claims < d->outstanding_pages);
> - if ( d->outstanding_pages < pages )
> - {
> - /* `pages` exceeds the domain's outstanding count. Zero it out. */
> - outstanding_claims -= d->outstanding_pages;
> - d->outstanding_pages = 0;
> - }
> - else
> - {
> - outstanding_claims -= pages;
> - d->outstanding_pages -= pages;
> - }
> + /*
> + * Reduce claims by outstanding claims or pages (whichever is smaller):
> + * If allocated > outstanding, reduce the claims only by outstanding pages.
> + */
> + adjustment = min(d->outstanding_pages, (unsigned int)pages);
This would be all fine if there wasn't the cast. It's only a latent problem,
yes, but I think we still would better avoid introducing such. Imo this wants
to be
adjustment = min_t(unsigned long, d->outstanding_pages, pages);
or the equivalent
adjustment = min(d->outstanding_pages + 0UL, pages + 0UL);
(personally I'd prefer the latter despite its odd look, for not involving
any casts).
Jan
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 2/7] xen/page_alloc: Remove `claim` from domain_set_outstanding_pages()
2025-08-16 11:19 [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Bernhard Kaindl
2025-08-16 11:19 ` [PATCH v2 1/7] xen/page_alloc: Simplify domain_adjust_tot_pages for future changes Bernhard Kaindl
@ 2025-08-16 11:19 ` Bernhard Kaindl
2025-08-26 8:20 ` Jan Beulich
2025-08-16 11:19 ` [PATCH v2 3/7] xen/page_alloc: Add static per-NUMA-node counts of free pages Bernhard Kaindl
` (5 subsequent siblings)
7 siblings, 1 reply; 13+ messages in thread
From: Bernhard Kaindl @ 2025-08-16 11:19 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Alejandro Vallejo
With a single global count for the claims it is easy to substract
domain_tot_pages() from the claim so the number given in the hypercall
is the real reservation of the domain. This is the current behaviour.
However, a later patch introduces node-specific claims and those interact
very poorly with such a scheme. Since accounting domain_tot_pages() in
one case but not the other seems strictly worse than not accounting them
at all (which is at least consistent), this patch stops substracting
tot_pages from the claim and instead checks that claimed memory +
allocated memory don't exceed max_mem.
Arguably it's also clearer for the caller to align the amount of claimed
memory with that of the requested claim. xl/libxenguest code never updated
an existing claim: It stakes a claim, allocates all domain memory, cancels
a possible leftover claim, finishes building the domain and unpauses it.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
xen/common/page_alloc.c | 19 ++++++-------------
1 file changed, 6 insertions(+), 13 deletions(-)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index e1ac22b9ed..7e90b9cc1e 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -550,7 +550,7 @@ out:
int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
{
int ret = -ENOMEM;
- unsigned long claim, avail_pages;
+ unsigned long avail_pages;
/*
* take the domain's page_alloc_lock, else all d->tot_page adjustments
@@ -576,28 +576,21 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
goto out;
}
- /* disallow a claim not exceeding domain_tot_pages() or above max_pages */
- if ( (pages <= domain_tot_pages(d)) || (pages > d->max_pages) )
+ /* Don't claim past max_pages */
+ if ( (domain_tot_pages(d) + pages) > d->max_pages )
{
ret = -EINVAL;
goto out;
}
/* how much memory is available? */
- avail_pages = total_avail_pages;
+ avail_pages = total_avail_pages - outstanding_claims;
- avail_pages -= outstanding_claims;
-
- /*
- * Note, if domain has already allocated memory before making a claim
- * then the claim must take domain_tot_pages() into account
- */
- claim = pages - domain_tot_pages(d);
- if ( claim > avail_pages )
+ if ( pages > avail_pages )
goto out;
/* yay, claim fits in available memory, stake the claim, success! */
- d->outstanding_pages = claim;
+ d->outstanding_pages = pages;
outstanding_claims += d->outstanding_pages;
ret = 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v2 2/7] xen/page_alloc: Remove `claim` from domain_set_outstanding_pages()
2025-08-16 11:19 ` [PATCH v2 2/7] xen/page_alloc: Remove `claim` from domain_set_outstanding_pages() Bernhard Kaindl
@ 2025-08-26 8:20 ` Jan Beulich
0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2025-08-26 8:20 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, Alejandro Vallejo,
xen-devel
On 16.08.2025 13:19, Bernhard Kaindl wrote:
> With a single global count for the claims it is easy to substract
> domain_tot_pages() from the claim so the number given in the hypercall
> is the real reservation of the domain. This is the current behaviour.
>
> However, a later patch introduces node-specific claims and those interact
> very poorly with such a scheme. Since accounting domain_tot_pages() in
> one case but not the other seems strictly worse than not accounting them
> at all (which is at least consistent), this patch stops substracting
> tot_pages from the claim and instead checks that claimed memory +
> allocated memory don't exceed max_mem.
>
> Arguably it's also clearer for the caller to align the amount of claimed
> memory with that of the requested claim. xl/libxenguest code never updated
> an existing claim: It stakes a claim, allocates all domain memory, cancels
> a possible leftover claim, finishes building the domain and unpauses it.
>
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
Is this order (and the lack of From:) correct? A patch of the same title was
submitted by Alejandro at some point. Additionally the cover letter lists
this one patch as the sole Alejandro-only one. I'm also uncertain if you may
freely alter the original S-o-b, which was still having his @cloud.com email
address afaict.
> ---
> xen/common/page_alloc.c | 19 ++++++-------------
> 1 file changed, 6 insertions(+), 13 deletions(-)
From eyeballing both patches nothing has changed. That would support the
tagging as Alejandro-only in the cover letter, but it also means review
comments weren't addressed. Such non-addressing would, however, require a
verbal reply to those review comments, which I can't find any record of.
Instead in a reply to Roger's comments Alejandro indicated that there
indeed was an oversight on his part. My separate comment wasn't replied to
at all.
Jan
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 3/7] xen/page_alloc: Add static per-NUMA-node counts of free pages
2025-08-16 11:19 [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Bernhard Kaindl
2025-08-16 11:19 ` [PATCH v2 1/7] xen/page_alloc: Simplify domain_adjust_tot_pages for future changes Bernhard Kaindl
2025-08-16 11:19 ` [PATCH v2 2/7] xen/page_alloc: Remove `claim` from domain_set_outstanding_pages() Bernhard Kaindl
@ 2025-08-16 11:19 ` Bernhard Kaindl
2025-08-26 8:27 ` Jan Beulich
2025-08-16 11:19 ` [PATCH v2 4/7] xen/page_alloc: Add node argument to domain_{adjust_tot_pages,set_outstanding_pages}() Bernhard Kaindl
` (4 subsequent siblings)
7 siblings, 1 reply; 13+ messages in thread
From: Bernhard Kaindl @ 2025-08-16 11:19 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Alejandro Vallejo
The static per-NUMA-node count of free pages is the sum of free memory
in all zones of a node. It's an optimisation to avoid doing that operation
frequently in the following patches that introduce per-NUMA-node claims.
---
Changed since v1:
- Added ASSERT(per_node_avail_pages[node] >= request) as requested
during review by Roger: Comment by me: As we have
ASSERT(avail[node][zone] >= request);
directly before it, request is already valid, so this checks
that per_node_avail_pages[node] is not mis-accounted too low.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
xen/common/page_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 7e90b9cc1e..43de9296fd 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -486,6 +486,9 @@ static unsigned long node_need_scrub[MAX_NUMNODES];
static unsigned long *avail[MAX_NUMNODES];
static long total_avail_pages;
+/* Per-NUMA-node counts of free pages */
+static unsigned long per_node_avail_pages[MAX_NUMNODES];
+
static DEFINE_SPINLOCK(heap_lock);
static long outstanding_claims; /* total outstanding claims by all domains */
@@ -1066,6 +1069,8 @@ static struct page_info *alloc_heap_pages(
ASSERT(avail[node][zone] >= request);
avail[node][zone] -= request;
+ ASSERT(per_node_avail_pages[node] >= request);
+ per_node_avail_pages[node] -= request;
total_avail_pages -= request;
ASSERT(total_avail_pages >= 0);
@@ -1226,6 +1231,8 @@ static int reserve_offlined_page(struct page_info *head)
continue;
avail[node][zone]--;
+ ASSERT(per_node_avail_pages[node] > 0);
+ per_node_avail_pages[node]--;
total_avail_pages--;
ASSERT(total_avail_pages >= 0);
@@ -1550,6 +1557,7 @@ static void free_heap_pages(
}
avail[node][zone] += 1 << order;
+ per_node_avail_pages[node] += 1 << order;
total_avail_pages += 1 << order;
if ( need_scrub )
{
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v2 3/7] xen/page_alloc: Add static per-NUMA-node counts of free pages
2025-08-16 11:19 ` [PATCH v2 3/7] xen/page_alloc: Add static per-NUMA-node counts of free pages Bernhard Kaindl
@ 2025-08-26 8:27 ` Jan Beulich
0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2025-08-26 8:27 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, Alejandro Vallejo,
xen-devel
On 16.08.2025 13:19, Bernhard Kaindl wrote:
> The static per-NUMA-node count of free pages is the sum of free memory
> in all zones of a node. It's an optimisation to avoid doing that operation
> frequently in the following patches that introduce per-NUMA-node claims.
>
> ---
> Changed since v1:
> - Added ASSERT(per_node_avail_pages[node] >= request) as requested
> during review by Roger: Comment by me: As we have
> ASSERT(avail[node][zone] >= request);
> directly before it, request is already valid, so this checks
> that per_node_avail_pages[node] is not mis-accounted too low.
Okay, this addresses Roger's comment. What about mine, though?
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
Apart from similar concerns as for patch 2, these come too late. They
wouldn't end up in the commit message, due to the earlier --- separator.
Given the problems with the first three patches I'm going to stop review
here, expecting a tidied v3 to be submitted with all prior review
comments addressed (verbally or by respective adjustments).
Jan
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 4/7] xen/page_alloc: Add node argument to domain_{adjust_tot_pages,set_outstanding_pages}()
2025-08-16 11:19 [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Bernhard Kaindl
` (2 preceding siblings ...)
2025-08-16 11:19 ` [PATCH v2 3/7] xen/page_alloc: Add static per-NUMA-node counts of free pages Bernhard Kaindl
@ 2025-08-16 11:19 ` Bernhard Kaindl
2025-08-16 11:19 ` [PATCH v2 5/7] xen/page_alloc: Create per-node outstanding claims Bernhard Kaindl
` (3 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Bernhard Kaindl @ 2025-08-16 11:19 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini,
Tamas K Lengyel, Alejandro Vallejo
domain_adjust_tot_pages() adjusts the total pages of a domain
after allocating or freeing memory. When allocating, it also
reduces the outstanding claims of a domain as pages are allocated.
When adding the node argument, we pass the node from which the pages
were allocated. When pages are freed, we simply pass NUMA_NO_NODE as
when freeing, the outstanding claims are not updated anyways.
domain_set_outstanding_pages() sets the amount of outstanding claims
of a domain. We pass the node on which to to stake an claim,
or NUMA_NO_NODE for host-wide claims.
No functional change, as neither function uses the arguments
for anything yet. It's a prerequisite to simplify for the next
patch that introduces per-node claim counts.
Changed since v1:
- Fix for the correct indentation of line with '-dec_count));'
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
xen/arch/x86/mm.c | 3 ++-
xen/arch/x86/mm/mem_sharing.c | 4 ++--
xen/common/domain.c | 2 +-
xen/common/grant_table.c | 4 ++--
xen/common/memory.c | 6 ++++--
xen/common/page_alloc.c | 17 ++++++++++++-----
xen/include/xen/mm.h | 6 ++++--
7 files changed, 27 insertions(+), 15 deletions(-)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e7fd56c7ce..effc67c6ba 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4444,7 +4444,8 @@ int steal_page(
page_list_del(page, &d->page_list);
/* Unlink from original owner. */
- if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )
+ if ( !(memflags & MEMF_no_refcount) &&
+ !domain_adjust_tot_pages(d, NUMA_NO_NODE, -1) )
drop_dom_ref = true;
nrspin_unlock(&d->page_alloc_lock);
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index da28266ef0..2551c0d86e 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -720,7 +720,7 @@ static int page_make_sharable(struct domain *d,
if ( !validate_only )
{
page_set_owner(page, dom_cow);
- drop_dom_ref = !domain_adjust_tot_pages(d, -1);
+ drop_dom_ref = !domain_adjust_tot_pages(d, NUMA_NO_NODE, -1);
page_list_del(page, &d->page_list);
}
@@ -766,7 +766,7 @@ static int page_make_private(struct domain *d, struct page_info *page)
ASSERT(page_get_owner(page) == dom_cow);
page_set_owner(page, d);
- if ( domain_adjust_tot_pages(d, 1) == 1 )
+ if ( domain_adjust_tot_pages(d, page_to_nid(page), 1) == 1 )
get_knownalive_domain(d);
page_list_add_tail(page, &d->page_list);
nrspin_unlock(&d->page_alloc_lock);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 5241a1629e..1beadb05e1 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1239,7 +1239,7 @@ int domain_kill(struct domain *d)
rspin_barrier(&d->domain_lock);
argo_destroy(d);
vnuma_destroy(d->vnuma);
- domain_set_outstanding_pages(d, 0);
+ domain_set_outstanding_pages(d, NUMA_NO_NODE, 0);
/* fallthrough */
case DOMDYING_dying:
rc = domain_teardown(d);
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index cf131c43a1..8fea75dbb2 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -2405,7 +2405,7 @@ gnttab_transfer(
}
/* Okay, add the page to 'e'. */
- if ( unlikely(domain_adjust_tot_pages(e, 1) == 1) )
+ if ( unlikely(domain_adjust_tot_pages(e, page_to_nid(page), 1) == 1) )
get_knownalive_domain(e);
/*
@@ -2431,7 +2431,7 @@ gnttab_transfer(
* page in the page total
*/
nrspin_lock(&e->page_alloc_lock);
- drop_dom_ref = !domain_adjust_tot_pages(e, -1);
+ drop_dom_ref = !domain_adjust_tot_pages(e, NUMA_NO_NODE, -1);
nrspin_unlock(&e->page_alloc_lock);
if ( okay /* i.e. e->is_dying due to the surrounding if() */ )
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 3688e6dd50..b8cf4bd23d 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -775,7 +775,8 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
nrspin_lock(&d->page_alloc_lock);
drop_dom_ref = (dec_count &&
- !domain_adjust_tot_pages(d, -dec_count));
+ !domain_adjust_tot_pages(d, NUMA_NO_NODE,
+ -dec_count));
nrspin_unlock(&d->page_alloc_lock);
if ( drop_dom_ref )
@@ -1682,7 +1683,8 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
rc = xsm_claim_pages(XSM_PRIV, d);
if ( !rc )
- rc = domain_set_outstanding_pages(d, reservation.nr_extents);
+ rc = domain_set_outstanding_pages(d, NUMA_NO_NODE,
+ reservation.nr_extents);
rcu_unlock_domain(d);
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 43de9296fd..e8ba21dc46 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -517,7 +517,8 @@ static unsigned long avail_heap_pages(
* Update the total number of pages and outstanding claims of a domain.
* - When pages were freed, we do not increase outstanding claims.
*/
-unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
+unsigned long domain_adjust_tot_pages(struct domain *d, nodeid_t node,
+ long pages)
{
unsigned long adjustment;
@@ -550,7 +551,8 @@ out:
return d->tot_pages;
}
-int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
+int domain_set_outstanding_pages(struct domain *d, nodeid_t node,
+ unsigned long pages)
{
int ret = -ENOMEM;
unsigned long avail_pages;
@@ -2620,6 +2622,8 @@ int assign_pages(
if ( !(memflags & MEMF_no_refcount) )
{
+ nodeid_t node = page_to_nid(&pg[0]);
+
if ( unlikely(d->tot_pages + nr < nr) )
{
gprintk(XENLOG_INFO,
@@ -2631,7 +2635,9 @@ int assign_pages(
goto out;
}
- if ( unlikely(domain_adjust_tot_pages(d, nr) == nr) )
+ ASSERT(node == page_to_nid(&pg[nr - 1]));
+
+ if ( unlikely(domain_adjust_tot_pages(d, node, nr) == nr) )
get_knownalive_domain(d);
}
@@ -2764,7 +2770,8 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
}
}
- drop_dom_ref = !domain_adjust_tot_pages(d, -(1 << order));
+ drop_dom_ref = !domain_adjust_tot_pages(d, NUMA_NO_NODE,
+ -(1 << order));
rspin_unlock(&d->page_alloc_lock);
@@ -2970,7 +2977,7 @@ void free_domstatic_page(struct page_info *page)
arch_free_heap_page(d, page);
- drop_dom_ref = !domain_adjust_tot_pages(d, -1);
+ drop_dom_ref = !domain_adjust_tot_pages(d, NUMA_NO_NODE, -1);
unprepare_staticmem_pages(page, 1, scrub_debug);
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 93c037d618..f977e73b1c 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -65,6 +65,7 @@
#include <xen/compiler.h>
#include <xen/mm-frame.h>
#include <xen/mm-types.h>
+#include <xen/numa.h>
#include <xen/types.h>
#include <xen/list.h>
#include <xen/spinlock.h>
@@ -130,8 +131,9 @@ mfn_t xen_map_to_mfn(unsigned long va);
int populate_pt_range(unsigned long virt, unsigned long nr_mfns);
/* Claim handling */
unsigned long __must_check domain_adjust_tot_pages(struct domain *d,
- long pages);
-int domain_set_outstanding_pages(struct domain *d, unsigned long pages);
+ nodeid_t node, long pages);
+int domain_set_outstanding_pages(struct domain *d, nodeid_t node,
+ unsigned long pages);
void get_outstanding_claims(uint64_t *free_pages, uint64_t *outstanding_pages);
/* Domain suballocator. These functions are *not* interrupt-safe.*/
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 5/7] xen/page_alloc: Create per-node outstanding claims
2025-08-16 11:19 [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Bernhard Kaindl
` (3 preceding siblings ...)
2025-08-16 11:19 ` [PATCH v2 4/7] xen/page_alloc: Add node argument to domain_{adjust_tot_pages,set_outstanding_pages}() Bernhard Kaindl
@ 2025-08-16 11:19 ` Bernhard Kaindl
2025-08-16 11:19 ` [PATCH v2 6/7] xen/page_alloc: Check per-node claims in alloc_heap_pages() Bernhard Kaindl
` (2 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Bernhard Kaindl @ 2025-08-16 11:19 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Marcus Granado, Alejandro Vallejo
Extend domain_set_outstanding_claims() to allow staking claims on a
specific NUMA node instead of host-wide:
A claim on a specific NUMA node is the amount of d->outstanding_claims
where the new field d->claim_node field is not NUMA_NO_NODE.
We use the most straightforward implementation to minimise the amount
of changes in this commit and the rest of the series: In the next series
that converts the claims handling the multi-node claims, this will of
course be converted into another structure. It helps to keep this
commit focused on the central challenge of the new type of claim and
leaves extending claims to multi-node claims for the next series.
Also extend get_free_buddy() for when it circles round-robin over nodes:
Make it skip NUMA nodes that do not have enough unclaimed memory left.
---
Changes since v1:
- Join all conditions into a single if clause
- Improve the function description and comments
- Use const when passing struct domain when applicable
- Renamed pernode_oc[] to per_node_outstanding_claims[]
- Reject invalid node IDs in domain_set_outstanding_pages()
- Use nodeid_t instead of unsigned int for the claim_node field.
- Removed dependency on MEMF_EXACT_NODE (checked in get_free_buddy())
- Added awareness for honoring NUMA claims to get_free_buddy()
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Marcus Granado <marcus.granado@cloud.com>
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
xen/common/page_alloc.c | 37 +++++++++++++++++++++++++++++++++++--
xen/include/xen/sched.h | 1 +
2 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index e8ba21dc46..63ecd74dcc 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -491,6 +491,7 @@ static unsigned long per_node_avail_pages[MAX_NUMNODES];
static DEFINE_SPINLOCK(heap_lock);
static long outstanding_claims; /* total outstanding claims by all domains */
+static unsigned long per_node_outstanding_claims[MAX_NUMNODES];
static unsigned long avail_heap_pages(
unsigned int zone_lo, unsigned int zone_hi, unsigned int node)
@@ -532,8 +533,12 @@ unsigned long domain_adjust_tot_pages(struct domain *d, nodeid_t node,
*
* If the domain has no outstanding claims (or we freed pages instead),
* we don't update outstanding claims and skip the claims adjustment.
+ *
+ * Also don't update outstanding claims when the domain has node-specific
+ * claims, but the memory allocation was from a different NUMA node.
*/
- if ( !d->outstanding_pages || pages <= 0 )
+ if ( !d->outstanding_pages || pages <= 0 ||
+ (d->claim_node != NUMA_NO_NODE && d->claim_node != node) )
goto out;
spin_lock(&heap_lock);
@@ -544,6 +549,8 @@ unsigned long domain_adjust_tot_pages(struct domain *d, nodeid_t node,
*/
adjustment = min(d->outstanding_pages, (unsigned int)pages);
d->outstanding_pages -= adjustment;
+ if ( d->claim_node != NUMA_NO_NODE ) /* adjust the static per-node claims */
+ per_node_outstanding_claims[d->claim_node] -= adjustment;
outstanding_claims -= adjustment;
spin_unlock(&heap_lock);
@@ -557,6 +564,9 @@ int domain_set_outstanding_pages(struct domain *d, nodeid_t node,
int ret = -ENOMEM;
unsigned long avail_pages;
+ if ( node != NUMA_NO_NODE && !node_online(node) )
+ return -EINVAL;
+
/*
* take the domain's page_alloc_lock, else all d->tot_page adjustments
* must always take the global heap_lock rather than only in the much
@@ -569,6 +579,10 @@ int domain_set_outstanding_pages(struct domain *d, nodeid_t node,
if ( pages == 0 )
{
outstanding_claims -= d->outstanding_pages;
+
+ if ( d->claim_node != NUMA_NO_NODE )
+ per_node_outstanding_claims[d->claim_node] -= d->outstanding_pages;
+
d->outstanding_pages = 0;
ret = 0;
goto out;
@@ -591,12 +605,26 @@ int domain_set_outstanding_pages(struct domain *d, nodeid_t node,
/* how much memory is available? */
avail_pages = total_avail_pages - outstanding_claims;
+ /* This check can't be skipped for the NUMA case, or we may overclaim */
if ( pages > avail_pages )
goto out;
+ if ( node != NUMA_NO_NODE )
+ {
+ avail_pages = per_node_avail_pages[node] - per_node_outstanding_claims[node];
+
+ if ( pages > avail_pages )
+ goto out;
+ }
+
/* yay, claim fits in available memory, stake the claim, success! */
d->outstanding_pages = pages;
outstanding_claims += d->outstanding_pages;
+ d->claim_node = node;
+
+ if ( node != NUMA_NO_NODE )
+ per_node_outstanding_claims[node] += pages;
+
ret = 0;
out:
@@ -934,7 +962,12 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
zone = zone_hi;
do {
/* Check if target node can support the allocation. */
- if ( !avail[node] || (avail[node][zone] < (1UL << order)) )
+ if ( !avail[node] || (avail[node][zone] < (1UL << order)) ||
+ /* For host-wide allocations, skip nodes without enough
+ * unclaimed memory. */
+ (req_node == NUMA_NO_NODE && outstanding_claims &&
+ ((per_node_avail_pages[node] -
+ per_node_outstanding_claims[node]) < (1UL << order))) )
continue;
/* Find smallest order which can satisfy the request. */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index fd5c9f9333..9535ed7a6a 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -406,6 +406,7 @@ struct domain
unsigned int max_pages; /* maximum value for domain_tot_pages() */
unsigned int extra_pages; /* pages not included in domain_tot_pages() */
+ nodeid_t claim_node; /* NUMA_NO_NODE for host-wide claims */
#ifdef CONFIG_MEM_SHARING
atomic_t shr_pages; /* shared pages */
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 6/7] xen/page_alloc: Check per-node claims in alloc_heap_pages()
2025-08-16 11:19 [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Bernhard Kaindl
` (4 preceding siblings ...)
2025-08-16 11:19 ` [PATCH v2 5/7] xen/page_alloc: Create per-node outstanding claims Bernhard Kaindl
@ 2025-08-16 11:19 ` Bernhard Kaindl
2025-08-16 11:19 ` [PATCH v2 7/7] xen: New API to claim memory for a domain using XEN_DOMCTL_claim_memory Bernhard Kaindl
2025-08-26 8:07 ` [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Jan Beulich
7 siblings, 0 replies; 13+ messages in thread
From: Bernhard Kaindl @ 2025-08-16 11:19 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Alejandro Vallejo
Extend the claim checks in alloc_heap_pages() to NUMA claims.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
Changes since v1:
- No longer require the memflags & MEMF_exact_node for using claims
- If the NUMA node is not passed in memflags, get the NUMA node to
conume claims from using the claim itself and confirm it using
the domain's d->node_affinity, which is where get_free_buddy will
allocate from. This also eases the conversion to multi-node claim
usage as memflags in inherently single-node.
---
xen/common/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 43 insertions(+), 3 deletions(-)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 63ecd74dcc..12e1d6a049 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1027,6 +1027,48 @@ static void init_free_page_fields(struct page_info *pg)
page_set_owner(pg, NULL);
}
+/*
+ * Check if a heap allocation is allowed (helper for alloc_heap_pages)
+ */
+static bool can_alloc(const struct domain *d, unsigned int memflags,
+ unsigned long request)
+{
+ nodeid_t node = MEMF_get_node(memflags);
+
+ /*
+ * If memflags don't define a node to allocate from, get_free_buddy() will
+ * use d->node_affinity for the allocation: Allow the allocation to
+ * take advantage of it when the claimed node is exactly d->node_affinity:
+ */
+ if ( node == NUMA_NO_NODE && d && d->claim_node != NUMA_NO_NODE )
+ {
+ nodemask_t claim_node = nodemask_of_node(d->claim_node);
+
+ if (nodes_equal(d->node_affinity, claim_node))
+ node = d->claim_node;
+ }
+
+ if ( outstanding_claims + request <= total_avail_pages && /* host-wide, */
+ (node == NUMA_NO_NODE || /* if the alloc is node-specific, then also */
+ per_node_outstanding_claims[node] + request <= /* check per-node */
+ per_node_avail_pages[node]) )
+ return true;
+
+ /*
+ * The requested allocation can only be satisfied by outstanding claims.
+ * Claimed memory is considered unavailable unless the request
+ * is made by a domain with sufficient unclaimed pages.
+ *
+ * Only allow if the allocation matches the available claims of the domain.
+ * For host-wide allocs and claims, node == d->claim_node == NUMA_NO_NODE.
+ *
+ * Only refcounted allocs attributed to domains may have been claimed:
+ * Not refcounted allocs cannot consume claimed memory.
+ */
+ return d && d->claim_node == node && d->outstanding_pages >= request &&
+ !(memflags & MEMF_no_refcount);
+}
+
/* Allocate 2^@order contiguous pages. */
static struct page_info *alloc_heap_pages(
unsigned int zone_lo, unsigned int zone_hi,
@@ -1057,9 +1099,7 @@ static struct page_info *alloc_heap_pages(
* Claimed memory is considered unavailable unless the request
* is made by a domain with sufficient unclaimed pages.
*/
- if ( (outstanding_claims + request > total_avail_pages) &&
- ((memflags & MEMF_no_refcount) ||
- !d || d->outstanding_pages < request) )
+ if ( !can_alloc(d, memflags, request) )
{
spin_unlock(&heap_lock);
return NULL;
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 7/7] xen: New API to claim memory for a domain using XEN_DOMCTL_claim_memory
2025-08-16 11:19 [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Bernhard Kaindl
` (5 preceding siblings ...)
2025-08-16 11:19 ` [PATCH v2 6/7] xen/page_alloc: Check per-node claims in alloc_heap_pages() Bernhard Kaindl
@ 2025-08-16 11:19 ` Bernhard Kaindl
2025-08-18 8:28 ` Christian Lindig
2025-08-26 8:07 ` [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Jan Beulich
7 siblings, 1 reply; 13+ messages in thread
From: Bernhard Kaindl @ 2025-08-16 11:19 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Daniel P. Smith, Anthony PERARD, Andrew Cooper,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Juergen Gross, Christian Lindig, David Scott,
Alejandro Vallejo
Add the new hypercall requested during the review of the v1 series
do not require changing the API for multi-node claims.
The hypercall receives a number of claims, intented to be one claim per
NUMA node, and limited to one claim for now. The changes to update the
NUMA claims management to handle updating the claims for multiple
NUMA nodes of a domain at once are deferred to the next series.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Cc: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
tools/flask/policy/modules/dom0.te | 1 +
tools/flask/policy/modules/xen.if | 1 +
tools/include/xenctrl.h | 4 +++
tools/libs/ctrl/xc_domain.c | 42 +++++++++++++++++++++++++++++
tools/ocaml/libs/xc/xenctrl.ml | 9 +++++++
tools/ocaml/libs/xc/xenctrl.mli | 9 +++++++
tools/ocaml/libs/xc/xenctrl_stubs.c | 21 +++++++++++++++
xen/common/domain.c | 30 +++++++++++++++++++++
xen/common/domctl.c | 8 ++++++
xen/include/public/domctl.h | 17 ++++++++++++
xen/include/xen/domain.h | 2 ++
xen/xsm/flask/hooks.c | 3 +++
xen/xsm/flask/policy/access_vectors | 2 ++
13 files changed, 149 insertions(+)
diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te
index ad2b4f9ea7..8801cb24f2 100644
--- a/tools/flask/policy/modules/dom0.te
+++ b/tools/flask/policy/modules/dom0.te
@@ -105,6 +105,7 @@ allow dom0_t dom0_t:domain2 {
get_cpu_policy
dt_overlay
get_domain_state
+ claim_memory
};
allow dom0_t dom0_t:resource {
add
diff --git a/tools/flask/policy/modules/xen.if b/tools/flask/policy/modules/xen.if
index ef7d8f438c..8e2dceb505 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -98,6 +98,7 @@ define(`create_domain_common', `
vuart_op
set_llc_colors
get_domain_state
+ claim_memory
};
allow $1 $2:security check_context;
allow $1 $2:shadow enable;
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 4955981231..1059629d94 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2660,6 +2660,10 @@ int xc_domain_set_llc_colors(xc_interface *xch, uint32_t domid,
const uint32_t *llc_colors,
uint32_t num_llc_colors);
+int xc_domain_claim_memory(xc_interface *xch, uint32_t domid,
+ uint32_t nr_claims,
+ const memory_claim_t *claims);
+
#if defined(__arm__) || defined(__aarch64__)
int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
uint32_t overlay_fdt_size, uint8_t overlay_op);
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index 2ddc3f4f42..370917d877 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -2229,6 +2229,48 @@ out:
return ret;
}
+
+/*
+ * Claim memory for a domain. A Domain can only have one type of claim:
+ *
+ * If the number of claims is 0, existing claims are cancelled.
+ * Updating claims is not supported, cancel the existing claim first.
+ *
+ * Memory allocations consume the outstanding claim and if not enough memory is
+ * free, the allocation must be satisfied from the remaining outstanding claim.
+ */
+int xc_domain_claim_memory(xc_interface *xch, uint32_t domid,
+ uint32_t nr_claims,
+ const memory_claim_t *claims)
+{
+ struct xen_domctl domctl = {
+ .cmd = XEN_DOMCTL_claim_memory,
+ .domain = domid,
+ .u.claim_memory.nr_claims = nr_claims,
+ };
+ int ret;
+ DECLARE_HYPERCALL_BUFFER(struct xen_domctl_claim_memory, buffer);
+
+ /* Use an array to not need changes for multi-node claims in the future */
+ if ( nr_claims )
+ {
+ size_t bytes = sizeof(memory_claim_t) * nr_claims;
+
+ buffer = xc_hypercall_buffer_alloc(xch, buffer, bytes);
+ if ( buffer == NULL )
+ {
+ PERROR("Could not allocate memory for xc_domain_claim_memory");
+ return -1;
+ }
+ memcpy(buffer, claims, bytes);
+ set_xen_guest_handle(domctl.u.claim_memory.claims, buffer);
+ }
+
+ ret = do_domctl(xch, &domctl);
+ xc_hypercall_buffer_free(xch, buffer);
+ return ret;
+}
+
/*
* Local variables:
* mode: C
diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index 7e1aabad6c..cb1c18481b 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -369,6 +369,15 @@ external domain_deassign_device: handle -> domid -> (int * int * int * int) -> u
external domain_test_assign_device: handle -> domid -> (int * int * int * int) -> bool
= "stub_xc_domain_test_assign_device"
+type claim =
+ {
+ node: int;
+ nr_pages: int64;
+ }
+
+external domain_claim_memory: handle -> domid -> int -> claim array -> unit
+ = "stub_xc_domain_claim_memory"
+
external version: handle -> version = "stub_xc_version_version"
external version_compile_info: handle -> compile_info
= "stub_xc_version_compile_info"
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index f44dba61ae..32786a5f7f 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -296,6 +296,15 @@ external domain_deassign_device: handle -> domid -> (int * int * int * int) -> u
external domain_test_assign_device: handle -> domid -> (int * int * int * int) -> bool
= "stub_xc_domain_test_assign_device"
+type claim =
+ {
+ node: int;
+ nr_pages: int64;
+ }
+
+external domain_claim_memory: handle -> domid -> int -> claim array -> unit
+ = "stub_xc_domain_claim_memory"
+
external version : handle -> version = "stub_xc_version_version"
external version_compile_info : handle -> compile_info
= "stub_xc_version_compile_info"
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index b51fd66788..c27e6c4683 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1424,6 +1424,27 @@ CAMLprim value stub_xc_watchdog(value xch_val, value domid, value timeout)
CAMLreturn(Val_int(ret));
}
+/* Claim memory for a domain. See xc_domain_claim_memory() for details. */
+CAMLprim value stub_xc_domain_claim_memory(value xch_val, value domid,
+ value num_claims, value desc)
+{
+ CAMLparam4(xch_val, domid, num_claims, desc);
+ xc_interface *xch = xch_of_val(xch_val);
+ int i, retval, nr_claims = Int_val(num_claims);
+ memory_claim_t claim[nr_claims];
+
+ for (i = 0; i < nr_claims; i++) {
+ claim[i].node = Int_val(Field(desc, i*2));
+ claim[i].nr_pages = Int64_val(Field(desc, i*2 + 1));
+ }
+
+ retval = xc_domain_claim_memory(xch, Int_val(domid), nr_claims, claim);
+ if (retval < 0)
+ failwith_xc(xch);
+
+ CAMLreturn(Val_unit);
+}
+
/*
* Local variables:
* indent-tabs-mode: t
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 1beadb05e1..dcfad4ab15 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -267,6 +267,36 @@ int get_domain_state(struct xen_domctl_get_domain_state *info, struct domain *d,
return rc;
}
+/* XEN_DOMCTL_claim_memory: Claim an amount of memory for a domain */
+int claim_memory(struct domain *d, const struct xen_domctl_claim_memory *uinfo)
+{
+ memory_claim_t claim;
+ int rc;
+
+ switch ( uinfo->nr_claims )
+ {
+ case 0:
+ /* Cancel existing claim. */
+ rc = domain_set_outstanding_pages(d, 0, 0);
+ break;
+
+ case 1:
+ /* Only single node claims supported at the moment. */
+ if ( copy_from_guest(&claim, uinfo->claims, 1) )
+ return -EFAULT;
+
+ rc = domain_set_outstanding_pages(d, claim.node,
+ claim.nr_pages);
+ break;
+
+ default:
+ rc = -EOPNOTSUPP;
+ break;
+ }
+
+ return rc;
+}
+
static void __domain_finalise_shutdown(struct domain *d)
{
struct vcpu *v;
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index f2a7caaf85..e7576ae00b 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -894,6 +894,14 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
ret = get_domain_state(&op->u.get_domain_state, d, &op->domain);
break;
+ case XEN_DOMCTL_claim_memory:
+ ret = xsm_claim_pages(XSM_PRIV, d);
+ if ( ret )
+ break;
+
+ ret = claim_memory(d, &op->u.claim_memory);
+ break;
+
default:
ret = arch_do_domctl(op, d, u_domctl);
break;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 0c75d9d27f..5e924abd85 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1273,6 +1273,21 @@ struct xen_domctl_get_domain_state {
uint64_t unique_id; /* Unique domain identifier. */
};
+struct xen_memory_claim {
+ unsigned int node; /* NUMA node, XC_NUMA_NO_NODE for a host claim */
+ unsigned long nr_pages; /* Number of pages to claim */
+};
+typedef struct xen_memory_claim memory_claim_t;
+DEFINE_XEN_GUEST_HANDLE(memory_claim_t);
+
+/* XEN_DOMCTL_claim_memory: Claim an amount of memory for a domain */
+struct xen_domctl_claim_memory {
+ /* IN: array of memory claims */
+ XEN_GUEST_HANDLE_64(memory_claim_t) claims;
+ /* IN: number of claims */
+ unsigned int nr_claims;
+};
+
struct xen_domctl {
/* Stable domctl ops: interface_version is required to be 0. */
uint32_t cmd;
@@ -1365,6 +1380,7 @@ struct xen_domctl {
#define XEN_DOMCTL_gsi_permission 88
#define XEN_DOMCTL_set_llc_colors 89
#define XEN_DOMCTL_get_domain_state 90 /* stable interface */
+#define XEN_DOMCTL_claim_memory 91
#define XEN_DOMCTL_gdbsx_guestmemio 1000
#define XEN_DOMCTL_gdbsx_pausevcpu 1001
#define XEN_DOMCTL_gdbsx_unpausevcpu 1002
@@ -1433,6 +1449,7 @@ struct xen_domctl {
#endif
struct xen_domctl_set_llc_colors set_llc_colors;
struct xen_domctl_get_domain_state get_domain_state;
+ struct xen_domctl_claim_memory claim_memory;
uint8_t pad[128];
} u;
};
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index e10baf2615..bd5f37bd64 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -192,4 +192,6 @@ extern bool vmtrace_available;
extern bool vpmu_is_available;
+int claim_memory(struct domain *d, const struct xen_domctl_claim_memory *uinfo);
+
#endif /* __XEN_DOMAIN_H__ */
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index b0308e1b26..6b2535b666 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -853,6 +853,9 @@ static int cf_check flask_domctl(struct domain *d, unsigned int cmd,
case XEN_DOMCTL_set_llc_colors:
return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_LLC_COLORS);
+ case XEN_DOMCTL_claim_memory:
+ return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__CLAIM_MEMORY);
+
default:
return avc_unknown_permission("domctl", cmd);
}
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 51a1577a66..87338b5c2a 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -259,6 +259,8 @@ class domain2
set_llc_colors
# XEN_DOMCTL_get_domain_state
get_domain_state
+# XEN_DOMCTL_claim_memory
+ claim_memory
}
# Similar to class domain, but primarily contains domctls related to HVM domains
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v2 7/7] xen: New API to claim memory for a domain using XEN_DOMCTL_claim_memory
2025-08-16 11:19 ` [PATCH v2 7/7] xen: New API to claim memory for a domain using XEN_DOMCTL_claim_memory Bernhard Kaindl
@ 2025-08-18 8:28 ` Christian Lindig
0 siblings, 0 replies; 13+ messages in thread
From: Christian Lindig @ 2025-08-18 8:28 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: xen-devel, Daniel P. Smith, Anthony PERARD, Andrew Cooper,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Juergen Gross, Christian Lindig, David Scott,
Alejandro Vallejo
Acked-by: Christian Lindig <christian.lindig@cloud.com>
> On 16 Aug 2025, at 12:19, Bernhard Kaindl <bernhard.kaindl@cloud.com> wrote:
>
> Add the new hypercall requested during the review of the v1 series
> do not require changing the API for multi-node claims.
>
> The hypercall receives a number of claims, intented to be one claim per
> NUMA node, and limited to one claim for now. The changes to update the
> NUMA claims management to handle updating the claims for multiple
> NUMA nodes of a domain at once are deferred to the next series.
>
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
> Cc: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
> ---
> tools/flask/policy/modules/dom0.te | 1 +
> tools/flask/policy/modules/xen.if | 1 +
> tools/include/xenctrl.h | 4 +++
> tools/libs/ctrl/xc_domain.c | 42 +++++++++++++++++++++++++++++
> tools/ocaml/libs/xc/xenctrl.ml | 9 +++++++
> tools/ocaml/libs/xc/xenctrl.mli | 9 +++++++
> tools/ocaml/libs/xc/xenctrl_stubs.c | 21 +++++++++++++++
> xen/common/domain.c | 30 +++++++++++++++++++++
> xen/common/domctl.c | 8 ++++++
> xen/include/public/domctl.h | 17 ++++++++++++
> xen/include/xen/domain.h | 2 ++
> xen/xsm/flask/hooks.c | 3 +++
> xen/xsm/flask/policy/access_vectors | 2 ++
> 13 files changed, 149 insertions(+)
> +
> + /* Use an array to not need changes for multi-node claims in the future */
> + if ( nr_claims )
> + {
> + size_t bytes = sizeof(memory_claim_t) * nr_claims;
> +
> + buffer = xc_hypercall_buffer_alloc(xch, buffer, bytes);
> + if ( buffer == NULL )
> + {
> + PERROR("Could not allocate memory for xc_domain_claim_memory");
> + return -1;
> + }
> + memcpy(buffer, claims, bytes);
> + set_xen_guest_handle(domctl.u.claim_memory.claims, buffer);
> + }
> +
> + ret = do_domctl(xch, &domctl);
> + xc_hypercall_buffer_free(xch, buffer);
> + return ret;
> +}
Should this be "if (nr_claims > 0)” or have an assertion against negative values?
— C
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims
2025-08-16 11:19 [PATCH v2 0/7] xen/page_alloc: Add NUMA-node specific memory claims Bernhard Kaindl
` (6 preceding siblings ...)
2025-08-16 11:19 ` [PATCH v2 7/7] xen: New API to claim memory for a domain using XEN_DOMCTL_claim_memory Bernhard Kaindl
@ 2025-08-26 8:07 ` Jan Beulich
7 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2025-08-26 8:07 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, Tamas K Lengyel,
Daniel P. Smith, Juergen Gross, Christian Lindig, David Scott,
Alejandro Vallejo, xen-devel
On 16.08.2025 13:19, Bernhard Kaindl wrote:
> Xen supports claiming an amount of memory ahead of allocating it to
> ensure that the memory for the domain is available for allocation.
>
> On NUMA hosts, the same assurance is needed on a per-NUMA-node basis
> to ensure optimal placement of domain memory on the correct NUMA node:
>
> Add per-NUMA-node claims and add a new Hypercall to claim memory for
> a domain using XEN_DOMCTL_claim_memory and xc_domain_claim_memory().
>
> As we will implement multi-node claims as well, we updated the design
> to be flexible for multi-node claims, so that a 2nd series can build
> upon it without changing the hypercall API.
>
> Bernhard Kaindl (6):
> xen/page_alloc: Simplify domain_adjust_tot_pages for future changes
> xen: New API to claim memory for a domain using XEN_DOMCTL_claim_memory
>
> Alejandro Vallejo (1):
> xen/page_alloc: Remove `claim` from domain_set_outstanding_pages()
>
> Alejandro Vallejo and Bernhard Kaindl (5):
> xen/page_alloc: Add static per-NUMA-node counts of free pages
> xen: Add node argument to
> domain_{adjust_tot_pages,set_outstanding_pages}()
> xen/page_alloc.c: Create per-node outstanding claims
> xen/page_alloc: Check per-node claims in alloc_heap_pages()
>
> tools/flask/policy/modules/dom0.te | 1 +
> tools/flask/policy/modules/xen.if | 1 +
> tools/include/xenctrl.h | 4 +
> tools/libs/ctrl/xc_domain.c | 42 ++++++++
> tools/ocaml/libs/xc/xenctrl.ml | 9 ++
> tools/ocaml/libs/xc/xenctrl.mli | 9 ++
> tools/ocaml/libs/xc/xenctrl_stubs.c | 21 ++++
> xen/arch/x86/mm.c | 3 +-
> xen/arch/x86/mm/mem_sharing.c | 4 +-
> xen/common/domain.c | 32 +++++-
> xen/common/domctl.c | 8 ++
> xen/common/grant_table.c | 4 +-
> xen/common/memory.c | 6 +-
> xen/common/page_alloc.c | 154 ++++++++++++++++++++++------
> xen/include/public/domctl.h | 17 +++
> xen/include/xen/domain.h | 2 +
> xen/include/xen/mm.h | 6 +-
> xen/include/xen/sched.h | 1 +
> xen/xsm/flask/hooks.c | 3 +
> xen/xsm/flask/policy/access_vectors | 2 +
> 20 files changed, 285 insertions(+), 44 deletions(-)
Having looked at only patch 1 so far, it already becomes clear that revision
information is lacking here. This is more important than usual for this series
because (a) a patch with a pretty similar title as patch 1 here has was
submitted by (and meanwhile committed for) Alejandro and (b) you picked up
earlier work by him. In fact I first thought you lost his S-o-b on patch 1.
That would have been easily clarified by indicating in the patch that it is
new in v2.
Jan
^ permalink raw reply [flat|nested] 13+ messages in thread