* [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains
@ 2026-02-26 14:29 Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function Bernhard Kaindl
` (10 more replies)
0 siblings, 11 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Daniel P. Smith, Juergen Gross,
Christian Lindig, David Scott
This series introduces NUMA-aware memory claims. Xen allocates the claimed
memory only for allocations from domains with a claim for this memory.
The new hypercall API is designed to support staking claims on multiple NUMA
nodes for a domain. It provides a foundation that can be extended to support
multi-node claims without changing the hypercall API.
Patch Structure:
1. xen/page_alloc: Extract claim consumption on allocation into static inline
2. xen/page_alloc: Add per-node free page counts; make counters unsigned long
3. xen/page_alloc: Add the implementation of NUMA-node-specific claims
4. xen/page_alloc: Consolidate per-node counters into avail[node][maxzone+x]
Is optional: transparent, no functional change, not needed
5. xen/domain: Add the XEN_DOMCTL_claim_memory hypercall handler
6. xsm/flask: Add a Flask security policy for the new hypercall
7. libs/ctrl/xc: Add the libxenctrl API xc_domain_claim_memory()
8. ocaml/libx/xc: Add the OCaml binding for xc_domain_claim_memory()
9. tools/tests: Add testing per-node claims and claims protection
10. doc/guest-guide: Add comprehensive API documentation
The updated guest-guide is deployed here for reviewing the created output:
https://bernhardk-xen-review.readthedocs.io/v4.22-claims.v4/guest-guide/
Changes in v4:
- The logic for adjusting claimed pages on allocation has been completely
reworked to align with recent upstream changes. (Roger Pau Monné)
- The check for node memory availability has been replaced with a corrected
implementation. (Marcus Granado, Roger Pau Monné, Bernhard Kaindl)
- The new hypercall API patch has been refactored and split into separate
patches for the DOMCTL, Flask policy, and libxenctrl implementation.
- Added initial tests and Sphinx documentation for the new API.
- With improvements and rebasing on upstream changes, this series has changed
very much. Reviewing it as a whole is recommended over an incremental review.
Credits:
- Alejandro Vallejo developed the initial version
- Roger Pau Monné updated the implementation and upstreamed key improvements
- Marcus Granado contributed analysis and suggestions during development
- Bernhard Kaindl developed the new domctl API, extended tests and documentation
and developed the refactored handler for consuming claims on allocation.
Comments and feedback welcome.
Bernhard Kaindl (10):
xen/page_alloc: Extract code for consuming claims into inline function
xen/page_alloc: Optimize getting per-NUMA-node free page counts
xen/page_alloc: Implement NUMA-node-specific claims
xen/page_alloc: Consolidate per-node counters into avail[] array
xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
xsm/flask: Add XEN_DOMCTL_claim_memory to flask
tools/lib/ctrl/xc: Add xc_domain_claim_memory() to libxenctrl
tools/ocaml/libs/xc: add OCaml domain_claim_memory binding
tools/tests: Update the claims test to test claim_memory hypercall
docs/guest-guide: document the memory claim hypercalls
.readthedocs.yaml | 13 +-
docs/conf.py | 6 +-
.../dom/DOMCTL_claim_memory-classes.mmd | 51 ++++
.../dom/DOMCTL_claim_memory-seqdia.mmd | 23 ++
.../dom/DOMCTL_claim_memory-workflow.mmd | 23 ++
docs/guest-guide/dom/DOMCTL_claim_memory.rst | 125 ++++++++
docs/guest-guide/dom/index.rst | 14 +
docs/guest-guide/index.rst | 23 ++
docs/guest-guide/mem/XENMEM_claim_pages.rst | 68 +++++
docs/guest-guide/mem/index.rst | 12 +
docs/hypervisor-guide/index.rst | 5 +
docs/hypervisor-guide/mm/claims.rst | 114 +++++++
docs/hypervisor-guide/mm/index.rst | 10 +
tools/flask/policy/modules/dom0.te | 1 +
tools/flask/policy/modules/xen.if | 1 +
tools/include/xenctrl.h | 4 +
tools/libs/ctrl/xc_domain.c | 27 ++
tools/ocaml/libs/xc/xenctrl.ml | 11 +
tools/ocaml/libs/xc/xenctrl.mli | 11 +
tools/ocaml/libs/xc/xenctrl_stubs.c | 43 +++
tools/tests/mem-claim/test-mem-claim.c | 277 ++++++++++++++++--
xen/common/domain.c | 32 +-
xen/common/domctl.c | 9 +
xen/common/memory.c | 3 +-
xen/common/page_alloc.c | 254 +++++++++++++---
xen/include/public/domctl.h | 38 +++
xen/include/xen/domain.h | 2 +
xen/include/xen/mm.h | 4 +-
xen/include/xen/sched.h | 1 +
xen/xsm/flask/hooks.c | 3 +
xen/xsm/flask/policy/access_vectors | 2 +
31 files changed, 1134 insertions(+), 76 deletions(-)
create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory-classes.mmd
create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory-seqdia.mmd
create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory-workflow.mmd
create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory.rst
create mode 100644 docs/guest-guide/dom/index.rst
create mode 100644 docs/guest-guide/mem/XENMEM_claim_pages.rst
create mode 100644 docs/guest-guide/mem/index.rst
create mode 100644 docs/hypervisor-guide/mm/claims.rst
create mode 100644 docs/hypervisor-guide/mm/index.rst
--
2.39.5
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-03-04 16:20 ` Jan Beulich
2026-03-05 8:21 ` Roger Pau Monné
2026-02-26 14:29 ` [PATCH v4 02/10] xen/page_alloc: Optimize getting per-NUMA-node free page counts Bernhard Kaindl
` (9 subsequent siblings)
10 siblings, 2 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
Refactor the claims consumption code in preparation for node-claims.
Lays the groundwork for adding the consumption of NUMA claims to it.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
xen/common/page_alloc.c | 56 +++++++++++++++++++++++------------------
1 file changed, 31 insertions(+), 25 deletions(-)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 588b5b99cbc7..6f7f30c64605 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -518,6 +518,34 @@ unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
return d->tot_pages;
}
+/* Release outstanding claims on the domain, host and later also node */
+static inline
+void release_outstanding_claims(struct domain *d, unsigned long release)
+{
+ ASSERT(spin_is_locked(&heap_lock));
+ BUG_ON(outstanding_claims < release);
+ outstanding_claims -= release;
+ d->outstanding_pages -= release;
+}
+
+/*
+ * Consume outstanding claimed pages when allocating pages for a domain.
+ * NB. The alloc could (in principle) fail in assign_pages() afterwards. In that
+ * case, the consumption is not reversed, but as claims are used only during
+ * domain build and d is destroyed if the build fails, this has no significance.
+ */
+static inline
+void consume_outstanding_claims(struct domain *d, unsigned long allocation)
+{
+ if ( !d || !d->outstanding_pages )
+ return;
+ ASSERT(spin_is_locked(&heap_lock));
+
+ /* Of course, the domain can only release up its outstanding claims */
+ allocation = min(allocation, d->outstanding_pages + 0UL);
+ release_outstanding_claims(d, allocation);
+}
+
int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
{
int ret = -ENOMEM;
@@ -535,8 +563,7 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
/* pages==0 means "unset" the claim. */
if ( pages == 0 )
{
- outstanding_claims -= d->outstanding_pages;
- d->outstanding_pages = 0;
+ release_outstanding_claims(d, d->outstanding_pages);
ret = 0;
goto out;
}
@@ -1048,29 +1075,8 @@ static struct page_info *alloc_heap_pages(
total_avail_pages -= request;
ASSERT(total_avail_pages >= 0);
- if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
- {
- /*
- * Adjust claims in the same locked region where total_avail_pages is
- * adjusted, not doing so would lead to a window where the amount of
- * free memory (avail - claimed) would be incorrect.
- *
- * Note that by adjusting the claimed amount here it's possible for
- * pages to fail to be assigned to the claiming domain while already
- * having been subtracted from d->outstanding_pages. Such claimed
- * amount is then lost, as the pages that fail to be assigned to the
- * domain are freed without replenishing the claim. This is fine given
- * claims are only to be used during physmap population as part of
- * domain build, and any failure in assign_pages() there will result in
- * the domain being destroyed before creation is finished. Losing part
- * of the claim makes no difference.
- */
- unsigned long outstanding = min(d->outstanding_pages + 0UL, request);
-
- BUG_ON(outstanding > outstanding_claims);
- outstanding_claims -= outstanding;
- d->outstanding_pages -= outstanding;
- }
+ if ( !(memflags & MEMF_no_refcount) )
+ consume_outstanding_claims(d, request);
check_low_mem_virq();
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v4 02/10] xen/page_alloc: Optimize getting per-NUMA-node free page counts
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-03-04 16:31 ` Jan Beulich
2026-02-26 14:29 ` [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims Bernhard Kaindl
` (8 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Alejandro Vallejo
From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Add per-node free page counters (node_avail_pages[]), protected by
heap_lock, updated in real-time in lockstep with total_avail_pages
as pages are allocated and freed.
This replaces the avail_heap_pages() loop over all online nodes and
zones in avail_node_heap_pages() with a direct O(1) array lookup,
making it efficient to get the total free pages for a given NUMA node.
The per-node counts are currently provided using sysctl for NUMA
placement decisions of domain builders and monitoring, and for
debugging with the debug-key 'u' to print NUMA info to the printk buffer.
They will also be used for checking if a NUMA node may be able to
satisfy a NUMA-node-specific allocation by comparing node availability
against node-specific claims before looking for pages in the zones
of the node.
Also change total_avail_pages and outstanding_claims to unsigned long:
Those never become negative (we protect that with ASSERT/BUG_ON already),
and converting them to unsigned long makes that explicit, and also
fixes signed/unsigned comparison warnings.
This only needs moving the ASSERT to before the subtraction.
See the previous commit moving the BUG_ON for outstanding_claims.
This lays the groundwork for implementing per-node claims.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
xen/common/page_alloc.c | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 6f7f30c64605..2176cb113fe2 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -483,11 +483,32 @@ static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
static unsigned long node_need_scrub[MAX_NUMNODES];
+/* avail[node][zone] is the number of free pages on that node and zone. */
static unsigned long *avail[MAX_NUMNODES];
-static long total_avail_pages;
+/* Global available pages, updated in real-time, protected by heap_lock */
+static unsigned long total_avail_pages;
+/* The global heap lock, protecting access to the heap and related structures */
static DEFINE_SPINLOCK(heap_lock);
-static long outstanding_claims; /* total outstanding claims by all domains */
+
+/*
+ * Per-node count of available pages, protected by heap_lock, updated in
+ * lockstep with total_avail_pages as pages are allocated and freed.
+ *
+ * Each entry holds the sum of avail[node][zone] across all zones, used for
+ * efficiently checking node-local availability for allocation requests.
+ * Also provided via sysctl for NUMA placement decisions of domain builders
+ * and monitoring, and logged with debug-key 'u' for NUMA debugging.
+ *
+ * Maintaining this under heap_lock does not reduce scalability, as the
+ * allocator is already serialized on it. The accessor macro abstracts the
+ * storage to ease future changes (e.g. moving to per-node lock granularity).
+ */
+#define node_avail_pages(node) (node_avail_pages[node])
+static unsigned long node_avail_pages[MAX_NUMNODES];
+
+/* total outstanding claims by all domains */
+static unsigned long outstanding_claims;
static unsigned long avail_heap_pages(
unsigned int zone_lo, unsigned int zone_hi, unsigned int node)
@@ -1072,8 +1093,10 @@ static struct page_info *alloc_heap_pages(
ASSERT(avail[node][zone] >= request);
avail[node][zone] -= request;
+ ASSERT(node_avail_pages(node) >= request);
+ node_avail_pages(node) -= request;
+ ASSERT(total_avail_pages >= request);
total_avail_pages -= request;
- ASSERT(total_avail_pages >= 0);
if ( !(memflags & MEMF_no_refcount) )
consume_outstanding_claims(d, request);
@@ -1235,8 +1258,10 @@ static int reserve_offlined_page(struct page_info *head)
continue;
avail[node][zone]--;
+ ASSERT(node_avail_pages(node) > 0);
+ node_avail_pages(node)--;
+ ASSERT(total_avail_pages > 0);
total_avail_pages--;
- ASSERT(total_avail_pages >= 0);
page_list_add_tail(cur_head,
test_bit(_PGC_broken, &cur_head->count_info) ?
@@ -1559,6 +1584,7 @@ static void free_heap_pages(
}
avail[node][zone] += 1 << order;
+ node_avail_pages(node) += 1 << order;
total_avail_pages += 1 << order;
if ( need_scrub )
{
@@ -2816,7 +2842,7 @@ unsigned long avail_domheap_pages_region(
unsigned long avail_node_heap_pages(unsigned int nodeid)
{
- return avail_heap_pages(MEMZONE_XEN, NR_ZONES -1, nodeid);
+ return node_avail_pages(nodeid);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 02/10] xen/page_alloc: Optimize getting per-NUMA-node free page counts Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-03-05 10:53 ` Jan Beulich
2026-02-26 14:29 ` [PATCH v4 04/10] xen/page_alloc: Consolidate per-node counters into avail[] array Bernhard Kaindl
` (7 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Alejandro Vallejo, Marcus Granado
Extend the domain memory claims infrastructure to optionally target
a NUMA node, preserving backward compatibility for existing callers.
Based on the design by Alejandro Vallejo with critical design changes
by Roger Pau Monné and me, including suggestions by Marcus Granado.
Overview:
- Add tracking of per-node claims
- Add tracking of the node of a claim in d->claim_node
- Add per-node claims to domain_set_outstanding_pages()
- Add per-node claims to consume_outstanding_claims()
- Add per-node claims to release_outstanding_claims()
- Add protecting per-node claims to get_free_buddy()
- Update host claim protection to include both claimed and free pages
Helper functions for claims:
- available_after_claims() gives the pages avaiable after outstanding claims
- host_allocatable_request() updates the check for globale memory to combine
d->outstanding_claims with the free pages when permittign an allocation.
- node_allocatable_request() is used in get_free_buddy() to enforce
per-node claim protection and skip to the next node if insufficient.
Cross-node claim preservation (alloc_node != d->claim_node):
- When allocating with alloc_node != d->claim_node, preserve the claim
unless d would exceed d->max_pages, in which case consume just enough
to stay within d->max_pages to not book excess memory to the domain.
Update the existing callers of domain_set_outstanding_pages() (domain_kill
and XENMEM_claim_pages) to pass NUMA_NO_NODE for backward compatibility.
This lays the groundwork for a NUMA claims hypercall.
Suggested-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Suggested-by: Marcus Granado <marcus.granado@citrix.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
xen/common/domain.c | 3 +-
xen/common/memory.c | 3 +-
xen/common/page_alloc.c | 147 ++++++++++++++++++++++++++++++++++++----
xen/include/xen/mm.h | 4 +-
xen/include/xen/sched.h | 1 +
5 files changed, 140 insertions(+), 18 deletions(-)
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2e46207d2db0..e7861259a2b3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -944,6 +944,7 @@ struct domain *domain_create(domid_t domid,
spin_lock_init(&d->node_affinity_lock);
d->node_affinity = NODE_MASK_ALL;
d->auto_node_affinity = 1;
+ d->claim_node = NUMA_NO_NODE;
spin_lock_init(&d->shutdown_lock);
d->shutdown_code = SHUTDOWN_CODE_INVALID;
@@ -1311,7 +1312,7 @@ int domain_kill(struct domain *d)
rspin_barrier(&d->domain_lock);
argo_destroy(d);
vnuma_destroy(d->vnuma);
- domain_set_outstanding_pages(d, 0);
+ domain_set_outstanding_pages(d, 0, NUMA_NO_NODE);
/* fallthrough */
case DOMDYING_dying:
rc = domain_teardown(d);
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 918510f287a0..85e242ad9e61 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1798,7 +1798,8 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
rc = -EINVAL;
if ( !rc )
- rc = domain_set_outstanding_pages(d, reservation.nr_extents);
+ rc = domain_set_outstanding_pages(d, reservation.nr_extents,
+ NUMA_NO_NODE);
rcu_unlock_domain(d);
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 2176cb113fe2..6fc7d4cb9d40 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -488,7 +488,10 @@ static unsigned long *avail[MAX_NUMNODES];
/* Global available pages, updated in real-time, protected by heap_lock */
static unsigned long total_avail_pages;
-/* The global heap lock, protecting access to the heap and related structures */
+/*
+ * The global heap lock, protecting access to the heap and related structures
+ * It protects the heap and claims, d->outstanding_pages and d->claim_node
+ */
static DEFINE_SPINLOCK(heap_lock);
/*
@@ -510,6 +513,71 @@ static unsigned long node_avail_pages[MAX_NUMNODES];
/* total outstanding claims by all domains */
static unsigned long outstanding_claims;
+/*
+ * Per-node accessor for outstanding claims, protected by heap_lock, updated
+ * in lockstep with the global outstanding_claims and d->outstanding_pages
+ * in domain_set_outstanding_pages() and release_outstanding_claims().
+ *
+ * node_outstanding_claims(node) is used to determine the outstanding claims on
+ * a node, which are subtracted from the node's available pages to determine if
+ * a request can be satisfied without violating the node's memory availability.
+ */
+#define node_outstanding_claims(node) (node_outstanding_claims[node])
+/* total outstanding claims by all domains on node */
+static unsigned long node_outstanding_claims[MAX_NUMNODES];
+
+/* Return available pages after subtracting claimed pages */
+static inline unsigned long available_after_claims(unsigned long avail_pages,
+ unsigned long claims)
+{
+ BUG_ON(claims > avail_pages);
+ return avail_pages - claims; /* Due to the BUG_ON, it cannot be negative */
+}
+
+/* Answer if host-level memory and claims permit this request to proceed */
+static inline bool host_allocatable_request(const struct domain *d,
+ unsigned int memflags,
+ unsigned long request)
+{
+ unsigned long allocatable_pages;
+
+ ASSERT(spin_is_locked(&heap_lock));
+
+ allocatable_pages = available_after_claims(total_avail_pages,
+ outstanding_claims);
+ if ( allocatable_pages >= request )
+ return true; /* The not claimed pages are enough to proceed */
+
+ if ( !d || (memflags & MEMF_no_refcount) )
+ return false; /* Claims are not available for this allocation */
+
+ /* The domain's claims are available, return true if sufficient */
+ return request <= allocatable_pages + d->outstanding_pages;
+}
+
+/* Answer if node-level memory and claims permit this request to proceed */
+static inline bool node_allocatable_request(const struct domain *d,
+ unsigned int memflags,
+ unsigned long request,
+ nodeid_t node)
+{
+ unsigned long allocatable_pages;
+
+ ASSERT(spin_is_locked(&heap_lock));
+ ASSERT(node < MAX_NUMNODES);
+
+ allocatable_pages = available_after_claims(node_avail_pages(node),
+ node_outstanding_claims(node));
+ if ( allocatable_pages >= request )
+ return true; /* The not claimed pages are enough to proceed */
+
+ if ( !d || (memflags & MEMF_no_refcount) || (node != d->claim_node) )
+ return false; /* Claims are not available for this allocation */
+
+ /* The domain's claims are available, return true if sufficient */
+ return request <= allocatable_pages + d->outstanding_pages;
+}
+
static unsigned long avail_heap_pages(
unsigned int zone_lo, unsigned int zone_hi, unsigned int node)
{
@@ -539,14 +607,23 @@ unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
return d->tot_pages;
}
-/* Release outstanding claims on the domain, host and later also node */
+/* Release outstanding claims on the domain, host and node */
static inline
void release_outstanding_claims(struct domain *d, unsigned long release)
{
ASSERT(spin_is_locked(&heap_lock));
BUG_ON(outstanding_claims < release);
outstanding_claims -= release;
+
+ if ( d->claim_node != NUMA_NO_NODE )
+ {
+ BUG_ON(node_outstanding_claims(d->claim_node) < release);
+ node_outstanding_claims(d->claim_node) -= release;
+ }
d->outstanding_pages -= release;
+
+ if ( d->outstanding_pages == 0 )
+ d->claim_node = NUMA_NO_NODE; /* Clear if no outstanding pages left */
}
/*
@@ -556,7 +633,8 @@ void release_outstanding_claims(struct domain *d, unsigned long release)
* domain build and d is destroyed if the build fails, this has no significance.
*/
static inline
-void consume_outstanding_claims(struct domain *d, unsigned long allocation)
+void consume_outstanding_claims(struct domain *d, unsigned long allocation,
+ nodeid_t alloc_node)
{
if ( !d || !d->outstanding_pages )
return;
@@ -564,14 +642,41 @@ void consume_outstanding_claims(struct domain *d, unsigned long allocation)
/* Of course, the domain can only release up its outstanding claims */
allocation = min(allocation, d->outstanding_pages + 0UL);
+
+ if ( d->claim_node != NUMA_NO_NODE && d->claim_node != alloc_node )
+ {
+ /*
+ * The domain has a claim on a node, but the alloc is on a different
+ * node. If it would exceed the domain's max_pages, reduce the claim
+ * up to the excess over max_pages so we don't reduce the claim more
+ * than we have to to honor the max_pages limit.
+ */
+ unsigned long booked_pages = domain_tot_pages(d) + allocation +
+ d->outstanding_pages;
+ if ( booked_pages <= d->max_pages )
+ return; /* booked is within max_pages, no excess, keep the claim */
+
+ /* Excess detected, release the exceeding pages from the claimed node */
+ allocation = min(allocation, booked_pages - d->max_pages);
+ }
release_outstanding_claims(d, allocation);
}
-int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
+/*
+ * Update outstanding claims for the domain. Note: The node is passed as an
+ * unsigned int to allow checking for overflow above the uint8_t nodeid_t limit.
+ */
+int domain_set_outstanding_pages(struct domain *d, unsigned long pages,
+ unsigned int node)
{
int ret = -ENOMEM;
unsigned long claim, avail_pages;
+ /* When releasing a claim, the node must be NUMA_NO_NODE (it is not used) */
+ if ( pages == 0 && node != NUMA_NO_NODE )
+ return -EINVAL;
+ if ( node != NUMA_NO_NODE && (node >= MAX_NUMNODES || !node_online(node)) )
+ return -ENOENT;
/*
* Two locks are needed here:
* - d->page_alloc_lock: protects accesses to d->{tot,max,extra}_pages.
@@ -604,9 +709,12 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
}
/* how much memory is available? */
- avail_pages = total_avail_pages;
-
- avail_pages -= outstanding_claims;
+ if ( node == NUMA_NO_NODE )
+ avail_pages = available_after_claims(total_avail_pages,
+ outstanding_claims);
+ else
+ avail_pages = available_after_claims(node_avail_pages(node),
+ node_outstanding_claims(node));
/*
* Note, if domain has already allocated memory before making a claim
@@ -619,6 +727,11 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
/* yay, claim fits in available memory, stake the claim, success! */
d->outstanding_pages = claim;
outstanding_claims += d->outstanding_pages;
+ if ( node != NUMA_NO_NODE )
+ {
+ node_outstanding_claims(node) += claim;
+ d->claim_node = node;
+ }
ret = 0;
out:
@@ -953,6 +1066,13 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
*/
for ( ; ; )
{
+ /*
+ * Claimed memory is considered unavailable unless the request
+ * is made by a domain with sufficient unclaimed pages.
+ */
+ if ( !node_allocatable_request(d, memflags, (1UL << order), node) )
+ goto try_next_node;
+
zone = zone_hi;
do {
/* Check if target node can support the allocation. */
@@ -982,6 +1102,8 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
}
} while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
+ try_next_node:
+ /* If MEMF_exact_node was passed, we may not skip to a different node */
if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
return NULL;
@@ -1042,13 +1164,8 @@ static struct page_info *alloc_heap_pages(
spin_lock(&heap_lock);
- /*
- * Claimed memory is considered unavailable unless the request
- * is made by a domain with sufficient unclaimed pages.
- */
- if ( (outstanding_claims + request > total_avail_pages) &&
- ((memflags & MEMF_no_refcount) ||
- !d || d->outstanding_pages < request) )
+ /* Proceed if host-level memory and claims permit this request to proceed */
+ if ( !host_allocatable_request(d, memflags, request) )
{
spin_unlock(&heap_lock);
return NULL;
@@ -1099,7 +1216,7 @@ static struct page_info *alloc_heap_pages(
total_avail_pages -= request;
if ( !(memflags & MEMF_no_refcount) )
- consume_outstanding_claims(d, request);
+ consume_outstanding_claims(d, request, node);
check_low_mem_virq();
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index d80bfba6d393..6e589a5b6389 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -65,6 +65,7 @@
#include <xen/compiler.h>
#include <xen/mm-frame.h>
#include <xen/mm-types.h>
+#include <xen/numa.h>
#include <xen/types.h>
#include <xen/list.h>
#include <xen/spinlock.h>
@@ -131,7 +132,8 @@ int populate_pt_range(unsigned long virt, unsigned long nr_mfns);
/* Claim handling */
unsigned long __must_check domain_adjust_tot_pages(struct domain *d,
long pages);
-int domain_set_outstanding_pages(struct domain *d, unsigned long pages);
+int domain_set_outstanding_pages(struct domain *d, unsigned long pages,
+ unsigned int node);
void get_outstanding_claims(uint64_t *free_pages, uint64_t *outstanding_pages);
/* Domain suballocator. These functions are *not* interrupt-safe.*/
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 40a35fc15c65..7f1654afbc7c 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -421,6 +421,7 @@ struct domain
unsigned int outstanding_pages;
unsigned int max_pages; /* maximum value for domain_tot_pages() */
unsigned int extra_pages; /* pages not included in domain_tot_pages() */
+ nodeid_t claim_node; /* NUMA_NO_NODE for host-wide claims */
#ifdef CONFIG_MEM_SHARING
atomic_t shr_pages; /* shared pages */
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v4 04/10] xen/page_alloc: Consolidate per-node counters into avail[] array
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
` (2 preceding siblings ...)
2026-02-26 14:29 ` [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness Bernhard Kaindl
` (6 subsequent siblings)
10 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
Replace the static node_avail_pages[] and node_outstanding_claims[]
arrays with two extra entries in the per-node avail[] array:
avail[node][AVAIL_NODE_TOTAL] - total free pages on this node
avail[node][AVAIL_NODE_CLAIMS] - outstanding claims on this node
This eliminates two MAX_NUMNODES-sized static arrays by extending the
dynamically allocated avail[] from NR_ZONES to NR_AVAIL_ENTRIES
(NR_ZONES + 2) per node. The node_avail_pages() and
node_outstanding_claims() accessor macros now index into avail[],
keeping all call sites unchanged.
Placing the per-node totals and claims adjacent to the per-zone
counters also improves cache locality: the allocator already touches
avail[node][zone] in get_free_buddy(), so the node-level counters
checked by node_allocatable_request() are likely warm in the same
cache line, avoiding the extra line fetch the separate static arrays
would have required.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
xen/common/page_alloc.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 6fc7d4cb9d40..e844c0ecf637 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -485,6 +485,18 @@ static unsigned long node_need_scrub[MAX_NUMNODES];
/* avail[node][zone] is the number of free pages on that node and zone. */
static unsigned long *avail[MAX_NUMNODES];
+/*
+ * The avail[] array has NR_ZONES entries for per-zone free page counts,
+ * plus two extra entries above NR_ZONES:
+ * avail[node][AVAIL_NODE_TOTAL] - total free pages on this node
+ * avail[node][AVAIL_NODE_CLAIMS] - outstanding claims on this node
+ * This replaces the former static node_avail_pages[] and
+ * node_outstanding_claims[] arrays.
+ */
+#define AVAIL_NODE_TOTAL NR_ZONES
+#define AVAIL_NODE_CLAIMS (NR_ZONES + 1)
+#define NR_AVAIL_ENTRIES (NR_ZONES + 2)
+
/* Global available pages, updated in real-time, protected by heap_lock */
static unsigned long total_avail_pages;
@@ -507,8 +519,7 @@ static DEFINE_SPINLOCK(heap_lock);
* allocator is already serialized on it. The accessor macro abstracts the
* storage to ease future changes (e.g. moving to per-node lock granularity).
*/
-#define node_avail_pages(node) (node_avail_pages[node])
-static unsigned long node_avail_pages[MAX_NUMNODES];
+#define node_avail_pages(node) (avail[node][AVAIL_NODE_TOTAL])
/* total outstanding claims by all domains */
static unsigned long outstanding_claims;
@@ -522,9 +533,7 @@ static unsigned long outstanding_claims;
* a node, which are subtracted from the node's available pages to determine if
* a request can be satisfied without violating the node's memory availability.
*/
-#define node_outstanding_claims(node) (node_outstanding_claims[node])
-/* total outstanding claims by all domains on node */
-static unsigned long node_outstanding_claims[MAX_NUMNODES];
+#define node_outstanding_claims(node) (avail[node][AVAIL_NODE_CLAIMS])
/* Return available pages after subtracting claimed pages */
static inline unsigned long available_after_claims(unsigned long avail_pages,
@@ -762,9 +771,9 @@ static unsigned long init_node_heap(int node, unsigned long mfn,
{
/* First node to be discovered has its heap metadata statically alloced. */
static heap_by_zone_and_order_t _heap_static;
- static unsigned long avail_static[NR_ZONES];
+ static unsigned long avail_static[NR_AVAIL_ENTRIES];
unsigned long needed = (sizeof(**_heap) +
- sizeof(**avail) * NR_ZONES +
+ sizeof(**avail) * NR_AVAIL_ENTRIES +
PAGE_SIZE - 1) >> PAGE_SHIFT;
int i, j;
@@ -782,7 +791,7 @@ static unsigned long init_node_heap(int node, unsigned long mfn,
{
_heap[node] = mfn_to_virt(mfn + nr - needed);
avail[node] = mfn_to_virt(mfn + nr - 1) +
- PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+ PAGE_SIZE - sizeof(**avail) * NR_AVAIL_ENTRIES;
}
else if ( nr >= needed &&
arch_mfns_in_directmap(mfn, needed) &&
@@ -791,7 +800,7 @@ static unsigned long init_node_heap(int node, unsigned long mfn,
{
_heap[node] = mfn_to_virt(mfn);
avail[node] = mfn_to_virt(mfn + needed - 1) +
- PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+ PAGE_SIZE - sizeof(**avail) * NR_AVAIL_ENTRIES;
*use_tail = false;
}
else if ( get_order_from_bytes(sizeof(**_heap)) ==
@@ -800,18 +809,18 @@ static unsigned long init_node_heap(int node, unsigned long mfn,
_heap[node] = alloc_xenheap_pages(get_order_from_pages(needed), 0);
BUG_ON(!_heap[node]);
avail[node] = (void *)_heap[node] + (needed << PAGE_SHIFT) -
- sizeof(**avail) * NR_ZONES;
+ sizeof(**avail) * NR_AVAIL_ENTRIES;
needed = 0;
}
else
{
_heap[node] = xmalloc(heap_by_zone_and_order_t);
- avail[node] = xmalloc_array(unsigned long, NR_ZONES);
+ avail[node] = xmalloc_array(unsigned long, NR_AVAIL_ENTRIES);
BUG_ON(!_heap[node] || !avail[node]);
needed = 0;
}
- memset(avail[node], 0, NR_ZONES * sizeof(long));
+ memset(avail[node], 0, NR_AVAIL_ENTRIES * sizeof(long));
for ( i = 0; i < NR_ZONES; i++ )
for ( j = 0; j <= MAX_ORDER; j++ )
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
` (3 preceding siblings ...)
2026-02-26 14:29 ` [PATCH v4 04/10] xen/page_alloc: Consolidate per-node counters into avail[] array Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-02-26 21:19 ` Teddy Astie
` (3 more replies)
2026-02-26 14:29 ` [PATCH v4 06/10] xsm/flask: Add XEN_DOMCTL_claim_memory to flask Bernhard Kaindl
` (5 subsequent siblings)
10 siblings, 4 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, Daniel P. Smith
Add a DOMCTL handler for claiming memory with NUMA awareness. It
rejects claims when LLC coloring (does not support claims) is enabled
and translates the public constant to the internal NUMA_NO_NODE.
The request is forwarded to domain_set_outstanding_pages() for the
actual claim processing. The handler uses the same XSM hook as the
legacy XENMEM_claim_pages hypercall.
While the underlying infrastructure currently supports only a single
claim, the public hypercall interface is designed to be extensible for
multiple claims in the future without breaking the API.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
xen/common/domain.c | 29 ++++++++++++++++++++++++++++
xen/common/domctl.c | 9 +++++++++
xen/include/public/domctl.h | 38 +++++++++++++++++++++++++++++++++++++
xen/include/xen/domain.h | 2 ++
4 files changed, 78 insertions(+)
diff --git a/xen/common/domain.c b/xen/common/domain.c
index e7861259a2b3..ac1b091f5574 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -268,6 +268,35 @@ int get_domain_state(struct xen_domctl_get_domain_state *info, struct domain *d,
return rc;
}
+/* Claim memory for a domain or reset the claim */
+int claim_memory(struct domain *d, const struct xen_domctl_claim_memory *uinfo)
+{
+ memory_claim_t claim;
+
+ /* alloc_color_heap_page() does not handle claims, so reject LLC coloring */
+ if ( llc_coloring_enabled )
+ return -EOPNOTSUPP;
+ /*
+ * We only support single claims at the moment, and if the domain is
+ * dying (d->is_dying is set), its claims have already been released
+ */
+ if ( uinfo->pad || uinfo->nr_claims != 1 || d->is_dying )
+ return -EINVAL;
+
+ if ( copy_from_guest(&claim, uinfo->claims, 1) )
+ return -EFAULT;
+
+ if ( claim.pad )
+ return -EINVAL;
+
+ /* Convert the API tag for a host-wide claim to the NUMA_NO_NODE constant */
+ if ( claim.node == XEN_DOMCTL_CLAIM_MEMORY_NO_NODE )
+ claim.node = NUMA_NO_NODE;
+
+ /* NB. domain_set_outstanding_pages() has the checks to validate its args */
+ return domain_set_outstanding_pages(d, claim.pages, claim.node);
+}
+
static void __domain_finalise_shutdown(struct domain *d)
{
struct vcpu *v;
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 29a7726d32d0..9e858f631aaf 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -868,6 +868,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
ret = get_domain_state(&op->u.get_domain_state, d, &op->domain);
break;
+ case XEN_DOMCTL_claim_memory:
+ /* Use the same XSM hook as XENMEM_claim_pages */
+ ret = xsm_claim_pages(XSM_PRIV, d);
+ if ( ret )
+ break;
+
+ ret = claim_memory(d, &op->u.claim_memory);
+ break;
+
default:
ret = arch_do_domctl(op, d, u_domctl);
break;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 8f6708c0a7cd..610806c8b6e0 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1276,6 +1276,42 @@ struct xen_domctl_get_domain_state {
uint64_t unique_id; /* Unique domain identifier. */
};
+/*
+ * XEN_DOMCTL_claim_memory
+ *
+ * Claim memory for a guest domain. The claimed memory is converted into actual
+ * memory pages by allocating it. Except for the option to pass claims for
+ * multiple NUMA nodes, the semantics are based on host-wide claims as
+ * provided by XENMEM_claim_pages, and are identical for host-wide claims.
+ *
+ * The initial implementation supports a claim for the host or a NUMA node, but
+ * using an array, the API is designed to be extensible to support more claims.
+ */
+struct xen_memory_claim {
+ uint64_aligned_t pages; /* Amount of pages to be allotted to the domain */
+ uint32_t node; /* NUMA node, or XEN_DOMCTL_CLAIM_MEMORY_NO_NODE for host */
+ uint32_t pad; /* padding for alignment, set to 0 on input */
+};
+typedef struct xen_memory_claim memory_claim_t;
+#define XEN_DOMCTL_CLAIM_MEMORY_NO_NODE 0xFFFFFFFF /* No node: host claim */
+
+/* Use XEN_NODE_CLAIM_INIT to initialize a memory_claim_t structure */
+#define XEN_NODE_CLAIM_INIT(_pages, _node) { \
+ .pages = (_pages), \
+ .node = (_node), \
+ .pad = 0 \
+}
+DEFINE_XEN_GUEST_HANDLE(memory_claim_t);
+
+struct xen_domctl_claim_memory {
+ /* IN: array of struct xen_memory_claim */
+ XEN_GUEST_HANDLE_64(memory_claim_t) claims;
+ /* IN: number of claims in the claims array handle. See the claims field. */
+ uint32_t nr_claims;
+#define XEN_DOMCTL_MAX_CLAIMS UINT8_MAX /* More claims require changes in Xen */
+ uint32_t pad; /* padding for alignment, set it to 0 */
+};
+
struct xen_domctl {
/* Stable domctl ops: interface_version is required to be 0. */
uint32_t cmd;
@@ -1368,6 +1404,7 @@ struct xen_domctl {
#define XEN_DOMCTL_gsi_permission 88
#define XEN_DOMCTL_set_llc_colors 89
#define XEN_DOMCTL_get_domain_state 90 /* stable interface */
+#define XEN_DOMCTL_claim_memory 91
#define XEN_DOMCTL_gdbsx_guestmemio 1000
#define XEN_DOMCTL_gdbsx_pausevcpu 1001
#define XEN_DOMCTL_gdbsx_unpausevcpu 1002
@@ -1436,6 +1473,7 @@ struct xen_domctl {
#endif
struct xen_domctl_set_llc_colors set_llc_colors;
struct xen_domctl_get_domain_state get_domain_state;
+ struct xen_domctl_claim_memory claim_memory;
uint8_t pad[128];
} u;
};
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 93c0fd00c1d7..79e8932c4530 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -193,4 +193,6 @@ extern bool vmtrace_available;
extern bool vpmu_is_available;
+int claim_memory(struct domain *d, const struct xen_domctl_claim_memory *uinfo);
+
#endif /* __XEN_DOMAIN_H__ */
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v4 06/10] xsm/flask: Add XEN_DOMCTL_claim_memory to flask
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
` (4 preceding siblings ...)
2026-02-26 14:29 ` [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-03-05 13:42 ` Jan Beulich
2026-02-26 14:29 ` [PATCH v4 07/10] tools/lib/ctrl/xc: Add xc_domain_claim_memory() to libxenctrl Bernhard Kaindl
` (4 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel; +Cc: Bernhard Kaindl, Daniel P. Smith, Anthony PERARD
Add a Flask security policy for the new XEN_DOMCTL_claim_memory hypercall
introduced in the previous commit. When Flask is enabled, this permission
controls whether a domain can stake memory claims for another domain.
The permission is granted to:
- dom0_t: Dom0 needs this to claim memory for guest domains
- create_domain_common: Domain builders need this during domain creation
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
tools/flask/policy/modules/dom0.te | 1 +
tools/flask/policy/modules/xen.if | 1 +
xen/xsm/flask/hooks.c | 3 +++
xen/xsm/flask/policy/access_vectors | 2 ++
4 files changed, 7 insertions(+)
diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te
index d30edf8be1fb..f5c330d01cec 100644
--- a/tools/flask/policy/modules/dom0.te
+++ b/tools/flask/policy/modules/dom0.te
@@ -103,6 +103,7 @@ allow dom0_t dom0_t:domain2 {
get_cpu_policy
dt_overlay
get_domain_state
+ claim_memory
};
allow dom0_t dom0_t:resource {
add
diff --git a/tools/flask/policy/modules/xen.if b/tools/flask/policy/modules/xen.if
index ef7d8f438c65..8e2dceb505cd 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -98,6 +98,7 @@ define(`create_domain_common', `
vuart_op
set_llc_colors
get_domain_state
+ claim_memory
};
allow $1 $2:security check_context;
allow $1 $2:shadow enable;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index b250b2706535..0cc04ada82a9 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -820,6 +820,9 @@ static int cf_check flask_domctl(struct domain *d, unsigned int cmd,
case XEN_DOMCTL_set_llc_colors:
return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_LLC_COLORS);
+ case XEN_DOMCTL_claim_memory:
+ return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__CLAIM_MEMORY);
+
default:
return avc_unknown_permission("domctl", cmd);
}
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index ce907d50a45e..2c9337f7a145 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -255,6 +255,8 @@ class domain2
set_llc_colors
# XEN_DOMCTL_get_domain_state
get_domain_state
+# XEN_DOMCTL_claim_memory
+ claim_memory
}
# Similar to class domain, but primarily contains domctls related to HVM domains
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v4 07/10] tools/lib/ctrl/xc: Add xc_domain_claim_memory() to libxenctrl
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
` (5 preceding siblings ...)
2026-02-26 14:29 ` [PATCH v4 06/10] xsm/flask: Add XEN_DOMCTL_claim_memory to flask Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 08/10] tools/ocaml/libs/xc: add OCaml domain_claim_memory binding Bernhard Kaindl
` (3 subsequent siblings)
10 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel; +Cc: Bernhard Kaindl, Anthony PERARD, Juergen Gross
Add a libxc function for the new XEN_DOMCTL_claim_memory hypercall,
It supports node-specific claims and host-wide claims.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
tools/include/xenctrl.h | 4 ++++
tools/libs/ctrl/xc_domain.c | 27 +++++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index d5dbf69c8968..a0a9f2143b32 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2659,6 +2659,10 @@ int xc_domain_set_llc_colors(xc_interface *xch, uint32_t domid,
const uint32_t *llc_colors,
uint32_t num_llc_colors);
+int xc_domain_claim_memory(xc_interface *xch, uint32_t domid,
+ uint32_t nr_claims,
+ memory_claim_t *claims);
+
#if defined(__arm__) || defined(__aarch64__)
int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
uint32_t overlay_fdt_size, uint8_t overlay_op);
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index 01c0669c8863..685efc03d295 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1070,6 +1070,33 @@ int xc_domain_remove_from_physmap(xc_interface *xch,
return xc_memory_op(xch, XENMEM_remove_from_physmap, &xrfp, sizeof(xrfp));
}
+/* Claim the guest memory for a domain before starting the domain build */
+int xc_domain_claim_memory(xc_interface *xch,
+ uint32_t domid,
+ uint32_t nr_claims,
+ memory_claim_t *claims)
+{
+ struct xen_domctl domctl = {};
+ DECLARE_HYPERCALL_BOUNCE(claims, sizeof(*claims) * nr_claims,
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+ int ret;
+
+ if ( xc_hypercall_bounce_pre(xch, claims) )
+ return -1;
+
+ domctl.cmd = XEN_DOMCTL_claim_memory;
+ domctl.domain = domid;
+ domctl.u.claim_memory.nr_claims = nr_claims;
+ set_xen_guest_handle(domctl.u.claim_memory.claims, claims);
+
+ ret = do_domctl(xch, &domctl);
+
+ xc_hypercall_bounce_post(xch, claims);
+
+ return ret;
+}
+
+/* Legacy function for claiming pages, replaced by xc_domain_claim_memory() */
int xc_domain_claim_pages(xc_interface *xch,
uint32_t domid,
unsigned long nr_pages)
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v4 08/10] tools/ocaml/libs/xc: add OCaml domain_claim_memory binding
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
` (6 preceding siblings ...)
2026-02-26 14:29 ` [PATCH v4 07/10] tools/lib/ctrl/xc: Add xc_domain_claim_memory() to libxenctrl Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 09/10] tools/tests: Update the claims test to test claim_memory hypercall Bernhard Kaindl
` (2 subsequent siblings)
10 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel; +Cc: Bernhard Kaindl, Christian Lindig, David Scott, Anthony PERARD
Add OCaml bindings for xc_domain_claim_memory(), for using the
XEN_DOMCTL_claim_memory hypercall from OCaml. This allows OCaml
toolstacks to place NUMA-aware memory claims for domains as well
as host-wide claims.
tools/ocaml/libs/xc/xenctrl.ml/mli:
- Add claim record type and domain_claim_memory external.
tools/ocaml/libs/xc/xenctrl_stubs.c:
- Validate claim count and arguments.
- Marshal the OCaml claim array to memory_claim_t[].
- Map node = -1 to XEN_DOMCTL_CLAIM_MEMORY_NO_NODE.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
tools/ocaml/libs/xc/xenctrl.ml | 11 ++++++++
tools/ocaml/libs/xc/xenctrl.mli | 11 ++++++++
tools/ocaml/libs/xc/xenctrl_stubs.c | 43 +++++++++++++++++++++++++++++
3 files changed, 65 insertions(+)
diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index 97108b9d861a..a1a05dcaede3 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -370,6 +370,17 @@ external domain_deassign_device: handle -> domid -> (int * int * int * int) -> u
external domain_test_assign_device: handle -> domid -> (int * int * int * int) -> bool
= "stub_xc_domain_test_assign_device"
+(* OCaml binding for xc_domain_claim_memory(): claim pages for a domain,
+ optionally per NUMA node (node = -1 means no specific node). *)
+
+type claim =
+ {
+ pages: int64; (* Number of pages to claim *)
+ node: int32; (* NUMA node ID, or -1 for no specific node *)
+ }
+external domain_claim_memory: handle -> domid -> claim array -> unit
+ = "stub_xc_domain_claim_memory"
+
external version: handle -> version = "stub_xc_version_version"
external version_compile_info: handle -> compile_info
= "stub_xc_version_compile_info"
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index 9fccb2c2c287..1781c89258fe 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -297,6 +297,17 @@ external domain_deassign_device: handle -> domid -> (int * int * int * int) -> u
external domain_test_assign_device: handle -> domid -> (int * int * int * int) -> bool
= "stub_xc_domain_test_assign_device"
+(* OCaml binding for xc_domain_claim_memory(): claim pages for a domain,
+ optionally per NUMA node (node = -1 means no specific node). *)
+
+type claim =
+ {
+ pages: int64; (* Number of pages to claim *)
+ node: int32; (* NUMA node ID, or -1 for no specific node *)
+ }
+external domain_claim_memory: handle -> domid -> claim array -> unit
+ = "stub_xc_domain_claim_memory"
+
external version : handle -> version = "stub_xc_version_version"
external version_compile_info : handle -> compile_info
= "stub_xc_version_compile_info"
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index c55f73b265b2..a77d7dac58e8 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1435,6 +1435,49 @@ CAMLprim value stub_xc_watchdog(value xch_val, value domid, value timeout)
CAMLreturn(Val_int(ret));
}
+CAMLprim value stub_xc_domain_claim_memory(value xch_val, value domid,
+ value claims)
+{
+ CAMLparam3(xch_val, domid, claims);
+ xc_interface *xch = xch_of_val(xch_val);
+ mlsize_t nr_claims = Wosize_val(claims);
+ memory_claim_t *claim;
+ int retval;
+
+ if (nr_claims > XEN_DOMCTL_MAX_CLAIMS)
+ caml_invalid_argument("domain_claim_memory: too many claims");
+
+ claim = calloc(nr_claims, sizeof(*claim));
+ if (claim == NULL && nr_claims != 0)
+ caml_raise_out_of_memory();
+
+ for (mlsize_t i = 0; i < nr_claims; i++) {
+ value claim_rec = Field(claims, i);
+ int64_t pages = Int64_val(Field(claim_rec, 0));
+ int32_t node = Int32_val(Field(claim_rec, 1));
+ uint32_t c_node;
+
+ if (pages < 0 || node < -1 ) {
+ free(claim);
+ caml_invalid_argument("domain_claim_memory: invalid pages or node");
+ }
+
+ if (node == -1)
+ c_node = XEN_DOMCTL_CLAIM_MEMORY_NO_NODE;
+ else
+ c_node = node;
+
+ claim[i] = (memory_claim_t)XEN_NODE_CLAIM_INIT((uint64_t)pages, c_node);
+ }
+
+ retval = xc_domain_claim_memory(xch, Int_val(domid), nr_claims, claim);
+ free(claim);
+ if (retval < 0)
+ failwith_xc(xch);
+
+ CAMLreturn(Val_unit);
+}
+
/*
* Local variables:
* indent-tabs-mode: t
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v4 09/10] tools/tests: Update the claims test to test claim_memory hypercall
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
` (7 preceding siblings ...)
2026-02-26 14:29 ` [PATCH v4 08/10] tools/ocaml/libs/xc: add OCaml domain_claim_memory binding Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 10/10] docs/guest-guide: document the memory claim hypercalls Bernhard Kaindl
2026-03-04 16:07 ` [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Jan Beulich
10 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel; +Cc: Bernhard Kaindl, Anthony PERARD
Extend the existing mem-claim test to verify both the legacy
XENMEM_claim_pages and the new XEN_DOMCTL_claim_memory hypercalls.
It tests both host-wide claims (NUMA_NO_NODE) and node-specific
claims (assuming at least a single NUMA node, node 0 is provided)
to ensure the new infrastructure works correctly.
It also checks the protection of host- and node-claims against
allocations without sufficient, specific claims.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
tools/tests/mem-claim/test-mem-claim.c | 277 +++++++++++++++++++++++--
1 file changed, 254 insertions(+), 23 deletions(-)
diff --git a/tools/tests/mem-claim/test-mem-claim.c b/tools/tests/mem-claim/test-mem-claim.c
index ad038e45d188..a98d3e43ff54 100644
--- a/tools/tests/mem-claim/test-mem-claim.c
+++ b/tools/tests/mem-claim/test-mem-claim.c
@@ -2,6 +2,7 @@
#include <err.h>
#include <errno.h>
#include <inttypes.h>
+#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
@@ -20,10 +21,13 @@ static unsigned int nr_failures;
#define MB_PAGES(x) (MB(x) / XC_PAGE_SIZE)
+#define CLAIM_TEST_ORDER 9 /* 2M */
+
static xc_interface *xch;
static uint32_t domid = DOMID_INVALID;
static xc_physinfo_t physinfo;
+static unsigned int claim_test_node;
static struct xen_domctl_createdomain create = {
.flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
@@ -38,10 +42,138 @@ static struct xen_domctl_createdomain create = {
},
};
-static void run_tests(void)
+typedef int (*claim_fn_t)(xc_interface *xch, uint32_t domid,
+ unsigned long pages);
+
+/* Wrapper function to test claiming memory using xc_domain_claim_pages. */
+static int wrap_claim_pages(xc_interface *xch,
+ uint32_t domid,
+ unsigned long pages)
+{
+ return xc_domain_claim_pages(xch, domid, pages);
+}
+
+/* Wrapper function to test claiming memory using xc_domain_claim_memory. */
+static int wrap_claim_memory(xc_interface *xch,
+ uint32_t domid,
+ unsigned long pages)
+{
+ memory_claim_t claim[] = {
+ XEN_NODE_CLAIM_INIT(pages, XEN_DOMCTL_CLAIM_MEMORY_NO_NODE)
+ };
+
+ return xc_domain_claim_memory(xch, domid, 1, claim);
+}
+
+/* Wrapper to test claiming memory using xc_domain_claim_memory on a NUMA node */
+static int wrap_claim_memory_node(xc_interface *xch,
+ uint32_t domid,
+ unsigned long pages)
{
int rc;
+ memory_claim_t claims[UINT8_MAX + 1] = {}; /* + 1 to test overflow check */
+
+ /* claim with a node that is not present */
+ claims[0] = (memory_claim_t)XEN_NODE_CLAIM_INIT(pages, physinfo.nr_nodes);
+ /* Check the return value of claiming memory on an invalid node */
+ rc = xc_domain_claim_memory(xch, domid, 1, claims);
+ if ( rc != -1 || errno != ENOENT )
+ {
+ fail("Expected claim failure on invalid node to fail with ENOENT\n");
+ return rc;
+ }
+ /*
+ * Check the return value of claiming on two nodes (not yet implemented)
+ * and that the valid claim is rejected when nr_claims > 1. We expect that
+ * the API will reject the call due exceeding nr_claims before it checks
+ * the validity of the node(s), so we expect EINVAL rather than ENOENT.
+ */
+ rc = xc_domain_claim_memory(xch, domid, 2, claims);
+ if ( rc != -1 || errno != EINVAL )
+ {
+ fail("Expected nr_claims == 2 to fail with EINVAL (for now)\n");
+ return rc;
+
+ }
+ /* Likewise check with nr_claims > MAX_UINT8 to test overflow */
+ rc = xc_domain_claim_memory(xch, domid, UINT8_MAX + 1, claims);
+ if ( rc != -1 || errno != EINVAL )
+ {
+ fail("Expected nr_claims = UINT8_MAX + 1 to fail with EINVAL\n");
+ return rc;
+ }
+ /* Likewise check with a node of MAX_UINT8 + 1 to test overflow */
+ claims[0].node = UINT8_MAX + 1;
+ rc = xc_domain_claim_memory(xch, domid, 1, claims);
+ if ( rc != -1 || errno != ENOENT )
+ {
+ fail("Expected node == UINT8_MAX + 1 to fail with ENOENT\n");
+ return rc;
+ }
+ /* Test with pages exceeding INT32_MAX to check overflow */
+ claims[0] = (memory_claim_t)XEN_NODE_CLAIM_INIT((unsigned)INT32_MAX + 1, 0);
+ rc = xc_domain_claim_memory(xch, domid, 1, claims);
+ if ( rc != -1 || errno != ENOMEM )
+ {
+ fail("Expected ENOMEM with pages > INT32_MAX\n");
+ return rc;
+ }
+ /* Test with pad not set to zero */
+ claims[0] = (memory_claim_t)XEN_NODE_CLAIM_INIT(pages, claim_test_node);
+ claims[0].pad = 1;
+ rc = xc_domain_claim_memory(xch, domid, 1, claims);
+ if ( rc != -1 || errno != EINVAL )
+ {
+ fail("Expected EINVAL with pad not set to zero\n");
+ return rc;
+ }
+
+ /* Pass a valid claim for the selected node and continue the test */
+ claims[0] = (memory_claim_t)XEN_NODE_CLAIM_INIT(pages, claim_test_node);
+ return xc_domain_claim_memory(xch, domid, 1, claims);
+}
+
+static int get_node_free_pages(unsigned int node, unsigned long *free_pages)
+{
+ int rc;
+ unsigned int num_nodes = 0;
+ xc_meminfo_t *meminfo;
+
+ rc = xc_numainfo(xch, &num_nodes, NULL, NULL);
+ if ( rc )
+ return rc;
+
+ if ( node >= num_nodes )
+ {
+ errno = EINVAL;
+ return -1;
+ }
+
+ meminfo = calloc(num_nodes, sizeof(*meminfo));
+ if ( !meminfo )
+ return -1;
+
+ rc = xc_numainfo(xch, &num_nodes, meminfo, NULL);
+ if ( rc )
+ goto out;
+
+ *free_pages = meminfo[node].memfree / XC_PAGE_SIZE;
+
+ out:
+ free(meminfo);
+ return rc;
+}
+
+static void run_test(claim_fn_t claim_call_wrapper, const char *claim_name,
+ bool host_wide_claim)
+{
+ int rc;
+ uint64_t free_heap_bytes;
+ unsigned long free_pages, claim_pages;
+ const unsigned long request_pages = 1UL << CLAIM_TEST_ORDER;
+
+ printf(" Testing %s\n", claim_name);
/*
* Check that the system is quiescent. Outstanding claims is a global
* field.
@@ -51,7 +183,7 @@ static void run_tests(void)
return fail("Failed to obtain physinfo: %d - %s\n",
errno, strerror(errno));
- printf("Free pages: %"PRIu64", Oustanding claims: %"PRIu64"\n",
+ printf("Free pages: %"PRIu64", Outstanding claims: %"PRIu64"\n",
physinfo.free_pages, physinfo.outstanding_pages);
if ( physinfo.outstanding_pages )
@@ -98,13 +230,30 @@ static void run_tests(void)
return fail(" Unexpected outstanding claim of %"PRIu64" pages\n",
physinfo.outstanding_pages);
- /*
- * Set a claim for 4M. This should be the only claim in the system, and
- * show up globally.
- */
- rc = xc_domain_claim_pages(xch, domid, MB_PAGES(4));
+ rc = xc_availheap(xch, 0, 0, host_wide_claim ? -1 : (int)claim_test_node,
+ &free_heap_bytes);
if ( rc )
- return fail(" Failed to claim 4M of RAM: %d - %s\n",
+ return fail(" Failed to query available heap: %d - %s\n",
+ errno, strerror(errno));
+
+ free_pages = free_heap_bytes / XC_PAGE_SIZE;
+ if ( !host_wide_claim )
+ {
+ rc = get_node_free_pages(claim_test_node, &free_pages);
+ if ( rc )
+ return fail(" Failed to query free pages on node %u: %d - %s\n",
+ claim_test_node, errno, strerror(errno));
+ }
+
+ if ( free_pages <= request_pages + 1 )
+ return fail(" Not enough free pages (%lu) to test %s claim enforcement\n",
+ free_pages, host_wide_claim ? "host-wide" : "node");
+
+ claim_pages = free_pages - request_pages + 1;
+
+ rc = claim_call_wrapper(xch, domid, claim_pages);
+ if ( rc )
+ return fail(" Failed to claim calculated RAM amount: %d - %s\n",
errno, strerror(errno));
rc = xc_physinfo(xch, &physinfo);
@@ -112,17 +261,51 @@ static void run_tests(void)
return fail(" Failed to obtain physinfo: %d - %s\n",
errno, strerror(errno));
- if ( physinfo.outstanding_pages != MB_PAGES(4) )
- return fail(" Expected claim to be 4M, got %"PRIu64" pages\n",
- physinfo.outstanding_pages);
+ if ( physinfo.outstanding_pages != claim_pages )
+ return fail(" Expected claim to be %lu pages, got %"PRIu64" pages\n",
+ claim_pages, physinfo.outstanding_pages);
+
+ {
+ uint32_t other_domid = DOMID_INVALID;
+ xen_pfn_t other_ram[] = { 0 };
+ unsigned int memflags = host_wide_claim ? 0 : XENMEMF_exact_node(claim_test_node);
+
+ rc = xc_domain_create(xch, &other_domid, &create);
+ if ( rc )
+ return fail(" Second domain create failure: %d - %s\n",
+ errno, strerror(errno));
+
+ rc = xc_domain_setmaxmem(xch, other_domid, -1);
+ if ( rc )
+ {
+ fail(" Failed to set maxmem for second domain: %d - %s\n",
+ errno, strerror(errno));
+ goto destroy_other;
+ }
+
+ rc = xc_domain_populate_physmap_exact(
+ xch, other_domid, ARRAY_SIZE(other_ram), CLAIM_TEST_ORDER,
+ memflags, other_ram);
+ if ( rc == 0 )
+ fail(" Expected %s claim to block second-domain allocation\n",
+ host_wide_claim ? "host-wide" : "node");
+
+ destroy_other:
+ rc = xc_domain_destroy(xch, other_domid);
+ if ( rc )
+ return fail(" Failed to destroy second domain: %d - %s\n",
+ errno, strerror(errno));
+ }
/*
- * Allocate 2M of RAM to the domain. This should be deducted from global
- * claim.
+ * Allocate one CLAIM_TEST_ORDER chunk to the domain. This should reduce
+ * the outstanding claim by request_pages. For node claims, request memory
+ * from the claimed node.
*/
xen_pfn_t ram[] = { 0 };
rc = xc_domain_populate_physmap_exact(
- xch, domid, ARRAY_SIZE(ram), 9 /* Order 2M */, 0, ram);
+ xch, domid, ARRAY_SIZE(ram), CLAIM_TEST_ORDER,
+ host_wide_claim ? 0 : XENMEMF_node(claim_test_node), ram);
if ( rc )
return fail(" Failed to populate physmap domain: %d - %s\n",
errno, strerror(errno));
@@ -132,9 +315,9 @@ static void run_tests(void)
return fail(" Failed to obtain physinfo: %d - %s\n",
errno, strerror(errno));
- if ( physinfo.outstanding_pages != MB_PAGES(2) )
- return fail(" Expected claim to be 2M, got %"PRIu64" pages\n",
- physinfo.outstanding_pages);
+ if ( physinfo.outstanding_pages != claim_pages - request_pages )
+ return fail(" Expected claim to be %lu pages, got %"PRIu64" pages\n",
+ claim_pages - request_pages, physinfo.outstanding_pages);
/*
* Destroying the domain should release the outstanding 2M claim.
@@ -161,6 +344,8 @@ static void run_tests(void)
int main(int argc, char **argv)
{
int rc;
+ unsigned int num_nodes = 0;
+ xc_meminfo_t *meminfo = NULL;
printf("Memory claims tests\n");
@@ -169,14 +354,60 @@ int main(int argc, char **argv)
if ( !xch )
err(1, "xc_interface_open");
- run_tests();
+ rc = xc_numainfo(xch, &num_nodes, NULL, NULL);
+ if ( rc || !num_nodes )
+ err(1, "xc_numainfo");
+
+ meminfo = calloc(num_nodes, sizeof(*meminfo));
+ if ( !meminfo )
+ err(1, "calloc");
- if ( domid != DOMID_INVALID )
+ rc = xc_numainfo(xch, &num_nodes, meminfo, NULL);
+ if ( rc )
+ err(1, "xc_numainfo");
+
+ claim_test_node = 0;
+ for ( unsigned int i = 1; i < num_nodes; i++ )
{
- rc = xc_domain_destroy(xch, domid);
- if ( rc )
- fail(" Failed to destroy domain: %d - %s\n",
- errno, strerror(errno));
+ if ( meminfo[i].memfree > meminfo[claim_test_node].memfree )
+ claim_test_node = i;
+ }
+
+ free(meminfo);
+
+ struct {
+ claim_fn_t fn;
+ const char *name;
+ bool host_wide;
+ } tests[] = {
+ {
+ .fn = wrap_claim_pages,
+ .name = "xc_domain_claim_pages",
+ .host_wide = true,
+ },
+ {
+ .fn = wrap_claim_memory,
+ .name = "xc_domain_claim_memory",
+ .host_wide = true,
+ },
+ {
+ .fn = wrap_claim_memory_node,
+ .name = "xc_domain_claim_memory_node",
+ .host_wide = false,
+ },
+ };
+ size_t num_tests = sizeof(tests) / sizeof(tests[0]);
+ for ( size_t i = 0; i < num_tests; i++ )
+ {
+ run_test(tests[i].fn, tests[i].name, tests[i].host_wide);
+ if ( domid != DOMID_INVALID )
+ {
+ rc = xc_domain_destroy(xch, domid);
+ if ( rc )
+ fail(" Failed to destroy domain: %d - %s\n",
+ errno, strerror(errno));
+ domid = DOMID_INVALID;
+ }
}
return !!nr_failures;
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v4 10/10] docs/guest-guide: document the memory claim hypercalls
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
` (8 preceding siblings ...)
2026-02-26 14:29 ` [PATCH v4 09/10] tools/tests: Update the claims test to test claim_memory hypercall Bernhard Kaindl
@ 2026-02-26 14:29 ` Bernhard Kaindl
2026-03-04 16:07 ` [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Jan Beulich
10 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 14:29 UTC (permalink / raw)
To: xen-devel
Cc: Bernhard Kaindl, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini
Add guest-guide documentation for Xen’s memory-claim mechanism and
the two hypercalls for it to the docs:
- The legacy XENMEM_claim_pages (only for global host-wide claims)
- The new XEN_DOMCTL_claim_memory which adds NUMA-aware claims
Also document the implementation of claims in the hypervisor.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
.readthedocs.yaml | 13 +-
docs/conf.py | 6 +-
.../dom/DOMCTL_claim_memory-classes.mmd | 51 +++++++
.../dom/DOMCTL_claim_memory-seqdia.mmd | 23 ++++
.../dom/DOMCTL_claim_memory-workflow.mmd | 23 ++++
docs/guest-guide/dom/DOMCTL_claim_memory.rst | 125 ++++++++++++++++++
docs/guest-guide/dom/index.rst | 14 ++
docs/guest-guide/index.rst | 23 ++++
docs/guest-guide/mem/XENMEM_claim_pages.rst | 68 ++++++++++
docs/guest-guide/mem/index.rst | 12 ++
docs/hypervisor-guide/index.rst | 5 +
docs/hypervisor-guide/mm/claims.rst | 114 ++++++++++++++++
docs/hypervisor-guide/mm/index.rst | 10 ++
13 files changed, 485 insertions(+), 2 deletions(-)
create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory-classes.mmd
create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory-seqdia.mmd
create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory-workflow.mmd
create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory.rst
create mode 100644 docs/guest-guide/dom/index.rst
create mode 100644 docs/guest-guide/mem/XENMEM_claim_pages.rst
create mode 100644 docs/guest-guide/mem/index.rst
create mode 100644 docs/hypervisor-guide/mm/claims.rst
create mode 100644 docs/hypervisor-guide/mm/index.rst
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
index d3aff7662ebf..3be7334c7527 100644
--- a/.readthedocs.yaml
+++ b/.readthedocs.yaml
@@ -8,11 +8,22 @@ build:
tools:
python: "latest"
+ nodejs: "20"
jobs:
post_install:
+ # Required for rendering the mermaid diagrams in the offline
+ # documentation (PDF & ePub) formats.
+ - npm install -g @mermaid-js/mermaid-cli
# Instead of needing a separate requirements.txt
- - python -m pip install --upgrade --no-cache-dir sphinx-rtd-theme
+ - >
+ python -m pip install --upgrade --no-cache-dir sphinx-rtd-theme
+ sphinxcontrib-mermaid
sphinx:
configuration: docs/conf.py
+
+# Build PDF & ePub
+formats:
+ - epub
+ - pdf
\ No newline at end of file
diff --git a/docs/conf.py b/docs/conf.py
index 2fb8bafe6589..9316202d3318 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -61,7 +61,11 @@ needs_sphinx = '1.4'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
-extensions = []
+extensions = ['sphinxcontrib.mermaid']
+
+mermaid_init_js = """
+mermaid.initialize({ theme: 'Neo', startOnLoad: true });
+"""
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
diff --git a/docs/guest-guide/dom/DOMCTL_claim_memory-classes.mmd b/docs/guest-guide/dom/DOMCTL_claim_memory-classes.mmd
new file mode 100644
index 000000000000..1406a4919442
--- /dev/null
+++ b/docs/guest-guide/dom/DOMCTL_claim_memory-classes.mmd
@@ -0,0 +1,51 @@
+%% SPDX-License-Identifier: CC-BY-4.0
+classDiagram
+
+class xen_domctl {
+ +uint32_t cmd
+ +uint32_t interface_version
+ +uint32_t domain
+ +xen_domctl_claim_memory
+}
+
+class xen_domctl_claim_memory {
+ +memory_claim_t* claims
+ +uint32_t nr_claims
+ +uint32_t pad
+}
+
+class memory_claim_t {
+ +uint64_aligned_t pages
+ +uint32_t node
+ +uint32_t pad
+}
+
+class xc_domain_claim_memory["xc_domain_claim_memory()"] {
+ +xc_interface* xch
+ +uint32_t domid
+ +uint32_t nr_claims
+ +memory_claim_t* claims
+}
+
+class page_alloc_globals["xen/common/page_alloc.c"] {
+ +unsigned long outstanding_claims
+ +unsigned long node_outstanding_claims[]
+}
+
+class claim["DOMCTL_claim_memory"] {
+ +int claim_memory(d, uinfo)
+ +int domain_set_outstanding_pages(d, pages, node)
+}
+
+class domain["struct domain"] {
+ +unsigned_int outstanding_pages
+ +nodeid_t claim_node
+}
+
+xen_domctl_claim_memory o--> memory_claim_t
+xen_domctl o--> xen_domctl_claim_memory
+xc_domain_claim_memory ..> xen_domctl : populates
+xc_domain_claim_memory ..> claim : calls via <tt>do_domctl()</tt>
+claim ..> xen_domctl_claim_memory : reads
+claim ..> domain : sets
+claim ..> page_alloc_globals : updates outstanding claims
diff --git a/docs/guest-guide/dom/DOMCTL_claim_memory-seqdia.mmd b/docs/guest-guide/dom/DOMCTL_claim_memory-seqdia.mmd
new file mode 100644
index 000000000000..05d688c59f13
--- /dev/null
+++ b/docs/guest-guide/dom/DOMCTL_claim_memory-seqdia.mmd
@@ -0,0 +1,23 @@
+%% SPDX-License-Identifier: CC-BY-4.0
+sequenceDiagram
+
+actor DomainBuilder
+participant OcamlStub as OCaml stub for<br>xc_domain<br>claim_memory
+participant Libxc as xc_domain<br>claim_memory
+participant Domctl as XEN_DOMCTL<br>claim_memory
+#participant DomainLogic as claim_memory
+participant Alloc as domain<br>set<br>outstanding_pages
+
+DomainBuilder->>OcamlStub: claims
+OcamlStub->>OcamlStub: marshall claims -----> OCaml to C
+OcamlStub->>Libxc: claims
+
+Libxc->>Domctl: do_domctl
+
+Domctl->>Domctl: copy_from_guest(claim)
+Domctl->>Domctl: validate claim
+Domctl->>Alloc: set<br>outstanding_pages
+Alloc-->>Domctl: result
+Domctl-->>Libxc: rc
+Libxc-->>OcamlStub: rc
+OcamlStub-->>DomainBuilder: claim_result
\ No newline at end of file
diff --git a/docs/guest-guide/dom/DOMCTL_claim_memory-workflow.mmd b/docs/guest-guide/dom/DOMCTL_claim_memory-workflow.mmd
new file mode 100644
index 000000000000..372f2bb7a616
--- /dev/null
+++ b/docs/guest-guide/dom/DOMCTL_claim_memory-workflow.mmd
@@ -0,0 +1,23 @@
+%% SPDX-License-Identifier: CC-BY-4.0
+sequenceDiagram
+
+participant Toolstack
+participant Xen
+participant NUMA Node memory
+
+Toolstack->>Xen: XEN_DOMCTL_createdomain
+Toolstack->>Xen: XEN_DOMCTL_max_mem(max_pages)
+
+Toolstack->>Xen: XEN_DOMCTL_claim_memory(pages, node)
+Xen->>NUMA Node memory: Claim pages on node
+Xen-->>Toolstack: Claim granted
+
+Toolstack->>Xen: XEN_DOMCTL_set_nodeaffinity(node)
+
+loop Populate domain memory
+ Toolstack->>Xen: XENMEM_populate_physmap(memflags:node)
+ Xen->>NUMA Node memory: alloc from claimed node
+end
+
+Toolstack->>Xen: XEN_DOMCTL_claim_memory(0, NO_NODE)
+Xen-->>Toolstack: Remaining claims released
diff --git a/docs/guest-guide/dom/DOMCTL_claim_memory.rst b/docs/guest-guide/dom/DOMCTL_claim_memory.rst
new file mode 100644
index 000000000000..8be37585f02a
--- /dev/null
+++ b/docs/guest-guide/dom/DOMCTL_claim_memory.rst
@@ -0,0 +1,125 @@
+.. SPDX-License-Identifier: CC-BY-4.0
+.. _XEN_DOMCTL_claim_memory:
+
+XEN_DOMCTL_claim_memory
+=======================
+
+This **domctl** command allows a privileged domain to stake a memory claim for
+a domain identical to :ref:`XENMEM_claim_pages`, but with support for
+NUMA-aware memory claims.
+
+A claim entry with a node value of ``XEN_DOMCTL_CLAIM_MEMORY_NO_NODE`` stakes
+a claim for host memory, exactly like :ref:`XENMEM_claim_pages` does.
+
+NUMA-aware memory claims
+------------------------
+
+Memory locality is an important factor for performance in NUMA systems.
+Allocating memory close to the CPU that will use it can reduce latency
+and improve overall performance.
+
+By claiming memory on specific NUMA nodes, toolstacks can ensure that they
+will be able to allocate memory for the domain on those nodes. This is
+particularly beneficial for workloads that are sensitive to memory latency,
+such as in-memory databases.
+
+**Note:** The ABI supports multiple claims for future expansion. At the moment,
+Xen accepts a single claim entry (either a NUMA-aware or host-wide claim).
+
+Implementation notes
+--------------------
+
+As described in :ref:`XENMEM_claim_pages`, Xen keeps track of the number
+of claimed pages in the domain's ``d->outstanding_pages`` counter.
+
+Xen declares a NUMA-aware claim by assigning ``d->claim_node`` to a NUMA node,
+which declares that ``d->outstanding_pages`` is claimed on ``d->claim_node``.
+
+See :ref:`hypervisor-guide` > :ref:`memory_management` > :ref:`memory_claims`
+for more details on the implementation on the details how claims are handled
+by the buddy allocator, and how a toolstack can populate the memory of a domain
+from the claimed node, even if it needs to wait for scrubbing to complete.
+
+Used functions & data structures
+--------------------------------
+
+This diagram illustrates the key functions and data structures involved in the
+implementation of the ``domctl`` hypercall command ``XEN_DOMCTL_claim_memory``:
+
+.. mermaid:: DOMCTL_claim_memory-classes.mmd
+ :caption: Diagram: Function and data relationships of XEN_DOMCTL_claim_memory
+
+Call sequence diagram
+---------------------
+
+The following sequence diagram illustrates the call flow for claiming memory
+for a domain using this hypercall command from an OCaml toolstack:
+
+.. mermaid:: DOMCTL_claim_memory-seqdia.mmd
+ :caption: Sequence diagram: Call flow for claiming memory for a domain
+
+Claim workflow
+--------------
+
+The following diagram illustrates a workflow for claiming and populating memory:
+
+.. mermaid:: DOMCTL_claim_memory-workflow.mmd
+ :caption: Workflow diagram: Claiming and populating memory for a domain
+
+API example (libxc)
+-------------------
+The following example demonstrates how a toolstack can claim memory before
+building the domain and then releasing the claim once the memory population
+is complete.
+
+Note: ``memory_claim_t`` contains padding to allow for future expansion.
+Thus, the structure must be zero-initialised to ensure forward compatibility.
+This can be achieved by using the ``XEN_NODE_CLAIM_INIT`` macro, which sets the
+pages and node fields while zero-initialising the padding of the structure,
+zero-initialising the entire structure, or by using a compound literal with
+designated initialisers to set the pages and node fields while zero-initialising
+the padding of the structure.
+
+.. code-block:: C
+
+ #include <xenctrl.h>
+
+ int claim_guest_memory(xc_interface *xch, uint32_t domid,
+ uint64_t pages)
+ {
+ memory_claim_t claim[] = {
+ /*
+ * Example 1:
+ * Uses the ``XEN_NODE_CLAIM_INIT`` macro to zero-initialise the padding
+ * and set the pages and node fields for a NUMA-aware claim on node 0.
+ */
+ XEN_NODE_CLAIM_INIT(pages, 0) /* Claim memory on NUMA node 0 */
+ };
+
+ /* Claim memory from NUMA node 0 for the domain build. */
+ return xc_domain_claim_memory(xch, domid, 1, claim);
+ }
+
+ int release_claim(xc_interface *xch, uint32_t domid)
+ {
+ memory_claim_t claim[] = {
+ /*
+ * Example 2:
+ * Uses a compound literal with designated initialisers to set the
+ * fields to release the claim while zero-initialising the rest
+ * of the structure for forward compatibility.
+ */
+ (memory_claim_t){
+ /*
+ * pages == 0 releases any outstanding claim.
+ * The node field is not used in this case, but must be set to
+ * XEN_DOMCTL_CLAIM_MEMORY_NO_NODE for forward compatibility.
+ */
+ .pages = 0,
+ .node = XEN_DOMCTL_CLAIM_MEMORY_NO_NODE,
+ }
+ };
+
+ /* Release any remaining claim once population is done. */
+ return xc_domain_claim_memory(xch, domid, 1, claim);
+ }
diff --git a/docs/guest-guide/dom/index.rst b/docs/guest-guide/dom/index.rst
new file mode 100644
index 000000000000..445ccf599047
--- /dev/null
+++ b/docs/guest-guide/dom/index.rst
@@ -0,0 +1,14 @@
+.. SPDX-License-Identifier: CC-BY-4.0
+
+Domctl Hypercall
+================
+
+Through domctl hypercalls, toolstacks in privileged domains can perform
+operations related to domain management. This includes operations such as
+creating, destroying, and modifying domains, as well as querying domain
+information.
+
+.. toctree::
+ :maxdepth: 2
+
+ DOMCTL_claim_memory
diff --git a/docs/guest-guide/index.rst b/docs/guest-guide/index.rst
index 5455c67479cf..d9611cd7504d 100644
--- a/docs/guest-guide/index.rst
+++ b/docs/guest-guide/index.rst
@@ -3,6 +3,29 @@
Guest documentation
===================
+Xen exposes a set of hypercalls that allow domains and toolstacks in
+privileged contexts (such as Dom0) to request services from the hypervisor.
+
+Through these hypercalls, privileged domains can perform privileged operations
+such as querying system information, memory and domain management,
+and enabling inter-domain communication via shared memory and event channels.
+
+These hypercalls are documented in the following sections, grouped by their
+functionality. Each section provides an overview of the hypercalls, their
+parameters, and examples of how to use them.
+
+Hypercall API documentation
+---------------------------
+
+.. toctree::
+ :maxdepth: 2
+
+ dom/index
+ mem/index
+
+Hypercall ABI documentation
+---------------------------
+
.. toctree::
:maxdepth: 2
diff --git a/docs/guest-guide/mem/XENMEM_claim_pages.rst b/docs/guest-guide/mem/XENMEM_claim_pages.rst
new file mode 100644
index 000000000000..7d465d2a87fe
--- /dev/null
+++ b/docs/guest-guide/mem/XENMEM_claim_pages.rst
@@ -0,0 +1,68 @@
+.. SPDX-License-Identifier: CC-BY-4.0
+.. _XENMEM_claim_pages:
+
+XENMEM_claim_pages
+==================
+
+This **xenmem** command allows a privileged guest to stake a memory claim for a
+domain, identical to :ref:`XEN_DOMCTL_claim_memory`, but without support for
+NUMA-aware memory claims.
+
+Memory claims in Xen
+--------------------
+
+The Xen hypervisor maintains a counter of outstanding pages for each domain
+which maintains a number of pages claimed, but not allocated for that domain.
+
+If the outstanding pages counter is zero, this hypercall allows a privileged
+guest to stake a claim for a specified number of pages of system memory for the
+domain.
+
+If the claim is successful, Xen updates the domain's outstanding pages counter
+to reflect the new claim, Xen allocates from the pool of claimed memory only
+for allocations for domains with a claim for this memory.
+
+A domain builder (toolstack in a privileged domain) building the domain can then
+allocate the guest memory for the domain, which converts the outstanding claim
+into actual memory of the new domain, backed by physical pages.
+
+Note that the resulting claim is relative to the already allocated pages for the
+domain, so the **pages** argument of this hypercall is absolute and must
+correspond to the total number expected to be allocated for the domain,
+and not incremental to the already allocated pages.
+
+Memory allocations by Xen for the domain also consume the claim, so toolstacks
+should stake a claim that is larger than the guest memory requirement to
+account for Xen's own memory usage. The exact amount of extra memory required
+depends on the configuration and features used by the domain, the host
+architecture and the features enabled by the Xen hypervisor on the host.
+
+Life-cycle of a claim
+---------------------
+
+The Domain's maximum memory limit must be set prior to staking a claim as
+the sum of the already allocated pages and the claim must be within that limit.
+
+To release the claim after the domain build is complete, call this hypercall
+command with the pages argument set to zero. This releases any remaining claim.
+`libxenguest` does this after the guest memory has been allocated for the domain
+and Xen does this also when it kills the domain.
+
+API example (libxc)
+-------------------
+The following example demonstrates how a toolstack can claim memory before
+building the domain and then releasing the claim once the memory population
+is complete.
+
+.. code-block:: C
+
+ #include <xenctrl.h>
+ ...
+ /* Claim memory for the domain build. */
+ int ret = xc_domain_claim_pages(xch, domid, nr_pages);
+
+ /* Build the domain and allocate memory for it. */
+ ...
+
+ /* Release any remaining claim after populating the domain memory. */
+ int ret = xc_domain_claim_pages(xch, domid, 0);
diff --git a/docs/guest-guide/mem/index.rst b/docs/guest-guide/mem/index.rst
new file mode 100644
index 000000000000..dabd1fd0153e
--- /dev/null
+++ b/docs/guest-guide/mem/index.rst
@@ -0,0 +1,12 @@
+.. SPDX-License-Identifier: CC-BY-4.0
+
+Memctl Hypercall
+================
+
+The memctl hypercall interface allows guests to perform various control
+operations related to memory management.
+
+.. toctree::
+ :maxdepth: 2
+
+ XENMEM_claim_pages
diff --git a/docs/hypervisor-guide/index.rst b/docs/hypervisor-guide/index.rst
index 520fe01554ab..fef35a1ac4fe 100644
--- a/docs/hypervisor-guide/index.rst
+++ b/docs/hypervisor-guide/index.rst
@@ -1,12 +1,17 @@
.. SPDX-License-Identifier: CC-BY-4.0
+.. _hypervisor-guide:
Hypervisor documentation
========================
+See :ref:`memory_claims` for more details on the implementation of the claims
+mechanism in the Hypervisor and its interaction with the buddy allocator.
+
.. toctree::
:maxdepth: 2
code-coverage
+ mm/index
x86/index
arm/index
\ No newline at end of file
diff --git a/docs/hypervisor-guide/mm/claims.rst b/docs/hypervisor-guide/mm/claims.rst
new file mode 100644
index 000000000000..97eb8a68fb1e
--- /dev/null
+++ b/docs/hypervisor-guide/mm/claims.rst
@@ -0,0 +1,114 @@
+.. SPDX-License-Identifier: CC-BY-4.0
+.. _memory_claims:
+
+Memory Claims
+=============
+
+Overview
+--------
+
+Xen's page allocator supports a **claims** mechanism that allows a domain
+builder to reserve memory before allocation begins, preventing concurrent
+allocations from exhausting available pages mid-build.
+A claim can be global (host-wide) or target a specific NUMA node, ensuring
+that a domain's memory is allocated locally on the same node as its vCPUs.
+
+The host-wide claims check subtracts global claims from total available pages.
+If the domain has claims, its ``d->outstanding_pages`` are added back as
+available (simplified pseudo-code):
+
+.. code:: C
+
+ ASSERT(spin_is_locked(&heap_lock));
+ unsigned long global_avail = total_avail_pages - outstanding_claims
+ + d->outstanding_pages;
+ return alloc_request <= global_avail;
+
+Similarly, the per-node check enforces node-level claims by subtracting
+outstanding node claims from available node pages, and adding back the domain's
+claim if allocating from the claimed node:
+
+.. code:: C
+
+ ASSERT(spin_is_locked(&heap_lock));
+ unsigned long avail = node_avail_pages(node)
+ - node_outstanding_claims(node)
+ + (node == d->claim_node ? d->outstanding_pages : 0);
+ return alloc_request <= avail;
+
+Simplified pseudo-code for the claims checks in the buddy allocator:
+
+.. code:: C
+
+ struct page_info *get_free_buddy(order, memflags, d) {
+ for ( ; ; ) {
+ node = preferred_node_or_next_node();
+ if (!node_allocatable_request(d, memflags, 1 << order, node))
+ goto try_next_node;
+ /* Find a zone on this node with a suitable buddy */
+ for (zone = highest_zone; zone >= lowest_zone; zone--)
+ for (j = order; j <= MAX_ORDER; j++)
+ if (pg = remove_head(&heap(node, zone, j)))
+ return pg;
+ try_next_node:
+ if (req_node != NUMA_NO_NODE && memflags & MEMF_exact_node)
+ return NULL;
+ /* Fall back to the next node and repeat. */
+ }
+ }
+
+ struct page_info *alloc_heap_pages(d, order, memflags) {
+ if (!host_allocatable_request(d, memflags, 1 << order))
+ return NULL;
+ pg = get_free_buddy(order, memflags, d);
+ if (!pg) /* Retry allowing unscrubbed pages */
+ pg = get_free_buddy(order, memflags|MEMF_no_scrub, d);
+ if (!pg)
+ return NULL;
+ if (pg has dirty pages)
+ scrub_dirty_pages(pg);
+ return pg;
+ }
+
+.. note:: The first ``get_free_buddy()`` pass skips unscrubbed pages and may
+ fall back to other nodes. With ``memflags & MEMF_exact_node``, no fallback
+ occurs, so the first pass may return ``NULL``.
+ The 2nd pass with ``MEMF_no_scrub`` will consider the unscrubbed pages.
+ ``alloc_heap_pages()`` then scrubs them before returning, guaranteeing the
+ domain gets the desired node-local pages even when scrubbing is pending.
+
+ Therefore, toolstacks should set ``MEMF_exact_node`` in ``memflags`` when
+ allocating for a domain with a NUMA-aware claim to with
+ ``XENMEMF_exact_node(node)``.
+
+ For efficient scrubbing, toolstacks might want to run domain builds
+ pinned on a CPU of the target NUMA node to scrub the pages on that node
+ without cross-node traffic and lower latency to speed up domain build.
+
+Data Structures
+---------------
+
+The following diagram shows the relationships between global, per-node,
+and per-domain claim counters, all protected by the global ``heap_lock``.
+
+.. mermaid::
+
+ graph TB
+ subgraph "Protected by the heap_lock"
+ direction TB
+ Global --Sum of--> Per-node
+ Per-node --Sum of--> Per-domain
+ end
+ subgraph Per-domain
+ direction LR
+ claim_node["d->claim_node"]
+ claim_node --claims on--> outstanding_pages["d->outstanding_pages"]
+ end
+ subgraph Per-node
+ direction LR
+ node_outstanding_claims--constrains-->node_avail_pages
+ end
+ subgraph Global
+ direction LR
+ outstanding_claims--constrains-->total_avail_pages
+ end
diff --git a/docs/hypervisor-guide/mm/index.rst b/docs/hypervisor-guide/mm/index.rst
new file mode 100644
index 000000000000..9b5d60e3181a
--- /dev/null
+++ b/docs/hypervisor-guide/mm/index.rst
@@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: CC-BY-4.0
+.. _memory_management:
+
+Memory Management
+=================
+
+.. toctree::
+ :maxdepth: 2
+
+ claims
--
2.39.5
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-02-26 14:29 ` [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness Bernhard Kaindl
@ 2026-02-26 21:19 ` Teddy Astie
2026-02-26 23:16 ` Bernhard Kaindl
2026-03-05 11:31 ` Jan Beulich
` (2 subsequent siblings)
3 siblings, 1 reply; 35+ messages in thread
From: Teddy Astie @ 2026-02-26 21:19 UTC (permalink / raw)
To: Bernhard Kaindl, xen-devel
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
Daniel P. Smith
Le 26/02/2026 à 15:54, Bernhard Kaindl a écrit :
> Add a DOMCTL handler for claiming memory with NUMA awareness. It
> rejects claims when LLC coloring (does not support claims) is enabled
> and translates the public constant to the internal NUMA_NO_NODE.
>
> The request is forwarded to domain_set_outstanding_pages() for the
> actual claim processing. The handler uses the same XSM hook as the
> legacy XENMEM_claim_pages hypercall.
>
> While the underlying infrastructure currently supports only a single
> claim, the public hypercall interface is designed to be extensible for
> multiple claims in the future without breaking the API.
>
I'm not sure about the idea of introducing a new hypercall for this
operation. Though I may be missing some context about the reasons of
introducing a new hypercall.
XENMEM_claim_pages doesn't have actual support for NUMA, but the
hypercall interface seems to define it (e.g you can pass
XENMEMF_exact_node(n) to mem_flags). Would it be preferable instead to
make XENMEM_claim_pages aware of NUMA-related XENMEMF flags ?
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
> ---
> xen/common/domain.c | 29 ++++++++++++++++++++++++++++
> xen/common/domctl.c | 9 +++++++++
> xen/include/public/domctl.h | 38 +++++++++++++++++++++++++++++++++++++
> xen/include/xen/domain.h | 2 ++
> 4 files changed, 78 insertions(+)
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index e7861259a2b3..ac1b091f5574 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -268,6 +268,35 @@ int get_domain_state(struct xen_domctl_get_domain_state *info, struct domain *d,
> return rc;
> }
>
> +/* Claim memory for a domain or reset the claim */
> +int claim_memory(struct domain *d, const struct xen_domctl_claim_memory *uinfo)
> +{
> + memory_claim_t claim;
> +
> + /* alloc_color_heap_page() does not handle claims, so reject LLC coloring */
> + if ( llc_coloring_enabled )
> + return -EOPNOTSUPP;
> + /*
> + * We only support single claims at the moment, and if the domain is
> + * dying (d->is_dying is set), its claims have already been released
> + */
> + if ( uinfo->pad || uinfo->nr_claims != 1 || d->is_dying )
> + return -EINVAL;
> +
> + if ( copy_from_guest(&claim, uinfo->claims, 1) )
> + return -EFAULT;
> +
> + if ( claim.pad )
> + return -EINVAL;
> +
> + /* Convert the API tag for a host-wide claim to the NUMA_NO_NODE constant */
> + if ( claim.node == XEN_DOMCTL_CLAIM_MEMORY_NO_NODE )
> + claim.node = NUMA_NO_NODE;
> +
> + /* NB. domain_set_outstanding_pages() has the checks to validate its args */
> + return domain_set_outstanding_pages(d, claim.pages, claim.node);
> +}
> +
> static void __domain_finalise_shutdown(struct domain *d)
> {
> struct vcpu *v;
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 29a7726d32d0..9e858f631aaf 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -868,6 +868,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> ret = get_domain_state(&op->u.get_domain_state, d, &op->domain);
> break;
>
> + case XEN_DOMCTL_claim_memory:
> + /* Use the same XSM hook as XENMEM_claim_pages */
> + ret = xsm_claim_pages(XSM_PRIV, d);
> + if ( ret )
> + break;
> +
> + ret = claim_memory(d, &op->u.claim_memory);
> + break;
> +
> default:
> ret = arch_do_domctl(op, d, u_domctl);
> break;
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 8f6708c0a7cd..610806c8b6e0 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -1276,6 +1276,42 @@ struct xen_domctl_get_domain_state {
> uint64_t unique_id; /* Unique domain identifier. */
> };
>
> +/*
> + * XEN_DOMCTL_claim_memory
> + *
> + * Claim memory for a guest domain. The claimed memory is converted into actual
> + * memory pages by allocating it. Except for the option to pass claims for
> + * multiple NUMA nodes, the semantics are based on host-wide claims as
> + * provided by XENMEM_claim_pages, and are identical for host-wide claims.
> + *
> + * The initial implementation supports a claim for the host or a NUMA node, but
> + * using an array, the API is designed to be extensible to support more claims.
> + */
> +struct xen_memory_claim {
> + uint64_aligned_t pages; /* Amount of pages to be allotted to the domain */
> + uint32_t node; /* NUMA node, or XEN_DOMCTL_CLAIM_MEMORY_NO_NODE for host */
> + uint32_t pad; /* padding for alignment, set to 0 on input */
> +};
> +typedef struct xen_memory_claim memory_claim_t;
> +#define XEN_DOMCTL_CLAIM_MEMORY_NO_NODE 0xFFFFFFFF /* No node: host claim */
> +
> +/* Use XEN_NODE_CLAIM_INIT to initialize a memory_claim_t structure */
> +#define XEN_NODE_CLAIM_INIT(_pages, _node) { \
> + .pages = (_pages), \
> + .node = (_node), \
> + .pad = 0 \
> +}
> +DEFINE_XEN_GUEST_HANDLE(memory_claim_t);
> +
> +struct xen_domctl_claim_memory {
> + /* IN: array of struct xen_memory_claim */
> + XEN_GUEST_HANDLE_64(memory_claim_t) claims;
> + /* IN: number of claims in the claims array handle. See the claims field. */
> + uint32_t nr_claims;
> +#define XEN_DOMCTL_MAX_CLAIMS UINT8_MAX /* More claims require changes in Xen */
> + uint32_t pad; /* padding for alignment, set it to 0 */
> +};
> +
> struct xen_domctl {
> /* Stable domctl ops: interface_version is required to be 0. */
> uint32_t cmd;
> @@ -1368,6 +1404,7 @@ struct xen_domctl {
> #define XEN_DOMCTL_gsi_permission 88
> #define XEN_DOMCTL_set_llc_colors 89
> #define XEN_DOMCTL_get_domain_state 90 /* stable interface */
> +#define XEN_DOMCTL_claim_memory 91
> #define XEN_DOMCTL_gdbsx_guestmemio 1000
> #define XEN_DOMCTL_gdbsx_pausevcpu 1001
> #define XEN_DOMCTL_gdbsx_unpausevcpu 1002
> @@ -1436,6 +1473,7 @@ struct xen_domctl {
> #endif
> struct xen_domctl_set_llc_colors set_llc_colors;
> struct xen_domctl_get_domain_state get_domain_state;
> + struct xen_domctl_claim_memory claim_memory;
> uint8_t pad[128];
> } u;
> };
> diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
> index 93c0fd00c1d7..79e8932c4530 100644
> --- a/xen/include/xen/domain.h
> +++ b/xen/include/xen/domain.h
> @@ -193,4 +193,6 @@ extern bool vmtrace_available;
>
> extern bool vpmu_is_available;
>
> +int claim_memory(struct domain *d, const struct xen_domctl_claim_memory *uinfo);
> +
> #endif /* __XEN_DOMAIN_H__ */
Teddy
--
| Vates
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-02-26 21:19 ` Teddy Astie
@ 2026-02-26 23:16 ` Bernhard Kaindl
2026-02-27 9:39 ` Teddy Astie
0 siblings, 1 reply; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-26 23:16 UTC (permalink / raw)
To: Teddy Astie, xen-devel@lists.xenproject.org
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monne, Stefano Stabellini,
Daniel P. Smith
On 26/02/2026 à 22:19, Teddy Astie a écrit :
> Le 26/02/2026 à 15:54, Bernhard Kaindl a écrit :
>> Add a DOMCTL handler for claiming memory with NUMA awareness. It
>> rejects claims when LLC coloring (does not support claims) is enabled
>> and translates the public constant to the internal NUMA_NO_NODE.
>>
>> The request is forwarded to domain_set_outstanding_pages() for the
>> actual claim processing. The handler uses the same XSM hook as the
>> legacy XENMEM_claim_pages hypercall.
>>
>> While the underlying infrastructure currently supports only a single
>> claim, the public hypercall interface is designed to be extensible for
>> multiple claims in the future without breaking the API.
> I'm not sure about the idea of introducing a new hypercall for this
> operation. Though I may be missing some context about the reasons of
> introducing a new hypercall.
>
> XENMEM_claim_pages doesn't have actual support for NUMA, but the
> hypercall interface seems to define it (e.g you can pass
> XENMEMF_exact_node(n) to mem_flags). Would it be preferable instead to
> make XENMEM_claim_pages aware of NUMA-related XENMEMF flags ?
Hello Teddy,
Thank you for your review — much appreciated.
Updating the do_memory_op(XENMEM_claim_pages) handler to accept a node
parameter, as you suggested, is indeed a practical way to retrofit this
feature into existing Xen builds. That’s also the approach we took in
v1 of this series:
* https://lists.xenproject.org/archives/html/xen-devel/2025-03/msg01127.html
* https://patchew.org/Xen/20250314172502.53498-1-alejandro.vallejo@cloud.com/
We are currently using this approach also in the XS9 Public Preview:
* https://www.xenserver.com/downloads/xs9-preview
That said, during review, Roger Pau Monné suggested that for upstream
inclusion, we should introduce a new hypercall API with support for
multi-node claims, even if the initial infrastructure only handles
a single node. See:
* https://lists.xenproject.org/archives/html/xen-devel/2025-06/msg00484.html
He raised the concern that the current interface effectively constrains
domains to be allocated from one node at a time, or to sequence claims
across nodes, which undermines the purpose of claims.
Instead, he proposed that the hypercall interface would ideally allow
making multi-node claims atomically, rather than requiring multiple
calls with rollback in case of failure.
I favour Roger’s position as well: I think we should aim for a clean
and extensible interface that supports claims across multiple nodes
in a single call. Otherwise, we risk having to introduce yet another
hypercall later when a real-world scenario requires multi-node claims.
On the implementation side, a reliable first-come, first-served mechanism
for multi-node claims will require serialisation in the central claim path.
Currently, the global heap_lock provides that protection, and it would
naturally cover the creation of a multi-node claim under a single lock,
ensuring atomicity and consistent behaviour.
Thanks again for the review and feedback!
Best regards / Bien cordialement / Saludos / Liebe Grüße,
With warm greetings from Vienna/Austria,
Bernhard
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-02-26 23:16 ` Bernhard Kaindl
@ 2026-02-27 9:39 ` Teddy Astie
2026-02-27 18:16 ` Bernhard Kaindl
0 siblings, 1 reply; 35+ messages in thread
From: Teddy Astie @ 2026-02-27 9:39 UTC (permalink / raw)
To: Bernhard Kaindl, xen-devel
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monne, Stefano Stabellini,
Daniel P. Smith
Le 27/02/2026 à 00:21, Bernhard Kaindl a écrit :
> On 26/02/2026 à 22:19, Teddy Astie a écrit :
>> Le 26/02/2026 à 15:54, Bernhard Kaindl a écrit :
>>> Add a DOMCTL handler for claiming memory with NUMA awareness. It
>>> rejects claims when LLC coloring (does not support claims) is enabled
>>> and translates the public constant to the internal NUMA_NO_NODE.
>>>
>>> The request is forwarded to domain_set_outstanding_pages() for the
>>> actual claim processing. The handler uses the same XSM hook as the
>>> legacy XENMEM_claim_pages hypercall.
>>>
>>> While the underlying infrastructure currently supports only a single
>>> claim, the public hypercall interface is designed to be extensible for
>>> multiple claims in the future without breaking the API.
>> I'm not sure about the idea of introducing a new hypercall for this
>> operation. Though I may be missing some context about the reasons of
>> introducing a new hypercall.
>>
>> XENMEM_claim_pages doesn't have actual support for NUMA, but the
>> hypercall interface seems to define it (e.g you can pass
>> XENMEMF_exact_node(n) to mem_flags). Would it be preferable instead to
>> make XENMEM_claim_pages aware of NUMA-related XENMEMF flags ?
>
> Hello Teddy,
>
> Thank you for your review — much appreciated.
>
> Updating the do_memory_op(XENMEM_claim_pages) handler to accept a node
> parameter, as you suggested, is indeed a practical way to retrofit this
> feature into existing Xen builds. That’s also the approach we took in
> v1 of this series:
>
> * https://lists.xenproject.org/archives/html/xen-devel/2025-03/msg01127.html
> * https://patchew.org/Xen/20250314172502.53498-1-alejandro.vallejo@cloud.com/
>
> We are currently using this approach also in the XS9 Public Preview:
>
> * https://www.xenserver.com/downloads/xs9-preview
>
> That said, during review, Roger Pau Monné suggested that for upstream
> inclusion, we should introduce a new hypercall API with support for
> multi-node claims, even if the initial infrastructure only handles
> a single node. See:
>
> * https://lists.xenproject.org/archives/html/xen-devel/2025-06/msg00484.html
>
> He raised the concern that the current interface effectively constrains
> domains to be allocated from one node at a time, or to sequence claims
> across nodes, which undermines the purpose of claims.
>
> Instead, he proposed that the hypercall interface would ideally allow
> making multi-node claims atomically, rather than requiring multiple
> calls with rollback in case of failure.
>
> I favour Roger’s position as well: I think we should aim for a clean
> and extensible interface that supports claims across multiple nodes
> in a single call. Otherwise, we risk having to introduce yet another
> hypercall later when a real-world scenario requires multi-node claims.
>
> On the implementation side, a reliable first-come, first-served mechanism
> for multi-node claims will require serialisation in the central claim path.
> Currently, the global heap_lock provides that protection, and it would
> naturally cover the creation of a multi-node claim under a single lock,
> ensuring atomicity and consistent behaviour.
>
Ok thanks.
Should we state that the old interface is "deprecated" (somehow), and
that people should take a look at XEN_DOMCTL_claim_memory instead,
especially if they need a NUMA-aware interface ?
That could be a note on the XENMEM_claim_memory hypercall.
> Thanks again for the review and feedback!
> > Best regards / Bien cordialement / Saludos / Liebe Grüße,
>
> With warm greetings from Vienna/Austria,
> Bernhard
Teddy
--
Teddy Astie | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-02-27 9:39 ` Teddy Astie
@ 2026-02-27 18:16 ` Bernhard Kaindl
0 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-02-27 18:16 UTC (permalink / raw)
To: Teddy Astie, xen-devel@lists.xenproject.org
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monne, Stefano Stabellini,
Daniel P. Smith
> Should we state that the old interface is "deprecated" (somehow), and that
> people should take a look at XEN_DOMCTL_claim_memory instead, especially if
> they need a NUMA-aware interface ?
> That could be a note on the XENMEM_claim_memory hypercall.
Yes. People looking at the then obsolete XENMEM_claim_pages interface
should be referred to the new hypercall using such note.
In preparation for a follow-up, I appended an initial patch to add such
Notes (maybe also in libxc, memory_op and OCaml bindings) to refer people
to the new hypercall interface.
Best, Bernhard
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -569,6 +569,15 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
* for 10, only 7 additional pages are claimed.
*
* Caller must be privileged or the hypercall fails.
+ *
+ * Note: This hypercall is deprecated by introducing XEN_DOMCTL_claim_memory
+ * which provides the same claim semantics described above, and thus can be
+ * used as drop-in replacement and is extended for NUMA-node-specific claims.
+ * This hypercall should not be used by new code.
+ *
+ * See the following documentation pages for more information:
+ * docs/guest-guide/dom/DOMCTL_claim_memory.rst
+ * docs/guest-guide/mem/XENMEM_claim_pages.rst
*/
#define XENMEM_claim_pages 24
--- a/docs/guest-guide/mem/XENMEM_claim_pages.rst
+++ b/docs/guest-guide/mem/XENMEM_claim_pages.rst
@@ -5,8 +5,9 @@ XENMEM_claim_pages
==================
This **xenmem** command allows a privileged guest to stake a memory claim for a
-domain, identical to :ref:`XEN_DOMCTL_claim_memory`, but without support for
-NUMA-aware memory claims.
+domain, identical to :ref:`XEN_DOMCTL_claim_memory`, which is extended for
+NUMA-aware claims. XENMEM_claim_pages should not be used for new code and is
+deprecated. :ref:`XEN_DOMCTL_claim_memory` provides the same claims semantics.
Memory claims in Xen
--------------------
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
` (9 preceding siblings ...)
2026-02-26 14:29 ` [PATCH v4 10/10] docs/guest-guide: document the memory claim hypercalls Bernhard Kaindl
@ 2026-03-04 16:07 ` Jan Beulich
2026-03-04 17:27 ` Bernhard Kaindl
10 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2026-03-04 16:07 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, Daniel P. Smith,
Juergen Gross, Christian Lindig, David Scott, xen-devel
On 26.02.2026 15:29, Bernhard Kaindl wrote:
> Credits:
>
> - Alejandro Vallejo developed the initial version
> - Roger Pau Monné updated the implementation and upstreamed key improvements
> - Marcus Granado contributed analysis and suggestions during development
Despite any of this, ...
> - Bernhard Kaindl developed the new domctl API, extended tests and documentation
> and developed the refactored handler for consuming claims on allocation.
>
> Comments and feedback welcome.
>
> Bernhard Kaindl (10):
> xen/page_alloc: Extract code for consuming claims into inline function
> xen/page_alloc: Optimize getting per-NUMA-node free page counts
> xen/page_alloc: Implement NUMA-node-specific claims
> xen/page_alloc: Consolidate per-node counters into avail[] array
> xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
> xsm/flask: Add XEN_DOMCTL_claim_memory to flask
> tools/lib/ctrl/xc: Add xc_domain_claim_memory() to libxenctrl
> tools/ocaml/libs/xc: add OCaml domain_claim_memory binding
> tools/tests: Update the claims test to test claim_memory hypercall
> docs/guest-guide: document the memory claim hypercalls
... only a single patch has an S-o-b other than yours. Is this a correct
representation of authorship?
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function
2026-02-26 14:29 ` [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function Bernhard Kaindl
@ 2026-03-04 16:20 ` Jan Beulich
2026-03-04 18:04 ` Bernhard Kaindl
2026-03-05 8:21 ` Roger Pau Monné
1 sibling, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2026-03-04 16:20 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, xen-devel
On 26.02.2026 15:29, Bernhard Kaindl wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -518,6 +518,34 @@ unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
> return d->tot_pages;
> }
>
> +/* Release outstanding claims on the domain, host and later also node */
> +static inline
Generally we prefer to avoid "inline" in .c files. This is better left to the
compiler. Furthermore while we have a few examples of this kind of line split,
it's clearly not the preferred form. You'll find ample well-formed static
functions in this one source file alone.
> +void release_outstanding_claims(struct domain *d, unsigned long release)
> +{
> + ASSERT(spin_is_locked(&heap_lock));
> + BUG_ON(outstanding_claims < release);
> + outstanding_claims -= release;
> + d->outstanding_pages -= release;
> +}
> +
> +/*
> + * Consume outstanding claimed pages when allocating pages for a domain.
> + * NB. The alloc could (in principle) fail in assign_pages() afterwards. In that
> + * case, the consumption is not reversed, but as claims are used only during
> + * domain build and d is destroyed if the build fails, this has no significance.
> + */
> +static inline
> +void consume_outstanding_claims(struct domain *d, unsigned long allocation)
> +{
> + if ( !d || !d->outstanding_pages )
> + return;
> + ASSERT(spin_is_locked(&heap_lock));
Why is this not the first thing in the function?
> @@ -1048,29 +1075,8 @@ static struct page_info *alloc_heap_pages(
> total_avail_pages -= request;
> ASSERT(total_avail_pages >= 0);
>
> - if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
> - {
> - /*
> - * Adjust claims in the same locked region where total_avail_pages is
> - * adjusted, not doing so would lead to a window where the amount of
> - * free memory (avail - claimed) would be incorrect.
> - *
> - * Note that by adjusting the claimed amount here it's possible for
> - * pages to fail to be assigned to the claiming domain while already
> - * having been subtracted from d->outstanding_pages. Such claimed
> - * amount is then lost, as the pages that fail to be assigned to the
> - * domain are freed without replenishing the claim. This is fine given
> - * claims are only to be used during physmap population as part of
> - * domain build, and any failure in assign_pages() there will result in
> - * the domain being destroyed before creation is finished. Losing part
> - * of the claim makes no difference.
> - */
Much of this comment is lost. Parts have been moved, but I think another part
(in particular the first paragraph) wants to be retained here. Plus in general
when rearranging code it is best to take the original commentary as is (typo
or factual corrections of course included as necessary).
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 02/10] xen/page_alloc: Optimize getting per-NUMA-node free page counts
2026-02-26 14:29 ` [PATCH v4 02/10] xen/page_alloc: Optimize getting per-NUMA-node free page counts Bernhard Kaindl
@ 2026-03-04 16:31 ` Jan Beulich
2026-03-04 18:21 ` Bernhard Kaindl
0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2026-03-04 16:31 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, xen-devel
On 26.02.2026 15:29, Bernhard Kaindl wrote:
> From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>
> Add per-node free page counters (node_avail_pages[]), protected by
> heap_lock, updated in real-time in lockstep with total_avail_pages
> as pages are allocated and freed.
>
> This replaces the avail_heap_pages() loop over all online nodes and
> zones in avail_node_heap_pages() with a direct O(1) array lookup,
> making it efficient to get the total free pages for a given NUMA node.
>
> The per-node counts are currently provided using sysctl for NUMA
> placement decisions of domain builders and monitoring, and for
> debugging with the debug-key 'u' to print NUMA info to the printk buffer.
>
> They will also be used for checking if a NUMA node may be able to
> satisfy a NUMA-node-specific allocation by comparing node availability
> against node-specific claims before looking for pages in the zones
> of the node.
>
> Also change total_avail_pages and outstanding_claims to unsigned long:
>
> Those never become negative (we protect that with ASSERT/BUG_ON already),
> and converting them to unsigned long makes that explicit, and also
> fixes signed/unsigned comparison warnings.
This wants to be a separate commit. It hasn't got anything to do in here.
> This only needs moving the ASSERT to before the subtraction.
> See the previous commit moving the BUG_ON for outstanding_claims.
Please can you avoid such statements? You won't know in which order the
patches are committed: Patch 01 may go in weeks or months before patch
02.
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -483,11 +483,32 @@ static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
>
> static unsigned long node_need_scrub[MAX_NUMNODES];
>
> +/* avail[node][zone] is the number of free pages on that node and zone. */
> static unsigned long *avail[MAX_NUMNODES];
> -static long total_avail_pages;
> +/* Global available pages, updated in real-time, protected by heap_lock */
> +static unsigned long total_avail_pages;
>
> +/* The global heap lock, protecting access to the heap and related structures */
> static DEFINE_SPINLOCK(heap_lock);
> -static long outstanding_claims; /* total outstanding claims by all domains */
> +
> +/*
> + * Per-node count of available pages, protected by heap_lock, updated in
> + * lockstep with total_avail_pages as pages are allocated and freed.
> + *
> + * Each entry holds the sum of avail[node][zone] across all zones, used for
> + * efficiently checking node-local availability for allocation requests.
> + * Also provided via sysctl for NUMA placement decisions of domain builders
> + * and monitoring, and logged with debug-key 'u' for NUMA debugging.
> + *
> + * Maintaining this under heap_lock does not reduce scalability, as the
> + * allocator is already serialized on it. The accessor macro abstracts the
> + * storage to ease future changes (e.g. moving to per-node lock granularity).
> + */
> +#define node_avail_pages(node) (node_avail_pages[node])
This isn't really needed when ...
> +static unsigned long node_avail_pages[MAX_NUMNODES];
... it's a static array anyway. Plus you may want to talk to Andrew regarding
the use of such a macro as an lvalue.
> +/* total outstanding claims by all domains */
> +static unsigned long outstanding_claims;
As you touch it, comment style wants correcting.
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains
2026-03-04 16:07 ` [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Jan Beulich
@ 2026-03-04 17:27 ` Bernhard Kaindl
0 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-03-04 17:27 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini, Daniel P. Smith,
Juergen Gross, Christian Lindig, David Scott,
xen-devel@lists.xenproject.org
Jan Beulich wrote:
> > - Alejandro Vallejo developed the initial version
> > - Roger Pau Monné updated the implementation and upstreamed key improvements
> > - Marcus Granado contributed analysis and suggestions during development
>
> Despite any of this, ...
[...]
> ... only a single patch has an S-o-b other than yours. Is this a correct
> representation of authorship?
Patch 3 should have S-o-bs by Roger and Alejandro in the commit, I will fix it.
Thanks for the catch,
Bernhard
PS: Details of the patches:
I'll also add Requested-by: Roger Pau Monné to the hypercall he requested to
implement for this series.
Here is the breakdown of contributions:
1. xen/page_alloc: Extract code for consuming claims into inline function
- By me as preparation to avoid duplicated code.
2. xen/page_alloc: Optimize getting per-NUMA-node free page counts
- Has S-o-b by Alejandro,
- Extracted into a separate patch for more focussed review
- Use node_avail_pages[node] also for avail_node_heap_pages(node)
- Use unsigned (to be factored into a separate commit per your review)
3. xen/page_alloc: Implement NUMA-node-specific claims
- Thanks for the catch, I will the fix Suggested-by to S-o-b's.
4. xen/page_alloc: Consolidate per-node counters into avail[] array
- I'll remove it from the series, skip its review.
Not needed, and it missed initializing nodes without any memory.
5. xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
6. xsm/flask: Add XEN_DOMCTL_claim_memory to flask
7. tools/lib/ctrl/xc: Add xc_domain_claim_memory() to libxenctrl
8. tools/ocaml/libs/xc: add OCaml domain_claim_memory binding
9. tools/tests: Update the claims test to test claim_memory hypercall
10. docs/guest-guide: document the memory claim hypercalls
- These are the patches for the new hypercall interface requested by Roger,
I'll add a Requested-by: Roger Pau Monné to the API interface patches.
They are of course based Xen code, but not on patches of somebody else.
Bernhard
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function
2026-03-04 16:20 ` Jan Beulich
@ 2026-03-04 18:04 ` Bernhard Kaindl
0 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-03-04 18:04 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini,
xen-devel@lists.xenproject.org
> > +static inline
>
> Generally we prefer to avoid "inline" in .c files. This is better left to the
> compiler. Furthermore while we have a few examples of this kind of line split,
> it's clearly not the preferred form. You'll find ample well-formed static
> functions in this one source file alone.
Ok, I will look for the preferred form.
> > +void consume_outstanding_claims(struct domain *d, unsigned long allocation)
> > +{
> > + if ( !d || !d->outstanding_pages )
> > + return;
> > + ASSERT(spin_is_locked(&heap_lock));
>
> Why is this not the first thing in the function?
Thanks, will move it up.
> > @@ -1048,29 +1075,8 @@ static struct page_info *alloc_heap_pages(
> > total_avail_pages -= request;
> > ASSERT(total_avail_pages >= 0);
> >
> > - if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
> > - {
> > - /*
> > - * Adjust claims in the same locked region where total_avail_pages
[...]
>
> Much of this comment is lost. Parts have been moved, but I think another part
> (in particular the first paragraph) wants to be retained here. Plus in general
> when rearranging code it is best to take the original commentary as is (typo
> or factual corrections of course included as necessary).
Ack, thanks, indeed, it is a good idea to keep this in place to inform readers
of the importance of having claims release and avail counter updates in the
same locked region. I'll retain the fist paragraph here and maybe only move
the 2nd part of the comment out of the alloc_heap_pages code flow.
Bernhard
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v4 02/10] xen/page_alloc: Optimize getting per-NUMA-node free page counts
2026-03-04 16:31 ` Jan Beulich
@ 2026-03-04 18:21 ` Bernhard Kaindl
2026-03-05 7:22 ` Jan Beulich
0 siblings, 1 reply; 35+ messages in thread
From: Bernhard Kaindl @ 2026-03-04 18:21 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini,
xen-devel@lists.xenproject.org
Jan Beulich <jbeulich@suse.com> wrote:
> On 26.02.2026 15:29, Bernhard Kaindl wrote:
> >
> > Also change total_avail_pages and outstanding_claims to unsigned long:
> >
> > Those never become negative (we protect that with ASSERT/BUG_ON already),
> > and converting them to unsigned long makes that explicit, and also
> > fixes signed/unsigned comparison warnings.
>
> This wants to be a separate commit. It hasn't got anything to do in here.
Ok.
> > This only needs moving the ASSERT to before the subtraction.
> > See the previous commit moving the BUG_ON for outstanding_claims.
>
> Please can you avoid such statements? You won't know in which order the
> patches are committed: Patch 01 may go in weeks or months before patch
> 02.
Thanks, ok, will remove.
- NB. I do think the first 3 commits should best be applied in one go.
> > +#define node_avail_pages(node) (node_avail_pages[node])
>
> This isn't really needed when ...
>
> > +static unsigned long node_avail_pages[MAX_NUMNODES];
>
> ... it's a static array anyway. Plus you may want to talk to Andrew regarding
> the use of such a macro as an lvalue.
Ok. It was only a controlled, local accessor in this file to support moving
it to another storage variable, but I'll omit the accessor macro(s) then.
> > +/* total outstanding claims by all domains */
> > +static unsigned long outstanding_claims;
>
> As you touch it, comment style wants correcting.
I guess you mean to uppercase the 1st letter of the comment. Will do.
Bernhard
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 02/10] xen/page_alloc: Optimize getting per-NUMA-node free page counts
2026-03-04 18:21 ` Bernhard Kaindl
@ 2026-03-05 7:22 ` Jan Beulich
0 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2026-03-05 7:22 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini,
xen-devel@lists.xenproject.org
On 04.03.2026 19:21, Bernhard Kaindl wrote:
> Jan Beulich <jbeulich@suse.com> wrote:
>> On 26.02.2026 15:29, Bernhard Kaindl wrote:
>>> This only needs moving the ASSERT to before the subtraction.
>>> See the previous commit moving the BUG_ON for outstanding_claims.
>>
>> Please can you avoid such statements? You won't know in which order the
>> patches are committed: Patch 01 may go in weeks or months before patch
>> 02.
>
> Thanks, ok, will remove.
>
> - NB. I do think the first 3 commits should best be applied in one go.
Such would want stating in the cover letter and in all affected patches
(outside of the commit message area of course). Preferably with a
reason (it's not quite clear to me, I have to admit, but then I also
haven't looked at patch 3 so far).
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function
2026-02-26 14:29 ` [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function Bernhard Kaindl
2026-03-04 16:20 ` Jan Beulich
@ 2026-03-05 8:21 ` Roger Pau Monné
1 sibling, 0 replies; 35+ messages in thread
From: Roger Pau Monné @ 2026-03-05 8:21 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Stefano Stabellini
On Thu, Feb 26, 2026 at 02:29:15PM +0000, Bernhard Kaindl wrote:
> Refactor the claims consumption code in preparation for node-claims.
> Lays the groundwork for adding the consumption of NUMA claims to it.
>
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
> ---
> xen/common/page_alloc.c | 56 +++++++++++++++++++++++------------------
> 1 file changed, 31 insertions(+), 25 deletions(-)
>
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 588b5b99cbc7..6f7f30c64605 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -518,6 +518,34 @@ unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
> return d->tot_pages;
> }
>
> +/* Release outstanding claims on the domain, host and later also node */
> +static inline
> +void release_outstanding_claims(struct domain *d, unsigned long release)
> +{
> + ASSERT(spin_is_locked(&heap_lock));
> + BUG_ON(outstanding_claims < release);
> + outstanding_claims -= release;
> + d->outstanding_pages -= release;
> +}
> +
> +/*
> + * Consume outstanding claimed pages when allocating pages for a domain.
> + * NB. The alloc could (in principle) fail in assign_pages() afterwards. In that
> + * case, the consumption is not reversed, but as claims are used only during
> + * domain build and d is destroyed if the build fails, this has no significance.
> + */
> +static inline
> +void consume_outstanding_claims(struct domain *d, unsigned long allocation)
> +{
> + if ( !d || !d->outstanding_pages )
> + return;
> + ASSERT(spin_is_locked(&heap_lock));
> +
> + /* Of course, the domain can only release up its outstanding claims */
> + allocation = min(allocation, d->outstanding_pages + 0UL);
> + release_outstanding_claims(d, allocation);
> +}
> +
> int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
> {
> int ret = -ENOMEM;
> @@ -535,8 +563,7 @@ int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
> /* pages==0 means "unset" the claim. */
> if ( pages == 0 )
> {
> - outstanding_claims -= d->outstanding_pages;
> - d->outstanding_pages = 0;
> + release_outstanding_claims(d, d->outstanding_pages);
> ret = 0;
> goto out;
> }
> @@ -1048,29 +1075,8 @@ static struct page_info *alloc_heap_pages(
> total_avail_pages -= request;
> ASSERT(total_avail_pages >= 0);
>
> - if ( d && d->outstanding_pages && !(memflags & MEMF_no_refcount) )
> - {
> - /*
> - * Adjust claims in the same locked region where total_avail_pages is
> - * adjusted, not doing so would lead to a window where the amount of
> - * free memory (avail - claimed) would be incorrect.
As Jan mentioned, you really need to keep this part of the comment.
Claims had been broken since its introduction because the above was
not respected, and that resulted in the accounting for free pages
being transiently incorrect while an allocation was taking place.
Thanks, Roger.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims
2026-02-26 14:29 ` [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims Bernhard Kaindl
@ 2026-03-05 10:53 ` Jan Beulich
2026-03-05 13:12 ` Bernhard Kaindl
0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2026-03-05 10:53 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, Marcus Granado,
xen-devel, Alejandro Vallejo
On 26.02.2026 15:29, Bernhard Kaindl wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -944,6 +944,7 @@ struct domain *domain_create(domid_t domid,
> spin_lock_init(&d->node_affinity_lock);
> d->node_affinity = NODE_MASK_ALL;
> d->auto_node_affinity = 1;
> + d->claim_node = NUMA_NO_NODE;
If, as the cover letter says, the new domctl is going to allow claiming from
multiple nodes in one go, why would this new field still be necessary?
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -488,7 +488,10 @@ static unsigned long *avail[MAX_NUMNODES];
> /* Global available pages, updated in real-time, protected by heap_lock */
> static unsigned long total_avail_pages;
>
> -/* The global heap lock, protecting access to the heap and related structures */
> +/*
> + * The global heap lock, protecting access to the heap and related structures
> + * It protects the heap and claims, d->outstanding_pages and d->claim_node
> + */
> static DEFINE_SPINLOCK(heap_lock);
Nit: Comment style.
> @@ -510,6 +513,71 @@ static unsigned long node_avail_pages[MAX_NUMNODES];
> /* total outstanding claims by all domains */
> static unsigned long outstanding_claims;
>
> +/*
> + * Per-node accessor for outstanding claims, protected by heap_lock, updated
> + * in lockstep with the global outstanding_claims and d->outstanding_pages
> + * in domain_set_outstanding_pages() and release_outstanding_claims().
> + *
> + * node_outstanding_claims(node) is used to determine the outstanding claims on
> + * a node, which are subtracted from the node's available pages to determine if
> + * a request can be satisfied without violating the node's memory availability.
> + */
> +#define node_outstanding_claims(node) (node_outstanding_claims[node])
See the comment on the earlier patch regarding such a wrapper.
> +/* total outstanding claims by all domains on node */
> +static unsigned long node_outstanding_claims[MAX_NUMNODES];
How come this is being added, rather than it replacing outstanding_claims?
> +/* Return available pages after subtracting claimed pages */
> +static inline unsigned long available_after_claims(unsigned long avail_pages,
> + unsigned long claims)
> +{
> + BUG_ON(claims > avail_pages);
> + return avail_pages - claims; /* Due to the BUG_ON, it cannot be negative */
> +}
A helper for a simple subtraction?
> +/* Answer if host-level memory and claims permit this request to proceed */
> +static inline bool host_allocatable_request(const struct domain *d,
> + unsigned int memflags,
> + unsigned long request)
> +{
> + unsigned long allocatable_pages;
> +
> + ASSERT(spin_is_locked(&heap_lock));
> +
> + allocatable_pages = available_after_claims(total_avail_pages,
> + outstanding_claims);
> + if ( allocatable_pages >= request )
> + return true; /* The not claimed pages are enough to proceed */
> +
> + if ( !d || (memflags & MEMF_no_refcount) )
> + return false; /* Claims are not available for this allocation */
> +
> + /* The domain's claims are available, return true if sufficient */
> + return request <= allocatable_pages + d->outstanding_pages;
> +}
This only uses variables which existed before, i.e. there's nothing NUMA-ish
in here. What's the deal?
> +/* Answer if node-level memory and claims permit this request to proceed */
> +static inline bool node_allocatable_request(const struct domain *d,
> + unsigned int memflags,
> + unsigned long request,
> + nodeid_t node)
> +{
> + unsigned long allocatable_pages;
> +
> + ASSERT(spin_is_locked(&heap_lock));
> + ASSERT(node < MAX_NUMNODES);
> +
> + allocatable_pages = available_after_claims(node_avail_pages(node),
> + node_outstanding_claims(node));
> + if ( allocatable_pages >= request )
> + return true; /* The not claimed pages are enough to proceed */
> +
> + if ( !d || (memflags & MEMF_no_refcount) || (node != d->claim_node) )
> + return false; /* Claims are not available for this allocation */
> +
> + /* The domain's claims are available, return true if sufficient */
> + return request <= allocatable_pages + d->outstanding_pages;
> +}
And this is the NUMA counterpart, almost identical in the basic logic. If
(for whatever reason) both are really needed, I think it should at least be
considered to fold them (with NUMA_NO_NODE indicating the non-NUMA intent).
In fact the node != d->claim_node would probably also apply to the non-NUMA
variant (as d->claim_node != NUMA_NO_NODE).
As to the comments in both functions, personally I think
s/not claimed/unclaimed/ would be slightly more logical to follow.
In any event, the first of these function looks like it could be split out
in a separate, earlier patch. Then (as per above) ideally here that function
would simply be extended to become NUMA-capable.
> @@ -539,14 +607,23 @@ unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
> return d->tot_pages;
> }
>
> -/* Release outstanding claims on the domain, host and later also node */
> +/* Release outstanding claims on the domain, host and node */
> static inline
> void release_outstanding_claims(struct domain *d, unsigned long release)
> {
> ASSERT(spin_is_locked(&heap_lock));
> BUG_ON(outstanding_claims < release);
> outstanding_claims -= release;
> +
> + if ( d->claim_node != NUMA_NO_NODE )
> + {
> + BUG_ON(node_outstanding_claims(d->claim_node) < release);
> + node_outstanding_claims(d->claim_node) -= release;
> + }
> d->outstanding_pages -= release;
> +
> + if ( d->outstanding_pages == 0 )
> + d->claim_node = NUMA_NO_NODE; /* Clear if no outstanding pages left */
I fear I don't understand this. If the domain has claims on other nodes,
why would would it be switched back to non-NUMA claims?
> @@ -564,14 +642,41 @@ void consume_outstanding_claims(struct domain *d, unsigned long allocation)
>
> /* Of course, the domain can only release up its outstanding claims */
> allocation = min(allocation, d->outstanding_pages + 0UL);
> +
> + if ( d->claim_node != NUMA_NO_NODE && d->claim_node != alloc_node )
> + {
> + /*
> + * The domain has a claim on a node, but the alloc is on a different
> + * node. If it would exceed the domain's max_pages, reduce the claim
> + * up to the excess over max_pages so we don't reduce the claim more
> + * than we have to to honor the max_pages limit.
> + */
> + unsigned long booked_pages = domain_tot_pages(d) + allocation +
> + d->outstanding_pages;
> + if ( booked_pages <= d->max_pages )
> + return; /* booked is within max_pages, no excess, keep the claim */
> +
> + /* Excess detected, release the exceeding pages from the claimed node */
> + allocation = min(allocation, booked_pages - d->max_pages);
> + }
> release_outstanding_claims(d, allocation);
Please can there be another blank line above this one?
Why is the adjustment made excluded for the NUMA_NO_NODE case? That's odd in
itself, but particularly with release_outstanding_claims() possibly switching a
domain to NUMA_NO_NODE. Plus the caller looks to be passing in the actual node
memory was taken from, not what the original request said (which is specifically
relevant when the request named no particular node).
> }
>
> -int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
> +/*
> + * Update outstanding claims for the domain. Note: The node is passed as an
> + * unsigned int to allow checking for overflow above the uint8_t nodeid_t limit.
> + */
> +int domain_set_outstanding_pages(struct domain *d, unsigned long pages,
> + unsigned int node)
> {
> int ret = -ENOMEM;
> unsigned long claim, avail_pages;
>
> + /* When releasing a claim, the node must be NUMA_NO_NODE (it is not used) */
Why would this be?
> + if ( pages == 0 && node != NUMA_NO_NODE )
> + return -EINVAL;
> + if ( node != NUMA_NO_NODE && (node >= MAX_NUMNODES || !node_online(node)) )
> + return -ENOENT;
> /*
Again, can there please be a blank line after each of the if()s?
> @@ -982,6 +1102,8 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
> }
> } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
>
> + try_next_node:
> + /* If MEMF_exact_node was passed, we may not skip to a different node */
> if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
> return NULL;
As per this, ...
> @@ -1042,13 +1164,8 @@ static struct page_info *alloc_heap_pages(
>
> spin_lock(&heap_lock);
>
> - /*
> - * Claimed memory is considered unavailable unless the request
> - * is made by a domain with sufficient unclaimed pages.
> - */
> - if ( (outstanding_claims + request > total_avail_pages) &&
> - ((memflags & MEMF_no_refcount) ||
> - !d || d->outstanding_pages < request) )
> + /* Proceed if host-level memory and claims permit this request to proceed */
> + if ( !host_allocatable_request(d, memflags, request) )
... in the MEMF_exact_node case I see little reason to check the global value
here.
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-02-26 14:29 ` [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness Bernhard Kaindl
2026-02-26 21:19 ` Teddy Astie
@ 2026-03-05 11:31 ` Jan Beulich
2026-04-14 15:17 ` Bernhard Kaindl
2026-03-05 12:38 ` Roger Pau Monné
2026-03-05 12:44 ` Jan Beulich
3 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2026-03-05 11:31 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, Daniel P. Smith,
xen-devel
On 26.02.2026 15:29, Bernhard Kaindl wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -268,6 +268,35 @@ int get_domain_state(struct xen_domctl_get_domain_state *info, struct domain *d,
> return rc;
> }
>
> +/* Claim memory for a domain or reset the claim */
> +int claim_memory(struct domain *d, const struct xen_domctl_claim_memory *uinfo)
static in domctl.c? Otherwise with Penny's work to make domctl optional this
would be unreachable code.
> +{
> + memory_claim_t claim;
> +
> + /* alloc_color_heap_page() does not handle claims, so reject LLC coloring */
> + if ( llc_coloring_enabled )
> + return -EOPNOTSUPP;
> + /*
> + * We only support single claims at the moment, and if the domain is
> + * dying (d->is_dying is set), its claims have already been released
> + */
> + if ( uinfo->pad || uinfo->nr_claims != 1 || d->is_dying )
> + return -EINVAL;
As already alluded to in reply to patch 03, I can't help the impression that
usage of this sub-op with multiple entries would we quite different (i.e. it
would be not only the implementation in Xen that changes). I'm therefore
pretty uncertain whether taking it with this restriction is going to make
much sense.
> + if ( copy_from_guest(&claim, uinfo->claims, 1) )
> + return -EFAULT;
> +
> + if ( claim.pad )
> + return -EINVAL;
> +
> + /* Convert the API tag for a host-wide claim to the NUMA_NO_NODE constant */
> + if ( claim.node == XEN_DOMCTL_CLAIM_MEMORY_NO_NODE )
> + claim.node = NUMA_NO_NODE;
What about the incoming claim.node being NUMA_NO_NODE? Imo the range checking
the previous patch adds to domain_set_outstanding_pages() wants to move here,
at which point the function's new parameter could be properly nodeid_t.
> + /* NB. domain_set_outstanding_pages() has the checks to validate its args */
> + return domain_set_outstanding_pages(d, claim.pages, claim.node);
> +}
There's no copying back of the result. When this is extended to allow more
than one entry, what's the plan towards dealing with partial success? Needing
to roll back may be unwieldy.
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -1276,6 +1276,42 @@ struct xen_domctl_get_domain_state {
> uint64_t unique_id; /* Unique domain identifier. */
> };
>
> +/*
> + * XEN_DOMCTL_claim_memory
> + *
> + * Claim memory for a guest domain. The claimed memory is converted into actual
> + * memory pages by allocating it. Except for the option to pass claims for
> + * multiple NUMA nodes, the semantics are based on host-wide claims as
> + * provided by XENMEM_claim_pages, and are identical for host-wide claims.
> + *
> + * The initial implementation supports a claim for the host or a NUMA node, but
> + * using an array, the API is designed to be extensible to support more claims.
> + */
> +struct xen_memory_claim {
> + uint64_aligned_t pages; /* Amount of pages to be allotted to the domain */
> + uint32_t node; /* NUMA node, or XEN_DOMCTL_CLAIM_MEMORY_NO_NODE for host */
> + uint32_t pad; /* padding for alignment, set to 0 on input */
This isn't for alignment; it's there to make the padding explicit.
> +};
> +typedef struct xen_memory_claim memory_claim_t;
> +#define XEN_DOMCTL_CLAIM_MEMORY_NO_NODE 0xFFFFFFFF /* No node: host claim */
Misra demands a U suffix here.
"host claim" (in the comment) also is ambiguous. Per-node claims also affect
the host. Maybe "host wide" or "global"?
> +/* Use XEN_NODE_CLAIM_INIT to initialize a memory_claim_t structure */
> +#define XEN_NODE_CLAIM_INIT(_pages, _node) { \
> + .pages = (_pages), \
> + .node = (_node), \
> + .pad = 0 \
> +}
While only a macro, it's still not C89, and hence may wants offering only as
an extension. Also .pad doesn't need explicitly specifying, does it? If you
provide such a macro, identifiers used also need to strictly conform to the
C spec (IOW leading underscores aren't permitted).
> +DEFINE_XEN_GUEST_HANDLE(memory_claim_t);
This wants to move up next to the typedef.
> +struct xen_domctl_claim_memory {
> + /* IN: array of struct xen_memory_claim */
> + XEN_GUEST_HANDLE_64(memory_claim_t) claims;
> + /* IN: number of claims in the claims array handle. See the claims field. */
> + uint32_t nr_claims;
Is repeating the word "claim" necessary / useful here?
> +#define XEN_DOMCTL_MAX_CLAIMS UINT8_MAX /* More claims require changes in Xen */
> + uint32_t pad; /* padding for alignment, set it to 0 */
Same comment as on the other pad field.
> @@ -1368,6 +1404,7 @@ struct xen_domctl {
> #define XEN_DOMCTL_gsi_permission 88
> #define XEN_DOMCTL_set_llc_colors 89
> #define XEN_DOMCTL_get_domain_state 90 /* stable interface */
> +#define XEN_DOMCTL_claim_memory 91
Seeing the adjacent comment, did you consider making this new sub-op a stable one
as well?
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-02-26 14:29 ` [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness Bernhard Kaindl
2026-02-26 21:19 ` Teddy Astie
2026-03-05 11:31 ` Jan Beulich
@ 2026-03-05 12:38 ` Roger Pau Monné
2026-03-05 12:44 ` Jan Beulich
3 siblings, 0 replies; 35+ messages in thread
From: Roger Pau Monné @ 2026-03-05 12:38 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Stefano Stabellini, Daniel P. Smith
On Thu, Feb 26, 2026 at 02:29:19PM +0000, Bernhard Kaindl wrote:
> Add a DOMCTL handler for claiming memory with NUMA awareness. It
> rejects claims when LLC coloring (does not support claims) is enabled
> and translates the public constant to the internal NUMA_NO_NODE.
>
> The request is forwarded to domain_set_outstanding_pages() for the
> actual claim processing. The handler uses the same XSM hook as the
> legacy XENMEM_claim_pages hypercall.
>
> While the underlying infrastructure currently supports only a single
> claim, the public hypercall interface is designed to be extensible for
> multiple claims in the future without breaking the API.
>
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
> ---
> xen/common/domain.c | 29 ++++++++++++++++++++++++++++
> xen/common/domctl.c | 9 +++++++++
> xen/include/public/domctl.h | 38 +++++++++++++++++++++++++++++++++++++
> xen/include/xen/domain.h | 2 ++
> 4 files changed, 78 insertions(+)
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index e7861259a2b3..ac1b091f5574 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -268,6 +268,35 @@ int get_domain_state(struct xen_domctl_get_domain_state *info, struct domain *d,
> return rc;
> }
>
> +/* Claim memory for a domain or reset the claim */
> +int claim_memory(struct domain *d, const struct xen_domctl_claim_memory *uinfo)
> +{
> + memory_claim_t claim;
> +
> + /* alloc_color_heap_page() does not handle claims, so reject LLC coloring */
> + if ( llc_coloring_enabled )
> + return -EOPNOTSUPP;
> + /*
> + * We only support single claims at the moment, and if the domain is
> + * dying (d->is_dying is set), its claims have already been released
> + */
> + if ( uinfo->pad || uinfo->nr_claims != 1 || d->is_dying )
Iff we can move forward with this single node claim implementation,
the return code for uinfo->nr_claims != 1 needs to be -EOPNOTSUPP, to
differentiate the hypervisor doesn't support the operation vs there's
an error in the input parameters. That check needs to moved into
the previous if condition.
If the domain is dying we could also return -ESRCH, so that we can
differentiate the different error paths from the return code of the
hypercall.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-02-26 14:29 ` [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness Bernhard Kaindl
` (2 preceding siblings ...)
2026-03-05 12:38 ` Roger Pau Monné
@ 2026-03-05 12:44 ` Jan Beulich
3 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2026-03-05 12:44 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monné, Stefano Stabellini, Daniel P. Smith,
xen-devel
On 26.02.2026 15:29, Bernhard Kaindl wrote:
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -868,6 +868,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> ret = get_domain_state(&op->u.get_domain_state, d, &op->domain);
> break;
>
> + case XEN_DOMCTL_claim_memory:
> + /* Use the same XSM hook as XENMEM_claim_pages */
> + ret = xsm_claim_pages(XSM_PRIV, d);
> + if ( ret )
> + break;
> +
> + ret = claim_memory(d, &op->u.claim_memory);
> + break;
This needs accompanying by a change to xsm/flask/hooks.c:flask_domctl().
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims
2026-03-05 10:53 ` Jan Beulich
@ 2026-03-05 13:12 ` Bernhard Kaindl
2026-03-05 13:36 ` Jan Beulich
0 siblings, 1 reply; 35+ messages in thread
From: Bernhard Kaindl @ 2026-03-05 13:12 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini, Marcus Granado,
xen-devel@lists.xenproject.org, Alejandro Vallejo
Jan Beulich wrote:
> > + d->claim_node = NUMA_NO_NODE;
>
> If, as the cover letter says, the new domctl is going to allow claiming from
> multiple nodes in one go, why would this new field still be necessary?
Roger requested the domctl API to allow claiming from multiple nodes in one go
and he specified that we should focus on getting the implementation for one
node-specific claim done first before we dive into multi-node claims code.
- Instead of adding/linking an array of claims to struct domain, we can keep
using d->outstanding_pages for the single-node claim.
- There are numerous comments and questions for this minimal implementation.
If we'd add multi-node claims to it, this review may become even more complex.
- The single-node claims backend contains the infrastructure and multi-node
claims would be an extension on top of that infrastructure.
We've a thread where I added the links to the past discussion here:
https://lists.xenproject.org/archives/html/xen-devel/2026-02/msg01403.html
> > +static unsigned long node_outstanding_claims[MAX_NUMNODES];
>
> How come this is being added, rather than it replacing outstanding_claims?
The global outstanding_claims variable counts the host-level claimed pages.
It has the sum of all host-level claims that are not specific to a NUMA node
and also the sum of all node-specific claims (see more on that in an answer
to another question further below).
If we were to replace it, we'd not have the outstanding_claims counter,
which would result in not supporting global claims anymore in Xen.
If a toolstack would want to claim more memory than a single NUMA node
has, someone would have to go loop over all NUMA nodes with enough memory
and split the claim across a number or per-NUMA-node claims.
- This would be less flexible than what we have with it: With it, we can
still support host-level claims without those claims be placed on
specific NUMA nodes.
- This allows for many domains to be built in parallel where some
domains are built with claims specific to specific NUMA nodes, while
allowing the other domains are built dynamically at allocation time
from any remaining memory wherever that memory remains to be available.
- There are use cases where the memory of some domains shall be spread
across NUMA nodes to have the memory bandwidth of those NUMA nodes
available to the individual processes in the guest domain but still
want the assurance of host-level claims for claimed memory when
constructing of many domains in parallel on a host.
> > +/* Return available pages after subtracting claimed pages */
> > +static inline unsigned long available_after_claims(unsigned long
> avail_pages,
> > + unsigned long claims)
> > +{
> > + BUG_ON(claims > avail_pages);
> > + return avail_pages - claims; /* Due to the BUG_ON, it cannot be negative
> */
> > +}
>
> A helper for a simple subtraction?
It is about not having to repeat the BUG_ON(claims > avail_pages) everywhere.
Also, the name of the helper makes clear what the result of the expression is,
So when using it, the flow of the code is more natural to understand.
Having to repeat the BUG_ON() everywhere would make the code less readable.
The BUG_ON() is good when refactoring as a guardrail when you broke the code.
> > +/* Answer if host-level memory and claims permit this request to proceed */
> > +static inline bool host_allocatable_request(const struct domain *d,
> > + unsigned int memflags,
> > + unsigned long request)
> > +{
> > + unsigned long allocatable_pages;
> > +
> > + ASSERT(spin_is_locked(&heap_lock));
> > +
> > + allocatable_pages = available_after_claims(total_avail_pages,
> > + outstanding_claims);
> > + if ( allocatable_pages >= request )
> > + return true; /* The not claimed pages are enough to proceed */
> > +
> > + if ( !d || (memflags & MEMF_no_refcount) )
> > + return false; /* Claims are not available for this allocation */
> > +
> > + /* The domain's claims are available, return true if sufficient */
> > + return request <= allocatable_pages + d->outstanding_pages;
> > +}
>
> This only uses variables which existed before, i.e. there's nothing NUMA-ish
> in here. What's the deal?
The deal is that for taking unclaimed memory beyond the remaining claims
Into account for deciding that the host has usable memory for a domain with
a claim, the needed if-expression would be quite complicated to understand.
When factoring this logic into an if expression without extracting it into
a function, it would bloat flow alloc_heap_pages(), especially if one would
want to have the comments. I'm not sure if this is a good idea.
> > +/* Answer if node-level memory and claims permit this request to proceed */
> > +static inline bool node_allocatable_request(const struct domain *d,
> > + unsigned int memflags,
> > + unsigned long request,
> > + nodeid_t node)
> > +{
> > + unsigned long allocatable_pages;
> > +
> > + ASSERT(spin_is_locked(&heap_lock));
> > + ASSERT(node < MAX_NUMNODES);
> > +
> > + allocatable_pages = available_after_claims(node_avail_pages(node),
> > +
> node_outstanding_claims(node));
> > + if ( allocatable_pages >= request )
> > + return true; /* The not claimed pages are enough to proceed */
> > +
> > + if ( !d || (memflags & MEMF_no_refcount) || (node != d->claim_node) )
> > + return false; /* Claims are not available for this allocation */
> > +
> > + /* The domain's claims are available, return true if sufficient */
> > + return request <= allocatable_pages + d->outstanding_pages;
> > +}
>
> And this is the NUMA counterpart, almost identical in the basic logic. If
Yes, and this is intentional: The same simple check just with per-node claims.
> (for whatever reason) both are really needed, I think it should at least be
The reason is that staking claims is the easy part. Protecting claims is the
(a bit) tricky part, which is what such functions are helping with.
> considered to fold them (with NUMA_NO_NODE indicating the non-NUMA intent).
> In fact the node != d->claim_node would probably also apply to the non-NUMA
> variant (as d->claim_node != NUMA_NO_NODE).
That's not true: The sum of all node-specific claims should also be part
of the global host-level outstanding_claims counter so the host-level check
of an allocation may proceed regarding claims incorporates the check if
node-specific claims would (in theory) allow a host-level alloc to proceed.
That means that for the host-level claims protection check for a domain,
the claims of the domain must be accounted, even if they are node-specific.
Thus, `node != d->claim_node` does not apply if `node != NUMA_NO_NODE`,
Yes, I should merge the functions, but this check is only for node-claims.
> As to the comments in both functions, personally I think
> s/not claimed/unclaimed/ would be slightly more logical to follow.
>
> In any event, the first of these function looks like it could be split out
> in a separate, earlier patch. Then (as per above) ideally here that function
> would simply be extended to become NUMA-capable.
Yes indeed, Good.
> > + if ( d->outstanding_pages == 0 )
> > + d->claim_node = NUMA_NO_NODE; /* Clear if no outstanding pages left
>
> I fear I don't understand this. If the domain has claims on other nodes,
> why would would it be switched back to non-NUMA claims?
( d->outstanding_pages == 0 ) means d has no claim left: None-at-all.
As there is none left, noting is switched, claims are reset to the no-claim state.
> > @@ -564,14 +642,41 @@ void consume_outstanding_claims(struct domain *d,
> unsigned long allocation)
> >
> > /* Of course, the domain can only release up its outstanding claims */
> > allocation = min(allocation, d->outstanding_pages + 0UL);
> > +
> > + if ( d->claim_node != NUMA_NO_NODE && d->claim_node != alloc_node )
> > + {
> > + /*
> > + * The domain has a claim on a node, but the alloc is on a different
> > + * node. If it would exceed the domain's max_pages, reduce the claim
> > + * up to the excess over max_pages so we don't reduce the claim more
> > + * than we have to to honor the max_pages limit.
> > + */
> > + unsigned long booked_pages = domain_tot_pages(d) + allocation +
> > + d->outstanding_pages;
> > + if ( booked_pages <= d->max_pages )
> > + return; /* booked is within max_pages, no excess, keep the claim
> */
> > +
> > + /* Excess detected, release the exceeding pages from the claimed
> node */
> > + allocation = min(allocation, booked_pages - d->max_pages);
> > + }
> > release_outstanding_claims(d, allocation);
>
> Please can there be another blank line above this one?
>
> Why is the adjustment made excluded for the NUMA_NO_NODE case? That's odd in
When the domain's claim is global, the allocation just consume from this claim.
> itself, but particularly with release_outstanding_claims() possibly switching a
> domain to NUMA_NO_NODE.
Commented above, its not switching with zero claims, it just resets the state.
> Plus the caller looks to be passing in the actual node
> memory was taken from, not what the original request said (which is
> specifically relevant when the request named no particular node).
For this check, the target of the initial allocation request does not matter.
What matters here is that for whatever reason when an allocation ends up not
being made on a NUMA node the domain has a node-specific claim on, and this
allocation would exceed the claimed + allocated memory beyond d->max_pages, we
should release and consume the size of claims that would exceed d->max_pages.
There is a bug here that I already fix in my draft for a v5 series: It needs to
Use the size of the full allocation to calculate the excess beyond d->max_pages,
not the min() of the allocation size and d->outstanding_claims. Will be fixed.
> > + /* When releasing a claim, the node must be NUMA_NO_NODE (it is not used) */
>
> Why would this be?
I need to fix the comment: It should say "resetting". When resetting claims to 0,
which is done by libxenguest after it has completed populating the guest memory,
we are not passing a NUMA node and as this function rejects any updates of an
existing claim besides resetting it using 0, the reset shall always apply to all
claims of the domain, independent which claims it has (even multi-node claims).
Roger added this check to make it clear that we expect NUMA_NO_NODE with this call.
> > if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
> > return NULL;
>
> As per this, ...
>
> > @@ -1042,13 +1164,8 @@ static struct page_info *alloc_heap_pages(
> >
> > spin_lock(&heap_lock);
> >
> > - /*
> > - * Claimed memory is considered unavailable unless the request
> > - * is made by a domain with sufficient unclaimed pages.
> > - */
> > - if ( (outstanding_claims + request > total_avail_pages) &&
> > - ((memflags & MEMF_no_refcount) ||
> > - !d || d->outstanding_pages < request) )
> > + /* Proceed if host-level memory and claims permit this request to
> proceed */
> > + if ( !host_allocatable_request(d, memflags, request) )
>
> ... in the MEMF_exact_node case I see little reason to check the global value here.
Ack, if (memflags & MEMF_exact_node), we can skip the host-wide check indeed,
which would be an optimisation as we'd not have to look at the host-level counters.
> Jan
Thanks for this review, I'll apply those changes!
Bernhard
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims
2026-03-05 13:12 ` Bernhard Kaindl
@ 2026-03-05 13:36 ` Jan Beulich
2026-03-05 14:54 ` Bernhard Kaindl
0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2026-03-05 13:36 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini, Marcus Granado,
xen-devel@lists.xenproject.org, Alejandro Vallejo
On 05.03.2026 14:12, Bernhard Kaindl wrote:
> Jan Beulich wrote:
>>> + d->claim_node = NUMA_NO_NODE;
>>
>> If, as the cover letter says, the new domctl is going to allow claiming from
>> multiple nodes in one go, why would this new field still be necessary?
>
> Roger requested the domctl API to allow claiming from multiple nodes in one go
> and he specified that we should focus on getting the implementation for one
> node-specific claim done first before we dive into multi-node claims code.
>
> - Instead of adding/linking an array of claims to struct domain, we can keep
> using d->outstanding_pages for the single-node claim.
>
> - There are numerous comments and questions for this minimal implementation.
> If we'd add multi-node claims to it, this review may become even more complex.
>
> - The single-node claims backend contains the infrastructure and multi-node
> claims would be an extension on top of that infrastructure.
What is at the very least needed is an outline of how multi-node claims are
intended to work. This is because what you do here needs to fit that scheme.
Which in turn I think is going to be difficult when for a domain more memory
is needed than any single node can supply. Hence why I think that you may
not be able to get away with just single-node claims, no matter that this
of course complicates things.
It's also not quite clear to me how multiple successive claims against
distinct nodes would work (which isn't all that different from a multi-node
claim).
Thinking of it, interaction with the existing mem-op also wants clarifying.
Imo only one of the two ought to be usable on a single domain.
>>> +static unsigned long node_outstanding_claims[MAX_NUMNODES];
>>
>> How come this is being added, rather than it replacing outstanding_claims?
>
> The global outstanding_claims variable counts the host-level claimed pages.
>
> It has the sum of all host-level claims that are not specific to a NUMA node
> and also the sum of all node-specific claims (see more on that in an answer
> to another question further below).
>
> If we were to replace it, we'd not have the outstanding_claims counter,
> which would result in not supporting global claims anymore in Xen.
>
> If a toolstack would want to claim more memory than a single NUMA node
> has, someone would have to go loop over all NUMA nodes with enough memory
> and split the claim across a number or per-NUMA-node claims.
>
> - This would be less flexible than what we have with it: With it, we can
> still support host-level claims without those claims be placed on
> specific NUMA nodes.
>
> - This allows for many domains to be built in parallel where some
> domains are built with claims specific to specific NUMA nodes, while
> allowing the other domains are built dynamically at allocation time
> from any remaining memory wherever that memory remains to be available.
>
> - There are use cases where the memory of some domains shall be spread
> across NUMA nodes to have the memory bandwidth of those NUMA nodes
> available to the individual processes in the guest domain but still
> want the assurance of host-level claims for claimed memory when
> constructing of many domains in parallel on a host.
All fine, but why is this written down only in a reply to review comments,
rather than right in the patch description?
>>> +/* Return available pages after subtracting claimed pages */
>>> +static inline unsigned long available_after_claims(unsigned long
>> avail_pages,
>>> + unsigned long claims)
>>> +{
>>> + BUG_ON(claims > avail_pages);
>>> + return avail_pages - claims; /* Due to the BUG_ON, it cannot be negative
>> */
>>> +}
>>
>> A helper for a simple subtraction?
>
> It is about not having to repeat the BUG_ON(claims > avail_pages) everywhere.
Which in turn I should have said I question. Imo this is supposed to be an
ASSERT(), not a BUG_ON().
> Also, the name of the helper makes clear what the result of the expression is,
> So when using it, the flow of the code is more natural to understand.
>
> Having to repeat the BUG_ON() everywhere would make the code less readable.
> The BUG_ON() is good when refactoring as a guardrail when you broke the code.
I'm not quite sure there.
>>> +/* Answer if host-level memory and claims permit this request to proceed */
>>> +static inline bool host_allocatable_request(const struct domain *d,
>>> + unsigned int memflags,
>>> + unsigned long request)
>>> +{
>>> + unsigned long allocatable_pages;
>>> +
>>> + ASSERT(spin_is_locked(&heap_lock));
>>> +
>>> + allocatable_pages = available_after_claims(total_avail_pages,
>>> + outstanding_claims);
>>> + if ( allocatable_pages >= request )
>>> + return true; /* The not claimed pages are enough to proceed */
>>> +
>>> + if ( !d || (memflags & MEMF_no_refcount) )
>>> + return false; /* Claims are not available for this allocation */
>>> +
>>> + /* The domain's claims are available, return true if sufficient */
>>> + return request <= allocatable_pages + d->outstanding_pages;
>>> +}
>>
>> This only uses variables which existed before, i.e. there's nothing NUMA-ish
>> in here. What's the deal?
>
> The deal is that for taking unclaimed memory beyond the remaining claims
> Into account for deciding that the host has usable memory for a domain with
> a claim, the needed if-expression would be quite complicated to understand.
> When factoring this logic into an if expression without extracting it into
> a function, it would bloat flow alloc_heap_pages(), especially if one would
> want to have the comments. I'm not sure if this is a good idea.
I guess I don't really follow: Right here all you do is transform a complex
if() into one that calls this function, with no functional difference. This
function isn't changed by subsequent patches. Hence what's the concern?
That said, I don't mind breaking it out, but as said - as a separate change,
and then with its NUMA counterpart preferably folded in.
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 06/10] xsm/flask: Add XEN_DOMCTL_claim_memory to flask
2026-02-26 14:29 ` [PATCH v4 06/10] xsm/flask: Add XEN_DOMCTL_claim_memory to flask Bernhard Kaindl
@ 2026-03-05 13:42 ` Jan Beulich
0 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2026-03-05 13:42 UTC (permalink / raw)
To: Bernhard Kaindl; +Cc: Daniel P. Smith, Anthony PERARD, xen-devel
On 26.02.2026 15:29, Bernhard Kaindl wrote:
> Add a Flask security policy for the new XEN_DOMCTL_claim_memory hypercall
> introduced in the previous commit. When Flask is enabled, this permission
> controls whether a domain can stake memory claims for another domain.
>
> The permission is granted to:
> - dom0_t: Dom0 needs this to claim memory for guest domains
> - create_domain_common: Domain builders need this during domain creation
>
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
> ---
> tools/flask/policy/modules/dom0.te | 1 +
> tools/flask/policy/modules/xen.if | 1 +
> xen/xsm/flask/hooks.c | 3 +++
> xen/xsm/flask/policy/access_vectors | 2 ++
> 4 files changed, 7 insertions(+)
Oh, here's the missing XSM/Flask change. First - this cannot come after the
introduction of the sub-op. If it can be split and come first, fine. Else it
needs to be folded in.
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -820,6 +820,9 @@ static int cf_check flask_domctl(struct domain *d, unsigned int cmd,
> case XEN_DOMCTL_set_llc_colors:
> return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SET_LLC_COLORS);
>
> + case XEN_DOMCTL_claim_memory:
> + return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__CLAIM_MEMORY);
You don't need two XSM checks, I don't think. As you use xsm_claim_pages(),
all you need to do here should be to add a case label to the "These have
individual XSM hooks (common/domctl.c)" block.
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims
2026-03-05 13:36 ` Jan Beulich
@ 2026-03-05 14:54 ` Bernhard Kaindl
2026-03-05 17:00 ` Jan Beulich
0 siblings, 1 reply; 35+ messages in thread
From: Bernhard Kaindl @ 2026-03-05 14:54 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini, Marcus Granado,
xen-devel@lists.xenproject.org, Alejandro Vallejo
Jan Beulich wrote:
> On 05.03.2026 14:12, Bernhard Kaindl wrote:
> >
> > Roger requested the domctl API to allow claiming from multiple nodes in one go
> > and he specified that we should focus on getting the implementation for one
> > node-specific claim done first before we dive into multi-node claims code.
> >
> > - Instead of adding/linking an array of claims to struct domain, we can keep
> > using d->outstanding_pages for the single-node claim.
> >
> > - There are numerous comments and questions for this minimal implementation.
> > If we'd add multi-node claims to it, this review may become even more complex.
> >
> > - The single-node claims backend contains the infrastructure and multi-node
> > claims would be an extension on top of that infrastructure.
>
> What is at the very least needed is an outline of how multi-node claims are
> intended to work. This is because what you do here needs to fit that scheme.
> Which in turn I think is going to be difficult when for a domain more memory
> is needed than any single node can supply. Hence why I think that you may
> not be able to get away with just single-node claims, no matter that this
> of course complicates things.
>
> It's also not quite clear to me how multiple successive claims against
> distinct nodes would work (which isn't all that different from a multi-node
> claim).
>
> Thinking of it, interaction with the existing mem-op also wants clarifying.
> Imo only one of the two ought to be usable on a single domain.
Yes, correct. As implemented by Xen in domain_set_outstanding_claims(),
Xen claims work very different from something like an allocation:
For example, when you allocate, you get memory, and when you repeat,
you have a bigger allocation.
But Xen claims in domain_set_outstanding_claims() don't work like that:
- When a domain has a claim, domain_set_outstanding_claims() only allows
to reset the claim to 0, nothing else. A second, or changed claim is not
possible. I think this was intentional:
- domain_set_outstanding_claims() rejects increasing/reducing a claim:
A claim is designed to be made by domain build when the size of the
domain is known. There is no tweaking it afterwards: The needed pages
shall be claimed by the domain builder before the domain is built.
Note: The claims are not only consumed when populating guest memory:
Claims are also (at least attempted to be) consumed when Xen needs to
allocate memory for other resources of the domain. For this reason,
the domain builder needs to add some headroom for allocations done by
Xen for creating the domain.
When the domain builder has finished building the domain, it is expected
to reset the claim to release any not consumed headroom it added.
- If a domain already has memory when the domain builder stakes a claim
for completing the build of the domain, the outstanding_claims are set
to the target value of the claim call, minus domain_tot_pages(d), so
already allocated memory does not contribute to a bigger total booking.
For NUMA claims and global host-level claims, it is similar:
A NUMA node-specific claim is implicitly also added to the global
host-level outstanding_claims of the host, as a Node-specific memory
is also part of the host's memory, so the host-level claims protection
does not have to also check for node-specific claims:
The effect of host-level claim is also given when you make a node-level claim.
When a domain one kind of claim, it does not make a lot of sense to then
later add a differently sized claim for another target. Like described in
how domain_set_outstanding_claims() is implemented, a domain builder stakes
a claim once, then builds the domain, then resets it, and that's all to it.
For example, Xapi toolstack and libxenguest have calls to claim memory,
but in any given configuration, only the first actor to claim memory for
a domain is the one who defines the claim: No mixing, changing, updating.
It makes things clear that the initial creator did make the claim.
Similar for multi-node claims:
Roger described how he wants this API do work here:
https://lists.xenproject.org/archives/html/xen-devel/2025-06/msg00484.html
(Before, he said that with multiple calls, it would be awkward, with partial
claims and rollback, and I want to add that would be diametrically counter
the original claims design of not allowing multiple calls)
> Ideally, we would need to introduce a new hypercall that allows making
> claims from multiple nodes in a single locked region, as to ensure
> success or failure in an atomic way.
In the locked region (inside heap_lock), we can check the claims requests
against existing claims and memory of the affected nodes and determine if
the claim call is a go or a no-go. If it is a go, we update all counters
which are all protected by the heap_lock and are done.
There is no partial success or failure. It will be atomic, like Roger asked.
With this, as I understand think I should create a design specification
for how claims are designed in Xen and how the claims design can be
extended to support atomic multi-node claims (without rollbacks/concurrency
issues).
I started describing how Xen implements claims in /docs/hypervisor-guide here:
https://bernhardk-xen-review.readthedocs.io/node-claims/hypervisor-guide/mm/claims.html
I'd add these new clarifications to this description then, I think.
To communicate the plan of how multi-node claims would work,
as described by Roger, I'd suggest I'd add a design document
for multi-node claims, modelled after the Hyperlaunch design
document found in the docs.
Once that design is approved, we should have a clear shared
understanding of them before we'd be looking at implementation.
Bernhard
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims
2026-03-05 14:54 ` Bernhard Kaindl
@ 2026-03-05 17:00 ` Jan Beulich
0 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2026-03-05 17:00 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini, Marcus Granado,
xen-devel@lists.xenproject.org, Alejandro Vallejo
On 05.03.2026 15:54, Bernhard Kaindl wrote:
> Jan Beulich wrote:
>> On 05.03.2026 14:12, Bernhard Kaindl wrote:
>>>
>>> Roger requested the domctl API to allow claiming from multiple nodes in one go
>>> and he specified that we should focus on getting the implementation for one
>>> node-specific claim done first before we dive into multi-node claims code.
>>>
>>> - Instead of adding/linking an array of claims to struct domain, we can keep
>>> using d->outstanding_pages for the single-node claim.
>>>
>>> - There are numerous comments and questions for this minimal implementation.
>>> If we'd add multi-node claims to it, this review may become even more complex.
>>>
>>> - The single-node claims backend contains the infrastructure and multi-node
>>> claims would be an extension on top of that infrastructure.
>>
>> What is at the very least needed is an outline of how multi-node claims are
>> intended to work. This is because what you do here needs to fit that scheme.
>> Which in turn I think is going to be difficult when for a domain more memory
>> is needed than any single node can supply. Hence why I think that you may
>> not be able to get away with just single-node claims, no matter that this
>> of course complicates things.
>>
>> It's also not quite clear to me how multiple successive claims against
>> distinct nodes would work (which isn't all that different from a multi-node
>> claim).
>>
>> Thinking of it, interaction with the existing mem-op also wants clarifying.
>> Imo only one of the two ought to be usable on a single domain.
>
> Yes, correct. As implemented by Xen in domain_set_outstanding_claims(),
> Xen claims work very different from something like an allocation:
>
> For example, when you allocate, you get memory, and when you repeat,
> you have a bigger allocation.
>
> But Xen claims in domain_set_outstanding_claims() don't work like that:
>
> - When a domain has a claim, domain_set_outstanding_claims() only allows
> to reset the claim to 0, nothing else. A second, or changed claim is not
> possible. I think this was intentional:
>
> - domain_set_outstanding_claims() rejects increasing/reducing a claim:
>
> A claim is designed to be made by domain build when the size of the
> domain is known. There is no tweaking it afterwards: The needed pages
> shall be claimed by the domain builder before the domain is built.
>
> Note: The claims are not only consumed when populating guest memory:
> Claims are also (at least attempted to be) consumed when Xen needs to
> allocate memory for other resources of the domain. For this reason,
> the domain builder needs to add some headroom for allocations done by
> Xen for creating the domain.
>
> When the domain builder has finished building the domain, it is expected
> to reset the claim to release any not consumed headroom it added.
>
> - If a domain already has memory when the domain builder stakes a claim
> for completing the build of the domain, the outstanding_claims are set
> to the target value of the claim call, minus domain_tot_pages(d), so
> already allocated memory does not contribute to a bigger total booking.
>
> For NUMA claims and global host-level claims, it is similar:
>
> A NUMA node-specific claim is implicitly also added to the global
> host-level outstanding_claims of the host, as a Node-specific memory
> is also part of the host's memory, so the host-level claims protection
> does not have to also check for node-specific claims:
>
> The effect of host-level claim is also given when you make a node-level claim.
>
> When a domain one kind of claim, it does not make a lot of sense to then
> later add a differently sized claim for another target. Like described in
> how domain_set_outstanding_claims() is implemented, a domain builder stakes
> a claim once, then builds the domain, then resets it, and that's all to it.
>
> For example, Xapi toolstack and libxenguest have calls to claim memory,
> but in any given configuration, only the first actor to claim memory for
> a domain is the one who defines the claim: No mixing, changing, updating.
> It makes things clear that the initial creator did make the claim.
>
> Similar for multi-node claims:
>
> Roger described how he wants this API do work here:
> https://lists.xenproject.org/archives/html/xen-devel/2025-06/msg00484.html
Fits my understanding, but doesn't fit you limiting the new sub-op to a
single node. As said, if you introduce the new sub-op this way, I'd still
expect for a single domain to have claims across multiple nodes, and
that (preferably) whatever the caller does to achieve that will continue
to work once the restriction is lifted.
Yet I can't see you describe such claims-on-multiple-nodes use case in
of your reply above. And indeed to achieve that you'd need data layout
changes, in particular there then couldn't be any single d->claim_node.
>> Ideally, we would need to introduce a new hypercall that allows making
>> claims from multiple nodes in a single locked region, as to ensure
>> success or failure in an atomic way.
>
> In the locked region (inside heap_lock), we can check the claims requests
> against existing claims and memory of the affected nodes and determine if
> the claim call is a go or a no-go. If it is a go, we update all counters
> which are all protected by the heap_lock and are done.
Yet as per above, afaics you don't even have the needed data layout to
record two (or more) claims against distinct nodes.
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-03-05 11:31 ` Jan Beulich
@ 2026-04-14 15:17 ` Bernhard Kaindl
2026-04-16 6:46 ` Jan Beulich
0 siblings, 1 reply; 35+ messages in thread
From: Bernhard Kaindl @ 2026-04-14 15:17 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini, Daniel P. Smith,
xen-devel@lists.xenproject.org
Hi Jan,
I'm sorry for the late reply, I had many TODOs and a vacation meanwhile,
but I'm back with the fixes for the review.
Jan Beulich wrote:
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
[...]
> > +int claim_memory(struct domain *d, [...]
>
> static in domctl.c? Otherwise with Penny's work to make domctl optional this
> would be unreachable code.
Thanks, done: Moved it to domctl.c to be not compiled without MGMT_HYPERCALLS in v5/v6.
> > + if ( uinfo->pad || uinfo->nr_claims != 1 || d->is_dying )
> > + return -EINVAL;
>
> As already alluded to in reply to patch 03, I can't help the impression that
> usage of this sub-op with multiple entries would we quite different (i.e. it
> would be not only the implementation in Xen that changes). I'm therefore
> pretty uncertain whether taking it with this restriction is going to make
> much sense.
I submitted this sub-op to support multiple entries with v5/v6 now.
In v5/v6 these checks are updated to support multiple claims in the claim set.
For clarity, I renamed the .node of the individual claim entries to .target:
The target of a claim entry can also be a selector for a global claim
or a legacy claim and the field have many bits for future use.
This wasn't needed but I think it's clearer that the claim entry specifies a
target which is where the claim entry is aimed at, it's not just only a node.
> + if ( claim.node == XEN_DOMCTL_CLAIM_MEMORY_NO_NODE )
> > + claim.node = NUMA_NO_NODE;
>
> What about the incoming claim.node being NUMA_NO_NODE? Imo the range checking
> the previous patch adds to domain_set_outstanding_pages() wants to move here,
> at which point the function's new parameter could be properly nodeid_t.
nodeid_t and NUMA_NO_NODE have (judging by the existing implementation) are not
exposed in the public API to the control domain.
This separation is probably a good thing because it allows to change Xen internals
like nodeit_t and NUMA_NO_NODE if so desired without changing the public API.
NUMA_NO_NODE is defined as 0xFF and nodeid_t is u8. But that is just an
implementation detail of the Hypervisor itself. If needed, we could change
the implementation like this series could do, if wanted.
The public struct xen_sysctl_numainfo and xen_sysctl_physinfo define num_nodes,
nr_nodes and max_node_id as uint32_t, for example. For type consistency, I opted
to define this public API as uint32_t as well and not expose internal types/values.
> > + return domain_set_outstanding_pages(d, claim.pages, claim.node);
> > +}
>
> There's no copying back of the result. When this is extended to allow more
> than one entry, what's the plan towards dealing with partial success? Needing
> to roll back may be unwieldy.
Roger described the core requirement I'm implementing:
> Ideally, we would need to introduce a new hypercall that allows
> making claims from multiple nodes in a single locked region,
> as to ensure success or failure in an atomic way.
-- Roger Pau Monné
Ref:
https://lists.xenproject.org/archives/html/xen-devel/2025-06/msg00484.html
As a result, we don't need to handle partial successes, so its not needed.
> > +#define XEN_DOMCTL_CLAIM_[...] 0xFFFFFFFF /* No node: host claim */
>
> "host claim" (in the comment) also is ambiguous. Per-node claims also affect
> the host. Maybe "host wide" or "global"?
Thanks for this suggestion! I changed the term used everywhere to "global" in v5/6.
> > +/* Use XEN_NODE_CLAIM_INIT to initialize a memory_claim_t structure */
> > +#define XEN_NODE_CLAIM_INIT(_pages, _node) { \
> > + .pages = (_pages), \
> > + .node = (_node), \
> > + .pad = 0 \
> > +}
>
> While only a macro, it's still not C89, and hence may wants offering only as
> an extension. Also .pad doesn't need explicitly specifying, does it? If you
> provide such a macro, identifiers used also need to strictly conform to the
> C spec (IOW leading underscores aren't permitted).
Thanks, removed as not needed.
> > +DEFINE_XEN_GUEST_HANDLE(memory_claim_t);
>
> This wants to move up next to the typedef.
Thanks, done in v5/v6.
> > + /* IN: number of claims in the claims array handle. See the claims
>
> Is repeating the word "claim" necessary / useful here?
Thanks, fixed.
> > #define XEN_DOMCTL_get_domain_state 90 /* stable interface */
> > +#define XEN_DOMCTL_claim_memory 91
>
> Seeing the adjacent comment, did you consider making this new sub-op a stable
> one as well?
Thanks, I investigated making such change, but I don't think it should be changed:
In short, XEN_DOMCTL_get_domain_state uses a fixed hypercall version of 0 and is
frozen because it needs to be used by a caller that must support multiple Xen
versions. Consequently, libxenctrl, using only the version controlled hypercalls
does not implement this hypercall.
That's not the designed use case of this hypercall:
The designed use is domain builders running in Dom0 which already
need to use the unstable (versioned) interfaces for building domains.
I think that calling this hypercall through libxenctrl like the other
hypercalls the domain builders suit it better. Otherwise, the domain builders
would use a mix of version-controlled and frozen/stable hypercalls, which could
be confusing for API users and for future maintenance.
From the domain builders’ viewpoint, it is more consistent to expose
the claims hypercall in the same way as the other calls they use.
From my viewpoint, such frozen interfaces also have drawbacks: By providing
stable syscalls, Linux needs to maintain the old interface indefinitely, which
can be a maintenance burden and can limit the ability to make improvements or
changes to the interface in the future. Linux carries many syscall successor
families, e.g., oldstat, stat, newstat, stat64, fstatat, statx, with similar
examples including openat, openat2, clone3, dup3, waitid, mmap2, epoll_create1,
pselect6 and many more. Glibc hides that complexity from users by providing a
consistent API, but it still needs to maintain the old system calls for
compatibility. Xen's interface for Dom0 is not an OS kernel syscall interface.
In contrast, the versioned libxenctrl hypercalls allow for more flexibility and
evolution of the API while still providing a clear path to adopt new features.
The reserved fields and reserved bits in the structures of this hypercall allow
for many future extensions without breaking existing callers.
Thanks for your review of the v4 series so far,
and I'm looking forward for everyone's reviews of the v6 series:
[PATCH v2] docs: Draft Design Document for NUMA-aware claim sets
https://lists.xen.org/archives/html/xen-devel/2026-04/msg00569.html
https://patchwork.kernel.org/project/xen-devel/list/?series=1081047
[PATCH v6 0/7] xen/mm: Introduce NUMA-aware claim sets for domains
https://lists.xen.org/archives/html/xen-devel/2026-04/msg00587.html
https://patchwork.kernel.org/project/xen-devel/list/?series=1081139
Thanks, Bernhard
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-04-14 15:17 ` Bernhard Kaindl
@ 2026-04-16 6:46 ` Jan Beulich
2026-04-16 23:48 ` Bernhard Kaindl
0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2026-04-16 6:46 UTC (permalink / raw)
To: Bernhard Kaindl
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini, Daniel P. Smith,
xen-devel@lists.xenproject.org
On 14.04.2026 17:17, Bernhard Kaindl wrote:
> Jan Beulich wrote:
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
> [...]
>>> +int claim_memory(struct domain *d, [...]
>>
>> static in domctl.c? Otherwise with Penny's work to make domctl optional this
>> would be unreachable code.
>
> Thanks, done: Moved it to domctl.c to be not compiled without MGMT_HYPERCALLS in v5/v6.
>
>>> + if ( uinfo->pad || uinfo->nr_claims != 1 || d->is_dying )
>>> + return -EINVAL;
>>
>> As already alluded to in reply to patch 03, I can't help the impression that
>> usage of this sub-op with multiple entries would we quite different (i.e. it
>> would be not only the implementation in Xen that changes). I'm therefore
>> pretty uncertain whether taking it with this restriction is going to make
>> much sense.
>
> I submitted this sub-op to support multiple entries with v5/v6 now.
>
> In v5/v6 these checks are updated to support multiple claims in the claim set.
> For clarity, I renamed the .node of the individual claim entries to .target:
>
> The target of a claim entry can also be a selector for a global claim
> or a legacy claim and the field have many bits for future use.
>
> This wasn't needed but I think it's clearer that the claim entry specifies a
> target which is where the claim entry is aimed at, it's not just only a node.
>
>> + if ( claim.node == XEN_DOMCTL_CLAIM_MEMORY_NO_NODE )
>>> + claim.node = NUMA_NO_NODE;
>>
>> What about the incoming claim.node being NUMA_NO_NODE? Imo the range checking
>> the previous patch adds to domain_set_outstanding_pages() wants to move here,
>> at which point the function's new parameter could be properly nodeid_t.
>
> nodeid_t and NUMA_NO_NODE have (judging by the existing implementation) are not
> exposed in the public API to the control domain.
>
> This separation is probably a good thing because it allows to change Xen internals
> like nodeit_t and NUMA_NO_NODE if so desired without changing the public API.
>
> NUMA_NO_NODE is defined as 0xFF and nodeid_t is u8. But that is just an
> implementation detail of the Hypervisor itself. If needed, we could change
> the implementation like this series could do, if wanted.
You spell it all out here, but then you don't draw the conclusion that I was aiming
at: If someone passes in 0xff, that _should not_ be mistaken for NUMA_NO_NODE. Hence
for the time being you simply need to reject 0xff if you don't want to expose "no
specific node" exactly that way in the ABI. And indeed ...
> The public struct xen_sysctl_numainfo and xen_sysctl_physinfo define num_nodes,
> nr_nodes and max_node_id as uint32_t, for example. For type consistency, I opted
> to define this public API as uint32_t as well and not expose internal types/values.
... the proper representation there would then likely be 0xffffffff.
Jan
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness
2026-04-16 6:46 ` Jan Beulich
@ 2026-04-16 23:48 ` Bernhard Kaindl
0 siblings, 0 replies; 35+ messages in thread
From: Bernhard Kaindl @ 2026-04-16 23:48 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Anthony PERARD, Michal Orzel, Julien Grall,
Roger Pau Monne, Stefano Stabellini, Daniel P. Smith,
xen-devel@lists.xenproject.org
Hello Jan,
I only reply here to acknowledge your comment. This code is obsolete now:
It is historical with v6 now, where we have new code and different considerations
that would be off topic for this discussion on the obsolete v4 series (the single-
node interface doesn't exist in v6, the new implementation is multi-node)
> >> + if ( claim.node == XEN_DOMCTL_CLAIM_MEMORY_NO_NODE )
> >>> + claim.node = NUMA_NO_NODE;
> >>
> >> What about the incoming claim.node being NUMA_NO_NODE? Imo the range checking
> >> the previous patch adds to domain_set_outstanding_pages() wants to move here,
> >> at which point the function's new parameter could be properly nodeid_t.
> >
> > nodeid_t and NUMA_NO_NODE have (judging by the existing implementation) are not
> > exposed in the public API to the control domain.
> >
> > This separation is probably a good thing because it allows to change Xen internals
> > like nodeit_t and NUMA_NO_NODE if so desired without changing the public API.
> >
> > NUMA_NO_NODE is defined as 0xFF and nodeid_t is u8. But that is just an
> > implementation detail of the Hypervisor itself. If needed, we could change
> > the implementation like this series could do, if wanted.
>
> You spell it all out here, but then you don't draw the conclusion that I was aiming
> at: If someone passes in 0xff, that _should not_ be mistaken for NUMA_NO_NODE. Hence
> for the time being you simply need to reject 0xff if you don't want to expose "no
> specific node" exactly that way in the ABI. And indeed ...
Ah, I misunderstood your comment, acknowledged.
Earlier reviews asked for node checking in domain_set_outstanding_pages(), which
should, as you suggested have been moved there. But that's historical with v6 now,
where we have new code and different considerations.
cu, Bernhard
PS: I submitted the current design document to reason the design, and v6 itself:
https://bernhard-xen.readthedocs.io/en/claim-sets-v2-design/designs/claims
https://lists.xenproject.org/archives/html/xen-devel/2026-04/msg00587.html
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2026-04-16 23:49 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 14:29 [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 01/10] xen/page_alloc: Extract code for consuming claims into inline function Bernhard Kaindl
2026-03-04 16:20 ` Jan Beulich
2026-03-04 18:04 ` Bernhard Kaindl
2026-03-05 8:21 ` Roger Pau Monné
2026-02-26 14:29 ` [PATCH v4 02/10] xen/page_alloc: Optimize getting per-NUMA-node free page counts Bernhard Kaindl
2026-03-04 16:31 ` Jan Beulich
2026-03-04 18:21 ` Bernhard Kaindl
2026-03-05 7:22 ` Jan Beulich
2026-02-26 14:29 ` [PATCH v4 03/10] xen/page_alloc: Implement NUMA-node-specific claims Bernhard Kaindl
2026-03-05 10:53 ` Jan Beulich
2026-03-05 13:12 ` Bernhard Kaindl
2026-03-05 13:36 ` Jan Beulich
2026-03-05 14:54 ` Bernhard Kaindl
2026-03-05 17:00 ` Jan Beulich
2026-02-26 14:29 ` [PATCH v4 04/10] xen/page_alloc: Consolidate per-node counters into avail[] array Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 05/10] xen/domain: Add DOMCTL handler for claiming memory with NUMA awareness Bernhard Kaindl
2026-02-26 21:19 ` Teddy Astie
2026-02-26 23:16 ` Bernhard Kaindl
2026-02-27 9:39 ` Teddy Astie
2026-02-27 18:16 ` Bernhard Kaindl
2026-03-05 11:31 ` Jan Beulich
2026-04-14 15:17 ` Bernhard Kaindl
2026-04-16 6:46 ` Jan Beulich
2026-04-16 23:48 ` Bernhard Kaindl
2026-03-05 12:38 ` Roger Pau Monné
2026-03-05 12:44 ` Jan Beulich
2026-02-26 14:29 ` [PATCH v4 06/10] xsm/flask: Add XEN_DOMCTL_claim_memory to flask Bernhard Kaindl
2026-03-05 13:42 ` Jan Beulich
2026-02-26 14:29 ` [PATCH v4 07/10] tools/lib/ctrl/xc: Add xc_domain_claim_memory() to libxenctrl Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 08/10] tools/ocaml/libs/xc: add OCaml domain_claim_memory binding Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 09/10] tools/tests: Update the claims test to test claim_memory hypercall Bernhard Kaindl
2026-02-26 14:29 ` [PATCH v4 10/10] docs/guest-guide: document the memory claim hypercalls Bernhard Kaindl
2026-03-04 16:07 ` [PATCH v4 0/10] xen: Add NUMA-aware memory claims for domains Jan Beulich
2026-03-04 17:27 ` Bernhard Kaindl
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.