[PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0
@ 2026-06-09  0:06 Jason Andryuk
  2026-06-09  0:06 ` [PATCH v3 1/2] xen/x86: Return virtual address from alloc_stub_page() Jason Andryuk
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jason Andryuk @ 2026-06-09  0:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Teddy Astie

This is a third approach to fixing the stub page handling that is
broken with !CONFIG_PV and smt=0.

There is a CPU-indexed stubs array and a NUMA node-indexed node_stubs
for allocating the stub buffers.

From v2, this patch
  xen/x86: Remove unneeded stub_page setting
is dropped as stub_page is removed as part of patch 2.

Jason Andryuk (2):
  xen/x86: Return virtual address from alloc_stub_page()
  xen/x86: Change stub page allocation/free

 xen/arch/x86/include/asm/stubs.h |   2 +-
 xen/arch/x86/setup.c             |   3 +-
 xen/arch/x86/smpboot.c           | 114 +++++++++++++++++++++----------
 3 files changed, 79 insertions(+), 40 deletions(-)

-- 
2.54.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/2] xen/x86: Return virtual address from alloc_stub_page()
  2026-06-09  0:06 [PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0 Jason Andryuk
@ 2026-06-09  0:06 ` Jason Andryuk
  2026-06-09  0:06 ` [PATCH v3 2/2] xen/x86: Change stub page allocation/free Jason Andryuk
  2026-06-10 11:54 ` [PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0 Oleksii Kurochko
  2 siblings, 0 replies; 8+ messages in thread
From: Jason Andryuk @ 2026-06-09  0:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Teddy Astie

Currently alloc_stub_page() returns the virtual address of the mapped
stubs page, and the caller adds the per-CPU offset.  Make
alloc_stub_page() return the final address.  This is in preparation for
changing the stubs allocation where the offset will not be tied to the
CPU number.

The call to alloc_stub_page() in setup.c:start_xen() did not add the
offset as it is assumed to run on CPU0.

Change the local variable stub_page to stub_va to reflect the value.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
 xen/arch/x86/smpboot.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index ff05955bae..d7619f534b 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -675,7 +675,7 @@ unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn)
     else if ( !*mfn )
         *mfn = mfn_x(page_to_mfn(pg));
 
-    return stub_va;
+    return stub_va ? stub_va + STUB_BUF_CPU_OFFS(cpu) : 0;
 }
 
 void cpu_exit_clear(unsigned int cpu)
@@ -1044,7 +1044,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
     unsigned int i, memflags = 0;
     nodeid_t node = cpu_to_node(cpu);
     seg_desc_t *gdt;
-    unsigned long stub_page;
+    unsigned long stub_va;
     int rc = -ENOMEM;
 
     if ( node != NUMA_NO_NODE )
@@ -1099,10 +1099,10 @@ static int cpu_smpboot_alloc(unsigned int cpu)
             break;
         }
     BUG_ON(i == cpu);
-    stub_page = alloc_stub_page(cpu, &per_cpu(stubs.mfn, cpu));
-    if ( !stub_page )
+    stub_va = alloc_stub_page(cpu, &per_cpu(stubs.mfn, cpu));
+    if ( !stub_va )
         goto out;
-    per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu);
+    per_cpu(stubs.addr, cpu) = stub_va;
 
     rc = setup_cpu_root_pgt(cpu);
     if ( rc )
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/2] xen/x86: Change stub page allocation/free
  2026-06-09  0:06 [PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0 Jason Andryuk
  2026-06-09  0:06 ` [PATCH v3 1/2] xen/x86: Return virtual address from alloc_stub_page() Jason Andryuk
@ 2026-06-09  0:06 ` Jason Andryuk
  2026-06-10 15:01   ` Roger Pau Monné
  2026-06-10 11:54 ` [PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0 Oleksii Kurochko
  2 siblings, 1 reply; 8+ messages in thread
From: Jason Andryuk @ 2026-06-09  0:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Teddy Astie

Today the inline tracking of the stub page is problematic.  0xcc is used
to indicate unused, but it is also a "clear value."  A !CONFIG_PV build
with smt=0 will bring up CPU0, bring up CPU1, bring down CPU1, and free
the in-use stub page.  Subsequent CPU onlining can write to the re-used
page.

The new approach uses a global, CPU-indexed array of stub pages.
However, to handle NUMA aware allocations, we cannot allocate all the
pages in advance because the NUMA information is not available.  Keep
track of 1 current page for each NUMA node, allocated on demand, and
allocate the stub buffers out of those pages.

The current NUMA allocation approach is opportunistic sharing among the
groups of 32 processors.  The new approach will allocate buffers densely
in a NUMA node.

stub pages are no longer freed.  They remain referenced in the global
CPU-indexed array and are re-used if the CPU is re-onlined.

stubs and node_stubs don't have an explicit lock.  During boot they are
accessed single threaded.  During runtime, &cpu_add_remove_lock
serializes access.

Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
I'm not sure how to test the NUMA part - I don't have an NUMA system.
Also, if NUMA is active, is a cpu node of NUMA_NO_NODE still possible?
I used the MAX_NUMNODES + 1 array sizing to handle that, but it's not
obvious to me if that is necessary.

Roger mentioned removing the per-cpu stubs.mfn.  We'd need to replace
that with exposing the stubs array for traps and the emulator.  I have
no idea if that will be an improvement and am looking for agreement on
this patch before attempting.
---
 xen/arch/x86/include/asm/stubs.h |   2 +-
 xen/arch/x86/setup.c             |   3 +-
 xen/arch/x86/smpboot.c           | 110 +++++++++++++++++++++----------
 3 files changed, 77 insertions(+), 38 deletions(-)

diff --git a/xen/arch/x86/include/asm/stubs.h b/xen/arch/x86/include/asm/stubs.h
index a520928e9a..9d776f81dd 100644
--- a/xen/arch/x86/include/asm/stubs.h
+++ b/xen/arch/x86/include/asm/stubs.h
@@ -32,6 +32,6 @@ struct stubs {
 };
 
 DECLARE_PER_CPU(struct stubs, stubs);
-unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
+unsigned long assign_stub_page(unsigned int cpu);
 
 #endif /* X86_ASM_STUBS_H */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 19ee857abf..0cac94cbdb 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -2089,8 +2089,7 @@ void asmlinkage __init noreturn __start_xen(void)
 
     init_idle_domain();
 
-    this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
-                                           &this_cpu(stubs).mfn);
+    this_cpu(stubs.addr) = assign_stub_page(0);
     BUG_ON(!this_cpu(stubs.addr));
 
     bsp_traps_reinit(); /* Needs stubs allocated, must be before presmp_initcalls. */
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index d7619f534b..d9cd90389d 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -641,41 +641,96 @@ static int do_boot_cpu(int apicid, int cpu)
     return rc;
 }
 
-#define STUB_BUF_CPU_OFFS(cpu) (((cpu) & (STUBS_PER_PAGE - 1)) * STUB_BUF_SIZE)
+/*
+ * Indexed by CPU.  `pg` may be shared by up to STUBS_PER_PAGE CPUs.  Offset
+ * is the byte offset into the stub page for the CPU's stub buffer.
+ */
+struct stub_info {
+    struct page_info *pg;
+    unsigned int offset;
+};
+struct stub_info __read_mostly stubs[NR_CPUS];
 
-unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn)
+/*
+ * Index by NUMA node.
+ *
+ * `pg` is the current stub page for the node.
+ * `next` is the next available stub index (STUBS_PER_PAGE available).
+ *
+ * if `pg` is NULL, allocate a new one.
+ * if `pg` is !NULL, use `pg` and stub `next`
+ * When STUBS_PER_PAGE are all assigned, clear `pg` and `next`.
+ */
+struct stub_node {
+    struct page_info *pg;
+    unsigned int next;
+};
+struct stub_node stub_nodes[MAX_NUMNODES + 1];
+
+nodeid_t cpu_to_stub_node(unsigned int cpu)
 {
-    unsigned long stub_va;
+    nodeid_t node = cpu_to_node(cpu);
+
+    return node == NUMA_NO_NODE ? MAX_NUMNODES : node;
+}
+
+static struct page_info *alloc_stub_page(unsigned int cpu)
+{
+    nodeid_t node = cpu_to_stub_node(cpu);
+    unsigned int stub_idx;
     struct page_info *pg;
 
     BUILD_BUG_ON(STUBS_PER_PAGE & (STUBS_PER_PAGE - 1));
 
-    if ( *mfn )
-        pg = mfn_to_page(_mfn(*mfn));
-    else
+    if ( !stub_nodes[node].pg )
     {
-        nodeid_t node = cpu_to_node(cpu);
         unsigned int memflags = node != NUMA_NO_NODE ? MEMF_node(node) : 0;
 
-        pg = alloc_domheap_page(NULL, memflags);
-        if ( !pg )
-            return 0;
+        stub_nodes[node].pg = alloc_domheap_page(NULL, memflags);
+        stub_nodes[node].next = 0;
+
+        if ( !stub_nodes[node].pg )
+            return NULL;
 
-        unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE));
+        unmap_domain_page(memset(__map_domain_page(stub_nodes[node].pg),
+                                 0xcc, PAGE_SIZE));
     }
 
+    stub_idx = stub_nodes[node].next++;
+    pg = stub_nodes[node].pg;
+    stubs[cpu].pg = stub_nodes[node].pg;
+    stubs[cpu].offset = stub_idx * STUB_BUF_SIZE;
+    if ( stub_nodes[node].next == STUBS_PER_PAGE )
+    {
+        stub_nodes[node].pg = NULL;
+        stub_nodes[node].next = 0;
+    }
+
+    return pg;
+}
+
+unsigned long assign_stub_page(unsigned int cpu)
+{
+    unsigned long stub_va;
+    struct page_info *pg = stubs[cpu].pg;
+
+    if ( !pg )
+        pg = alloc_stub_page(cpu);
+
+    if ( !pg )
+        return 0;
+
     stub_va = XEN_VIRT_END - FIXADDR_X_SIZE - (cpu + 1) * PAGE_SIZE;
     if ( map_pages_to_xen(stub_va, page_to_mfn(pg), 1,
                           PAGE_HYPERVISOR_RX | MAP_SMALL_PAGES) )
-    {
-        if ( !*mfn )
-            free_domheap_page(pg);
         stub_va = 0;
+    else
+    {
+        per_cpu(stubs.mfn, cpu) = mfn_x(page_to_mfn(pg));
+        stub_va += stubs[cpu].offset;
     }
-    else if ( !*mfn )
-        *mfn = mfn_x(page_to_mfn(pg));
 
-    return stub_va ? stub_va + STUB_BUF_CPU_OFFS(cpu) : 0;
+    return stub_va;
 }
 
 void cpu_exit_clear(unsigned int cpu)
@@ -990,19 +1045,12 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
     {
         mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu));
         unsigned char *stub_page = map_domain_page(mfn);
-        unsigned int i;
 
-        memset(stub_page + STUB_BUF_CPU_OFFS(cpu), 0xcc, STUB_BUF_SIZE);
-        for ( i = 0; i < STUBS_PER_PAGE; ++i )
-            if ( stub_page[i * STUB_BUF_SIZE] != 0xcc )
-                break;
+        memset(stub_page + stubs[cpu].offset, 0xcc, STUB_BUF_SIZE);
         unmap_domain_page(stub_page);
         destroy_xen_mappings(per_cpu(stubs.addr, cpu) & PAGE_MASK,
                              (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) + 1);
         per_cpu(stubs.addr, cpu) = 0;
-        per_cpu(stubs.mfn, cpu) = 0;
-        if ( i == STUBS_PER_PAGE )
-            free_domheap_page(mfn_to_page(mfn));
     }
 
     if ( IS_ENABLED(CONFIG_PV32) )
@@ -1041,7 +1089,7 @@ void *cpu_alloc_stack(unsigned int cpu)
 static int cpu_smpboot_alloc(unsigned int cpu)
 {
     struct cpu_info *info;
-    unsigned int i, memflags = 0;
+    unsigned int memflags = 0;
     nodeid_t node = cpu_to_node(cpu);
     seg_desc_t *gdt;
     unsigned long stub_va;
@@ -1091,15 +1139,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
     memcpy(per_cpu(idt, cpu), bsp_idt, sizeof(bsp_idt));
     disable_each_ist(per_cpu(idt, cpu));
 
-    for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
-          i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
-        if ( cpu_online(i) && cpu_to_node(i) == node )
-        {
-            per_cpu(stubs.mfn, cpu) = per_cpu(stubs.mfn, i);
-            break;
-        }
-    BUG_ON(i == cpu);
-    stub_va = alloc_stub_page(cpu, &per_cpu(stubs.mfn, cpu));
+    stub_va = assign_stub_page(cpu);
     if ( !stub_va )
         goto out;
     per_cpu(stubs.addr, cpu) = stub_va;
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0
  2026-06-09  0:06 [PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0 Jason Andryuk
  2026-06-09  0:06 ` [PATCH v3 1/2] xen/x86: Return virtual address from alloc_stub_page() Jason Andryuk
  2026-06-09  0:06 ` [PATCH v3 2/2] xen/x86: Change stub page allocation/free Jason Andryuk
@ 2026-06-10 11:54 ` Oleksii Kurochko
  2 siblings, 0 replies; 8+ messages in thread
From: Oleksii Kurochko @ 2026-06-10 11:54 UTC (permalink / raw)
  To: Jason Andryuk, xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Teddy Astie



On 6/9/26 2:06 AM, Jason Andryuk wrote:
> This is a third approach to fixing the stub page handling that is
> broken with !CONFIG_PV and smt=0.
> 
> There is a CPU-indexed stubs array and a NUMA node-indexed node_stubs
> for allocating the stub buffers.
> 
>  From v2, this patch
>    xen/x86: Remove unneeded stub_page setting
> is dropped as stub_page is removed as part of patch 2.
> 
> Jason Andryuk (2):
>    xen/x86: Return virtual address from alloc_stub_page()
>    xen/x86: Change stub page allocation/free
> 
>   xen/arch/x86/include/asm/stubs.h |   2 +-
>   xen/arch/x86/setup.c             |   3 +-
>   xen/arch/x86/smpboot.c           | 114 +++++++++++++++++++++----------
>   3 files changed, 79 insertions(+), 40 deletions(-)
> 

Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

Thanks.

~ Oleksii


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] xen/x86: Change stub page allocation/free
  2026-06-09  0:06 ` [PATCH v3 2/2] xen/x86: Change stub page allocation/free Jason Andryuk
@ 2026-06-10 15:01   ` Roger Pau Monné
  2026-06-10 15:23     ` Jason Andryuk
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2026-06-10 15:01 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: xen-devel, Jan Beulich, Andrew Cooper, Teddy Astie

On Mon, Jun 08, 2026 at 08:06:38PM -0400, Jason Andryuk wrote:
> Today the inline tracking of the stub page is problematic.  0xcc is used
> to indicate unused, but it is also a "clear value."  A !CONFIG_PV build
> with smt=0 will bring up CPU0, bring up CPU1, bring down CPU1, and free
> the in-use stub page.  Subsequent CPU onlining can write to the re-used
> page.
> 
> The new approach uses a global, CPU-indexed array of stub pages.
> However, to handle NUMA aware allocations, we cannot allocate all the
> pages in advance because the NUMA information is not available.  Keep
> track of 1 current page for each NUMA node, allocated on demand, and
> allocate the stub buffers out of those pages.
> 
> The current NUMA allocation approach is opportunistic sharing among the
> groups of 32 processors.  The new approach will allocate buffers densely
> in a NUMA node.
> 
> stub pages are no longer freed.  They remain referenced in the global
> CPU-indexed array and are re-used if the CPU is re-onlined.
> 
> stubs and node_stubs don't have an explicit lock.  During boot they are
> accessed single threaded.  During runtime, &cpu_add_remove_lock
> serializes access.
> 
> Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> I'm not sure how to test the NUMA part - I don't have an NUMA system.
> Also, if NUMA is active, is a cpu node of NUMA_NO_NODE still possible?
> I used the MAX_NUMNODES + 1 array sizing to handle that, but it's not
> obvious to me if that is necessary.
> 
> Roger mentioned removing the per-cpu stubs.mfn.  We'd need to replace
> that with exposing the stubs array for traps and the emulator.  I have
> no idea if that will be an improvement and am looking for agreement on
> this patch before attempting.
> ---
>  xen/arch/x86/include/asm/stubs.h |   2 +-
>  xen/arch/x86/setup.c             |   3 +-
>  xen/arch/x86/smpboot.c           | 110 +++++++++++++++++++++----------
>  3 files changed, 77 insertions(+), 38 deletions(-)
> 
> diff --git a/xen/arch/x86/include/asm/stubs.h b/xen/arch/x86/include/asm/stubs.h
> index a520928e9a..9d776f81dd 100644
> --- a/xen/arch/x86/include/asm/stubs.h
> +++ b/xen/arch/x86/include/asm/stubs.h
> @@ -32,6 +32,6 @@ struct stubs {
>  };
>  
>  DECLARE_PER_CPU(struct stubs, stubs);
> -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
> +unsigned long assign_stub_page(unsigned int cpu);
>  
>  #endif /* X86_ASM_STUBS_H */
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 19ee857abf..0cac94cbdb 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -2089,8 +2089,7 @@ void asmlinkage __init noreturn __start_xen(void)
>  
>      init_idle_domain();
>  
> -    this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
> -                                           &this_cpu(stubs).mfn);
> +    this_cpu(stubs.addr) = assign_stub_page(0);

Given stub pages is first used quite late in the boot process, the above
arrays would better be dynamically allocated using xvmalloc_array().

>      BUG_ON(!this_cpu(stubs.addr));
>  
>      bsp_traps_reinit(); /* Needs stubs allocated, must be before presmp_initcalls. */
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index d7619f534b..d9cd90389d 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -641,41 +641,96 @@ static int do_boot_cpu(int apicid, int cpu)
>      return rc;
>  }
>  
> -#define STUB_BUF_CPU_OFFS(cpu) (((cpu) & (STUBS_PER_PAGE - 1)) * STUB_BUF_SIZE)
> +/*
> + * Indexed by CPU.  `pg` may be shared by up to STUBS_PER_PAGE CPUs.  Offset
> + * is the byte offset into the stub page for the CPU's stub buffer.
> + */
> +struct stub_info {
> +    struct page_info *pg;
> +    unsigned int offset;
> +};
> +struct stub_info __read_mostly stubs[NR_CPUS];
>  
> -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn)
> +/*
> + * Index by NUMA node.
> + *
> + * `pg` is the current stub page for the node.
> + * `next` is the next available stub index (STUBS_PER_PAGE available).
> + *
> + * if `pg` is NULL, allocate a new one.
> + * if `pg` is !NULL, use `pg` and stub `next`
> + * When STUBS_PER_PAGE are all assigned, clear `pg` and `next`.
> + */
> +struct stub_node {
> +    struct page_info *pg;
> +    unsigned int next;
> +};
> +struct stub_node stub_nodes[MAX_NUMNODES + 1];

I think we could get away with a single array, that uses the CPU as
the index and stores the physical address of the stub.

We could also simplify the allocation logic, assuming that CPUs
belonging to the same NUMA node are packed contiguously in the common
case.  I've given a try at this, and adjusted your original commit.  I
however only tested this in QEMU so far.  If you think it's OK I can
test it on XenRT and see how that goes.

Sorry I took over the patch, I didn't want to force you into another
direction without knowing whether it would be OK, as it wasn't clear
to me this approach would be fine (seem so, but still needs further
testing).

One thing that would simplify the logic greatly, which Andrew brought
up, is foregoing the NUMA memory affinity for the allocated stubs page, and
allocate and map them contiguously in both the physical and the linear
address spaces, so that you would find the VA using:

XEN_VIRT_END - FIXADDR_X_SIZE - (cpu + 1) * STUB_BUF_SIZE

This would possibly allow to simply populate the whole range up to
num_present_cpus() at boot and get done with it.  However that's a
bigger change that should likely be done after 4.22 is out.

Thanks, Roger.
---
diff --git a/xen/arch/x86/include/asm/stubs.h b/xen/arch/x86/include/asm/stubs.h
index a520928e9a50..d575f1eb0631 100644
--- a/xen/arch/x86/include/asm/stubs.h
+++ b/xen/arch/x86/include/asm/stubs.h
@@ -32,6 +32,7 @@ struct stubs {
 };
 
 DECLARE_PER_CPU(struct stubs, stubs);
-unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
+unsigned long assign_stub_page(unsigned int cpu);
+void init_bsp_stub(void);
 
 #endif /* X86_ASM_STUBS_H */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 4192edf635b6..cddf8806c877 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -2089,9 +2089,7 @@ void asmlinkage __init noreturn __start_xen(void)
 
     init_idle_domain();
 
-    this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
-                                           &this_cpu(stubs).mfn);
-    BUG_ON(!this_cpu(stubs.addr));
+    init_bsp_stub();
 
     bsp_traps_reinit(); /* Needs stubs allocated, must be before presmp_initcalls. */
 
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index b3045eac5b5e..dd0972a3025e 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -20,6 +20,7 @@
 #include <xen/serial.h>
 #include <xen/softirq.h>
 #include <xen/tasklet.h>
+#include <xen/xvmalloc.h>
 
 #include <asm/apic.h>
 #include <asm/cpuidle.h>
@@ -641,41 +642,61 @@ static int do_boot_cpu(int apicid, int cpu)
     return rc;
 }
 
-#define STUB_BUF_CPU_OFFS(cpu) (((cpu) & (STUBS_PER_PAGE - 1)) * STUB_BUF_SIZE)
+/* Dynamically allocated, indexed by CPU.  Store physical address of stubs. */
+static paddr_t *__ro_after_init stubs;
 
-unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn)
+unsigned long assign_stub_page(unsigned int cpu)
 {
     unsigned long stub_va;
-    struct page_info *pg;
+    paddr_t addr = stubs[cpu];
 
-    BUILD_BUG_ON(STUBS_PER_PAGE & (STUBS_PER_PAGE - 1));
-
-    if ( *mfn )
-        pg = mfn_to_page(_mfn(*mfn));
-    else
+    if ( addr == INVALID_PADDR )
     {
-        nodeid_t node = cpu_to_node(cpu);
-        unsigned int memflags = node != NUMA_NO_NODE ? MEMF_node(node) : 0;
+        nodeid_t nid = cpu_to_node(cpu);
 
-        pg = alloc_domheap_page(NULL, memflags);
-        if ( !pg )
-            return 0;
+        /*
+         * Attempt to use the same page as the previous CPU if possible,
+         * otherwise allocate a new one.
+         */
+        if ( cpu && nid == cpu_to_node(cpu - 1) &&
+             PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE) )
+            addr = stubs[cpu - 1] + STUB_BUF_SIZE;
+        else
+        {
+            struct page_info *pg = alloc_domheap_page(NULL, MEMF_node(nid));
 
-        unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE));
+            if ( !pg )
+                return 0;
+            addr = page_to_maddr(pg);
+        }
+        stubs[cpu] = addr;
     }
 
     stub_va = XEN_VIRT_END - FIXADDR_X_SIZE - (cpu + 1) * PAGE_SIZE;
-    if ( map_pages_to_xen(stub_va, page_to_mfn(pg), 1,
+    if ( map_pages_to_xen(stub_va, maddr_to_mfn(addr), 1,
                           PAGE_HYPERVISOR_RX | MAP_SMALL_PAGES) )
-    {
-        if ( !*mfn )
-            free_domheap_page(pg);
-        stub_va = 0;
-    }
-    else if ( !*mfn )
-        *mfn = mfn_x(page_to_mfn(pg));
+        return 0;
+
+    per_cpu(stubs.mfn, cpu) = PFN_DOWN(addr);
+    return stub_va + PAGE_OFFSET(addr);
+}
+
+void __init init_bsp_stub(void)
+{
+    const unsigned int num_cpus = num_present_cpus();
+    unsigned int i;
+
+    ASSERT(!stubs);
+    stubs = xvmalloc_array(typeof(*stubs), num_cpus);
+    if ( !stubs )
+        panic("Unable to allocate stub array");
+
+    for ( i = 0; i < num_cpus; i++ )
+        stubs[i] = INVALID_PADDR;
 
-    return stub_va ? stub_va + STUB_BUF_CPU_OFFS(cpu) : 0;
+    this_cpu(stubs.addr) = assign_stub_page(0);
+    if ( !this_cpu(stubs.addr) )
+        panic("Unable to initialize BSP stub region");
 }
 
 void cpu_exit_clear(unsigned int cpu)
@@ -990,19 +1011,12 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
     {
         mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu));
         unsigned char *stub_page = map_domain_page(mfn);
-        unsigned int i;
 
-        memset(stub_page + STUB_BUF_CPU_OFFS(cpu), 0xcc, STUB_BUF_SIZE);
-        for ( i = 0; i < STUBS_PER_PAGE; ++i )
-            if ( stub_page[i * STUB_BUF_SIZE] != 0xcc )
-                break;
+        memset(stub_page + PAGE_OFFSET(stubs[cpu]), 0xcc, STUB_BUF_SIZE);
         unmap_domain_page(stub_page);
         destroy_xen_mappings(per_cpu(stubs.addr, cpu) & PAGE_MASK,
                              (per_cpu(stubs.addr, cpu) | ~PAGE_MASK) + 1);
         per_cpu(stubs.addr, cpu) = 0;
-        per_cpu(stubs.mfn, cpu) = 0;
-        if ( i == STUBS_PER_PAGE )
-            free_domheap_page(mfn_to_page(mfn));
     }
 
     if ( IS_ENABLED(CONFIG_PV32) )
@@ -1041,7 +1055,7 @@ void *cpu_alloc_stack(unsigned int cpu)
 static int cpu_smpboot_alloc(unsigned int cpu)
 {
     struct cpu_info *info;
-    unsigned int i, memflags = 0;
+    unsigned int memflags = 0;
     nodeid_t node = cpu_to_node(cpu);
     seg_desc_t *gdt;
     unsigned long stub_va;
@@ -1092,15 +1106,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
     memcpy(per_cpu(idt, cpu), bsp_idt, sizeof(bsp_idt));
     disable_each_ist(per_cpu(idt, cpu));
 
-    for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
-          i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
-        if ( cpu_online(i) && cpu_to_node(i) == node )
-        {
-            per_cpu(stubs.mfn, cpu) = per_cpu(stubs.mfn, i);
-            break;
-        }
-    BUG_ON(i == cpu);
-    stub_va = alloc_stub_page(cpu, &per_cpu(stubs.mfn, cpu));
+    stub_va = assign_stub_page(cpu);
     if ( !stub_va )
         goto out;
     per_cpu(stubs.addr, cpu) = stub_va;


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] xen/x86: Change stub page allocation/free
  2026-06-10 15:01   ` Roger Pau Monné
@ 2026-06-10 15:23     ` Jason Andryuk
  2026-06-10 18:20       ` Roger Pau Monné
  2026-06-11 15:10       ` Jan Beulich
  0 siblings, 2 replies; 8+ messages in thread
From: Jason Andryuk @ 2026-06-10 15:23 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich, Andrew Cooper, Teddy Astie

On 2026-06-10 11:01, Roger Pau Monné wrote:
> On Mon, Jun 08, 2026 at 08:06:38PM -0400, Jason Andryuk wrote:
>> Today the inline tracking of the stub page is problematic.  0xcc is used
>> to indicate unused, but it is also a "clear value."  A !CONFIG_PV build
>> with smt=0 will bring up CPU0, bring up CPU1, bring down CPU1, and free
>> the in-use stub page.  Subsequent CPU onlining can write to the re-used
>> page.
>>
>> The new approach uses a global, CPU-indexed array of stub pages.
>> However, to handle NUMA aware allocations, we cannot allocate all the
>> pages in advance because the NUMA information is not available.  Keep
>> track of 1 current page for each NUMA node, allocated on demand, and
>> allocate the stub buffers out of those pages.
>>
>> The current NUMA allocation approach is opportunistic sharing among the
>> groups of 32 processors.  The new approach will allocate buffers densely
>> in a NUMA node.
>>
>> stub pages are no longer freed.  They remain referenced in the global
>> CPU-indexed array and are re-used if the CPU is re-onlined.
>>
>> stubs and node_stubs don't have an explicit lock.  During boot they are
>> accessed single threaded.  During runtime, &cpu_add_remove_lock
>> serializes access.
>>
>> Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>> ---
>> I'm not sure how to test the NUMA part - I don't have an NUMA system.
>> Also, if NUMA is active, is a cpu node of NUMA_NO_NODE still possible?
>> I used the MAX_NUMNODES + 1 array sizing to handle that, but it's not
>> obvious to me if that is necessary.
>>
>> Roger mentioned removing the per-cpu stubs.mfn.  We'd need to replace
>> that with exposing the stubs array for traps and the emulator.  I have
>> no idea if that will be an improvement and am looking for agreement on
>> this patch before attempting.
>> ---
>>   xen/arch/x86/include/asm/stubs.h |   2 +-
>>   xen/arch/x86/setup.c             |   3 +-
>>   xen/arch/x86/smpboot.c           | 110 +++++++++++++++++++++----------
>>   3 files changed, 77 insertions(+), 38 deletions(-)
>>
>> diff --git a/xen/arch/x86/include/asm/stubs.h b/xen/arch/x86/include/asm/stubs.h
>> index a520928e9a..9d776f81dd 100644
>> --- a/xen/arch/x86/include/asm/stubs.h
>> +++ b/xen/arch/x86/include/asm/stubs.h
>> @@ -32,6 +32,6 @@ struct stubs {
>>   };
>>   
>>   DECLARE_PER_CPU(struct stubs, stubs);
>> -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
>> +unsigned long assign_stub_page(unsigned int cpu);
>>   
>>   #endif /* X86_ASM_STUBS_H */
>> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
>> index 19ee857abf..0cac94cbdb 100644
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -2089,8 +2089,7 @@ void asmlinkage __init noreturn __start_xen(void)
>>   
>>       init_idle_domain();
>>   
>> -    this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
>> -                                           &this_cpu(stubs).mfn);
>> +    this_cpu(stubs.addr) = assign_stub_page(0);
> 
> Given stub pages is first used quite late in the boot process, the above
> arrays would better be dynamically allocated using xvmalloc_array().

Ok.  At some point I intended to dynamically allocate.  But x86 doesn't 
have num_possible_cpus(), and I thought num_present_cpus() wouldn't have 
the correct value.  nr_cpu_ids seemed close to the value, but then I 
convinced myself NR_CPUS would be okay.

>>       BUG_ON(!this_cpu(stubs.addr));
>>   
>>       bsp_traps_reinit(); /* Needs stubs allocated, must be before presmp_initcalls. */
>> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
>> index d7619f534b..d9cd90389d 100644
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -641,41 +641,96 @@ static int do_boot_cpu(int apicid, int cpu)
>>       return rc;
>>   }
>>   
>> -#define STUB_BUF_CPU_OFFS(cpu) (((cpu) & (STUBS_PER_PAGE - 1)) * STUB_BUF_SIZE)
>> +/*
>> + * Indexed by CPU.  `pg` may be shared by up to STUBS_PER_PAGE CPUs.  Offset
>> + * is the byte offset into the stub page for the CPU's stub buffer.
>> + */
>> +struct stub_info {
>> +    struct page_info *pg;
>> +    unsigned int offset;
>> +};
>> +struct stub_info __read_mostly stubs[NR_CPUS];
>>   
>> -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn)
>> +/*
>> + * Index by NUMA node.
>> + *
>> + * `pg` is the current stub page for the node.
>> + * `next` is the next available stub index (STUBS_PER_PAGE available).
>> + *
>> + * if `pg` is NULL, allocate a new one.
>> + * if `pg` is !NULL, use `pg` and stub `next`
>> + * When STUBS_PER_PAGE are all assigned, clear `pg` and `next`.
>> + */
>> +struct stub_node {
>> +    struct page_info *pg;
>> +    unsigned int next;
>> +};
>> +struct stub_node stub_nodes[MAX_NUMNODES + 1];
> 
> I think we could get away with a single array, that uses the CPU as
> the index and stores the physical address of the stub.

Yes, this is a good idea.

> We could also simplify the allocation logic, assuming that CPUs
> belonging to the same NUMA node are packed contiguously in the common
> case.  I've given a try at this, and adjusted your original commit.  I
> however only tested this in QEMU so far.  If you think it's OK I can
> test it on XenRT and see how that goes.
> 
> Sorry I took over the patch, I didn't want to force you into another
> direction without knowing whether it would be OK, as it wasn't clear
> to me this approach would be fine (seem so, but still needs further
> testing).

No worries.  Thank you!

> One thing that would simplify the logic greatly, which Andrew brought
> up, is foregoing the NUMA memory affinity for the allocated stubs page, and
> allocate and map them contiguously in both the physical and the linear
> address spaces, so that you would find the VA using:
> 
> XEN_VIRT_END - FIXADDR_X_SIZE - (cpu + 1) * STUB_BUF_SIZE
> 
> This would possibly allow to simply populate the whole range up to
> num_present_cpus() at boot and get done with it.  However that's a
> bigger change that should likely be done after 4.22 is out.

 From your initial feedback, I intended to use a single array, but NUMA 
quickly complicated that.

> 
> Thanks, Roger.
> ---
> diff --git a/xen/arch/x86/include/asm/stubs.h b/xen/arch/x86/include/asm/stubs.h
> index a520928e9a50..d575f1eb0631 100644
> --- a/xen/arch/x86/include/asm/stubs.h
> +++ b/xen/arch/x86/include/asm/stubs.h
> @@ -32,6 +32,7 @@ struct stubs {
>   };
>   
>   DECLARE_PER_CPU(struct stubs, stubs);
> -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
> +unsigned long assign_stub_page(unsigned int cpu);
> +void init_bsp_stub(void);

With init_bsp_stub(), assign_stub_page can be static and not exported.

>   
>   #endif /* X86_ASM_STUBS_H */

> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index b3045eac5b5e..dd0972a3025e 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -20,6 +20,7 @@
>   #include <xen/serial.h>
>   #include <xen/softirq.h>
>   #include <xen/tasklet.h>
> +#include <xen/xvmalloc.h>
>   
>   #include <asm/apic.h>
>   #include <asm/cpuidle.h>
> @@ -641,41 +642,61 @@ static int do_boot_cpu(int apicid, int cpu)
>       return rc;
>   }
>   
> -#define STUB_BUF_CPU_OFFS(cpu) (((cpu) & (STUBS_PER_PAGE - 1)) * STUB_BUF_SIZE)
> +/* Dynamically allocated, indexed by CPU.  Store physical address of stubs. */
> +static paddr_t *__ro_after_init stubs;
>   
> -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn)
> +unsigned long assign_stub_page(unsigned int cpu)
>   {
>       unsigned long stub_va;
> -    struct page_info *pg;
> +    paddr_t addr = stubs[cpu];
>   
> -    BUILD_BUG_ON(STUBS_PER_PAGE & (STUBS_PER_PAGE - 1));
> -
> -    if ( *mfn )
> -        pg = mfn_to_page(_mfn(*mfn));
> -    else
> +    if ( addr == INVALID_PADDR )
>       {
> -        nodeid_t node = cpu_to_node(cpu);
> -        unsigned int memflags = node != NUMA_NO_NODE ? MEMF_node(node) : 0;
> +        nodeid_t nid = cpu_to_node(cpu);
>   
> -        pg = alloc_domheap_page(NULL, memflags);
> -        if ( !pg )
> -            return 0;
> +        /*
> +         * Attempt to use the same page as the previous CPU if possible,
> +         * otherwise allocate a new one.
> +         */
> +        if ( cpu && nid == cpu_to_node(cpu - 1) &&
> +             PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE) )
     PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE)
is to ensure we it remains inside the allocated stub page?

> +            addr = stubs[cpu - 1] + STUB_BUF_SIZE;
> +        else
> +        {
> +            struct page_info *pg = alloc_domheap_page(NULL, MEMF_node(nid));
>   

> @@ -1092,15 +1106,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
>       memcpy(per_cpu(idt, cpu), bsp_idt, sizeof(bsp_idt));
>       disable_each_ist(per_cpu(idt, cpu));
>   
> -    for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
> -          i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
> -        if ( cpu_online(i) && cpu_to_node(i) == node )
> -        {
> -            per_cpu(stubs.mfn, cpu) = per_cpu(stubs.mfn, i);

This loop tries hard to re-use the same page for a NUMA node.  My posted 
approach will densely allocate the stubs.  Your approach would re-use 
less, unless the CPUs are contiguous in a node.

This is just an observation.  I have no idea how NUMA nodes are 
allocated.  The "round robin" code in numa_init_array() made me worry 
that CPUs are more likely to be non-contiguous.

If you have NUMA systems in XenRT, I think it would be worthwhile to 
test.  Some printks to see how many pages are allocated would be useful.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] xen/x86: Change stub page allocation/free
  2026-06-10 15:23     ` Jason Andryuk
@ 2026-06-10 18:20       ` Roger Pau Monné
  2026-06-11 15:10       ` Jan Beulich
  1 sibling, 0 replies; 8+ messages in thread
From: Roger Pau Monné @ 2026-06-10 18:20 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: xen-devel, Jan Beulich, Andrew Cooper, Teddy Astie

On Wed, Jun 10, 2026 at 11:23:46AM -0400, Jason Andryuk wrote:
> On 2026-06-10 11:01, Roger Pau Monné wrote:
> > On Mon, Jun 08, 2026 at 08:06:38PM -0400, Jason Andryuk wrote:
> > > Today the inline tracking of the stub page is problematic.  0xcc is used
> > > to indicate unused, but it is also a "clear value."  A !CONFIG_PV build
> > > with smt=0 will bring up CPU0, bring up CPU1, bring down CPU1, and free
> > > the in-use stub page.  Subsequent CPU onlining can write to the re-used
> > > page.
> > > 
> > > The new approach uses a global, CPU-indexed array of stub pages.
> > > However, to handle NUMA aware allocations, we cannot allocate all the
> > > pages in advance because the NUMA information is not available.  Keep
> > > track of 1 current page for each NUMA node, allocated on demand, and
> > > allocate the stub buffers out of those pages.
> > > 
> > > The current NUMA allocation approach is opportunistic sharing among the
> > > groups of 32 processors.  The new approach will allocate buffers densely
> > > in a NUMA node.
> > > 
> > > stub pages are no longer freed.  They remain referenced in the global
> > > CPU-indexed array and are re-used if the CPU is re-onlined.
> > > 
> > > stubs and node_stubs don't have an explicit lock.  During boot they are
> > > accessed single threaded.  During runtime, &cpu_add_remove_lock
> > > serializes access.
> > > 
> > > Fixes: 7a66ac8d1633 ("x86: move syscall trampolines off the stack")
> > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> > > ---
> > > I'm not sure how to test the NUMA part - I don't have an NUMA system.
> > > Also, if NUMA is active, is a cpu node of NUMA_NO_NODE still possible?
> > > I used the MAX_NUMNODES + 1 array sizing to handle that, but it's not
> > > obvious to me if that is necessary.
> > > 
> > > Roger mentioned removing the per-cpu stubs.mfn.  We'd need to replace
> > > that with exposing the stubs array for traps and the emulator.  I have
> > > no idea if that will be an improvement and am looking for agreement on
> > > this patch before attempting.
> > > ---
> > >   xen/arch/x86/include/asm/stubs.h |   2 +-
> > >   xen/arch/x86/setup.c             |   3 +-
> > >   xen/arch/x86/smpboot.c           | 110 +++++++++++++++++++++----------
> > >   3 files changed, 77 insertions(+), 38 deletions(-)
> > > 
> > > diff --git a/xen/arch/x86/include/asm/stubs.h b/xen/arch/x86/include/asm/stubs.h
> > > index a520928e9a..9d776f81dd 100644
> > > --- a/xen/arch/x86/include/asm/stubs.h
> > > +++ b/xen/arch/x86/include/asm/stubs.h
> > > @@ -32,6 +32,6 @@ struct stubs {
> > >   };
> > >   DECLARE_PER_CPU(struct stubs, stubs);
> > > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
> > > +unsigned long assign_stub_page(unsigned int cpu);
> > >   #endif /* X86_ASM_STUBS_H */
> > > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> > > index 19ee857abf..0cac94cbdb 100644
> > > --- a/xen/arch/x86/setup.c
> > > +++ b/xen/arch/x86/setup.c
> > > @@ -2089,8 +2089,7 @@ void asmlinkage __init noreturn __start_xen(void)
> > >       init_idle_domain();
> > > -    this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
> > > -                                           &this_cpu(stubs).mfn);
> > > +    this_cpu(stubs.addr) = assign_stub_page(0);
> > 
> > Given stub pages is first used quite late in the boot process, the above
> > arrays would better be dynamically allocated using xvmalloc_array().
> 
> Ok.  At some point I intended to dynamically allocate.  But x86 doesn't have
> num_possible_cpus(), and I thought num_present_cpus() wouldn't have the
> correct value.  nr_cpu_ids seemed close to the value, but then I convinced
> myself NR_CPUS would be okay.

I will double check, but I think num_present_cpus() accounts for the
maximum number of online CPUs possible at any point.

> > diff --git a/xen/arch/x86/include/asm/stubs.h b/xen/arch/x86/include/asm/stubs.h
> > index a520928e9a50..d575f1eb0631 100644
> > --- a/xen/arch/x86/include/asm/stubs.h
> > +++ b/xen/arch/x86/include/asm/stubs.h
> > @@ -32,6 +32,7 @@ struct stubs {
> >   };
> >   DECLARE_PER_CPU(struct stubs, stubs);
> > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn);
> > +unsigned long assign_stub_page(unsigned int cpu);
> > +void init_bsp_stub(void);
> 
> With init_bsp_stub(), assign_stub_page can be static and not exported.

Oh, nice one.

> 
> >   #endif /* X86_ASM_STUBS_H */
> 
> > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> > index b3045eac5b5e..dd0972a3025e 100644
> > --- a/xen/arch/x86/smpboot.c
> > +++ b/xen/arch/x86/smpboot.c
> > @@ -20,6 +20,7 @@
> >   #include <xen/serial.h>
> >   #include <xen/softirq.h>
> >   #include <xen/tasklet.h>
> > +#include <xen/xvmalloc.h>
> >   #include <asm/apic.h>
> >   #include <asm/cpuidle.h>
> > @@ -641,41 +642,61 @@ static int do_boot_cpu(int apicid, int cpu)
> >       return rc;
> >   }
> > -#define STUB_BUF_CPU_OFFS(cpu) (((cpu) & (STUBS_PER_PAGE - 1)) * STUB_BUF_SIZE)
> > +/* Dynamically allocated, indexed by CPU.  Store physical address of stubs. */
> > +static paddr_t *__ro_after_init stubs;
> > -unsigned long alloc_stub_page(unsigned int cpu, unsigned long *mfn)
> > +unsigned long assign_stub_page(unsigned int cpu)
> >   {
> >       unsigned long stub_va;
> > -    struct page_info *pg;
> > +    paddr_t addr = stubs[cpu];
> > -    BUILD_BUG_ON(STUBS_PER_PAGE & (STUBS_PER_PAGE - 1));
> > -
> > -    if ( *mfn )
> > -        pg = mfn_to_page(_mfn(*mfn));
> > -    else
> > +    if ( addr == INVALID_PADDR )
> >       {
> > -        nodeid_t node = cpu_to_node(cpu);
> > -        unsigned int memflags = node != NUMA_NO_NODE ? MEMF_node(node) : 0;
> > +        nodeid_t nid = cpu_to_node(cpu);
> > -        pg = alloc_domheap_page(NULL, memflags);
> > -        if ( !pg )
> > -            return 0;
> > +        /*
> > +         * Attempt to use the same page as the previous CPU if possible,
> > +         * otherwise allocate a new one.
> > +         */
> > +        if ( cpu && nid == cpu_to_node(cpu - 1) &&
> > +             PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE) )
>     PAGE_OFFSET(stubs[cpu - 1] + STUB_BUF_SIZE)
> is to ensure we it remains inside the allocated stub page?

Yup, if there's no offset it means we have consumed the full page, and
we need to allocate a new one (at least that was my intention).

> 
> > +            addr = stubs[cpu - 1] + STUB_BUF_SIZE;
> > +        else
> > +        {
> > +            struct page_info *pg = alloc_domheap_page(NULL, MEMF_node(nid));
> 
> > @@ -1092,15 +1106,7 @@ static int cpu_smpboot_alloc(unsigned int cpu)
> >       memcpy(per_cpu(idt, cpu), bsp_idt, sizeof(bsp_idt));
> >       disable_each_ist(per_cpu(idt, cpu));
> > -    for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
> > -          i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
> > -        if ( cpu_online(i) && cpu_to_node(i) == node )
> > -        {
> > -            per_cpu(stubs.mfn, cpu) = per_cpu(stubs.mfn, i);
> 
> This loop tries hard to re-use the same page for a NUMA node.  My posted
> approach will densely allocate the stubs.  Your approach would re-use less,
> unless the CPUs are contiguous in a node.

From what I saw on a couple of systems, CPUs are contiguous inside a
node, for example:

(XEN) [ 2384.850395] CPU0...39 -> NODE0
(XEN) [ 2384.850397] CPU40...79 -> NODE1

> This is just an observation.  I have no idea how NUMA nodes are allocated.
> The "round robin" code in numa_init_array() made me worry that CPUs are more
> likely to be non-contiguous.

Well, the approach here would still work for non-contiguously assigned
CPUs, it would just be sub-optimal.  We can adjust if we ever find
such a system, but I think it's unlikely.

> If you have NUMA systems in XenRT, I think it would be worthwhile to test.
> Some printks to see how many pages are allocated would be useful.

Sure, I will give it a try across what we have on the lab.



> 
> Thanks,
> Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 2/2] xen/x86: Change stub page allocation/free
  2026-06-10 15:23     ` Jason Andryuk
  2026-06-10 18:20       ` Roger Pau Monné
@ 2026-06-11 15:10       ` Jan Beulich
  1 sibling, 0 replies; 8+ messages in thread
From: Jan Beulich @ 2026-06-11 15:10 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: xen-devel, Andrew Cooper, Teddy Astie, Roger Pau Monné

On 10.06.2026 17:23, Jason Andryuk wrote:
> On 2026-06-10 11:01, Roger Pau Monné wrote:
>> On Mon, Jun 08, 2026 at 08:06:38PM -0400, Jason Andryuk wrote:
>>> --- a/xen/arch/x86/setup.c
>>> +++ b/xen/arch/x86/setup.c
>>> @@ -2089,8 +2089,7 @@ void asmlinkage __init noreturn __start_xen(void)
>>>         init_idle_domain();
>>>   -    this_cpu(stubs.addr) = alloc_stub_page(smp_processor_id(),
>>> -                                           &this_cpu(stubs).mfn);
>>> +    this_cpu(stubs.addr) = assign_stub_page(0);
>>
>> Given stub pages is first used quite late in the boot process, the above
>> arrays would better be dynamically allocated using xvmalloc_array().
> 
> Ok.  At some point I intended to dynamically allocate.  But x86 doesn't have num_possible_cpus(), and I thought num_present_cpus() wouldn't have the correct value.  nr_cpu_ids seemed close to the value, but then I convinced myself NR_CPUS would be okay.

Not specific to this patch: Using NR_CPUS is almost never okay. It's a last
resort if you need a static upper bound. But NR_CPUS can be _much_ larger
than nr_cpu_ids, and hence arrays generally want dimensioning (allocating)
by using the latter.

Jan


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-11 15:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09  0:06 [PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0 Jason Andryuk
2026-06-09  0:06 ` [PATCH v3 1/2] xen/x86: Return virtual address from alloc_stub_page() Jason Andryuk
2026-06-09  0:06 ` [PATCH v3 2/2] xen/x86: Change stub page allocation/free Jason Andryuk
2026-06-10 15:01   ` Roger Pau Monné
2026-06-10 15:23     ` Jason Andryuk
2026-06-10 18:20       ` Roger Pau Monné
2026-06-11 15:10       ` Jan Beulich
2026-06-10 11:54 ` [PATCH v3 0/2] xen/x86: Change stub page freeing to fix smt=0 Oleksii Kurochko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.