All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] x86: Expose consistent topology to guests
@ 2024-05-29 14:32 Alejandro Vallejo
  2024-05-29 14:32 ` [PATCH v3 1/6] x86/vlapic: Move lapic migration checks to the check hooks Alejandro Vallejo
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-29 14:32 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Anthony PERARD, Juergen Gross

v2 -> v3:

  (v2/patch2 and v2/patch4 are already committed)

  * Moved the vlapic check hook addition to v3/patch1
    * And created a check hook for the architectural state too for consistency.
  * Fixed migrations from Xen <= 4.13 by reconstructing the previous topology.
  * Correctly set the APIC ID after a policy change when vlapic is already in
    x2APIC mode.
  * Removed bogus assumption introduced in v1 and v2 on hvmloader about which
    8bit APIC IDs represent ids > 254. (it's "id % 0xff", not "min(id, 0xff)".
        * Used an x2apic flag check instead.
  * Various formatting adjustments.

v1 -> v2:

  * v1/patch 4 replaced by a different strategy (See patches 4 and 5 in v2):
      * Have hvmloader populate MADT with the real APIC IDs as read by the APs
        themselves rather than giving it knowledge on how to derive them.
  * Removed patches 2 and 3 in v1, as no longer relevant.
  * Split v1/patch6 in two parts ((a) creating the generator and (b) plugging it
    in) and use the generator in the unit tests of the vcpuid->apicid mapping
    function. Becomes patches 6 and 8 in v2.

  Patch 1: Same as v1/patch1.
  Patch 2: Header dependency cleanup in preparation for patch3.
  Patch 3: Adds vlapic_hidden check for the newly introduced reserved area.
  Patch 4: [hvmloader] Replaces INIT+SIPI+SIPI sequences with hypercalls.
  Patch 5: [hvmloader] Retrieve the per-CPU APIC IDs from the APs themselves.
  Patch 6: Split from v1/patch6.
  Patch 7: Logically matching v1/patch5, but using v2/patch6 for testing.
  Patch 8: Split from v1/patch6.

=== Original cover letter ===

Current topology handling is close to non-existent. As things stand, APIC
IDs are allocated through the apic_id=vcpu_id*2 relation without giving any
hints to the OS on how to parse the x2APIC ID of a given CPU and assuming
the guest will assume 2 threads per core.

This series involves bringing x2APIC IDs into the migration stream, so
older guests keep operating as they used to and enhancing Xen+toolstack so
new guests get topology information consistent with their x2APIC IDs. As a
side effect of this, x2APIC IDs are now packed and don't have (unless under
a pathological case) gaps.

Further work ought to allow combining this topology configurations with
gang-scheduling of guest hyperthreads into affine physical hyperthreads.
For the time being it purposefully keeps the configuration of "1 socket" +
"1 thread per core" + "1 core per vCPU".

Patch 1: Includes x2APIC IDs in the migration stream. This allows Xen to
         reconstruct the right x2APIC IDs on migrated-in guests, and
         future-proofs itself in the face of x2APIC ID derivation changes.
Patch 2: Minor refactor to expose xc_cpu_policy in libxl
Patch 3: Refactors xen/lib/x86 to work on non-Xen freestanding environments
         (e.g: hvmloader)
Patch 4: Remove old assumptions about vcpu_id<->apic_id relationship in hvmloader
Patch 5: Add logic to derive x2APIC IDs given a CPU policy and vCPU IDs
Patch 6: Includes a simple topology generator for toolstack so new guests
         have topologically consistent information in CPUID


Alejandro Vallejo (6):
  x86/vlapic: Move lapic migration checks to the check hooks
  xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
  xen/lib: Add topology generator for x86
  xen/x86: Derive topologically correct x2APIC IDs from the policy
  xen/x86: Synthesise domain topologies

 tools/firmware/hvmloader/config.h        |   6 +-
 tools/firmware/hvmloader/hvmloader.c     |   4 +-
 tools/firmware/hvmloader/smp.c           |  54 ++++--
 tools/include/xen-tools/common-macros.h  |   5 +
 tools/libs/guest/xg_cpuid_x86.c          |  24 ++-
 tools/tests/cpu-policy/test-cpu-policy.c | 201 +++++++++++++++++++++++
 xen/arch/x86/cpu-policy.c                |   9 +-
 xen/arch/x86/cpuid.c                     |  14 +-
 xen/arch/x86/hvm/vlapic.c                | 122 ++++++++++----
 xen/arch/x86/include/asm/hvm/hvm.h       |   2 +
 xen/arch/x86/include/asm/hvm/vlapic.h    |   2 +
 xen/include/public/arch-x86/hvm/save.h   |   2 +
 xen/include/xen/lib/x86/cpu-policy.h     |  27 +++
 xen/lib/x86/policy.c                     | 167 +++++++++++++++++++
 xen/lib/x86/private.h                    |   4 +
 15 files changed, 587 insertions(+), 56 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/6] x86/vlapic: Move lapic migration checks to the check hooks
  2024-05-29 14:32 [PATCH v3 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
@ 2024-05-29 14:32 ` Alejandro Vallejo
  2024-05-30 10:07   ` Roger Pau Monné
  2024-05-29 14:32 ` [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-29 14:32 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné

While doing this, factor out checks common to architectural and hidden state.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v3:
  * Moved from v2/patch3.
  * Added check hook for the architectural state as well.
  * Use domain_vcpu() rather than the previous open coded checks for vcpu range.
---
 xen/arch/x86/hvm/vlapic.c | 81 +++++++++++++++++++++++++--------------
 1 file changed, 53 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 9cfc82666ae5..a0df62b5ec0a 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1553,60 +1553,85 @@ static void lapic_load_fixup(struct vlapic *vlapic)
                v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
 }
 
-static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
-{
-    unsigned int vcpuid = hvm_load_instance(h);
-    struct vcpu *v;
-    struct vlapic *s;
 
+static int lapic_check_common(const struct domain *d, unsigned int vcpuid)
+{
     if ( !has_vlapic(d) )
         return -ENODEV;
 
     /* Which vlapic to load? */
-    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    if ( !domain_vcpu(d, vcpuid) )
     {
         dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
                 d->domain_id, vcpuid);
         return -EINVAL;
     }
-    s = vcpu_vlapic(v);
 
-    if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
+    return 0;
+}
+
+static int cf_check lapic_check_hidden(const struct domain *d,
+                                       hvm_domain_context_t *h)
+{
+    unsigned int vcpuid = hvm_load_instance(h);
+    struct hvm_hw_lapic s;
+    int rc;
+
+    if ( (rc = lapic_check_common(d, vcpuid)) )
+        return rc;
+
+    if ( hvm_load_entry_zeroextend(LAPIC, h, &s) != 0 )
+        return -ENODATA;
+
+    /* EN=0 with EXTD=1 is illegal */
+    if ( (s.apic_base_msr & (APIC_BASE_ENABLE | APIC_BASE_EXTD)) ==
+         APIC_BASE_EXTD )
         return -EINVAL;
 
+    return 0;
+}
+
+static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
+{
+    unsigned int vcpuid = hvm_load_instance(h);
+    struct vcpu *v = d->vcpu[vcpuid];
+    struct vlapic *s = vcpu_vlapic(v);
+
+    if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
+        BUG();
+
     s->loaded.hw = 1;
     if ( s->loaded.regs )
         lapic_load_fixup(s);
 
-    if ( !(s->hw.apic_base_msr & APIC_BASE_ENABLE) &&
-         unlikely(vlapic_x2apic_mode(s)) )
-        return -EINVAL;
-
     hvm_update_vlapic_mode(v);
 
     return 0;
 }
 
-static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
+static int cf_check lapic_check_regs(const struct domain *d,
+                                     hvm_domain_context_t *h)
 {
     unsigned int vcpuid = hvm_load_instance(h);
-    struct vcpu *v;
-    struct vlapic *s;
+    int rc;
 
-    if ( !has_vlapic(d) )
-        return -ENODEV;
+    if ( (rc = lapic_check_common(d, vcpuid)) )
+        return rc;
 
-    /* Which vlapic to load? */
-    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
-    {
-        dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
-                d->domain_id, vcpuid);
-        return -EINVAL;
-    }
-    s = vcpu_vlapic(v);
+    if ( !hvm_get_entry(LAPIC_REGS, h) )
+        return -ENODATA;
+
+    return 0;
+}
+
+static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
+{
+    unsigned int vcpuid = hvm_load_instance(h);
+    struct vcpu *v = d->vcpu[vcpuid];
+    struct vlapic *s = vcpu_vlapic(v);
 
     if ( hvm_load_entry(LAPIC_REGS, h, s->regs) != 0 )
-        return -EINVAL;
+        BUG();
 
     s->loaded.id = vlapic_get_reg(s, APIC_ID);
     s->loaded.ldr = vlapic_get_reg(s, APIC_LDR);
@@ -1623,9 +1648,9 @@ static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
     return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, NULL,
+HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, lapic_check_hidden,
                           lapic_load_hidden, 1, HVMSR_PER_VCPU);
-HVM_REGISTER_SAVE_RESTORE(LAPIC_REGS, lapic_save_regs, NULL,
+HVM_REGISTER_SAVE_RESTORE(LAPIC_REGS, lapic_save_regs, lapic_check_regs,
                           lapic_load_regs, 1, HVMSR_PER_VCPU);
 
 int vlapic_init(struct vcpu *v)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-05-29 14:32 [PATCH v3 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
  2024-05-29 14:32 ` [PATCH v3 1/6] x86/vlapic: Move lapic migration checks to the check hooks Alejandro Vallejo
@ 2024-05-29 14:32 ` Alejandro Vallejo
  2024-05-30 10:14   ` Roger Pau Monné
  2024-05-30 11:08   ` Andrew Cooper
  2024-05-29 14:32 ` [PATCH v3 3/6] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-29 14:32 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné

This allows the initial x2APIC ID to be sent on the migration stream. The
hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being.
Given the vlapic data is zero-extended on restore, fix up migrations from
hosts without the field by setting it to the old convention if zero.

x2APIC IDs are calculated from the CPU policy where the guest topology is
defined. For the time being, the function simply returns the old
relationship, but will eventually return results consistent with the
topology.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v3:
  * Added rsvd_zero check to the check hook (introduced in v3/patch1).
  * Set APIC ID properly during policy update if the APIC is already
    in x2apic mode, ensuring its LDR is updated too in that case.
  * Fixed typo in variable for x86_x2apic_id_from_vcpu_id().
    * Missed due to being mid-series.
  * Rewrote the comment on CPUID leaf 0xb.
  * Rewrote the comment on x86_x2apic_id_from_vcpu_id()
---
 xen/arch/x86/cpuid.c                   | 14 ++++-----
 xen/arch/x86/hvm/vlapic.c              | 41 ++++++++++++++++++++++++--
 xen/arch/x86/include/asm/hvm/hvm.h     |  2 ++
 xen/arch/x86/include/asm/hvm/vlapic.h  |  2 ++
 xen/include/public/arch-x86/hvm/save.h |  2 ++
 xen/include/xen/lib/x86/cpu-policy.h   |  9 ++++++
 xen/lib/x86/policy.c                   | 11 +++++++
 7 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 7a38e032146a..ebcdbc5cbc5d 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
         const struct cpu_user_regs *regs;
 
     case 0x1:
-        /* TODO: Rework topology logic. */
         res->b &= 0x00ffffffu;
         if ( is_hvm_domain(d) )
-            res->b |= (v->vcpu_id * 2) << 24;
+            res->b |= vlapic_x2apic_id(vcpu_vlapic(v)) << 24;
 
         /* TODO: Rework vPMU control in terms of toolstack choices. */
         if ( vpmu_available(v) &&
@@ -312,18 +311,15 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
 
     case 0xb:
         /*
-         * In principle, this leaf is Intel-only.  In practice, it is tightly
-         * coupled with x2apic, and we offer an x2apic-capable APIC emulation
-         * to guests on AMD hardware as well.
-         *
-         * TODO: Rework topology logic.
+         * Don't expose topology information to PV guests. Exposed on HVM
+         * along with x2APIC because they are tightly coupled.
          */
-        if ( p->basic.x2apic )
+        if ( is_hvm_domain(d) && p->basic.x2apic )
         {
             *(uint8_t *)&res->c = subleaf;
 
             /* Fix the x2APIC identifier. */
-            res->d = v->vcpu_id * 2;
+            res->d = vlapic_x2apic_id(vcpu_vlapic(v));
         }
         break;
 
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index a0df62b5ec0a..626a6258a4d4 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1072,7 +1072,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id)
 static void set_x2apic_id(struct vlapic *vlapic)
 {
     const struct vcpu *v = vlapic_vcpu(vlapic);
-    uint32_t apic_id = v->vcpu_id * 2;
+    uint32_t apic_id = vlapic->hw.x2apic_id;
     uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
 
     /*
@@ -1086,6 +1086,26 @@ static void set_x2apic_id(struct vlapic *vlapic)
     vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
 }
 
+void vlapic_cpu_policy_changed(struct vcpu *v)
+{
+    struct vlapic *vlapic = vcpu_vlapic(v);
+    const struct cpu_policy *cp = v->domain->arch.cpu_policy;
+
+    /*
+     * Don't override the initial x2APIC ID if we have migrated it or
+     * if the domain doesn't have vLAPIC at all.
+     */
+    if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
+        return;
+
+    vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
+
+    if ( vlapic_x2apic_mode(vlapic) )
+        set_x2apic_id(vlapic); /* Set the APIC ID _and_ the LDR */
+    else
+        vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
+}
+
 int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
 {
     const struct cpu_policy *cp = v->domain->arch.cpu_policy;
@@ -1452,7 +1472,7 @@ void vlapic_reset(struct vlapic *vlapic)
     if ( v->vcpu_id == 0 )
         vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
 
-    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
+    vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
     vlapic_do_init(vlapic);
 }
 
@@ -1520,6 +1540,16 @@ static void lapic_load_fixup(struct vlapic *vlapic)
     const struct vcpu *v = vlapic_vcpu(vlapic);
     uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
 
+    /*
+     * Loading record without hw.x2apic_id in the save stream, calculate using
+     * the traditional "vcpu_id * 2" relation. There's an implicit assumption
+     * that vCPU0 always has x2APIC0, which is true for the old relation, and
+     * still holds under the new x2APIC generation algorithm. While that case
+     * goes through the conditional it's benign because it still maps to zero.
+     */
+    if ( !vlapic->hw.x2apic_id )
+        vlapic->hw.x2apic_id = v->vcpu_id * 2;
+
     /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
     if ( !vlapic_x2apic_mode(vlapic) ||
          (vlapic->loaded.ldr == good_ldr) )
@@ -1588,6 +1618,13 @@ static int cf_check lapic_check_hidden(const struct domain *d,
          APIC_BASE_EXTD )
         return -EINVAL;
 
+    /*
+     * Fail migrations from newer versions of Xen where
+     * rsvd_zero is interpreted as something else.
+     */
+    if ( s.rsvd_zero )
+        return -EINVAL;
+
     return 0;
 }
 
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index 1c01e22c8e62..746b4739f53f 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -16,6 +16,7 @@
 #include <asm/current.h>
 #include <asm/x86_emulate.h>
 #include <asm/hvm/asid.h>
+#include <asm/hvm/vlapic.h>
 
 struct pirq; /* needed by pi_update_irte */
 
@@ -448,6 +449,7 @@ static inline void hvm_update_guest_efer(struct vcpu *v)
 static inline void hvm_cpuid_policy_changed(struct vcpu *v)
 {
     alternative_vcall(hvm_funcs.cpuid_policy_changed, v);
+    vlapic_cpu_policy_changed(v);
 }
 
 static inline void hvm_set_tsc_offset(struct vcpu *v, uint64_t offset,
diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
index 2c4ff94ae7a8..34f23cd38a20 100644
--- a/xen/arch/x86/include/asm/hvm/vlapic.h
+++ b/xen/arch/x86/include/asm/hvm/vlapic.h
@@ -44,6 +44,7 @@
 #define vlapic_xapic_mode(vlapic)                               \
     (!vlapic_hw_disabled(vlapic) && \
      !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
+#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
 
 /*
  * Generic APIC bitmap vector update & search routines.
@@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack);
 
 int  vlapic_init(struct vcpu *v);
 void vlapic_destroy(struct vcpu *v);
+void vlapic_cpu_policy_changed(struct vcpu *v);
 
 void vlapic_reset(struct vlapic *vlapic);
 
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 7ecacadde165..1c2ec669ffc9 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -394,6 +394,8 @@ struct hvm_hw_lapic {
     uint32_t             disabled; /* VLAPIC_xx_DISABLED */
     uint32_t             timer_divisor;
     uint64_t             tdt_msr;
+    uint32_t             x2apic_id;
+    uint32_t             rsvd_zero;
 };
 
 DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index d5e447e9dc06..392320b9adbe 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -542,6 +542,15 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
                                     const struct cpu_policy *guest,
                                     struct cpu_policy_errors *err);
 
+/**
+ * Calculates the x2APIC ID of a vCPU given a CPU policy
+ *
+ * @param p          CPU policy of the domain.
+ * @param id         vCPU ID of the vCPU.
+ * @returns x2APIC ID of the vCPU.
+ */
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id);
+
 #endif /* !XEN_LIB_X86_POLICIES_H */
 
 /*
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index f033d22785be..b70b22d55fcf 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,6 +2,17 @@
 
 #include <xen/lib/x86/cpu-policy.h>
 
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
+{
+    /*
+     * TODO: Derive x2APIC ID from the topology information inside `p`
+     *       rather than from the vCPU ID alone. This bodge is a temporary
+     *       measure until all infra is in place to retrieve or derive the
+     *       initial x2APIC ID from migrated domains.
+     */
+    return id * 2;
+}
+
 int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
                                     const struct cpu_policy *guest,
                                     struct cpu_policy_errors *err)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/6] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
  2024-05-29 14:32 [PATCH v3 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
  2024-05-29 14:32 ` [PATCH v3 1/6] x86/vlapic: Move lapic migration checks to the check hooks Alejandro Vallejo
  2024-05-29 14:32 ` [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
@ 2024-05-29 14:32 ` Alejandro Vallejo
  2024-05-29 16:16   ` Alejandro Vallejo
  2024-05-29 14:32 ` [PATCH v3 4/6] xen/lib: Add topology generator for x86 Alejandro Vallejo
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-29 14:32 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Anthony PERARD

Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
from hvmloader.

While at this also remove ap_callin, as writing the APIC ID may serve the same
purpose.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v3:
  * Moved ACCESS_ONCE() to common-macros.h
  * 8bit APIC IDs clipping at 255 was a bogus assumption. Stop relying on it.
    * APIC ID taken from leaf 0xb instead when the x2apic feature is supported.
  * Assert APIC IDs read by the APs are never zero, as that's always the BSP.
  * Added comment about how CPU_TO_X2APICID serves as cross-vcpu synchronizer.
---
 tools/firmware/hvmloader/config.h       |  6 ++-
 tools/firmware/hvmloader/hvmloader.c    |  4 +-
 tools/firmware/hvmloader/smp.c          | 54 ++++++++++++++++++++-----
 tools/include/xen-tools/common-macros.h |  5 +++
 4 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index cd716bf39245..213ac1f28e17 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -4,6 +4,8 @@
 #include <stdint.h>
 #include <stdbool.h>
 
+#include <xen/hvm/hvm_info_table.h>
+
 enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
 extern enum virtual_vga virtual_vga;
 
@@ -48,8 +50,10 @@ extern uint8_t ioapic_version;
 
 #define IOAPIC_ID           0x01
 
+extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
+
 #define LAPIC_BASE_ADDRESS  0xfee00000
-#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
+#define LAPIC_ID(vcpu_id)   (CPU_TO_X2APICID[(vcpu_id)])
 
 #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
 #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index f8af88fabf24..5c02e8fc226a 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -341,11 +341,11 @@ int main(void)
 
     printf("CPU speed is %u MHz\n", get_cpu_mhz());
 
+    smp_initialise();
+
     apic_setup();
     pci_setup();
 
-    smp_initialise();
-
     perform_tests();
 
     if ( bios->bios_info_setup )
diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
index 5d46eee1c5f4..f878d91898bf 100644
--- a/tools/firmware/hvmloader/smp.c
+++ b/tools/firmware/hvmloader/smp.c
@@ -29,7 +29,34 @@
 
 #include <xen/vcpu.h>
 
-static int ap_callin;
+/**
+ * Lookup table of x2APIC IDs.
+ *
+ * Each entry is populated its respective CPU as they come online. This is required
+ * for generating the MADT with minimal assumptions about ID relationships.
+ */
+uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
+
+/** Tristate about x2apic being supported. -1=unknown */
+static int has_x2apic = -1;
+
+static uint32_t read_apic_id(void)
+{
+    uint32_t apic_id;
+
+    if ( has_x2apic )
+        cpuid(0xb, NULL, NULL, NULL, &apic_id);
+    else
+    {
+        cpuid(1, NULL, &apic_id, NULL, NULL);
+        apic_id >>= 24;
+    }
+
+    /* Never called by cpu0, so should never return 0 */
+    ASSERT(!apic_id);
+
+    return apic_id;
+}
 
 static void __attribute__((regparm(1))) cpu_setup(unsigned int cpu)
 {
@@ -37,13 +64,17 @@ static void __attribute__((regparm(1))) cpu_setup(unsigned int cpu)
     cacheattr_init();
     printf("done.\n");
 
-    if ( !cpu ) /* Used on the BSP too */
+    /* The BSP exits early because its APIC ID is known to be zero */
+    if ( !cpu )
         return;
 
     wmb();
-    ap_callin = 1;
+    ACCESS_ONCE(CPU_TO_X2APICID[cpu]) = read_apic_id();
 
-    /* After this point, the BSP will shut us down. */
+    /*
+     * After this point the BSP will shut us down. A write to
+     * CPU_TO_X2APICID[cpu] signals the BSP to bring down `cpu`.
+     */
 
     for ( ;; )
         asm volatile ( "hlt" );
@@ -54,10 +85,6 @@ static void boot_cpu(unsigned int cpu)
     static uint8_t ap_stack[PAGE_SIZE] __attribute__ ((aligned (16)));
     static struct vcpu_hvm_context ap;
 
-    /* Initialise shared variables. */
-    ap_callin = 0;
-    wmb();
-
     /* Wake up the secondary processor */
     ap = (struct vcpu_hvm_context) {
         .mode = VCPU_HVM_MODE_32B,
@@ -90,10 +117,11 @@ static void boot_cpu(unsigned int cpu)
         BUG();
 
     /*
-     * Wait for the secondary processor to complete initialisation.
+     * Wait for the secondary processor to complete initialisation,
+     * which is signaled by its x2APIC ID being written to the LUT.
      * Do not touch shared resources meanwhile.
      */
-    while ( !ap_callin )
+    while ( !ACCESS_ONCE(CPU_TO_X2APICID[cpu]) )
         cpu_relax();
 
     /* Take the secondary processor offline. */
@@ -104,6 +132,12 @@ static void boot_cpu(unsigned int cpu)
 void smp_initialise(void)
 {
     unsigned int i, nr_cpus = hvm_info->nr_vcpus;
+    uint32_t ecx;
+
+    cpuid(1, NULL, NULL, &ecx, NULL);
+    has_x2apic = (ecx >> 21) & 1;
+    if ( has_x2apic )
+        printf("x2APIC supported\n");
 
     printf("Multiprocessor initialisation:\n");
     cpu_setup(0);
diff --git a/tools/include/xen-tools/common-macros.h b/tools/include/xen-tools/common-macros.h
index 60912225cb7a..336c6309d96e 100644
--- a/tools/include/xen-tools/common-macros.h
+++ b/tools/include/xen-tools/common-macros.h
@@ -108,4 +108,9 @@
 #define get_unaligned(ptr)      get_unaligned_t(typeof(*(ptr)), ptr)
 #define put_unaligned(val, ptr) put_unaligned_t(typeof(*(ptr)), val, ptr)
 
+#define __ACCESS_ONCE(x) ({                             \
+            (void)(typeof(x))0; /* Scalar typecheck. */ \
+            (volatile typeof(x) *)&(x); })
+#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
+
 #endif	/* __XEN_TOOLS_COMMON_MACROS__ */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/6] xen/lib: Add topology generator for x86
  2024-05-29 14:32 [PATCH v3 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
                   ` (2 preceding siblings ...)
  2024-05-29 14:32 ` [PATCH v3 3/6] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
@ 2024-05-29 14:32 ` Alejandro Vallejo
  2024-05-29 14:32 ` [PATCH v3 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
  2024-05-29 14:32 ` [PATCH v3 6/6] xen/x86: Synthesise domain topologies Alejandro Vallejo
  5 siblings, 0 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-29 14:32 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Anthony PERARD

Add a helper to populate topology leaves in the cpu policy from
threads/core and cores/package counts. It's unit-tested in test-cpu-policy.c,
but it's not connected to the rest of the code yet.

Adds the ASSERT() macro to xen/lib/x86/private.h, as it was missing.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v3:
  * Style adjustments (linewraps, newlines...)
  * Slight refactor of the TOPO() macro in unit tests.
  * Reduce indentation of x86_topo_from_parts().
  * Remove "no functional change" from commit message.
  * Assert n!=0 in clz(n)
    * Which implied adding the ASSERT() macro to private.h
---
 tools/tests/cpu-policy/test-cpu-policy.c | 133 +++++++++++++++++++++++
 xen/include/xen/lib/x86/cpu-policy.h     |  16 +++
 xen/lib/x86/policy.c                     |  90 +++++++++++++++
 xen/lib/x86/private.h                    |   4 +
 4 files changed, 243 insertions(+)

diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index 301df2c00285..849d7cebaa7c 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -650,6 +650,137 @@ static void test_is_compatible_failure(void)
     }
 }
 
+static void test_topo_from_parts(void)
+{
+    static const struct test {
+        unsigned int threads_per_core;
+        unsigned int cores_per_pkg;
+        struct cpu_policy policy;
+    } tests[] = {
+        {
+            .threads_per_core = 3, .cores_per_pkg = 1,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+                    { .nr_logical = 1, .level = 1, .type = 2, .id_shift = 2, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 1, .cores_per_pkg = 3,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
+                    { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 7, .cores_per_pkg = 5,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
+                    { .nr_logical = 5, .level = 1, .type = 2, .id_shift = 6, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 2, .cores_per_pkg = 128,
+            .policy = {
+                .x86_vendor = X86_VENDOR_AMD,
+                .topo.subleaf = {
+                    { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
+                    { .nr_logical = 128, .level = 1, .type = 2,
+                      .id_shift = 8, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 3, .cores_per_pkg = 1,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+                    { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 1, .cores_per_pkg = 3,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
+                    { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 7, .cores_per_pkg = 5,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
+                    { .nr_logical = 35, .level = 1, .type = 2, .id_shift = 6, },
+                },
+            },
+        },
+        {
+            .threads_per_core = 2, .cores_per_pkg = 128,
+            .policy = {
+                .x86_vendor = X86_VENDOR_INTEL,
+                .topo.subleaf = {
+                    { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
+                    { .nr_logical = 256, .level = 1, .type = 2,
+                      .id_shift = 8, },
+                },
+            },
+        },
+    };
+
+    printf("Testing topology synthesis from parts:\n");
+
+    for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
+    {
+        const struct test *t = &tests[i];
+        struct cpu_policy actual = { .x86_vendor = t->policy.x86_vendor };
+        int rc = x86_topo_from_parts(&actual, t->threads_per_core,
+                                     t->cores_per_pkg);
+
+        if ( rc || memcmp(&actual.topo, &t->policy.topo, sizeof(actual.topo)) )
+        {
+#define TOPO(n, f)  t->policy.topo.subleaf[(n)].f, actual.topo.subleaf[(n)].f
+            fail("FAIL[%d] - '%s %u t/c, %u c/p'\n",
+                 rc,
+                 x86_cpuid_vendor_to_str(t->policy.x86_vendor),
+                 t->threads_per_core, t->cores_per_pkg);
+            printf("  subleaf=%u  expected_n=%u actual_n=%u\n"
+                   "             expected_lvl=%u actual_lvl=%u\n"
+                   "             expected_type=%u actual_type=%u\n"
+                   "             expected_shift=%u actual_shift=%u\n",
+                   0,
+                   TOPO(0, nr_logical),
+                   TOPO(0, level),
+                   TOPO(0, type),
+                   TOPO(0, id_shift));
+
+            printf("  subleaf=%u  expected_n=%u actual_n=%u\n"
+                   "             expected_lvl=%u actual_lvl=%u\n"
+                   "             expected_type=%u actual_type=%u\n"
+                   "             expected_shift=%u actual_shift=%u\n",
+                   1,
+                   TOPO(1, nr_logical),
+                   TOPO(1, level),
+                   TOPO(1, type),
+                   TOPO(1, id_shift));
+#undef TOPO
+        }
+    }
+}
+
 int main(int argc, char **argv)
 {
     printf("CPU Policy unit tests\n");
@@ -667,6 +798,8 @@ int main(int argc, char **argv)
     test_is_compatible_success();
     test_is_compatible_failure();
 
+    test_topo_from_parts();
+
     if ( nr_failures )
         printf("Done: %u failures\n", nr_failures);
     else
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index 392320b9adbe..f5df18e9f77c 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -551,6 +551,22 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
  */
 uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id);
 
+/**
+ * Synthesise topology information in `p` given high-level constraints
+ *
+ * Topology is given in various fields accross several leaves, some of
+ * which are vendor-specific. This function uses the policy itself to
+ * derive such leaves from threads/core and cores/package.
+ *
+ * @param p                   CPU policy of the domain.
+ * @param threads_per_core    threads/core. Doesn't need to be a power of 2.
+ * @param cores_per_package   cores/package. Doesn't need to be a power of 2.
+ * @return                    0 on success; -errno on failure
+ */
+int x86_topo_from_parts(struct cpu_policy *p,
+                        unsigned int threads_per_core,
+                        unsigned int cores_per_pkg);
+
 #endif /* !XEN_LIB_X86_POLICIES_H */
 
 /*
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index b70b22d55fcf..7709736a2812 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -13,6 +13,96 @@ uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
     return id * 2;
 }
 
+static unsigned int order(unsigned int n)
+{
+    ASSERT(n); /* clz(0) is UB */
+
+    return 8 * sizeof(n) - __builtin_clz(n);
+}
+
+int x86_topo_from_parts(struct cpu_policy *p,
+                        unsigned int threads_per_core,
+                        unsigned int cores_per_pkg)
+{
+    unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;
+    unsigned int apic_id_size;
+
+    if ( !p || !threads_per_core || !cores_per_pkg )
+        return -EINVAL;
+
+    p->basic.max_leaf = MAX(0xb, p->basic.max_leaf);
+
+    memset(p->topo.raw, 0, sizeof(p->topo.raw));
+
+    /* thread level */
+    p->topo.subleaf[0].nr_logical = threads_per_core;
+    p->topo.subleaf[0].id_shift = 0;
+    p->topo.subleaf[0].level = 0;
+    p->topo.subleaf[0].type = 1;
+    if ( threads_per_core > 1 )
+        p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
+
+    /* core level */
+    p->topo.subleaf[1].nr_logical = cores_per_pkg;
+    if ( p->x86_vendor == X86_VENDOR_INTEL )
+        p->topo.subleaf[1].nr_logical = threads_per_pkg;
+    p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
+    p->topo.subleaf[1].level = 1;
+    p->topo.subleaf[1].type = 2;
+    if ( cores_per_pkg > 1 )
+        p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
+
+    apic_id_size = p->topo.subleaf[1].id_shift;
+
+    /*
+     * Contrary to what the name might seem to imply. HTT is an enabler for
+     * SMP and there's no harm in setting it even with a single vCPU.
+     */
+    p->basic.htt = true;
+    p->basic.lppp = MIN(0xff, p->basic.lppp);
+
+    switch ( p->x86_vendor )
+    {
+    case X86_VENDOR_INTEL: {
+        struct cpuid_cache_leaf *sl = p->cache.subleaf;
+
+        for ( size_t i = 0; sl->type &&
+                            i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
+        {
+            sl->cores_per_package = cores_per_pkg - 1;
+            sl->threads_per_cache = threads_per_core - 1;
+            if ( sl->type == 3 /* unified cache */ )
+                sl->threads_per_cache = threads_per_pkg - 1;
+        }
+        break;
+    }
+
+    case X86_VENDOR_AMD:
+    case X86_VENDOR_HYGON:
+        /* Expose p->basic.lppp */
+        p->extd.cmp_legacy = true;
+
+        /* Clip NC to the maximum value it can hold */
+        p->extd.nc = 0xff;
+        if ( threads_per_pkg <= 0xff )
+            p->extd.nc = threads_per_pkg - 1;
+
+        /* TODO: Expose leaf e1E */
+        p->extd.topoext = false;
+
+        /*
+         * Clip APIC ID to 8 bits, as that's what high core-count machines do.
+         *
+         * That's what AMD EPYC 9654 does with >256 CPUs.
+         */
+        p->extd.apic_id_size = MIN(8, apic_id_size);
+
+        break;
+    }
+
+    return 0;
+}
+
 int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
                                     const struct cpu_policy *guest,
                                     struct cpu_policy_errors *err)
diff --git a/xen/lib/x86/private.h b/xen/lib/x86/private.h
index 60bb82a400b7..2ec9dbee33c2 100644
--- a/xen/lib/x86/private.h
+++ b/xen/lib/x86/private.h
@@ -4,6 +4,7 @@
 #ifdef __XEN__
 
 #include <xen/bitops.h>
+#include <xen/bug.h>
 #include <xen/guest_access.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -17,6 +18,7 @@
 
 #else
 
+#include <assert.h>
 #include <errno.h>
 #include <inttypes.h>
 #include <stdbool.h>
@@ -28,6 +30,8 @@
 
 #include <xen-tools/common-macros.h>
 
+#define ASSERT(x) assert(x)
+
 static inline bool test_bit(unsigned int bit, const void *vaddr)
 {
     const char *addr = vaddr;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy
  2024-05-29 14:32 [PATCH v3 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
                   ` (3 preceding siblings ...)
  2024-05-29 14:32 ` [PATCH v3 4/6] xen/lib: Add topology generator for x86 Alejandro Vallejo
@ 2024-05-29 14:32 ` Alejandro Vallejo
  2024-05-29 14:32 ` [PATCH v3 6/6] xen/x86: Synthesise domain topologies Alejandro Vallejo
  5 siblings, 0 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-29 14:32 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné, Anthony PERARD

Implements the helper for mapping vcpu_id to x2apic_id given a valid
topology in a policy. The algo is written with the intention of extending
it to leaves 0x1f and extended 0x26 in the future.

Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared,
so the leaf is not implemented. In that case, the new helper just returns
the legacy mapping.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v3:
  * Formatting adjustments.
  * Replace a conditional in x86_topo_from_parts() with MIN()
    * Was meant to happen in v2, but fell between the cracks.
  * Moved the `policy` variable to the inner scope so it's clean for every test.
  * Rewrote commit message to say "extended 0x26" rather than e26.
---
 tools/tests/cpu-policy/test-cpu-policy.c | 68 ++++++++++++++++++++++
 xen/include/xen/lib/x86/cpu-policy.h     |  2 +
 xen/lib/x86/policy.c                     | 73 ++++++++++++++++++++++--
 3 files changed, 138 insertions(+), 5 deletions(-)

diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index 849d7cebaa7c..e5f9b8f7ee39 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -781,6 +781,73 @@ static void test_topo_from_parts(void)
     }
 }
 
+static void test_x2apic_id_from_vcpu_id_success(void)
+{
+    static const struct test {
+        unsigned int vcpu_id;
+        unsigned int threads_per_core;
+        unsigned int cores_per_pkg;
+        uint32_t x2apic_id;
+        uint8_t x86_vendor;
+    } tests[] = {
+        {
+            .vcpu_id = 3, .threads_per_core = 3, .cores_per_pkg = 8,
+            .x2apic_id = 1 << 2,
+        },
+        {
+            .vcpu_id = 6, .threads_per_core = 3, .cores_per_pkg = 8,
+            .x2apic_id = 2 << 2,
+        },
+        {
+            .vcpu_id = 24, .threads_per_core = 3, .cores_per_pkg = 8,
+            .x2apic_id = 1 << 5,
+        },
+        {
+            .vcpu_id = 35, .threads_per_core = 3, .cores_per_pkg = 8,
+            .x2apic_id = (35 % 3) | (((35 / 3) % 8) << 2) | ((35 / 24) << 5),
+        },
+        {
+            .vcpu_id = 96, .threads_per_core = 7, .cores_per_pkg = 3,
+            .x2apic_id = (96 % 7) | (((96 / 7) % 3) << 3) | ((96 / 21) << 5),
+        },
+    };
+
+    const uint8_t vendors[] = {
+        X86_VENDOR_INTEL,
+        X86_VENDOR_AMD,
+        X86_VENDOR_CENTAUR,
+        X86_VENDOR_SHANGHAI,
+        X86_VENDOR_HYGON,
+    };
+
+    printf("Testing x2apic id from vcpu id success:\n");
+
+    /* Perform the test run on every vendor we know about */
+    for ( size_t i = 0; i < ARRAY_SIZE(vendors); ++i )
+    {
+        for ( size_t j = 0; j < ARRAY_SIZE(tests); ++j )
+        {
+            struct cpu_policy policy = { .x86_vendor = vendors[i] };
+            const struct test *t = &tests[j];
+            uint32_t x2apic_id;
+            int rc = x86_topo_from_parts(&policy, t->threads_per_core,
+                                         t->cores_per_pkg);
+
+            if ( rc ) {
+                fail("FAIL[%d] - 'x86_topo_from_parts() failed", rc);
+                continue;
+            }
+
+            x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id);
+            if ( x2apic_id != t->x2apic_id )
+                fail("FAIL - '%s cpu%u %u t/c %u c/p'. bad x2apic_id: expected=%u actual=%u\n",
+                     x86_cpuid_vendor_to_str(policy.x86_vendor),
+                     t->vcpu_id, t->threads_per_core, t->cores_per_pkg,
+                     t->x2apic_id, x2apic_id);
+        }
+    }
+}
+
 int main(int argc, char **argv)
 {
     printf("CPU Policy unit tests\n");
@@ -799,6 +866,7 @@ int main(int argc, char **argv)
     test_is_compatible_failure();
 
     test_topo_from_parts();
+    test_x2apic_id_from_vcpu_id_success();
 
     if ( nr_failures )
         printf("Done: %u failures\n", nr_failures);
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index f5df18e9f77c..2cbc2726a861 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -545,6 +545,8 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
 /**
  * Calculates the x2APIC ID of a vCPU given a CPU policy
  *
+ * If the policy lacks leaf 0xb falls back to legacy mapping of apic_id=cpu*2
+ *
  * @param p          CPU policy of the domain.
  * @param id         vCPU ID of the vCPU.
  * @returns x2APIC ID of the vCPU.
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index 7709736a2812..239386b71769 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,15 +2,78 @@
 
 #include <xen/lib/x86/cpu-policy.h>
 
+static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
+{
+    /*
+     * `nr_logical` reported by Intel is the number of THREADS contained in
+     * the next topological scope. For example, assuming a system with 2
+     * threads/core and 3 cores/module in a fully symmetric topology,
+     * `nr_logical` at the core level will report 6. Because it's reporting
+     * the number of threads in a module.
+     *
+     * On AMD/Hygon, nr_logical is already normalized by the higher scoped
+     * level (cores/complex, etc) so we can return it as-is.
+     */
+    if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
+        return p->topo.subleaf[lvl].nr_logical;
+
+    return p->topo.subleaf[lvl].nr_logical / p->topo.subleaf[lvl - 1].nr_logical;
+}
+
 uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
 {
+    uint32_t shift = 0, x2apic_id = 0;
+
+    /* In the absence of topology leaves, fallback to traditional mapping */
+    if ( !p->topo.subleaf[0].type )
+        return id * 2;
+
     /*
-     * TODO: Derive x2APIC ID from the topology information inside `p`
-     *       rather than from the vCPU ID alone. This bodge is a temporary
-     *       measure until all infra is in place to retrieve or derive the
-     *       initial x2APIC ID from migrated domains.
+     * `id` means different things at different points of the algo
+     *
+     * At lvl=0: global thread_id (same as vcpu_id)
+     * At lvl=1: global core_id
+     * At lvl=2: global socket_id (actually complex_id in AMD, module_id
+     *                             in Intel, but the name is inconsequential)
+     *
+     *                 +--+
+     *            ____ |#0| ______           <= 1 socket
+     *           /     +--+       \+--+
+     *       __#0__              __|#1|__    <= 2 cores/socket
+     *      /   |  \        +--+/  +-|+  \
+     *    #0   #1   #2      |#3|    #4    #5 <= 3 threads/core
+     *                      +--+
+     *
+     * ... and so on. Global in this context means that it's a unique
+     * identifier for the whole topology, and not relative to the level
+     * it's in. For example, in the diagram shown above, we're looking at
+     * thread #3 in the global sense, though it's #0 within its core.
+     *
+     * Note that dividing a global thread_id by the number of threads per
+     * core returns the global core id that contains it. e.g: 0, 1 or 2
+     * divided by 3 returns core_id=0. 3, 4 or 5 divided by 3 returns core
+     * 1, and so on. An analogous argument holds for higher levels. This is
+     * the property we exploit to derive x2apic_id from vcpu_id.
+     *
+     * NOTE: `topo` is currently derived from leaf 0xb, which is bound to two
+     * levels, but once we track leaves 0x1f (or extended 0x26) there will be a
+     * few more. The algorithm is written to cope with that case.
      */
-    return id * 2;
+    for ( uint32_t i = 0; i < ARRAY_SIZE(p->topo.raw); i++ )
+    {
+        uint32_t nr_parts;
+
+        if ( !p->topo.subleaf[i].type )
+            /* sentinel subleaf */
+            break;
+
+        nr_parts = parts_per_higher_scoped_level(p, i);
+        x2apic_id |= (id % nr_parts) << shift;
+        id /= nr_parts;
+        shift = p->topo.subleaf[i].id_shift;
+    }
+
+    return (id << shift) | x2apic_id;
 }
 
 static unsigned int order(unsigned int n)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 6/6] xen/x86: Synthesise domain topologies
  2024-05-29 14:32 [PATCH v3 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
                   ` (4 preceding siblings ...)
  2024-05-29 14:32 ` [PATCH v3 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
@ 2024-05-29 14:32 ` Alejandro Vallejo
  5 siblings, 0 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-29 14:32 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Anthony PERARD, Juergen Gross, Jan Beulich,
	Andrew Cooper, Roger Pau Monné

Expose sensible topologies in leaf 0xb. At the moment it synthesises non-HT
systems, in line with the previous code intent.

Leaf 0xb in the host policy is no longer zapped and the guest {max,def} policies
have their topology leaves zapped instead. The intent is for toolstack to
populate them. There's no current use for the topology information in the host
policy, but it makes no harm.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v3:
  * Formatting adjustments.
  * Restored previous topology logic and gated it through the "restore" variable
  * Print return code on topo generation failures.
  * Adjusted comment on wiping the topology in the guest policies.
  * Described the changes in topology zero-out in the commit message.
---
 tools/libs/guest/xg_cpuid_x86.c | 24 +++++++++++++++++++++++-
 xen/arch/x86/cpu-policy.c       |  9 ++++++---
 xen/lib/x86/policy.c            |  9 ++++++---
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 4453178100ad..6062dcab01ce 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -725,8 +725,16 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
         p->policy.basic.htt       = test_bit(X86_FEATURE_HTT, host_featureset);
         p->policy.extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset);
     }
-    else
+    else if ( restore )
     {
+        /*
+         * Reconstruct the topology exposed on Xen <= 4.13. It makes very little
+         * sense, but it's what those guests saw so it's set in stone now.
+         *
+         * Guests from Xen 4.14 onwards carry their own CPUID leaves in the
+         * migration stream so they don't need special treatment.
+         */
+
         /*
          * Topology for HVM guests is entirely controlled by Xen.  For now, we
          * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
@@ -782,6 +790,20 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
             break;
         }
     }
+    else
+    {
+        /* TODO: Expose the ability to choose a custom topology for HVM/PVH */
+        unsigned int threads_per_core = 1;
+        unsigned int cores_per_pkg = di.max_vcpu_id + 1;
+
+        rc = x86_topo_from_parts(&p->policy, threads_per_core, cores_per_pkg);
+        if ( rc )
+        {
+            ERROR("Failed to generate topology: rc=%d t/c=%u c/p=%u",
+                  rc, threads_per_core, cores_per_pkg);
+            goto out;
+        }
+    }
 
     nr_leaves = ARRAY_SIZE(p->leaves);
     rc = x86_cpuid_copy_to_buffer(&p->policy, p->leaves, &nr_leaves);
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index b96f4ee55cc4..ecbe98302df2 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -278,9 +278,6 @@ static void recalculate_misc(struct cpu_policy *p)
 
     p->basic.raw[0x8] = EMPTY_LEAF;
 
-    /* TODO: Rework topology logic. */
-    memset(p->topo.raw, 0, sizeof(p->topo.raw));
-
     p->basic.raw[0xc] = EMPTY_LEAF;
 
     p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES;
@@ -628,6 +625,9 @@ static void __init calculate_pv_max_policy(void)
     recalculate_xstate(p);
 
     p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */
+
+    /* Wipe host topology. Populated by toolstack */
+    memset(p->topo.raw, 0, sizeof(p->topo.raw));
 }
 
 static void __init calculate_pv_def_policy(void)
@@ -791,6 +791,9 @@ static void __init calculate_hvm_max_policy(void)
 
     /* It's always possible to emulate CPUID faulting for HVM guests */
     p->platform_info.cpuid_faulting = true;
+
+    /* Wipe host topology. Populated by toolstack */
+    memset(p->topo.raw, 0, sizeof(p->topo.raw));
 }
 
 static void __init calculate_hvm_def_policy(void)
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index 239386b71769..01b9ed39d597 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,7 +2,8 @@
 
 #include <xen/lib/x86/cpu-policy.h>
 
-static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
+static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p,
+                                              size_t lvl)
 {
     /*
      * `nr_logical` reported by Intel is the number of THREADS contained in
@@ -17,10 +18,12 @@ static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t
     if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
         return p->topo.subleaf[lvl].nr_logical;
 
-    return p->topo.subleaf[lvl].nr_logical / p->topo.subleaf[lvl - 1].nr_logical;
+    return p->topo.subleaf[lvl].nr_logical /
+           p->topo.subleaf[lvl - 1].nr_logical;
 }
 
-uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p,
+                                    uint32_t id)
 {
     uint32_t shift = 0, x2apic_id = 0;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/6] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
  2024-05-29 14:32 ` [PATCH v3 3/6] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
@ 2024-05-29 16:16   ` Alejandro Vallejo
  2024-05-29 16:42     ` Alejandro Vallejo
  0 siblings, 1 reply; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-29 16:16 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD

On 29/05/2024 15:32, Alejandro Vallejo wrote:
> +static uint32_t read_apic_id(void)
> +{
> +    uint32_t apic_id;
> +
> +    if ( has_x2apic )
> +        cpuid(0xb, NULL, NULL, NULL, &apic_id);
> +    else
> +    {
> +        cpuid(1, NULL, &apic_id, NULL, NULL);
> +        apic_id >>= 24;
> +    }
> +
> +    /* Never called by cpu0, so should never return 0 */
> +    ASSERT(!apic_id);

Bah! Typo. That's meant to be ASSERT(apic_id).

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/6] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
  2024-05-29 16:16   ` Alejandro Vallejo
@ 2024-05-29 16:42     ` Alejandro Vallejo
  0 siblings, 0 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-29 16:42 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD

On 29/05/2024 17:16, Alejandro Vallejo wrote:
> On 29/05/2024 15:32, Alejandro Vallejo wrote:
>> +static uint32_t read_apic_id(void)
>> +{
>> +    uint32_t apic_id;
>> +
>> +    if ( has_x2apic )
>> +        cpuid(0xb, NULL, NULL, NULL, &apic_id);
>> +    else
>> +    {
>> +        cpuid(1, NULL, &apic_id, NULL, NULL);
>> +        apic_id >>= 24;
>> +    }
>> +
>> +    /* Never called by cpu0, so should never return 0 */
>> +    ASSERT(!apic_id);
> 
> Bah! Typo. That's meant to be ASSERT(apic_id).
> 
> Cheers,
> Alejandro

And something else is broke, because I can't boot HVM guests anymore.
Thought we had HVM tests in gitlab, but apparently we don't.

https://gitlab.com/xen-project/people/agvallejo/xen/-/pipelines/1309033234

Closest I see is booting alpine, but that's a 1-vCPU PV case, which
wouldn't expose this.

I'll resend the series after weeding out the bugs.

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/6] x86/vlapic: Move lapic migration checks to the check hooks
  2024-05-29 14:32 ` [PATCH v3 1/6] x86/vlapic: Move lapic migration checks to the check hooks Alejandro Vallejo
@ 2024-05-30 10:07   ` Roger Pau Monné
  0 siblings, 0 replies; 15+ messages in thread
From: Roger Pau Monné @ 2024-05-30 10:07 UTC (permalink / raw)
  To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper

On Wed, May 29, 2024 at 03:32:30PM +0100, Alejandro Vallejo wrote:
> While doing this, factor out checks common to architectural and hidden state.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>

Reviewed-by: Roger PAu Monné <roger.pau@citrix.com>

With the BUG() possibly replaced with ASSERT_UNREACHABLE(),

> ---
> v3:
>   * Moved from v2/patch3.
>   * Added check hook for the architectural state as well.
>   * Use domain_vcpu() rather than the previous open coded checks for vcpu range.
> ---
>  xen/arch/x86/hvm/vlapic.c | 81 +++++++++++++++++++++++++--------------
>  1 file changed, 53 insertions(+), 28 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 9cfc82666ae5..a0df62b5ec0a 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1553,60 +1553,85 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>                 v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
>  }
>  
> -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
> -{
> -    unsigned int vcpuid = hvm_load_instance(h);
> -    struct vcpu *v;
> -    struct vlapic *s;
>  
> +static int lapic_check_common(const struct domain *d, unsigned int vcpuid)
> +{
>      if ( !has_vlapic(d) )
>          return -ENODEV;
>  
>      /* Which vlapic to load? */
> -    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +    if ( !domain_vcpu(d, vcpuid) )
>      {
>          dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
>                  d->domain_id, vcpuid);

The message here is kind of misleading as printing apic%u makes it
look like it's an APIC ID, but it's a vCPU ID.  It would be best to
just print: "HVM restore: dom%d has no vCPU %u\n".

>          return -EINVAL;
>      }
> -    s = vcpu_vlapic(v);
>  
> -    if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
> +    return 0;
> +}
> +
> +static int cf_check lapic_check_hidden(const struct domain *d,
> +                                       hvm_domain_context_t *h)
> +{
> +    unsigned int vcpuid = hvm_load_instance(h);
> +    struct hvm_hw_lapic s;
> +    int rc;
> +
> +    if ( (rc = lapic_check_common(d, vcpuid)) )
> +        return rc;

Nit: I don't like much to assign values inside of conditions, I would
rather do:

int rc = lapic_check_common(d, vcpuid);

if ( rc )
    return rc;

> +
> +    if ( hvm_load_entry_zeroextend(LAPIC, h, &s) != 0 )
> +        return -ENODATA;
> +
> +    /* EN=0 with EXTD=1 is illegal */
> +    if ( (s.apic_base_msr & (APIC_BASE_ENABLE | APIC_BASE_EXTD)) ==
> +         APIC_BASE_EXTD )
>          return -EINVAL;
>  
> +    return 0;
> +}
> +
> +static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
> +{
> +    unsigned int vcpuid = hvm_load_instance(h);
> +    struct vcpu *v = d->vcpu[vcpuid];

Not sure whether it's worth using domain_vcpu() here.  We have already
checked the vCPU is valid.

> +    struct vlapic *s = vcpu_vlapic(v);
> +
> +    if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
> +        BUG();

I would use { ASSERT_UNREACHABLE(); return -EINVAL; } here, there's
IMO no strict reason to panic on non-debug builds.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-05-29 14:32 ` [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
@ 2024-05-30 10:14   ` Roger Pau Monné
  2024-05-30 11:08   ` Andrew Cooper
  1 sibling, 0 replies; 15+ messages in thread
From: Roger Pau Monné @ 2024-05-30 10:14 UTC (permalink / raw)
  To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper

On Wed, May 29, 2024 at 03:32:31PM +0100, Alejandro Vallejo wrote:
> This allows the initial x2APIC ID to be sent on the migration stream. The
> hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being.
> Given the vlapic data is zero-extended on restore, fix up migrations from
> hosts without the field by setting it to the old convention if zero.
> 
> x2APIC IDs are calculated from the CPU policy where the guest topology is
> defined. For the time being, the function simply returns the old
> relationship, but will eventually return results consistent with the
> topology.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-05-29 14:32 ` [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
  2024-05-30 10:14   ` Roger Pau Monné
@ 2024-05-30 11:08   ` Andrew Cooper
  2024-05-30 13:48     ` Alejandro Vallejo
  2024-05-31 10:31     ` Roger Pau Monné
  1 sibling, 2 replies; 15+ messages in thread
From: Andrew Cooper @ 2024-05-30 11:08 UTC (permalink / raw)
  To: Alejandro Vallejo, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné

On 29/05/2024 3:32 pm, Alejandro Vallejo wrote:
> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
> index f033d22785be..b70b22d55fcf 100644
> --- a/xen/lib/x86/policy.c
> +++ b/xen/lib/x86/policy.c
> @@ -2,6 +2,17 @@
>  
>  #include <xen/lib/x86/cpu-policy.h>
>  
> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
> +{
> +    /*
> +     * TODO: Derive x2APIC ID from the topology information inside `p`
> +     *       rather than from the vCPU ID alone. This bodge is a temporary
> +     *       measure until all infra is in place to retrieve or derive the
> +     *       initial x2APIC ID from migrated domains.
> +     */
> +    return id * 2;
> +}
> +

I'm afraid it's nonsensical to try and derive x2APIC ID from a
policy+vcpu_id.

Take a step back, and think the data through.

A VM has:
* A unique APIC_ID for each vCPU
* Info in CPUID describing how to decompose the APIC_ID into topology

Right now, because this is all completely broken, we have:
* Hardcoded APIC_ID = vCPU_ID * 2
* Total nonsense in CPUID


When constructing a VM, the toolstack (given suitable admin
guidance/defaults) *must* choose both:
* The APIC_ID themselves
* The CPUID topo data to match

i.e. this series should be editing the toolstack's call to
xc_domain_hvm_setcontext().

It's not, because AFAICT you're depending on the migration compatibility
logic and inserting a new hardcoded assumption about symmetry of the layout.


The data flows we need are:

(New) create:
* Toolstack chooses both parts of topo information
* Xen needs a default, which reasonably can be APIC_ID=vCPU_ID when the
rest of the data flow has been cleaned up.  But this is needs to be
explicit in vcpu_create() and without reference to the policy.

And to be clear, it's fine for now for the toolstack to choose a
symmetric layout and pick appropriate APIC_IDs+CPUID for this, but it
needs to be the toolstack making this decision, not Xen inventing state
out of thin air based on the toolstack only giving half the information.

(New) migrate:
* Data from the stream, exactly as presented

(Compat) migrate:
* Synthesize the missing xapic_id field in LAPIC_REGs as APIC_ID=vCPU_ID
* 2.

I'm pretty sure this will be a net reduction in complexity in this
series.  It definitely reduces the Xen complexity.

~Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-05-30 11:08   ` Andrew Cooper
@ 2024-05-30 13:48     ` Alejandro Vallejo
  2024-05-30 15:47       ` Roger Pau Monné
  2024-05-31 10:31     ` Roger Pau Monné
  1 sibling, 1 reply; 15+ messages in thread
From: Alejandro Vallejo @ 2024-05-30 13:48 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné

On 30/05/2024 12:08, Andrew Cooper wrote:
> On 29/05/2024 3:32 pm, Alejandro Vallejo wrote:
>> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
>> index f033d22785be..b70b22d55fcf 100644
>> --- a/xen/lib/x86/policy.c
>> +++ b/xen/lib/x86/policy.c
>> @@ -2,6 +2,17 @@
>>  
>>  #include <xen/lib/x86/cpu-policy.h>
>>  
>> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
>> +{
>> +    /*
>> +     * TODO: Derive x2APIC ID from the topology information inside `p`
>> +     *       rather than from the vCPU ID alone. This bodge is a temporary
>> +     *       measure until all infra is in place to retrieve or derive the
>> +     *       initial x2APIC ID from migrated domains.
>> +     */
>> +    return id * 2;
>> +}
>> +
> 
> I'm afraid it's nonsensical to try and derive x2APIC ID from a
> policy+vcpu_id.

That's debatable, and we clearly have different views, however...

> 
> Take a step back, and think the data through.
> 
> A VM has:
> * A unique APIC_ID for each vCPU
> * Info in CPUID describing how to decompose the APIC_ID into topology
> 
> Right now, because this is all completely broken, we have:
> * Hardcoded APIC_ID = vCPU_ID * 2
> * Total nonsense in CPUID
> 
> 
> When constructing a VM, the toolstack (given suitable admin
> guidance/defaults) *must* choose both:
> * The APIC_ID themselves
> * The CPUID topo data to match
> 
> i.e. this series should be editing the toolstack's call to
> xc_domain_hvm_setcontext().
> 
> It's not, because AFAICT you're depending on the migration compatibility
> logic and inserting a new hardcoded assumption about symmetry of the layout.
> 
> 
> The data flows we need are:
> 
> (New) create:
> * Toolstack chooses both parts of topo information
> * Xen needs a default, which reasonably can be APIC_ID=vCPU_ID when the
> rest of the data flow has been cleaned up.  But this is needs to be
> explicit in vcpu_create() and without reference to the policy.
> 
> And to be clear, it's fine for now for the toolstack to choose a
> symmetric layout and pick appropriate APIC_IDs+CPUID for this, but it
> needs to be the toolstack making this decision, not Xen inventing state
> out of thin air based on the toolstack only giving half the information.
> 
> (New) migrate:
> * Data from the stream, exactly as presented
> 
> (Compat) migrate:
> * Synthesize the missing xapic_id field in LAPIC_REGs as APIC_ID=vCPU_ID
> * 2.
> 
> I'm pretty sure this will be a net reduction in complexity in this
> series.  It definitely reduces the Xen complexity.
> 
> ~Andrew

... I didn't know toolstack could send hvmcontexts during non-migrated
domain creation. That's neat!

I was going to defend my approach (because it does make sense), but
there's an extra benefit from yours you didn't seem to notice. With the
x2apicid in the migration stream (patches 1 and parts of 2) it's not
only possible to set the APIC ID from toolstack per vCPU with the
contexts, but it would also allow toolstack to be responsible to
preinitialize all the APICs in x2apic mode when any of them is 255 or more.

I'll try to do that soon-ish. I suspect the pain points are going to be
making it work nicely as well on 1vCPU systems with no APIC (are those
expected to work?).

I'm not looking forward to re-testing all of this again...

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-05-30 13:48     ` Alejandro Vallejo
@ 2024-05-30 15:47       ` Roger Pau Monné
  0 siblings, 0 replies; 15+ messages in thread
From: Roger Pau Monné @ 2024-05-30 15:47 UTC (permalink / raw)
  To: Alejandro Vallejo; +Cc: Andrew Cooper, Xen-devel, Jan Beulich

On Thu, May 30, 2024 at 02:48:10PM +0100, Alejandro Vallejo wrote:
> I'll try to do that soon-ish. I suspect the pain points are going to be
> making it work nicely as well on 1vCPU systems with no APIC (are
> those expected to work?).

We do not allow creation of PVH/HVM domains without an emulated local
APIC, and I don't think we ever want to allow doing so (see
emulation_flags_ok()).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
  2024-05-30 11:08   ` Andrew Cooper
  2024-05-30 13:48     ` Alejandro Vallejo
@ 2024-05-31 10:31     ` Roger Pau Monné
  1 sibling, 0 replies; 15+ messages in thread
From: Roger Pau Monné @ 2024-05-31 10:31 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Alejandro Vallejo, Xen-devel, Jan Beulich

On Thu, May 30, 2024 at 12:08:26PM +0100, Andrew Cooper wrote:
> On 29/05/2024 3:32 pm, Alejandro Vallejo wrote:
> > diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
> > index f033d22785be..b70b22d55fcf 100644
> > --- a/xen/lib/x86/policy.c
> > +++ b/xen/lib/x86/policy.c
> > @@ -2,6 +2,17 @@
> >  
> >  #include <xen/lib/x86/cpu-policy.h>
> >  
> > +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
> > +{
> > +    /*
> > +     * TODO: Derive x2APIC ID from the topology information inside `p`
> > +     *       rather than from the vCPU ID alone. This bodge is a temporary
> > +     *       measure until all infra is in place to retrieve or derive the
> > +     *       initial x2APIC ID from migrated domains.
> > +     */
> > +    return id * 2;
> > +}
> > +
> 
> I'm afraid it's nonsensical to try and derive x2APIC ID from a
> policy+vcpu_id.
> 
> Take a step back, and think the data through.
> 
> A VM has:
> * A unique APIC_ID for each vCPU
> * Info in CPUID describing how to decompose the APIC_ID into topology
> 
> Right now, because this is all completely broken, we have:
> * Hardcoded APIC_ID = vCPU_ID * 2
> * Total nonsense in CPUID
> 
> 
> When constructing a VM, the toolstack (given suitable admin
> guidance/defaults) *must* choose both:
> * The APIC_ID themselves
> * The CPUID topo data to match
> 
> i.e. this series should be editing the toolstack's call to
> xc_domain_hvm_setcontext().
> 
> It's not, because AFAICT you're depending on the migration compatibility
> logic and inserting a new hardcoded assumption about symmetry of the layout.
> 
> 
> The data flows we need are:
> 
> (New) create:
> * Toolstack chooses both parts of topo information
> * Xen needs a default, which reasonably can be APIC_ID=vCPU_ID when the
> rest of the data flow has been cleaned up.  But this is needs to be
> explicit in vcpu_create() and without reference to the policy.

Doesn't using APIC_ID=vCPU_ID limits us to only being able to expose
certain typologies? (as vCPU IDs are contiguous).  For example
exposing a topology with 3 cores per package won't be possible?

Not saying it's a bad move to start this way, but if we want to
support exposing more exotic topology sooner or later we will need
some kind of logic that assigns the APIC IDs based on the knowledge of
the expected topology.  Whether is gets such knowledge from the CPU
policy or directly from the toolstack is another question.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-31 10:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29 14:32 [PATCH v3 0/6] x86: Expose consistent topology to guests Alejandro Vallejo
2024-05-29 14:32 ` [PATCH v3 1/6] x86/vlapic: Move lapic migration checks to the check hooks Alejandro Vallejo
2024-05-30 10:07   ` Roger Pau Monné
2024-05-29 14:32 ` [PATCH v3 2/6] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
2024-05-30 10:14   ` Roger Pau Monné
2024-05-30 11:08   ` Andrew Cooper
2024-05-30 13:48     ` Alejandro Vallejo
2024-05-30 15:47       ` Roger Pau Monné
2024-05-31 10:31     ` Roger Pau Monné
2024-05-29 14:32 ` [PATCH v3 3/6] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
2024-05-29 16:16   ` Alejandro Vallejo
2024-05-29 16:42     ` Alejandro Vallejo
2024-05-29 14:32 ` [PATCH v3 4/6] xen/lib: Add topology generator for x86 Alejandro Vallejo
2024-05-29 14:32 ` [PATCH v3 5/6] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
2024-05-29 14:32 ` [PATCH v3 6/6] xen/x86: Synthesise domain topologies Alejandro Vallejo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.