* [PATCH v2 0/8] x86: Expose consistent topology to guests
@ 2024-05-08 12:39 Alejandro Vallejo
2024-05-08 12:39 ` [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
` (7 more replies)
0 siblings, 8 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-08 12:39 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD, Juergen Gross
v1 -> v2:
* v1/patch 4 replaced by a different strategy (See patches 4 and 5 in v2):
* Have hvmloader populate MADT with the real APIC IDs as read by the APs
themselves rather than giving it knowledge on how to derive them.
* Removed patches 2 and 3 in v1, as no longer relevant.
* Split v1/patch6 in two parts ((a) creating the generator and (b) plugging it
in) and use the generator in the unit tests of the vcpuid->apicid mapping
function. Becomes patches 6 and 8 in v2.
Patch 1: Same as v1/patch1.
Patch 2: Header dependency cleanup in preparation for patch3.
Patch 3: Adds vlapic_hidden check for the newly introduced reserved area.
Patch 4: [hvmloader] Replaces INIT+SIPI+SIPI sequences with hypercalls.
Patch 5: [hvmloader] Retrieve the per-CPU APIC IDs from the APs themselves.
Patch 6: Split from v1/patch6.
Patch 7: Logically matching v1/patch5, but using v2/patch6 for testing.
Patch 8: Split from v1/patch6.
=== Original cover letter ===
Current topology handling is close to non-existent. As things stand, APIC
IDs are allocated through the apic_id=vcpu_id*2 relation without giving any
hints to the OS on how to parse the x2APIC ID of a given CPU and assuming
the guest will assume 2 threads per core.
This series involves bringing x2APIC IDs into the migration stream, so
older guests keep operating as they used to and enhancing Xen+toolstack so
new guests get topology information consistent with their x2APIC IDs. As a
side effect of this, x2APIC IDs are now packed and don't have (unless under
a pathological case) gaps.
Further work ought to allow combining this topology configurations with
gang-scheduling of guest hyperthreads into affine physical hyperthreads.
For the time being it purposefully keeps the configuration of "1 socket" +
"1 thread per core" + "1 core per vCPU".
Patch 1: Includes x2APIC IDs in the migration stream. This allows Xen to
reconstruct the right x2APIC IDs on migrated-in guests, and
future-proofs itself in the face of x2APIC ID derivation changes.
Patch 2: Minor refactor to expose xc_cpu_policy in libxl
Patch 3: Refactors xen/lib/x86 to work on non-Xen freestanding environments
(e.g: hvmloader)
Patch 4: Remove old assumptions about vcpu_id<->apic_id relationship in hvmloader
Patch 5: Add logic to derive x2APIC IDs given a CPU policy and vCPU IDs
Patch 6: Includes a simple topology generator for toolstack so new guests
have topologically consistent information in CPUID
Alejandro Vallejo (8):
xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
xen/x86: Simplify header dependencies in x86/hvm
x86/vlapic: Move lapic_load_hidden migration checks to the check hook
tools/hvmloader: Wake APs with hypercalls and not with INIT+SIPI+SIPI
tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
xen/lib: Add topology generator for x86
xen/x86: Derive topologically correct x2APIC IDs from the policy
xen/x86: Synthesise domain topologies
tools/firmware/hvmloader/config.h | 6 +-
tools/firmware/hvmloader/hvmloader.c | 4 +-
tools/firmware/hvmloader/smp.c | 151 +++++++++---------
tools/firmware/hvmloader/util.h | 5 +
tools/libs/guest/xg_cpuid_x86.c | 62 ++------
tools/tests/cpu-policy/test-cpu-policy.c | 191 +++++++++++++++++++++++
xen/arch/x86/cpu-policy.c | 9 +-
xen/arch/x86/cpuid.c | 15 +-
xen/arch/x86/hvm/irq.c | 6 +-
xen/arch/x86/hvm/vlapic.c | 75 +++++++--
xen/arch/x86/include/asm/hvm/hvm.h | 8 +
xen/arch/x86/include/asm/hvm/vlapic.h | 8 +-
xen/arch/x86/include/asm/hvm/vpt.h | 1 -
xen/include/public/arch-x86/hvm/save.h | 2 +
xen/include/xen/lib/x86/cpu-policy.h | 27 ++++
xen/lib/x86/policy.c | 160 +++++++++++++++++++
16 files changed, 562 insertions(+), 168 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
2024-05-08 12:39 [PATCH v2 0/8] x86: Expose consistent topology to guests Alejandro Vallejo
@ 2024-05-08 12:39 ` Alejandro Vallejo
2024-05-23 14:32 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm Alejandro Vallejo
` (6 subsequent siblings)
7 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-08 12:39 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné
This allows the initial x2APIC ID to be sent on the migration stream. The
hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being.
Given the vlapic data is zero-extended on restore, fix up migrations from
hosts without the field by setting it to the old convention if zero.
x2APIC IDs are calculated from the CPU policy where the guest topology is
defined. For the time being, the function simply returns the old
relationship, but will eventually return results consistent with the
topology.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v2:
* Removed usage of SET_xAPIC_ID().
* Restored previous logic when exposing leaf 0xb, and gate it for HVM only.
* Rewrote comment in lapic_load_fixup, including the implicit assumption.
* Moved vlapic_cpu_policy_changed() into hvm_cpuid_policy_changed())
* const-ified policy in vlapic_cpu_policy_changed()
---
xen/arch/x86/cpuid.c | 15 ++++---------
xen/arch/x86/hvm/vlapic.c | 30 ++++++++++++++++++++++++--
xen/arch/x86/include/asm/hvm/hvm.h | 1 +
xen/arch/x86/include/asm/hvm/vlapic.h | 2 ++
xen/include/public/arch-x86/hvm/save.h | 2 ++
xen/include/xen/lib/x86/cpu-policy.h | 9 ++++++++
xen/lib/x86/policy.c | 11 ++++++++++
7 files changed, 57 insertions(+), 13 deletions(-)
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 7a38e032146a..242c21ec5bb6 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
const struct cpu_user_regs *regs;
case 0x1:
- /* TODO: Rework topology logic. */
res->b &= 0x00ffffffu;
if ( is_hvm_domain(d) )
- res->b |= (v->vcpu_id * 2) << 24;
+ res->b |= vlapic_x2apic_id(vcpu_vlapic(v)) << 24;
/* TODO: Rework vPMU control in terms of toolstack choices. */
if ( vpmu_available(v) &&
@@ -311,19 +310,13 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
break;
case 0xb:
- /*
- * In principle, this leaf is Intel-only. In practice, it is tightly
- * coupled with x2apic, and we offer an x2apic-capable APIC emulation
- * to guests on AMD hardware as well.
- *
- * TODO: Rework topology logic.
- */
- if ( p->basic.x2apic )
+ /* Don't expose topology information to PV guests */
+ if ( is_hvm_domain(d) && p->basic.x2apic )
{
*(uint8_t *)&res->c = subleaf;
/* Fix the x2APIC identifier. */
- res->d = v->vcpu_id * 2;
+ res->d = vlapic_x2apic_id(vcpu_vlapic(v));
}
break;
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 05072a21bf38..61a96474006b 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1069,7 +1069,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id)
static void set_x2apic_id(struct vlapic *vlapic)
{
const struct vcpu *v = vlapic_vcpu(vlapic);
- uint32_t apic_id = v->vcpu_id * 2;
+ uint32_t apic_id = vlapic->hw.x2apic_id;
uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
/*
@@ -1083,6 +1083,22 @@ static void set_x2apic_id(struct vlapic *vlapic)
vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
}
+void vlapic_cpu_policy_changed(struct vcpu *v)
+{
+ struct vlapic *vlapic = vcpu_vlapic(v);
+ const struct cpu_policy *cp = v->domain->arch.cpu_policy;
+
+ /*
+ * Don't override the initial x2APIC ID if we have migrated it or
+ * if the domain doesn't have vLAPIC at all.
+ */
+ if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
+ return;
+
+ vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
+ vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
+}
+
int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
{
const struct cpu_policy *cp = v->domain->arch.cpu_policy;
@@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
if ( v->vcpu_id == 0 )
vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
- vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
+ vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
vlapic_do_init(vlapic);
}
@@ -1514,6 +1530,16 @@ static void lapic_load_fixup(struct vlapic *vlapic)
const struct vcpu *v = vlapic_vcpu(vlapic);
uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
+ /*
+ * Loading record without hw.x2apic_id in the save stream, calculate using
+ * the traditional "vcpu_id * 2" relation. There's an implicit assumption
+ * that vCPU0 always has x2APIC0, which is true for the old relation, and
+ * still holds under the new x2APIC generation algorithm. While that case
+ * goes through the conditional it's benign because it still maps to zero.
+ */
+ if ( !vlapic->hw.x2apic_id )
+ vlapic->hw.x2apic_id = v->vcpu_id * 2;
+
/* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
if ( !vlapic_x2apic_mode(vlapic) ||
(vlapic->loaded.ldr == good_ldr) )
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index 0c9e6f15645d..e1f0585d75a9 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -448,6 +448,7 @@ static inline void hvm_update_guest_efer(struct vcpu *v)
static inline void hvm_cpuid_policy_changed(struct vcpu *v)
{
alternative_vcall(hvm_funcs.cpuid_policy_changed, v);
+ vlapic_cpu_policy_changed(v);
}
static inline void hvm_set_tsc_offset(struct vcpu *v, uint64_t offset,
diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
index 88ef94524339..e8d41313abd3 100644
--- a/xen/arch/x86/include/asm/hvm/vlapic.h
+++ b/xen/arch/x86/include/asm/hvm/vlapic.h
@@ -44,6 +44,7 @@
#define vlapic_xapic_mode(vlapic) \
(!vlapic_hw_disabled(vlapic) && \
!((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
+#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
/*
* Generic APIC bitmap vector update & search routines.
@@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack);
int vlapic_init(struct vcpu *v);
void vlapic_destroy(struct vcpu *v);
+void vlapic_cpu_policy_changed(struct vcpu *v);
void vlapic_reset(struct vlapic *vlapic);
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 7ecacadde165..1c2ec669ffc9 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -394,6 +394,8 @@ struct hvm_hw_lapic {
uint32_t disabled; /* VLAPIC_xx_DISABLED */
uint32_t timer_divisor;
uint64_t tdt_msr;
+ uint32_t x2apic_id;
+ uint32_t rsvd_zero;
};
DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index d5e447e9dc06..392320b9adbe 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -542,6 +542,15 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
const struct cpu_policy *guest,
struct cpu_policy_errors *err);
+/**
+ * Calculates the x2APIC ID of a vCPU given a CPU policy
+ *
+ * @param p CPU policy of the domain.
+ * @param id vCPU ID of the vCPU.
+ * @returns x2APIC ID of the vCPU.
+ */
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id);
+
#endif /* !XEN_LIB_X86_POLICIES_H */
/*
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index f033d22785be..4cef658feeb8 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,6 +2,17 @@
#include <xen/lib/x86/cpu-policy.h>
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
+{
+ /*
+ * TODO: Derive x2APIC ID from the topology information inside `p`
+ * rather than from vCPU ID. This bodge is a temporary measure
+ * until all infra is in place to retrieve or derive the initial
+ * x2APIC ID from migrated domains.
+ */
+ return vcpu_id * 2;
+}
+
int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
const struct cpu_policy *guest,
struct cpu_policy_errors *err)
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm
2024-05-08 12:39 [PATCH v2 0/8] x86: Expose consistent topology to guests Alejandro Vallejo
2024-05-08 12:39 ` [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
@ 2024-05-08 12:39 ` Alejandro Vallejo
2024-05-22 9:33 ` Jan Beulich
2024-05-23 14:37 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook Alejandro Vallejo
` (5 subsequent siblings)
7 siblings, 2 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-08 12:39 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné
Otherwise it's not possible to call functions described in hvm/vlapic.h from the
inline functions of hvm/hvm.h.
This is because a static inline in vlapic.h depends on hvm.h, and pulls it
transitively through vpt.h. The ultimate cause is having hvm.h included in any
of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.
No functional change.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v2:
* New patch. Prereq to moving vlapic_cpu_policy_changed() onto hvm.h
---
xen/arch/x86/hvm/irq.c | 6 +++---
xen/arch/x86/hvm/vlapic.c | 4 ++--
xen/arch/x86/include/asm/hvm/hvm.h | 6 ++++++
xen/arch/x86/include/asm/hvm/vlapic.h | 6 ------
xen/arch/x86/include/asm/hvm/vpt.h | 1 -
5 files changed, 11 insertions(+), 12 deletions(-)
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 4a9fe82cbd8d..4f5479b12c98 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -512,13 +512,13 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
int vector;
/*
- * Always call vlapic_sync_pir_to_irr so that PIR is synced into IRR when
- * using posted interrupts. Note this is also done by
+ * Always call hvm_vlapic_sync_pir_to_irr so that PIR is synced into IRR
+ * when using posted interrupts. Note this is also done by
* vlapic_has_pending_irq but depending on which interrupts are pending
* hvm_vcpu_has_pending_irq will return early without calling
* vlapic_has_pending_irq.
*/
- vlapic_sync_pir_to_irr(v);
+ hvm_vlapic_sync_pir_to_irr(v);
if ( unlikely(v->arch.nmi_pending) )
return hvm_intack_nmi;
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 61a96474006b..8a244100009c 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -98,7 +98,7 @@ static void vlapic_clear_irr(int vector, struct vlapic *vlapic)
static int vlapic_find_highest_irr(struct vlapic *vlapic)
{
- vlapic_sync_pir_to_irr(vlapic_vcpu(vlapic));
+ hvm_vlapic_sync_pir_to_irr(vlapic_vcpu(vlapic));
return vlapic_find_highest_vector(&vlapic->regs->data[APIC_IRR]);
}
@@ -1516,7 +1516,7 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
if ( !has_vlapic(v->domain) )
return 0;
- vlapic_sync_pir_to_irr(v);
+ hvm_vlapic_sync_pir_to_irr(v);
return hvm_save_entry(LAPIC_REGS, v->vcpu_id, h, vcpu_vlapic(v)->regs);
}
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index e1f0585d75a9..84911f3ebcb4 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu *v)
alternative_vcall(hvm_funcs.update_vlapic_mode, v);
}
+static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
+{
+ if ( hvm_funcs.sync_pir_to_irr )
+ alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
+}
+
#else /* CONFIG_HVM */
#define hvm_enabled false
diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
index e8d41313abd3..34f23cd38a20 100644
--- a/xen/arch/x86/include/asm/hvm/vlapic.h
+++ b/xen/arch/x86/include/asm/hvm/vlapic.h
@@ -139,10 +139,4 @@ bool vlapic_match_dest(
const struct vlapic *target, const struct vlapic *source,
int short_hand, uint32_t dest, bool dest_mode);
-static inline void vlapic_sync_pir_to_irr(struct vcpu *v)
-{
- if ( hvm_funcs.sync_pir_to_irr )
- alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
-}
-
#endif /* __ASM_X86_HVM_VLAPIC_H__ */
diff --git a/xen/arch/x86/include/asm/hvm/vpt.h b/xen/arch/x86/include/asm/hvm/vpt.h
index feb0bf43f14b..0b92b286252d 100644
--- a/xen/arch/x86/include/asm/hvm/vpt.h
+++ b/xen/arch/x86/include/asm/hvm/vpt.h
@@ -11,7 +11,6 @@
#include <xen/timer.h>
#include <xen/list.h>
#include <xen/rwlock.h>
-#include <asm/hvm/hvm.h>
/*
* Abstract layer of periodic time, one short time.
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook
2024-05-08 12:39 [PATCH v2 0/8] x86: Expose consistent topology to guests Alejandro Vallejo
2024-05-08 12:39 ` [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
2024-05-08 12:39 ` [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm Alejandro Vallejo
@ 2024-05-08 12:39 ` Alejandro Vallejo
2024-05-23 14:50 ` Roger Pau Monné
2024-05-23 18:58 ` Andrew Cooper
2024-05-08 12:39 ` [PATCH v2 4/8] tools/hvmloader: Wake APs with hypercalls and not with INIT+SIPI+SIPI Alejandro Vallejo
` (4 subsequent siblings)
7 siblings, 2 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-08 12:39 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné
While at it, add a check for the reserved field in the hidden save area.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v2:
* New patch. Addresses the missing check for rsvd_zero in v1.
---
xen/arch/x86/hvm/vlapic.c | 41 ++++++++++++++++++++++++++++-----------
1 file changed, 30 insertions(+), 11 deletions(-)
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 8a244100009c..2f06bff1b2cc 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
}
-static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
+static int cf_check lapic_check_hidden(const struct domain *d,
+ hvm_domain_context_t *h)
{
unsigned int vcpuid = hvm_load_instance(h);
- struct vcpu *v;
- struct vlapic *s;
+ struct hvm_hw_lapic s;
if ( !has_vlapic(d) )
return -ENODEV;
/* Which vlapic to load? */
- if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+ if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )
{
dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
d->domain_id, vcpuid);
return -EINVAL;
}
- s = vcpu_vlapic(v);
- if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
+ if ( hvm_load_entry_zeroextend(LAPIC, h, &s) )
+ return -ENODATA;
+
+ /* EN=0 with EXTD=1 is illegal */
+ if ( (s.apic_base_msr & (APIC_BASE_ENABLE | APIC_BASE_EXTD)) ==
+ APIC_BASE_EXTD )
+ return -EINVAL;
+
+ /*
+ * Fail migrations from newer versions of Xen where
+ * rsvd_zero is interpreted as something else.
+ */
+ if ( s.rsvd_zero )
return -EINVAL;
+ return 0;
+}
+
+static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
+{
+ unsigned int vcpuid = hvm_load_instance(h);
+ struct vcpu *v = d->vcpu[vcpuid];
+ struct vlapic *s = vcpu_vlapic(v);
+
+ if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
+ BUG();
+
s->loaded.hw = 1;
if ( s->loaded.regs )
lapic_load_fixup(s);
- if ( !(s->hw.apic_base_msr & APIC_BASE_ENABLE) &&
- unlikely(vlapic_x2apic_mode(s)) )
- return -EINVAL;
-
hvm_update_vlapic_mode(v);
return 0;
@@ -1643,7 +1662,7 @@ static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
return 0;
}
-HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, NULL,
+HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, lapic_check_hidden,
lapic_load_hidden, 1, HVMSR_PER_VCPU);
HVM_REGISTER_SAVE_RESTORE(LAPIC_REGS, lapic_save_regs, NULL,
lapic_load_regs, 1, HVMSR_PER_VCPU);
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 4/8] tools/hvmloader: Wake APs with hypercalls and not with INIT+SIPI+SIPI
2024-05-08 12:39 [PATCH v2 0/8] x86: Expose consistent topology to guests Alejandro Vallejo
` (2 preceding siblings ...)
2024-05-08 12:39 ` [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook Alejandro Vallejo
@ 2024-05-08 12:39 ` Alejandro Vallejo
2024-05-08 19:13 ` Andrew Cooper
2024-05-09 17:50 ` [PATCH 4.5/8] tools/hvmloader: Further simplify SMP setup Andrew Cooper
2024-05-08 12:39 ` [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
` (3 subsequent siblings)
7 siblings, 2 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-08 12:39 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Removes a needless assembly entry point and simplifies the codebase by allowing
hvmloader to wake APs it doesn't know the APIC ID of.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v2:
* New patch. Replaces adding cpu policy to hvmloader in v1.
---
tools/firmware/hvmloader/smp.c | 111 +++++++++++++--------------------
1 file changed, 44 insertions(+), 67 deletions(-)
diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
index 082b17f13818..a668f15d7e1f 100644
--- a/tools/firmware/hvmloader/smp.c
+++ b/tools/firmware/hvmloader/smp.c
@@ -22,88 +22,68 @@
#include "util.h"
#include "config.h"
#include "apic_regs.h"
+#include "hypercall.h"
-#define AP_BOOT_EIP 0x1000
-extern char ap_boot_start[], ap_boot_end[];
+#include <xen/asm/x86-defns.h>
+#include <xen/hvm/hvm_vcpu.h>
+
+#include <xen/vcpu.h>
static int ap_callin, ap_cpuid;
-asm (
- " .text \n"
- " .code16 \n"
- "ap_boot_start: .code16 \n"
- " mov %cs,%ax \n"
- " mov %ax,%ds \n"
- " lgdt gdt_desr-ap_boot_start\n"
- " xor %ax, %ax \n"
- " inc %ax \n"
- " lmsw %ax \n"
- " ljmpl $0x08,$1f \n"
- "gdt_desr: \n"
- " .word gdt_end - gdt - 1 \n"
- " .long gdt \n"
- "ap_boot_end: .code32 \n"
- "1: mov $0x10,%eax \n"
- " mov %eax,%ds \n"
- " mov %eax,%es \n"
- " mov %eax,%ss \n"
- " movl $stack_top,%esp \n"
- " movl %esp,%ebp \n"
- " call ap_start \n"
- "1: hlt \n"
- " jmp 1b \n"
- " \n"
- " .align 8 \n"
- "gdt: \n"
- " .quad 0x0000000000000000 \n"
- " .quad 0x00cf9a000000ffff \n" /* 0x08: Flat code segment */
- " .quad 0x00cf92000000ffff \n" /* 0x10: Flat data segment */
- "gdt_end: \n"
- " \n"
- " .bss \n"
- " .align 8 \n"
- "stack: \n"
- " .skip 0x4000 \n"
- "stack_top: \n"
- " .text \n"
- );
-
-void ap_start(void); /* non-static avoids unused-function compiler warning */
-/*static*/ void ap_start(void)
+static void ap_start(void)
{
printf(" - CPU%d ... ", ap_cpuid);
cacheattr_init();
printf("done.\n");
+
+ if ( !ap_cpuid )
+ return;
+
wmb();
ap_callin = 1;
-}
-static void lapic_wait_ready(void)
-{
- while ( lapic_read(APIC_ICR) & APIC_ICR_BUSY )
- cpu_relax();
+ while ( 1 )
+ asm volatile ( "hlt" );
}
static void boot_cpu(unsigned int cpu)
{
- unsigned int icr2 = SET_APIC_DEST_FIELD(LAPIC_ID(cpu));
+ static uint8_t ap_stack[4 * PAGE_SIZE] __attribute__ ((aligned (16)));
+ static struct vcpu_hvm_context ap;
/* Initialise shared variables. */
ap_cpuid = cpu;
- ap_callin = 0;
wmb();
- /* Wake up the secondary processor: INIT-SIPI-SIPI... */
- lapic_wait_ready();
- lapic_write(APIC_ICR2, icr2);
- lapic_write(APIC_ICR, APIC_DM_INIT);
- lapic_wait_ready();
- lapic_write(APIC_ICR2, icr2);
- lapic_write(APIC_ICR, APIC_DM_STARTUP | (AP_BOOT_EIP >> 12));
- lapic_wait_ready();
- lapic_write(APIC_ICR2, icr2);
- lapic_write(APIC_ICR, APIC_DM_STARTUP | (AP_BOOT_EIP >> 12));
- lapic_wait_ready();
+ /* Wake up the secondary processor */
+ ap = (struct vcpu_hvm_context) {
+ .mode = VCPU_HVM_MODE_32B,
+ .cpu_regs.x86_32 = {
+ .eip = (uint32_t)ap_start,
+ .esp = (uint32_t)ap_stack + ARRAY_SIZE(ap_stack),
+
+ /* Protected mode with MMU off */
+ .cr0 = X86_CR0_PE,
+
+ /* Prepopulate the GDT */
+ .cs_limit = -1U,
+ .ds_limit = -1U,
+ .ss_limit = -1U,
+ .es_limit = -1U,
+ .tr_limit = 0x67,
+ .cs_ar = 0xc9b,
+ .ds_ar = 0xc93,
+ .es_ar = 0xc93,
+ .ss_ar = 0xc93,
+ .tr_ar = 0x8b,
+ },
+ };
+
+ if ( hypercall_vcpu_op(VCPUOP_initialise, cpu, &ap) )
+ BUG();
+ if ( hypercall_vcpu_op(VCPUOP_up, cpu, NULL) )
+ BUG();
/*
* Wait for the secondary processor to complete initialisation.
@@ -113,17 +93,14 @@ static void boot_cpu(unsigned int cpu)
cpu_relax();
/* Take the secondary processor offline. */
- lapic_write(APIC_ICR2, icr2);
- lapic_write(APIC_ICR, APIC_DM_INIT);
- lapic_wait_ready();
+ if ( hypercall_vcpu_op(VCPUOP_down, cpu, NULL) )
+ BUG();
}
void smp_initialise(void)
{
unsigned int i, nr_cpus = hvm_info->nr_vcpus;
- memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start);
-
printf("Multiprocessor initialisation:\n");
ap_start();
for ( i = 1; i < nr_cpus; i++ )
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
2024-05-08 12:39 [PATCH v2 0/8] x86: Expose consistent topology to guests Alejandro Vallejo
` (3 preceding siblings ...)
2024-05-08 12:39 ` [PATCH v2 4/8] tools/hvmloader: Wake APs with hypercalls and not with INIT+SIPI+SIPI Alejandro Vallejo
@ 2024-05-08 12:39 ` Alejandro Vallejo
2024-05-23 16:13 ` Roger Pau Monné
2024-05-24 7:21 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 6/8] xen/lib: Add topology generator for x86 Alejandro Vallejo
` (2 subsequent siblings)
7 siblings, 2 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-08 12:39 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
from hvmloader.
While at this also remove ap_callin, as writing the APIC ID may serve the same
purpose.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v2:
* New patch. Replaces adding cpu policy to hvmloader in v1.
---
tools/firmware/hvmloader/config.h | 6 ++++-
tools/firmware/hvmloader/hvmloader.c | 4 +--
tools/firmware/hvmloader/smp.c | 40 +++++++++++++++++++++++-----
tools/firmware/hvmloader/util.h | 5 ++++
xen/arch/x86/include/asm/hvm/hvm.h | 1 +
5 files changed, 47 insertions(+), 9 deletions(-)
diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index c82adf6dc508..edf6fa9c908c 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -4,6 +4,8 @@
#include <stdint.h>
#include <stdbool.h>
+#include <xen/hvm/hvm_info_table.h>
+
enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
extern enum virtual_vga virtual_vga;
@@ -49,8 +51,10 @@ extern uint8_t ioapic_version;
#define IOAPIC_ID 0x01
+extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
+
#define LAPIC_BASE_ADDRESS 0xfee00000
-#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2)
+#define LAPIC_ID(vcpu_id) (CPU_TO_X2APICID[(vcpu_id)])
#define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
#define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index c58841e5b556..1eba92229925 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -342,11 +342,11 @@ int main(void)
printf("CPU speed is %u MHz\n", get_cpu_mhz());
+ smp_initialise();
+
apic_setup();
pci_setup();
- smp_initialise();
-
perform_tests();
if ( bios->bios_info_setup )
diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
index a668f15d7e1f..4d75f239c2f5 100644
--- a/tools/firmware/hvmloader/smp.c
+++ b/tools/firmware/hvmloader/smp.c
@@ -29,7 +29,34 @@
#include <xen/vcpu.h>
-static int ap_callin, ap_cpuid;
+static int ap_cpuid;
+
+/**
+ * Lookup table of x2APIC IDs.
+ *
+ * Each entry is populated its respective CPU as they come online. This is required
+ * for generating the MADT with minimal assumptions about ID relationships.
+ */
+uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
+
+static uint32_t read_apic_id(void)
+{
+ uint32_t apic_id;
+
+ cpuid(1, NULL, &apic_id, NULL, NULL);
+ apic_id >>= 24;
+
+ /*
+ * APIC IDs over 255 are represented by 255 in leaf 1 and are meant to be
+ * read from topology leaves instead. Xen exposes x2APIC IDs in leaf 0xb,
+ * but only if the x2APIC feature is present. If there are that many CPUs
+ * it's guaranteed to be there so we can avoid checking for it specifically
+ */
+ if ( apic_id == 255 )
+ cpuid(0xb, NULL, NULL, NULL, &apic_id);
+
+ return apic_id;
+}
static void ap_start(void)
{
@@ -37,12 +64,12 @@ static void ap_start(void)
cacheattr_init();
printf("done.\n");
+ wmb();
+ ACCESS_ONCE(CPU_TO_X2APICID[ap_cpuid]) = read_apic_id();
+
if ( !ap_cpuid )
return;
- wmb();
- ap_callin = 1;
-
while ( 1 )
asm volatile ( "hlt" );
}
@@ -86,10 +113,11 @@ static void boot_cpu(unsigned int cpu)
BUG();
/*
- * Wait for the secondary processor to complete initialisation.
+ * Wait for the secondary processor to complete initialisation,
+ * which is signaled by its x2APIC ID being writted to the LUT.
* Do not touch shared resources meanwhile.
*/
- while ( !ap_callin )
+ while ( !ACCESS_ONCE(CPU_TO_X2APICID[cpu]) )
cpu_relax();
/* Take the secondary processor offline. */
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 14078bde1e30..51e9003bc615 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -23,6 +23,11 @@ enum {
#define __STR(...) #__VA_ARGS__
#define STR(...) __STR(__VA_ARGS__)
+#define __ACCESS_ONCE(x) ({ \
+ (void)(typeof(x))0; /* Scalar typecheck. */ \
+ (volatile typeof(x) *)&(x); })
+#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
+
/* GDT selector values. */
#define SEL_CODE16 0x0008
#define SEL_DATA16 0x0010
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index 84911f3ebcb4..6c005f0b0b38 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -16,6 +16,7 @@
#include <asm/current.h>
#include <asm/x86_emulate.h>
#include <asm/hvm/asid.h>
+#include <asm/hvm/vlapic.h>
struct pirq; /* needed by pi_update_irte */
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 6/8] xen/lib: Add topology generator for x86
2024-05-08 12:39 [PATCH v2 0/8] x86: Expose consistent topology to guests Alejandro Vallejo
` (4 preceding siblings ...)
2024-05-08 12:39 ` [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
@ 2024-05-08 12:39 ` Alejandro Vallejo
2024-05-23 16:50 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 7/8] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
2024-05-08 12:39 ` [PATCH v2 8/8] xen/x86: Synthesise domain topologies Alejandro Vallejo
7 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-08 12:39 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Add a helper to populate topology leaves in the cpu policy from
threads/core and cores/package counts.
No functional change, as it's not connected to anything yet.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v2:
* New patch. Extracted from v1/patch6
---
tools/tests/cpu-policy/test-cpu-policy.c | 128 +++++++++++++++++++++++
xen/include/xen/lib/x86/cpu-policy.h | 16 +++
xen/lib/x86/policy.c | 86 +++++++++++++++
3 files changed, 230 insertions(+)
diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index 301df2c00285..0ba8c418b1b3 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -650,6 +650,132 @@ static void test_is_compatible_failure(void)
}
}
+static void test_topo_from_parts(void)
+{
+ static const struct test {
+ unsigned int threads_per_core;
+ unsigned int cores_per_pkg;
+ struct cpu_policy policy;
+ } tests[] = {
+ {
+ .threads_per_core = 3, .cores_per_pkg = 1,
+ .policy = {
+ .x86_vendor = X86_VENDOR_AMD,
+ .topo.subleaf = {
+ [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+ [1] = { .nr_logical = 1, .level = 1, .type = 2, .id_shift = 2, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 1, .cores_per_pkg = 3,
+ .policy = {
+ .x86_vendor = X86_VENDOR_AMD,
+ .topo.subleaf = {
+ [0] = { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
+ [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 7, .cores_per_pkg = 5,
+ .policy = {
+ .x86_vendor = X86_VENDOR_AMD,
+ .topo.subleaf = {
+ [0] = { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
+ [1] = { .nr_logical = 5, .level = 1, .type = 2, .id_shift = 6, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 2, .cores_per_pkg = 128,
+ .policy = {
+ .x86_vendor = X86_VENDOR_AMD,
+ .topo.subleaf = {
+ [0] = { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
+ [1] = { .nr_logical = 128, .level = 1, .type = 2, .id_shift = 8, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 3, .cores_per_pkg = 1,
+ .policy = {
+ .x86_vendor = X86_VENDOR_INTEL,
+ .topo.subleaf = {
+ [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+ [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 1, .cores_per_pkg = 3,
+ .policy = {
+ .x86_vendor = X86_VENDOR_INTEL,
+ .topo.subleaf = {
+ [0] = { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
+ [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 7, .cores_per_pkg = 5,
+ .policy = {
+ .x86_vendor = X86_VENDOR_INTEL,
+ .topo.subleaf = {
+ [0] = { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
+ [1] = { .nr_logical = 35, .level = 1, .type = 2, .id_shift = 6, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 2, .cores_per_pkg = 128,
+ .policy = {
+ .x86_vendor = X86_VENDOR_INTEL,
+ .topo.subleaf = {
+ [0] = { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
+ [1] = { .nr_logical = 256, .level = 1, .type = 2, .id_shift = 8, },
+ },
+ },
+ },
+ };
+
+ printf("Testing topology synthesis from parts:\n");
+
+ for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
+ {
+ const struct test *t = &tests[i];
+ struct cpu_policy actual = { .x86_vendor = t->policy.x86_vendor };
+ int rc = x86_topo_from_parts(&actual, t->threads_per_core, t->cores_per_pkg);
+
+ if ( rc || memcmp(&actual.topo, &t->policy.topo, sizeof(actual.topo)) )
+ {
+#define TOPO(n) topo.subleaf[(n)]
+ fail("FAIL[%d] - '%s %u t/c, %u c/p'\n",
+ rc,
+ x86_cpuid_vendor_to_str(t->policy.x86_vendor),
+ t->threads_per_core, t->cores_per_pkg);
+ printf(" subleaf=%u expected_n=%u actual_n=%u\n"
+ " expected_lvl=%u actual_lvl=%u\n"
+ " expected_type=%u actual_type=%u\n"
+ " expected_shift=%u actual_shift=%u\n",
+ 0, t->policy.TOPO(0).nr_logical, actual.TOPO(0).nr_logical,
+ t->policy.TOPO(0).level, actual.TOPO(0).level,
+ t->policy.TOPO(0).type, actual.TOPO(0).type,
+ t->policy.TOPO(0).id_shift, actual.TOPO(0).id_shift);
+
+ printf(" subleaf=%u expected_n=%u actual_n=%u\n"
+ " expected_lvl=%u actual_lvl=%u\n"
+ " expected_type=%u actual_type=%u\n"
+ " expected_shift=%u actual_shift=%u\n",
+ 1, t->policy.TOPO(1).nr_logical, actual.TOPO(1).nr_logical,
+ t->policy.TOPO(1).level, actual.TOPO(1).level,
+ t->policy.TOPO(1).type, actual.TOPO(1).type,
+ t->policy.TOPO(1).id_shift, actual.TOPO(1).id_shift);
+#undef TOPO
+ }
+ }
+}
+
int main(int argc, char **argv)
{
printf("CPU Policy unit tests\n");
@@ -667,6 +793,8 @@ int main(int argc, char **argv)
test_is_compatible_success();
test_is_compatible_failure();
+ test_topo_from_parts();
+
if ( nr_failures )
printf("Done: %u failures\n", nr_failures);
else
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index 392320b9adbe..f5df18e9f77c 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -551,6 +551,22 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
*/
uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id);
+/**
+ * Synthesise topology information in `p` given high-level constraints
+ *
+ * Topology is given in various fields accross several leaves, some of
+ * which are vendor-specific. This function uses the policy itself to
+ * derive such leaves from threads/core and cores/package.
+ *
+ * @param p CPU policy of the domain.
+ * @param threads_per_core threads/core. Doesn't need to be a power of 2.
+ * @param cores_per_package cores/package. Doesn't need to be a power of 2.
+ * @return 0 on success; -errno on failure
+ */
+int x86_topo_from_parts(struct cpu_policy *p,
+ unsigned int threads_per_core,
+ unsigned int cores_per_pkg);
+
#endif /* !XEN_LIB_X86_POLICIES_H */
/*
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index 4cef658feeb8..d033ee5398dd 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -13,6 +13,92 @@ uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
return vcpu_id * 2;
}
+static unsigned int order(unsigned int n)
+{
+ return 8 * sizeof(n) - __builtin_clz(n);
+}
+
+int x86_topo_from_parts(struct cpu_policy *p,
+ unsigned int threads_per_core,
+ unsigned int cores_per_pkg)
+{
+ unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;
+ unsigned int apic_id_size;
+
+ if ( !p || !threads_per_core || !cores_per_pkg )
+ return -EINVAL;
+
+ p->basic.max_leaf = MAX(0xb, p->basic.max_leaf);
+
+ memset(p->topo.raw, 0, sizeof(p->topo.raw));
+
+ /* thread level */
+ p->topo.subleaf[0].nr_logical = threads_per_core;
+ p->topo.subleaf[0].id_shift = 0;
+ p->topo.subleaf[0].level = 0;
+ p->topo.subleaf[0].type = 1;
+ if ( threads_per_core > 1 )
+ p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
+
+ /* core level */
+ p->topo.subleaf[1].nr_logical = cores_per_pkg;
+ if ( p->x86_vendor == X86_VENDOR_INTEL )
+ p->topo.subleaf[1].nr_logical = threads_per_pkg;
+ p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
+ p->topo.subleaf[1].level = 1;
+ p->topo.subleaf[1].type = 2;
+ if ( cores_per_pkg > 1 )
+ p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
+
+ apic_id_size = p->topo.subleaf[1].id_shift;
+
+ /*
+ * Contrary to what the name might seem to imply. HTT is an enabler for
+ * SMP and there's no harm in setting it even with a single vCPU.
+ */
+ p->basic.htt = true;
+ p->basic.lppp = MIN(0xff, p->basic.lppp);
+
+ switch ( p->x86_vendor )
+ {
+ case X86_VENDOR_INTEL: {
+ struct cpuid_cache_leaf *sl = p->cache.subleaf;
+ for ( size_t i = 0; sl->type &&
+ i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
+ {
+ sl->cores_per_package = cores_per_pkg - 1;
+ sl->threads_per_cache = threads_per_core - 1;
+ if ( sl->type == 3 /* unified cache */ )
+ sl->threads_per_cache = threads_per_pkg - 1;
+ }
+ break;
+ }
+ case X86_VENDOR_AMD:
+ case X86_VENDOR_HYGON:
+ /* Expose p->basic.lppp */
+ p->extd.cmp_legacy = true;
+
+ /* Clip NC to the maximum value it can hold */
+ p->extd.nc = 0xff;
+ if ( threads_per_pkg <= 0xff )
+ p->extd.nc = threads_per_pkg - 1;
+
+ /* TODO: Expose leaf e1E */
+ p->extd.topoext = false;
+
+ /*
+ * Clip APIC ID to 8 bits, as that's what high core-count machines do
+ *
+ * That what AMD EPYC 9654 does with >256 CPUs
+ */
+ p->extd.apic_id_size = MIN(8, apic_id_size);
+
+ break;
+ }
+
+ return 0;
+}
+
int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
const struct cpu_policy *guest,
struct cpu_policy_errors *err)
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 7/8] xen/x86: Derive topologically correct x2APIC IDs from the policy
2024-05-08 12:39 [PATCH v2 0/8] x86: Expose consistent topology to guests Alejandro Vallejo
` (5 preceding siblings ...)
2024-05-08 12:39 ` [PATCH v2 6/8] xen/lib: Add topology generator for x86 Alejandro Vallejo
@ 2024-05-08 12:39 ` Alejandro Vallejo
2024-05-24 8:39 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 8/8] xen/x86: Synthesise domain topologies Alejandro Vallejo
7 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-08 12:39 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Implements the helper for mapping vcpu_id to x2apic_id given a valid
topology in a policy. The algo is written with the intention of extending
it to leaves 0x1f and e26 in the future.
Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared,
so the leaf is not implemented. In that case, the new helper just returns
the legacy mapping.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v2:
* const-ify the test definitions
* Cosmetic changes (newline + parameter name in prototype)
---
tools/tests/cpu-policy/test-cpu-policy.c | 63 ++++++++++++++++++++
xen/include/xen/lib/x86/cpu-policy.h | 2 +
xen/lib/x86/policy.c | 73 ++++++++++++++++++++++--
3 files changed, 133 insertions(+), 5 deletions(-)
diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index 0ba8c418b1b3..82a6aeb23317 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -776,6 +776,68 @@ static void test_topo_from_parts(void)
}
}
+static void test_x2apic_id_from_vcpu_id_success(void)
+{
+ static const struct test {
+ unsigned int vcpu_id;
+ unsigned int threads_per_core;
+ unsigned int cores_per_pkg;
+ uint32_t x2apic_id;
+ uint8_t x86_vendor;
+ } tests[] = {
+ {
+ .vcpu_id = 3, .threads_per_core = 3, .cores_per_pkg = 8,
+ .x2apic_id = 1 << 2,
+ },
+ {
+ .vcpu_id = 6, .threads_per_core = 3, .cores_per_pkg = 8,
+ .x2apic_id = 2 << 2,
+ },
+ {
+ .vcpu_id = 24, .threads_per_core = 3, .cores_per_pkg = 8,
+ .x2apic_id = 1 << 5,
+ },
+ {
+ .vcpu_id = 35, .threads_per_core = 3, .cores_per_pkg = 8,
+ .x2apic_id = (35 % 3) | (((35 / 3) % 8) << 2) | ((35 / 24) << 5),
+ },
+ {
+ .vcpu_id = 96, .threads_per_core = 7, .cores_per_pkg = 3,
+ .x2apic_id = (96 % 7) | (((96 / 7) % 3) << 3) | ((96 / 21) << 5),
+ },
+ };
+
+ const uint8_t vendors[] = {
+ X86_VENDOR_INTEL,
+ X86_VENDOR_AMD,
+ X86_VENDOR_CENTAUR,
+ X86_VENDOR_SHANGHAI,
+ X86_VENDOR_HYGON,
+ };
+
+ printf("Testing x2apic id from vcpu id success:\n");
+
+ /* Perform the test run on every vendor we know about */
+ for ( size_t i = 0; i < ARRAY_SIZE(vendors); ++i )
+ {
+ struct cpu_policy policy = { .x86_vendor = vendors[i] };
+ for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
+ {
+ const struct test *t = &tests[i];
+ uint32_t x2apic_id;
+ int rc = x86_topo_from_parts(&policy, t->threads_per_core, t->cores_per_pkg);
+
+ x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id);
+ if ( rc || x2apic_id != t->x2apic_id )
+ fail("FAIL[%d] - '%s cpu%u %u t/c %u c/p'. bad x2apic_id: expected=%u actual=%u\n",
+ rc,
+ x86_cpuid_vendor_to_str(policy.x86_vendor),
+ t->vcpu_id, t->threads_per_core, t->cores_per_pkg,
+ t->x2apic_id, x2apic_id);
+ }
+ }
+}
+
int main(int argc, char **argv)
{
printf("CPU Policy unit tests\n");
@@ -794,6 +856,7 @@ int main(int argc, char **argv)
test_is_compatible_failure();
test_topo_from_parts();
+ test_x2apic_id_from_vcpu_id_success();
if ( nr_failures )
printf("Done: %u failures\n", nr_failures);
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index f5df18e9f77c..2cbc2726a861 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -545,6 +545,8 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
/**
* Calculates the x2APIC ID of a vCPU given a CPU policy
*
+ * If the policy lacks leaf 0xb falls back to legacy mapping of apic_id=cpu*2
+ *
* @param p CPU policy of the domain.
* @param id vCPU ID of the vCPU.
* @returns x2APIC ID of the vCPU.
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index d033ee5398dd..e498e32f8fd7 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,15 +2,78 @@
#include <xen/lib/x86/cpu-policy.h>
+static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
+{
+ /*
+ * `nr_logical` reported by Intel is the number of THREADS contained in
+ * the next topological scope. For example, assuming a system with 2
+ * threads/core and 3 cores/module in a fully symmetric topology,
+ * `nr_logical` at the core level will report 6. Because it's reporting
+ * the number of threads in a module.
+ *
+ * On AMD/Hygon, nr_logical is already normalized by the higher scoped
+ * level (cores/complex, etc) so we can return it as-is.
+ */
+ if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
+ return p->topo.subleaf[lvl].nr_logical;
+
+ return p->topo.subleaf[lvl].nr_logical / p->topo.subleaf[lvl - 1].nr_logical;
+}
+
uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
{
+ uint32_t shift = 0, x2apic_id = 0;
+
+ /* In the absence of topology leaves, fallback to traditional mapping */
+ if ( !p->topo.subleaf[0].type )
+ return id * 2;
+
/*
- * TODO: Derive x2APIC ID from the topology information inside `p`
- * rather than from vCPU ID. This bodge is a temporary measure
- * until all infra is in place to retrieve or derive the initial
- * x2APIC ID from migrated domains.
+ * `id` means different things at different points of the algo
+ *
+ * At lvl=0: global thread_id (same as vcpu_id)
+ * At lvl=1: global core_id
+ * At lvl=2: global socket_id (actually complex_id in AMD, module_id
+ * in Intel, but the name is inconsequential)
+ *
+ * +--+
+ * ____ |#0| ______ <= 1 socket
+ * / +--+ \+--+
+ * __#0__ __|#1|__ <= 2 cores/socket
+ * / | \ +--+/ +-|+ \
+ * #0 #1 #2 |#3| #4 #5 <= 3 threads/core
+ * +--+
+ *
+ * ... and so on. Global in this context means that it's a unique
+ * identifier for the whole topology, and not relative to the level
+ * it's in. For example, in the diagram shown above, we're looking at
+ * thread #3 in the global sense, though it's #0 within its core.
+ *
+ * Note that dividing a global thread_id by the number of threads per
+ * core returns the global core id that contains it. e.g: 0, 1 or 2
+ * divided by 3 returns core_id=0. 3, 4 or 5 divided by 3 returns core
+ * 1, and so on. An analogous argument holds for higher levels. This is
+ * the property we exploit to derive x2apic_id from vcpu_id.
+ *
+ * NOTE: `topo` is currently derived from leaf 0xb, which is bound to
+ * two levels, but once we track leaves 0x1f (or e26) there will be a
+ * few more. The algorithm is written to cope with that case.
*/
- return vcpu_id * 2;
+ for ( uint32_t i = 0; i < ARRAY_SIZE(p->topo.raw); i++ )
+ {
+ uint32_t nr_parts;
+
+ if ( !p->topo.subleaf[i].type )
+ /* sentinel subleaf */
+ break;
+
+ nr_parts = parts_per_higher_scoped_level(p, i);
+ x2apic_id |= (id % nr_parts) << shift;
+ id /= nr_parts;
+ shift = p->topo.subleaf[i].id_shift;
+ }
+
+ return (id << shift) | x2apic_id;
}
static unsigned int order(unsigned int n)
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 8/8] xen/x86: Synthesise domain topologies
2024-05-08 12:39 [PATCH v2 0/8] x86: Expose consistent topology to guests Alejandro Vallejo
` (6 preceding siblings ...)
2024-05-08 12:39 ` [PATCH v2 7/8] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
@ 2024-05-08 12:39 ` Alejandro Vallejo
2024-05-24 8:58 ` Roger Pau Monné
7 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-08 12:39 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Anthony PERARD, Juergen Gross, Jan Beulich,
Andrew Cooper, Roger Pau Monné
Expose sensible topologies in leaf 0xb. At the moment it synthesises non-HT
systems, in line with the previous code intent.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v2:
* Zap the topology leaves of (pv/hvm)_(def/max)_policy rather than the host policy
---
tools/libs/guest/xg_cpuid_x86.c | 62 +++++----------------------------
xen/arch/x86/cpu-policy.c | 9 +++--
2 files changed, 15 insertions(+), 56 deletions(-)
diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 4453178100ad..8170769dbe43 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -584,7 +584,7 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
bool hvm;
xc_domaininfo_t di;
struct xc_cpu_policy *p = xc_cpu_policy_init();
- unsigned int i, nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
+ unsigned int nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
uint32_t len = ARRAY_SIZE(host_featureset);
@@ -727,59 +727,15 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
}
else
{
- /*
- * Topology for HVM guests is entirely controlled by Xen. For now, we
- * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
- */
- p->policy.basic.htt = true;
- p->policy.extd.cmp_legacy = false;
-
- /*
- * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
- * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
- * overflow.
- */
- if ( !p->policy.basic.lppp )
- p->policy.basic.lppp = 2;
- else if ( !(p->policy.basic.lppp & 0x80) )
- p->policy.basic.lppp *= 2;
-
- switch ( p->policy.x86_vendor )
+ /* TODO: Expose the ability to choose a custom topology for HVM/PVH */
+ unsigned int threads_per_core = 1;
+ unsigned int cores_per_pkg = di.max_vcpu_id + 1;
+ rc = x86_topo_from_parts(&p->policy, threads_per_core, cores_per_pkg);
+ if ( rc )
{
- case X86_VENDOR_INTEL:
- for ( i = 0; (p->policy.cache.subleaf[i].type &&
- i < ARRAY_SIZE(p->policy.cache.raw)); ++i )
- {
- p->policy.cache.subleaf[i].cores_per_package =
- (p->policy.cache.subleaf[i].cores_per_package << 1) | 1;
- p->policy.cache.subleaf[i].threads_per_cache = 0;
- }
- break;
-
- case X86_VENDOR_AMD:
- case X86_VENDOR_HYGON:
- /*
- * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize.
- * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one).
- * Update to reflect vLAPIC_ID = vCPU_ID * 2. But avoid
- * - overflow,
- * - going out of sync with leaf 1 EBX[23:16],
- * - incrementing ApicIdCoreSize when it's zero (which changes the
- * meaning of bits 7:0).
- *
- * UPDATE: I addition to avoiding overflow, some
- * proprietary operating systems have trouble with
- * apic_id_size values greater than 7. Limit the value to
- * 7 for now.
- */
- if ( p->policy.extd.nc < 0x7f )
- {
- if ( p->policy.extd.apic_id_size != 0 && p->policy.extd.apic_id_size < 0x7 )
- p->policy.extd.apic_id_size++;
-
- p->policy.extd.nc = (p->policy.extd.nc << 1) | 1;
- }
- break;
+ ERROR("Failed to generate topology: t/c=%u c/p=%u",
+ threads_per_core, cores_per_pkg);
+ goto out;
}
}
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 4b6d96276399..0ad871732ba0 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -278,9 +278,6 @@ static void recalculate_misc(struct cpu_policy *p)
p->basic.raw[0x8] = EMPTY_LEAF;
- /* TODO: Rework topology logic. */
- memset(p->topo.raw, 0, sizeof(p->topo.raw));
-
p->basic.raw[0xc] = EMPTY_LEAF;
p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES;
@@ -621,6 +618,9 @@ static void __init calculate_pv_max_policy(void)
recalculate_xstate(p);
p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */
+
+ /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
+ memset(p->topo.raw, 0, sizeof(p->topo.raw));
}
static void __init calculate_pv_def_policy(void)
@@ -773,6 +773,9 @@ static void __init calculate_hvm_max_policy(void)
/* It's always possible to emulate CPUID faulting for HVM guests */
p->platform_info.cpuid_faulting = true;
+
+ /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
+ memset(p->topo.raw, 0, sizeof(p->topo.raw));
}
static void __init calculate_hvm_def_policy(void)
--
2.34.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v2 4/8] tools/hvmloader: Wake APs with hypercalls and not with INIT+SIPI+SIPI
2024-05-08 12:39 ` [PATCH v2 4/8] tools/hvmloader: Wake APs with hypercalls and not with INIT+SIPI+SIPI Alejandro Vallejo
@ 2024-05-08 19:13 ` Andrew Cooper
2024-05-09 11:04 ` Alejandro Vallejo
2024-05-09 17:50 ` [PATCH 4.5/8] tools/hvmloader: Further simplify SMP setup Andrew Cooper
1 sibling, 1 reply; 43+ messages in thread
From: Andrew Cooper @ 2024-05-08 19:13 UTC (permalink / raw)
To: Alejandro Vallejo, Xen-devel
Cc: Jan Beulich, Roger Pau Monné, Anthony PERARD
On 08/05/2024 1:39 pm, Alejandro Vallejo wrote:
> Removes a needless assembly entry point and simplifies the codebase by allowing
> hvmloader to wake APs it doesn't know the APIC ID of.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
This is basically independent of the rest of the series, and it would be
good to pull it in separately. A few notes.
The commit message ought to note that this has a side effect of removing
an LMSW instruction which has fastpath at all on AMD CPUs, and requires
full emulation, and it gets rid of 13 vLAPIC emulations when 3
hypercalls would do.
The point being that this is borderline backport material, although it
does depend on the 32 vCPU bugfix.
> ---
> v2:
> * New patch. Replaces adding cpu policy to hvmloader in v1.
> ---
> tools/firmware/hvmloader/smp.c | 111 +++++++++++++--------------------
> 1 file changed, 44 insertions(+), 67 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
> index 082b17f13818..a668f15d7e1f 100644
> --- a/tools/firmware/hvmloader/smp.c
> +++ b/tools/firmware/hvmloader/smp.c
> @@ -22,88 +22,68 @@
> #include "util.h"
> #include "config.h"
> #include "apic_regs.h"
> +#include "hypercall.h"
>
> -#define AP_BOOT_EIP 0x1000
> -extern char ap_boot_start[], ap_boot_end[];
> +#include <xen/asm/x86-defns.h>
> +#include <xen/hvm/hvm_vcpu.h>
> +
> +#include <xen/vcpu.h>
>
> static int ap_callin, ap_cpuid;
>
> -asm (
> - " .text \n"
> - " .code16 \n"
> - "ap_boot_start: .code16 \n"
> - " mov %cs,%ax \n"
> - " mov %ax,%ds \n"
> - " lgdt gdt_desr-ap_boot_start\n"
> - " xor %ax, %ax \n"
> - " inc %ax \n"
> - " lmsw %ax \n"
> - " ljmpl $0x08,$1f \n"
> - "gdt_desr: \n"
> - " .word gdt_end - gdt - 1 \n"
> - " .long gdt \n"
> - "ap_boot_end: .code32 \n"
> - "1: mov $0x10,%eax \n"
> - " mov %eax,%ds \n"
> - " mov %eax,%es \n"
> - " mov %eax,%ss \n"
> - " movl $stack_top,%esp \n"
> - " movl %esp,%ebp \n"
> - " call ap_start \n"
> - "1: hlt \n"
> - " jmp 1b \n"
> - " \n"
> - " .align 8 \n"
> - "gdt: \n"
> - " .quad 0x0000000000000000 \n"
> - " .quad 0x00cf9a000000ffff \n" /* 0x08: Flat code segment */
> - " .quad 0x00cf92000000ffff \n" /* 0x10: Flat data segment */
> - "gdt_end: \n"
> - " \n"
> - " .bss \n"
> - " .align 8 \n"
> - "stack: \n"
> - " .skip 0x4000 \n"
> - "stack_top: \n"
> - " .text \n"
> - );
> -
> -void ap_start(void); /* non-static avoids unused-function compiler warning */
> -/*static*/ void ap_start(void)
> +static void ap_start(void)
> {
> printf(" - CPU%d ... ", ap_cpuid);
> cacheattr_init();
> printf("done.\n");
> +
> + if ( !ap_cpuid )
> + return;
> +
> wmb();
> ap_callin = 1;
/* After this point, the BSP will shut us down. */
> -}
>
> -static void lapic_wait_ready(void)
> -{
> - while ( lapic_read(APIC_ICR) & APIC_ICR_BUSY )
> - cpu_relax();
> + while ( 1 )
For this, we tend to favour "for ( ;; )".
> + asm volatile ( "hlt" );
> }
>
> static void boot_cpu(unsigned int cpu)
> {
> - unsigned int icr2 = SET_APIC_DEST_FIELD(LAPIC_ID(cpu));
> + static uint8_t ap_stack[4 * PAGE_SIZE] __attribute__ ((aligned (16)));
I know you're just copying what was there, but 4 pages is stupidly large
for something that needs about 4 stack slots.
4K is absolutely plenty.
> + static struct vcpu_hvm_context ap;
>
> /* Initialise shared variables. */
> ap_cpuid = cpu;
> - ap_callin = 0;
> wmb();
>
> - /* Wake up the secondary processor: INIT-SIPI-SIPI... */
> - lapic_wait_ready();
> - lapic_write(APIC_ICR2, icr2);
> - lapic_write(APIC_ICR, APIC_DM_INIT);
> - lapic_wait_ready();
> - lapic_write(APIC_ICR2, icr2);
> - lapic_write(APIC_ICR, APIC_DM_STARTUP | (AP_BOOT_EIP >> 12));
> - lapic_wait_ready();
> - lapic_write(APIC_ICR2, icr2);
> - lapic_write(APIC_ICR, APIC_DM_STARTUP | (AP_BOOT_EIP >> 12));
> - lapic_wait_ready();
> + /* Wake up the secondary processor */
> + ap = (struct vcpu_hvm_context) {
> + .mode = VCPU_HVM_MODE_32B,
> + .cpu_regs.x86_32 = {
> + .eip = (uint32_t)ap_start,
> + .esp = (uint32_t)ap_stack + ARRAY_SIZE(ap_stack),
Not that it really matters, but these want to be unsigned long casts.
> +
> + /* Protected mode with MMU off */
> + .cr0 = X86_CR0_PE,
> +
> + /* Prepopulate the GDT */
/* 32bit Flat Mode */
This isn't the GDT; it's the segment registers, but "Flat Mode" is the
more meaningful term to use.
I'm happy to fix all on commit.
~Andrew
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 4/8] tools/hvmloader: Wake APs with hypercalls and not with INIT+SIPI+SIPI
2024-05-08 19:13 ` Andrew Cooper
@ 2024-05-09 11:04 ` Alejandro Vallejo
0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-09 11:04 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel
Cc: Jan Beulich, Roger Pau Monné, Anthony PERARD
On 08/05/2024 20:13, Andrew Cooper wrote:
> On 08/05/2024 1:39 pm, Alejandro Vallejo wrote:
>> Removes a needless assembly entry point and simplifies the codebase by allowing
>> hvmloader to wake APs it doesn't know the APIC ID of.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>
> This is basically independent of the rest of the series, and it would be
> good to pull it in separately. A few notes.
>
> The commit message ought to note that this has a side effect of removing
> an LMSW instruction which has fastpath at all on AMD CPUs, and requires
> full emulation, and it gets rid of 13 vLAPIC emulations when 3
> hypercalls would do.
>
> The point being that this is borderline backport material, although it
> does depend on the 32 vCPU bugfix.
>
>> ---
>> v2:
>> * New patch. Replaces adding cpu policy to hvmloader in v1.
>> ---
>> tools/firmware/hvmloader/smp.c | 111 +++++++++++++--------------------
>> 1 file changed, 44 insertions(+), 67 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
>> index 082b17f13818..a668f15d7e1f 100644
>> --- a/tools/firmware/hvmloader/smp.c
>> +++ b/tools/firmware/hvmloader/smp.c
>> @@ -22,88 +22,68 @@
>> #include "util.h"
>> #include "config.h"
>> #include "apic_regs.h"
>> +#include "hypercall.h"
>>
>> -#define AP_BOOT_EIP 0x1000
>> -extern char ap_boot_start[], ap_boot_end[];
>> +#include <xen/asm/x86-defns.h>
>> +#include <xen/hvm/hvm_vcpu.h>
>> +
>> +#include <xen/vcpu.h>
>>
>> static int ap_callin, ap_cpuid;
>>
>> -asm (
>> - " .text \n"
>> - " .code16 \n"
>> - "ap_boot_start: .code16 \n"
>> - " mov %cs,%ax \n"
>> - " mov %ax,%ds \n"
>> - " lgdt gdt_desr-ap_boot_start\n"
>> - " xor %ax, %ax \n"
>> - " inc %ax \n"
>> - " lmsw %ax \n"
>> - " ljmpl $0x08,$1f \n"
>> - "gdt_desr: \n"
>> - " .word gdt_end - gdt - 1 \n"
>> - " .long gdt \n"
>> - "ap_boot_end: .code32 \n"
>> - "1: mov $0x10,%eax \n"
>> - " mov %eax,%ds \n"
>> - " mov %eax,%es \n"
>> - " mov %eax,%ss \n"
>> - " movl $stack_top,%esp \n"
>> - " movl %esp,%ebp \n"
>> - " call ap_start \n"
>> - "1: hlt \n"
>> - " jmp 1b \n"
>> - " \n"
>> - " .align 8 \n"
>> - "gdt: \n"
>> - " .quad 0x0000000000000000 \n"
>> - " .quad 0x00cf9a000000ffff \n" /* 0x08: Flat code segment */
>> - " .quad 0x00cf92000000ffff \n" /* 0x10: Flat data segment */
>> - "gdt_end: \n"
>> - " \n"
>> - " .bss \n"
>> - " .align 8 \n"
>> - "stack: \n"
>> - " .skip 0x4000 \n"
>> - "stack_top: \n"
>> - " .text \n"
>> - );
>> -
>> -void ap_start(void); /* non-static avoids unused-function compiler warning */
>> -/*static*/ void ap_start(void)
>> +static void ap_start(void)
>> {
>> printf(" - CPU%d ... ", ap_cpuid);
>> cacheattr_init();
>> printf("done.\n");
>> +
>> + if ( !ap_cpuid )
>> + return;
>> +
>> wmb();
>> ap_callin = 1;
>
> /* After this point, the BSP will shut us down. */
>
>> -}
>>
>> -static void lapic_wait_ready(void)
>> -{
>> - while ( lapic_read(APIC_ICR) & APIC_ICR_BUSY )
>> - cpu_relax();
>> + while ( 1 )
>
> For this, we tend to favour "for ( ;; )".
>
>> + asm volatile ( "hlt" );
>> }
>>
>> static void boot_cpu(unsigned int cpu)
>> {
>> - unsigned int icr2 = SET_APIC_DEST_FIELD(LAPIC_ID(cpu));
>> + static uint8_t ap_stack[4 * PAGE_SIZE] __attribute__ ((aligned (16)));
>
> I know you're just copying what was there, but 4 pages is stupidly large
> for something that needs about 4 stack slots.
>
> 4K is absolutely plenty.
>
>> + static struct vcpu_hvm_context ap;
>>
>> /* Initialise shared variables. */
>> ap_cpuid = cpu;
>> - ap_callin = 0;
>> wmb();
>>
>> - /* Wake up the secondary processor: INIT-SIPI-SIPI... */
>> - lapic_wait_ready();
>> - lapic_write(APIC_ICR2, icr2);
>> - lapic_write(APIC_ICR, APIC_DM_INIT);
>> - lapic_wait_ready();
>> - lapic_write(APIC_ICR2, icr2);
>> - lapic_write(APIC_ICR, APIC_DM_STARTUP | (AP_BOOT_EIP >> 12));
>> - lapic_wait_ready();
>> - lapic_write(APIC_ICR2, icr2);
>> - lapic_write(APIC_ICR, APIC_DM_STARTUP | (AP_BOOT_EIP >> 12));
>> - lapic_wait_ready();
>> + /* Wake up the secondary processor */
>> + ap = (struct vcpu_hvm_context) {
>> + .mode = VCPU_HVM_MODE_32B,
>> + .cpu_regs.x86_32 = {
>> + .eip = (uint32_t)ap_start,
>> + .esp = (uint32_t)ap_stack + ARRAY_SIZE(ap_stack),
>
> Not that it really matters, but these want to be unsigned long casts.
>
>> +
>> + /* Protected mode with MMU off */
>> + .cr0 = X86_CR0_PE,
>> +
>> + /* Prepopulate the GDT */
>
> /* 32bit Flat Mode */
>
> This isn't the GDT; it's the segment registers, but "Flat Mode" is the
> more meaningful term to use.
>
>
> I'm happy to fix all on commit.
>
> ~Andrew
All sound ok to me.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 4.5/8] tools/hvmloader: Further simplify SMP setup
2024-05-08 12:39 ` [PATCH v2 4/8] tools/hvmloader: Wake APs with hypercalls and not with INIT+SIPI+SIPI Alejandro Vallejo
2024-05-08 19:13 ` Andrew Cooper
@ 2024-05-09 17:50 ` Andrew Cooper
2024-05-13 17:19 ` Alejandro Vallejo
2024-05-23 15:04 ` Roger Pau Monné
1 sibling, 2 replies; 43+ messages in thread
From: Andrew Cooper @ 2024-05-09 17:50 UTC (permalink / raw)
To: Xen-devel
Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
Alejandro Vallejo
Now that we're using hypercalls to start APs, we can replace the 'ap_cpuid'
global with a regular function parameter. This requires telling the compiler
that we'd like the parameter in a register rather than on the stack.
While adjusting, rename to cpu_setup(). It's always been used on the BSP,
making the name ap_start() specifically misleading.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Alejandro Vallejo <alejandro.vallejo@cloud.com>
This is a trick I found for XTF, not that I've completed the SMP support yet.
I realise it's cheating slightly WRT 4.19, but it came out of the middle of a
series targetted for 4.19.
---
tools/firmware/hvmloader/smp.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
index 6ebf0b60faab..5d46eee1c5f4 100644
--- a/tools/firmware/hvmloader/smp.c
+++ b/tools/firmware/hvmloader/smp.c
@@ -29,15 +29,15 @@
#include <xen/vcpu.h>
-static int ap_callin, ap_cpuid;
+static int ap_callin;
-static void ap_start(void)
+static void __attribute__((regparm(1))) cpu_setup(unsigned int cpu)
{
- printf(" - CPU%d ... ", ap_cpuid);
+ printf(" - CPU%d ... ", cpu);
cacheattr_init();
printf("done.\n");
- if ( !ap_cpuid ) /* Used on the BSP too */
+ if ( !cpu ) /* Used on the BSP too */
return;
wmb();
@@ -55,7 +55,6 @@ static void boot_cpu(unsigned int cpu)
static struct vcpu_hvm_context ap;
/* Initialise shared variables. */
- ap_cpuid = cpu;
ap_callin = 0;
wmb();
@@ -63,9 +62,11 @@ static void boot_cpu(unsigned int cpu)
ap = (struct vcpu_hvm_context) {
.mode = VCPU_HVM_MODE_32B,
.cpu_regs.x86_32 = {
- .eip = (unsigned long)ap_start,
+ .eip = (unsigned long)cpu_setup,
.esp = (unsigned long)ap_stack + ARRAY_SIZE(ap_stack),
+ .eax = cpu,
+
/* Protected Mode, no paging. */
.cr0 = X86_CR0_PE,
@@ -105,7 +106,7 @@ void smp_initialise(void)
unsigned int i, nr_cpus = hvm_info->nr_vcpus;
printf("Multiprocessor initialisation:\n");
- ap_start();
+ cpu_setup(0);
for ( i = 1; i < nr_cpus; i++ )
boot_cpu(i);
}
base-commit: 53959cb8309919fc2f157a1c99e0af2ce280cb84
--
2.30.2
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 4.5/8] tools/hvmloader: Further simplify SMP setup
2024-05-09 17:50 ` [PATCH 4.5/8] tools/hvmloader: Further simplify SMP setup Andrew Cooper
@ 2024-05-13 17:19 ` Alejandro Vallejo
2024-05-23 15:04 ` Roger Pau Monné
1 sibling, 0 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-13 17:19 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné
On 09/05/2024 18:50, Andrew Cooper wrote:
> Now that we're using hypercalls to start APs, we can replace the 'ap_cpuid'
> global with a regular function parameter. This requires telling the compiler
> that we'd like the parameter in a register rather than on the stack.
>
> While adjusting, rename to cpu_setup(). It's always been used on the BSP,
> making the name ap_start() specifically misleading.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>
> This is a trick I found for XTF, not that I've completed the SMP support yet.
>
> I realise it's cheating slightly WRT 4.19, but it came out of the middle of a
> series targetted for 4.19.
> ---
> tools/firmware/hvmloader/smp.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
> index 6ebf0b60faab..5d46eee1c5f4 100644
> --- a/tools/firmware/hvmloader/smp.c
> +++ b/tools/firmware/hvmloader/smp.c
> @@ -29,15 +29,15 @@
>
> #include <xen/vcpu.h>
>
> -static int ap_callin, ap_cpuid;
> +static int ap_callin;
>
> -static void ap_start(void)
> +static void __attribute__((regparm(1))) cpu_setup(unsigned int cpu)
I like it, but I'm not a fan of compiler attributes when there's sane
alternatives. We could pre-push the argument onto ap_stack to achieve
the same thing. As in, add a -4 offset to esp, and write "cpu" there.
*(uint32_t*)ap.cpu_regs.x86_32.esp) = cpu;
That said, this is a solution as good as any other and it's definitely
better than a global, so take it or leave it.
With or without the proposed alternative...
Reviewed-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> {
> - printf(" - CPU%d ... ", ap_cpuid);
> + printf(" - CPU%d ... ", cpu);
> cacheattr_init();
> printf("done.\n");
>
> - if ( !ap_cpuid ) /* Used on the BSP too */
> + if ( !cpu ) /* Used on the BSP too */
> return;
>
> wmb();
> @@ -55,7 +55,6 @@ static void boot_cpu(unsigned int cpu)
> static struct vcpu_hvm_context ap;
>
> /* Initialise shared variables. */
> - ap_cpuid = cpu;
> ap_callin = 0;
> wmb();
>
> @@ -63,9 +62,11 @@ static void boot_cpu(unsigned int cpu)
> ap = (struct vcpu_hvm_context) {
> .mode = VCPU_HVM_MODE_32B,
> .cpu_regs.x86_32 = {
> - .eip = (unsigned long)ap_start,
> + .eip = (unsigned long)cpu_setup,
> .esp = (unsigned long)ap_stack + ARRAY_SIZE(ap_stack),
>
> + .eax = cpu,
> +
> /* Protected Mode, no paging. */
> .cr0 = X86_CR0_PE,
>
> @@ -105,7 +106,7 @@ void smp_initialise(void)
> unsigned int i, nr_cpus = hvm_info->nr_vcpus;
>
> printf("Multiprocessor initialisation:\n");
> - ap_start();
> + cpu_setup(0);
> for ( i = 1; i < nr_cpus; i++ )
> boot_cpu(i);
> }
>
> base-commit: 53959cb8309919fc2f157a1c99e0af2ce280cb84
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm
2024-05-08 12:39 ` [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm Alejandro Vallejo
@ 2024-05-22 9:33 ` Jan Beulich
2024-05-22 15:24 ` Alejandro Vallejo
2024-05-23 14:37 ` Roger Pau Monné
1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2024-05-22 9:33 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel
On 08.05.2024 14:39, Alejandro Vallejo wrote:
> Otherwise it's not possible to call functions described in hvm/vlapic.h from the
> inline functions of hvm/hvm.h.
>
> This is because a static inline in vlapic.h depends on hvm.h, and pulls it
> transitively through vpt.h. The ultimate cause is having hvm.h included in any
> of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.
>
> No functional change.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
In principle:
Reviewed-by: Jan Beulich <jbeulich@suse.com>
But see below for one possible adjustment.
> ---
> v2:
> * New patch. Prereq to moving vlapic_cpu_policy_changed() onto hvm.h
That hook invocation living outside of hvm/hvm.h was an outlier anyway,
so even without the planned further work this is probably a good move.
> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu *v)
> alternative_vcall(hvm_funcs.update_vlapic_mode, v);
> }
>
> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
> +{
> + if ( hvm_funcs.sync_pir_to_irr )
> + alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
> +}
The hook doesn't have "vlapic" in its name. Therefore instead or prepending
hvm_ to the original name or the wrapper, how about replacing the vlapic_
that was there. That would then also fit better with the naming scheme used
for other hooks and their wrappers. Happy to adjust while committing, so
long as you don't disagree.
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm
2024-05-22 9:33 ` Jan Beulich
@ 2024-05-22 15:24 ` Alejandro Vallejo
0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-22 15:24 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel
On 22/05/2024 10:33, Jan Beulich wrote:
> On 08.05.2024 14:39, Alejandro Vallejo wrote:
>> Otherwise it's not possible to call functions described in hvm/vlapic.h from the
>> inline functions of hvm/hvm.h.
>>
>> This is because a static inline in vlapic.h depends on hvm.h, and pulls it
>> transitively through vpt.h. The ultimate cause is having hvm.h included in any
>> of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.
>>
>> No functional change.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>
> In principle:
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> But see below for one possible adjustment.
>
>> ---
>> v2:
>> * New patch. Prereq to moving vlapic_cpu_policy_changed() onto hvm.h
>
> That hook invocation living outside of hvm/hvm.h was an outlier anyway,
> so even without the planned further work this is probably a good move.
>
>> --- a/xen/arch/x86/include/asm/hvm/hvm.h
>> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
>> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu *v)
>> alternative_vcall(hvm_funcs.update_vlapic_mode, v);
>> }
>>
>> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
>> +{
>> + if ( hvm_funcs.sync_pir_to_irr )
>> + alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
>> +}
>
> The hook doesn't have "vlapic" in its name. Therefore instead or prepending
> hvm_ to the original name or the wrapper, how about replacing the vlapic_
> that was there. That would then also fit better with the naming scheme used
> for other hooks and their wrappers. Happy to adjust while committing, so
> long as you don't disagree.
>
> Jan
Sounds reasonable. I wasn't sure whether vlapic was adding anything more
than a namespace prefix to the function name. Are you happy to adjust
that on commit?
If so, I'm good with it in the form you propose.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
2024-05-08 12:39 ` [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
@ 2024-05-23 14:32 ` Roger Pau Monné
2024-05-24 10:58 ` Alejandro Vallejo
0 siblings, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-23 14:32 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper
On Wed, May 08, 2024 at 01:39:20PM +0100, Alejandro Vallejo wrote:
> This allows the initial x2APIC ID to be sent on the migration stream. The
> hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being.
> Given the vlapic data is zero-extended on restore, fix up migrations from
> hosts without the field by setting it to the old convention if zero.
>
> x2APIC IDs are calculated from the CPU policy where the guest topology is
> defined. For the time being, the function simply returns the old
> relationship, but will eventually return results consistent with the
> topology.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
> v2:
> * Removed usage of SET_xAPIC_ID().
> * Restored previous logic when exposing leaf 0xb, and gate it for HVM only.
> * Rewrote comment in lapic_load_fixup, including the implicit assumption.
> * Moved vlapic_cpu_policy_changed() into hvm_cpuid_policy_changed())
> * const-ified policy in vlapic_cpu_policy_changed()
> ---
> xen/arch/x86/cpuid.c | 15 ++++---------
> xen/arch/x86/hvm/vlapic.c | 30 ++++++++++++++++++++++++--
> xen/arch/x86/include/asm/hvm/hvm.h | 1 +
> xen/arch/x86/include/asm/hvm/vlapic.h | 2 ++
> xen/include/public/arch-x86/hvm/save.h | 2 ++
> xen/include/xen/lib/x86/cpu-policy.h | 9 ++++++++
> xen/lib/x86/policy.c | 11 ++++++++++
> 7 files changed, 57 insertions(+), 13 deletions(-)
>
> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> index 7a38e032146a..242c21ec5bb6 100644
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
> const struct cpu_user_regs *regs;
>
> case 0x1:
> - /* TODO: Rework topology logic. */
> res->b &= 0x00ffffffu;
> if ( is_hvm_domain(d) )
> - res->b |= (v->vcpu_id * 2) << 24;
> + res->b |= vlapic_x2apic_id(vcpu_vlapic(v)) << 24;
>
> /* TODO: Rework vPMU control in terms of toolstack choices. */
> if ( vpmu_available(v) &&
> @@ -311,19 +310,13 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
> break;
>
> case 0xb:
> - /*
> - * In principle, this leaf is Intel-only. In practice, it is tightly
> - * coupled with x2apic, and we offer an x2apic-capable APIC emulation
> - * to guests on AMD hardware as well.
> - *
> - * TODO: Rework topology logic.
> - */
> - if ( p->basic.x2apic )
> + /* Don't expose topology information to PV guests */
Not sure whether we want to keep part of the comment about exposing
x2APIC to guests even when x2APIC is not present in the host. I think
this code has changed and the comment is kind of stale now.
> + if ( is_hvm_domain(d) && p->basic.x2apic )
> {
> *(uint8_t *)&res->c = subleaf;
>
> /* Fix the x2APIC identifier. */
> - res->d = v->vcpu_id * 2;
> + res->d = vlapic_x2apic_id(vcpu_vlapic(v));
> }
> break;
>
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 05072a21bf38..61a96474006b 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1069,7 +1069,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id)
> static void set_x2apic_id(struct vlapic *vlapic)
> {
> const struct vcpu *v = vlapic_vcpu(vlapic);
> - uint32_t apic_id = v->vcpu_id * 2;
> + uint32_t apic_id = vlapic->hw.x2apic_id;
> uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
>
> /*
> @@ -1083,6 +1083,22 @@ static void set_x2apic_id(struct vlapic *vlapic)
> vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
> }
>
> +void vlapic_cpu_policy_changed(struct vcpu *v)
> +{
> + struct vlapic *vlapic = vcpu_vlapic(v);
> + const struct cpu_policy *cp = v->domain->arch.cpu_policy;
> +
> + /*
> + * Don't override the initial x2APIC ID if we have migrated it or
> + * if the domain doesn't have vLAPIC at all.
> + */
> + if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
> + return;
> +
> + vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
> + vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
Nit: in case we decide to start APICs in x2APIC mode, might be good to
take this into account here and use vlapic_x2apic_mode(vlapic) to
select whether SET_xAPIC_ID() needs to be used or not:
vlapic_set_reg(vlapic, APIC_ID,
vlapic_x2apic_mode(vlapic) ? vlapic->hw.x2apic_id
: SET_xAPIC_ID(vlapic->hw.x2apic_id));
Or similar.
> +}
> +
> int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
> {
> const struct cpu_policy *cp = v->domain->arch.cpu_policy;
> @@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
> if ( v->vcpu_id == 0 )
> vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>
> - vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> + vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
> vlapic_do_init(vlapic);
> }
>
> @@ -1514,6 +1530,16 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> const struct vcpu *v = vlapic_vcpu(vlapic);
> uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>
> + /*
> + * Loading record without hw.x2apic_id in the save stream, calculate using
> + * the traditional "vcpu_id * 2" relation. There's an implicit assumption
> + * that vCPU0 always has x2APIC0, which is true for the old relation, and
> + * still holds under the new x2APIC generation algorithm. While that case
> + * goes through the conditional it's benign because it still maps to zero.
> + */
> + if ( !vlapic->hw.x2apic_id )
> + vlapic->hw.x2apic_id = v->vcpu_id * 2;
> +
> /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
> if ( !vlapic_x2apic_mode(vlapic) ||
> (vlapic->loaded.ldr == good_ldr) )
> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
> index 0c9e6f15645d..e1f0585d75a9 100644
> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> @@ -448,6 +448,7 @@ static inline void hvm_update_guest_efer(struct vcpu *v)
> static inline void hvm_cpuid_policy_changed(struct vcpu *v)
> {
> alternative_vcall(hvm_funcs.cpuid_policy_changed, v);
> + vlapic_cpu_policy_changed(v);
Note sure whether this call would better be placed in
cpu_policy_updated() inside the is_hvm_vcpu() conditional branch.
hvm_cpuid_policy_changed() &c are just wrappers around the hvm_funcs
hooks, pulling vlapic functions in there is likely to complicate the
header dependencies in the long term.
> }
>
> static inline void hvm_set_tsc_offset(struct vcpu *v, uint64_t offset,
> diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
> index 88ef94524339..e8d41313abd3 100644
> --- a/xen/arch/x86/include/asm/hvm/vlapic.h
> +++ b/xen/arch/x86/include/asm/hvm/vlapic.h
> @@ -44,6 +44,7 @@
> #define vlapic_xapic_mode(vlapic) \
> (!vlapic_hw_disabled(vlapic) && \
> !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
> +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
>
> /*
> * Generic APIC bitmap vector update & search routines.
> @@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack);
>
> int vlapic_init(struct vcpu *v);
> void vlapic_destroy(struct vcpu *v);
> +void vlapic_cpu_policy_changed(struct vcpu *v);
>
> void vlapic_reset(struct vlapic *vlapic);
>
> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
> index 7ecacadde165..1c2ec669ffc9 100644
> --- a/xen/include/public/arch-x86/hvm/save.h
> +++ b/xen/include/public/arch-x86/hvm/save.h
> @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
> uint32_t disabled; /* VLAPIC_xx_DISABLED */
> uint32_t timer_divisor;
> uint64_t tdt_msr;
> + uint32_t x2apic_id;
> + uint32_t rsvd_zero;
I think Jan requested for this field to be checked to be 0 for
incoming migrations, yet I don't see such check added.
> };
>
> DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);
> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
> index d5e447e9dc06..392320b9adbe 100644
> --- a/xen/include/xen/lib/x86/cpu-policy.h
> +++ b/xen/include/xen/lib/x86/cpu-policy.h
> @@ -542,6 +542,15 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
> const struct cpu_policy *guest,
> struct cpu_policy_errors *err);
>
> +/**
> + * Calculates the x2APIC ID of a vCPU given a CPU policy
> + *
> + * @param p CPU policy of the domain.
> + * @param id vCPU ID of the vCPU.
> + * @returns x2APIC ID of the vCPU.
> + */
> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id);
> +
> #endif /* !XEN_LIB_X86_POLICIES_H */
>
> /*
> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
> index f033d22785be..4cef658feeb8 100644
> --- a/xen/lib/x86/policy.c
> +++ b/xen/lib/x86/policy.c
> @@ -2,6 +2,17 @@
>
> #include <xen/lib/x86/cpu-policy.h>
>
> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
> +{
> + /*
> + * TODO: Derive x2APIC ID from the topology information inside `p`
> + * rather than from vCPU ID. This bodge is a temporary measure
> + * until all infra is in place to retrieve or derive the initial
> + * x2APIC ID from migrated domains.
> + */
> + return vcpu_id * 2;
I don't think this builds?
Note the parameter is plain 'id' not 'vcpu_id'.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm
2024-05-08 12:39 ` [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm Alejandro Vallejo
2024-05-22 9:33 ` Jan Beulich
@ 2024-05-23 14:37 ` Roger Pau Monné
2024-05-23 14:40 ` Jan Beulich
1 sibling, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-23 14:37 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper
On Wed, May 08, 2024 at 01:39:21PM +0100, Alejandro Vallejo wrote:
> Otherwise it's not possible to call functions described in hvm/vlapic.h from the
> inline functions of hvm/hvm.h.
>
> This is because a static inline in vlapic.h depends on hvm.h, and pulls it
> transitively through vpt.h. The ultimate cause is having hvm.h included in any
> of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.
>
> No functional change.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
One cosmetic comment below.
> ---
> v2:
> * New patch. Prereq to moving vlapic_cpu_policy_changed() onto hvm.h
> ---
> xen/arch/x86/hvm/irq.c | 6 +++---
> xen/arch/x86/hvm/vlapic.c | 4 ++--
> xen/arch/x86/include/asm/hvm/hvm.h | 6 ++++++
> xen/arch/x86/include/asm/hvm/vlapic.h | 6 ------
> xen/arch/x86/include/asm/hvm/vpt.h | 1 -
> 5 files changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
> index 4a9fe82cbd8d..4f5479b12c98 100644
> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -512,13 +512,13 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
> int vector;
>
> /*
> - * Always call vlapic_sync_pir_to_irr so that PIR is synced into IRR when
> - * using posted interrupts. Note this is also done by
> + * Always call hvm_vlapic_sync_pir_to_irr so that PIR is synced into IRR
> + * when using posted interrupts. Note this is also done by
> * vlapic_has_pending_irq but depending on which interrupts are pending
> * hvm_vcpu_has_pending_irq will return early without calling
> * vlapic_has_pending_irq.
> */
> - vlapic_sync_pir_to_irr(v);
> + hvm_vlapic_sync_pir_to_irr(v);
>
> if ( unlikely(v->arch.nmi_pending) )
> return hvm_intack_nmi;
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 61a96474006b..8a244100009c 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -98,7 +98,7 @@ static void vlapic_clear_irr(int vector, struct vlapic *vlapic)
>
> static int vlapic_find_highest_irr(struct vlapic *vlapic)
> {
> - vlapic_sync_pir_to_irr(vlapic_vcpu(vlapic));
> + hvm_vlapic_sync_pir_to_irr(vlapic_vcpu(vlapic));
>
> return vlapic_find_highest_vector(&vlapic->regs->data[APIC_IRR]);
> }
> @@ -1516,7 +1516,7 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
> if ( !has_vlapic(v->domain) )
> return 0;
>
> - vlapic_sync_pir_to_irr(v);
> + hvm_vlapic_sync_pir_to_irr(v);
>
> return hvm_save_entry(LAPIC_REGS, v->vcpu_id, h, vcpu_vlapic(v)->regs);
> }
> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
> index e1f0585d75a9..84911f3ebcb4 100644
> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu *v)
> alternative_vcall(hvm_funcs.update_vlapic_mode, v);
> }
>
> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
> +{
> + if ( hvm_funcs.sync_pir_to_irr )
> + alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
Nit: for consistency the wrappers are usually named hvm_<hook_name>,
so in this case it would be hvm_sync_pir_to_irr(), or the hvm_funcs
field should be renamed to vlapic_sync_pir_to_irr.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm
2024-05-23 14:37 ` Roger Pau Monné
@ 2024-05-23 14:40 ` Jan Beulich
2024-05-23 14:52 ` Roger Pau Monné
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2024-05-23 14:40 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: Xen-devel, Andrew Cooper, Alejandro Vallejo
On 23.05.2024 16:37, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:21PM +0100, Alejandro Vallejo wrote:
>> --- a/xen/arch/x86/include/asm/hvm/hvm.h
>> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
>> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu *v)
>> alternative_vcall(hvm_funcs.update_vlapic_mode, v);
>> }
>>
>> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
>> +{
>> + if ( hvm_funcs.sync_pir_to_irr )
>> + alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
>
> Nit: for consistency the wrappers are usually named hvm_<hook_name>,
> so in this case it would be hvm_sync_pir_to_irr(), or the hvm_funcs
> field should be renamed to vlapic_sync_pir_to_irr.
Funny you should mention that: See my earlier comment as well as what
was committed.
Jan
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook
2024-05-08 12:39 ` [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook Alejandro Vallejo
@ 2024-05-23 14:50 ` Roger Pau Monné
2024-05-24 11:16 ` Alejandro Vallejo
2024-05-23 18:58 ` Andrew Cooper
1 sibling, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-23 14:50 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper
On Wed, May 08, 2024 at 01:39:22PM +0100, Alejandro Vallejo wrote:
> While at it, add a check for the reserved field in the hidden save area.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
> v2:
> * New patch. Addresses the missing check for rsvd_zero in v1.
Oh, it would be better if this was done at the time when rsvd_zero is
introduced. I think this should be moved ahead of the series, so that
the patch that introduces rsvd_zero can add the check in
lapic_check_hidden().
> ---
> xen/arch/x86/hvm/vlapic.c | 41 ++++++++++++++++++++++++++++-----------
> 1 file changed, 30 insertions(+), 11 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 8a244100009c..2f06bff1b2cc 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
> }
>
> -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
> +static int cf_check lapic_check_hidden(const struct domain *d,
> + hvm_domain_context_t *h)
> {
> unsigned int vcpuid = hvm_load_instance(h);
> - struct vcpu *v;
> - struct vlapic *s;
> + struct hvm_hw_lapic s;
>
> if ( !has_vlapic(d) )
> return -ENODEV;
>
> /* Which vlapic to load? */
> - if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> + if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )
> {
> dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
> d->domain_id, vcpuid);
> return -EINVAL;
> }
> - s = vcpu_vlapic(v);
>
> - if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
> + if ( hvm_load_entry_zeroextend(LAPIC, h, &s) )
Can't you use hvm_get_entry() to perform the sanity checks:
const struct hvm_hw_lapic *s = hvm_get_entry(LAPIC, h);
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm
2024-05-23 14:40 ` Jan Beulich
@ 2024-05-23 14:52 ` Roger Pau Monné
0 siblings, 0 replies; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-23 14:52 UTC (permalink / raw)
To: Jan Beulich; +Cc: Xen-devel, Andrew Cooper, Alejandro Vallejo
On Thu, May 23, 2024 at 04:40:06PM +0200, Jan Beulich wrote:
> On 23.05.2024 16:37, Roger Pau Monné wrote:
> > On Wed, May 08, 2024 at 01:39:21PM +0100, Alejandro Vallejo wrote:
> >> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> >> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> >> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu *v)
> >> alternative_vcall(hvm_funcs.update_vlapic_mode, v);
> >> }
> >>
> >> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
> >> +{
> >> + if ( hvm_funcs.sync_pir_to_irr )
> >> + alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
> >
> > Nit: for consistency the wrappers are usually named hvm_<hook_name>,
> > so in this case it would be hvm_sync_pir_to_irr(), or the hvm_funcs
> > field should be renamed to vlapic_sync_pir_to_irr.
>
> Funny you should mention that: See my earlier comment as well as what
> was committed.
Oh, sorry, didn't realize you already replied, adjusted and committed.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 4.5/8] tools/hvmloader: Further simplify SMP setup
2024-05-09 17:50 ` [PATCH 4.5/8] tools/hvmloader: Further simplify SMP setup Andrew Cooper
2024-05-13 17:19 ` Alejandro Vallejo
@ 2024-05-23 15:04 ` Roger Pau Monné
1 sibling, 0 replies; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-23 15:04 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel, Jan Beulich, Alejandro Vallejo
On Thu, May 09, 2024 at 06:50:57PM +0100, Andrew Cooper wrote:
> Now that we're using hypercalls to start APs, we can replace the 'ap_cpuid'
> global with a regular function parameter. This requires telling the compiler
> that we'd like the parameter in a register rather than on the stack.
>
> While adjusting, rename to cpu_setup(). It's always been used on the BSP,
> making the name ap_start() specifically misleading.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
2024-05-08 12:39 ` [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
@ 2024-05-23 16:13 ` Roger Pau Monné
2024-05-24 15:16 ` Alejandro Vallejo
2024-05-24 7:21 ` Roger Pau Monné
1 sibling, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-23 16:13 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Wed, May 08, 2024 at 01:39:24PM +0100, Alejandro Vallejo wrote:
> Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
> populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
> from hvmloader.
>
> While at this also remove ap_callin, as writing the APIC ID may serve the same
> purpose.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
> v2:
> * New patch. Replaces adding cpu policy to hvmloader in v1.
> ---
> tools/firmware/hvmloader/config.h | 6 ++++-
> tools/firmware/hvmloader/hvmloader.c | 4 +--
> tools/firmware/hvmloader/smp.c | 40 +++++++++++++++++++++++-----
> tools/firmware/hvmloader/util.h | 5 ++++
> xen/arch/x86/include/asm/hvm/hvm.h | 1 +
> 5 files changed, 47 insertions(+), 9 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
> index c82adf6dc508..edf6fa9c908c 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -4,6 +4,8 @@
> #include <stdint.h>
> #include <stdbool.h>
>
> +#include <xen/hvm/hvm_info_table.h>
> +
> enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
> extern enum virtual_vga virtual_vga;
>
> @@ -49,8 +51,10 @@ extern uint8_t ioapic_version;
>
> #define IOAPIC_ID 0x01
>
> +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
> +
> #define LAPIC_BASE_ADDRESS 0xfee00000
> -#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2)
> +#define LAPIC_ID(vcpu_id) (CPU_TO_X2APICID[(vcpu_id)])
>
> #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
> #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> index c58841e5b556..1eba92229925 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -342,11 +342,11 @@ int main(void)
>
> printf("CPU speed is %u MHz\n", get_cpu_mhz());
>
> + smp_initialise();
> +
> apic_setup();
> pci_setup();
>
> - smp_initialise();
> -
> perform_tests();
>
> if ( bios->bios_info_setup )
> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
> index a668f15d7e1f..4d75f239c2f5 100644
> --- a/tools/firmware/hvmloader/smp.c
> +++ b/tools/firmware/hvmloader/smp.c
> @@ -29,7 +29,34 @@
>
> #include <xen/vcpu.h>
>
> -static int ap_callin, ap_cpuid;
> +static int ap_cpuid;
> +
> +/**
> + * Lookup table of x2APIC IDs.
> + *
> + * Each entry is populated its respective CPU as they come online. This is required
> + * for generating the MADT with minimal assumptions about ID relationships.
> + */
> +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
> +
> +static uint32_t read_apic_id(void)
> +{
> + uint32_t apic_id;
> +
> + cpuid(1, NULL, &apic_id, NULL, NULL);
> + apic_id >>= 24;
> +
> + /*
> + * APIC IDs over 255 are represented by 255 in leaf 1 and are meant to be
> + * read from topology leaves instead. Xen exposes x2APIC IDs in leaf 0xb,
> + * but only if the x2APIC feature is present. If there are that many CPUs
> + * it's guaranteed to be there so we can avoid checking for it specifically
> + */
Maybe I'm missing something, but given the current code won't Xen just
return the low 8 bits from the x2APIC ID? I don't see any code in
guest_cpuid() that adjusts the IDs to be 255 when > 255.
> + if ( apic_id == 255 )
> + cpuid(0xb, NULL, NULL, NULL, &apic_id);
Won't the correct logic be to check if x2APIC is set in CPUID, and
then fetch the APIC ID from leaf 0xb, otherwise fallback to fetching
the APID ID from leaf 1?
> +
> + return apic_id;
> +}
>
> static void ap_start(void)
> {
> @@ -37,12 +64,12 @@ static void ap_start(void)
> cacheattr_init();
> printf("done.\n");
>
> + wmb();
> + ACCESS_ONCE(CPU_TO_X2APICID[ap_cpuid]) = read_apic_id();
A comment would be helpful here, that CPU_TO_X2APICID[ap_cpuid] is
used as synchronization that the AP has started.
You probably want to assert that read_apic_id() doesn't return 0,
otherwise we are skewed.
> +
> if ( !ap_cpuid )
> return;
>
> - wmb();
> - ap_callin = 1;
> -
> while ( 1 )
> asm volatile ( "hlt" );
> }
> @@ -86,10 +113,11 @@ static void boot_cpu(unsigned int cpu)
> BUG();
>
> /*
> - * Wait for the secondary processor to complete initialisation.
> + * Wait for the secondary processor to complete initialisation,
> + * which is signaled by its x2APIC ID being writted to the LUT.
> * Do not touch shared resources meanwhile.
> */
> - while ( !ap_callin )
> + while ( !ACCESS_ONCE(CPU_TO_X2APICID[cpu]) )
> cpu_relax();
As a further improvement, we could launch all APs in pararell, and use
a for loop to wait until all positions of the CPU_TO_X2APICID array
are set.
>
> /* Take the secondary processor offline. */
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index 14078bde1e30..51e9003bc615 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -23,6 +23,11 @@ enum {
> #define __STR(...) #__VA_ARGS__
> #define STR(...) __STR(__VA_ARGS__)
>
> +#define __ACCESS_ONCE(x) ({ \
> + (void)(typeof(x))0; /* Scalar typecheck. */ \
> + (volatile typeof(x) *)&(x); })
> +#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
Might be better for this to be placed in common-macros.h?
> +
> /* GDT selector values. */
> #define SEL_CODE16 0x0008
> #define SEL_DATA16 0x0010
> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
> index 84911f3ebcb4..6c005f0b0b38 100644
> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> @@ -16,6 +16,7 @@
> #include <asm/current.h>
> #include <asm/x86_emulate.h>
> #include <asm/hvm/asid.h>
> +#include <asm/hvm/vlapic.h>
Is this a stray change? Otherwise I don't see why this need to be
part of this patch when the rest of the changes are exclusively to
hvmloader.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 6/8] xen/lib: Add topology generator for x86
2024-05-08 12:39 ` [PATCH v2 6/8] xen/lib: Add topology generator for x86 Alejandro Vallejo
@ 2024-05-23 16:50 ` Roger Pau Monné
2024-05-28 13:28 ` Alejandro Vallejo
0 siblings, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-23 16:50 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Wed, May 08, 2024 at 01:39:25PM +0100, Alejandro Vallejo wrote:
> Add a helper to populate topology leaves in the cpu policy from
> threads/core and cores/package counts.
>
> No functional change, as it's not connected to anything yet.
There is a functional change in test-cpu-policy.c.
Maybe the commit message needs to be updated to reflect the added
testing to test-cpu-policy.c using the newly introduced helper to
generate topologies?
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
> v2:
> * New patch. Extracted from v1/patch6
> ---
> tools/tests/cpu-policy/test-cpu-policy.c | 128 +++++++++++++++++++++++
> xen/include/xen/lib/x86/cpu-policy.h | 16 +++
> xen/lib/x86/policy.c | 86 +++++++++++++++
> 3 files changed, 230 insertions(+)
>
> diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
> index 301df2c00285..0ba8c418b1b3 100644
> --- a/tools/tests/cpu-policy/test-cpu-policy.c
> +++ b/tools/tests/cpu-policy/test-cpu-policy.c
> @@ -650,6 +650,132 @@ static void test_is_compatible_failure(void)
> }
> }
>
> +static void test_topo_from_parts(void)
> +{
> + static const struct test {
> + unsigned int threads_per_core;
> + unsigned int cores_per_pkg;
> + struct cpu_policy policy;
> + } tests[] = {
> + {
> + .threads_per_core = 3, .cores_per_pkg = 1,
> + .policy = {
> + .x86_vendor = X86_VENDOR_AMD,
> + .topo.subleaf = {
> + [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
> + [1] = { .nr_logical = 1, .level = 1, .type = 2, .id_shift = 2, },
> + },
> + },
> + },
> + {
> + .threads_per_core = 1, .cores_per_pkg = 3,
> + .policy = {
> + .x86_vendor = X86_VENDOR_AMD,
> + .topo.subleaf = {
> + [0] = { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
> + [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
> + },
> + },
> + },
> + {
> + .threads_per_core = 7, .cores_per_pkg = 5,
> + .policy = {
> + .x86_vendor = X86_VENDOR_AMD,
> + .topo.subleaf = {
> + [0] = { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
> + [1] = { .nr_logical = 5, .level = 1, .type = 2, .id_shift = 6, },
> + },
> + },
> + },
> + {
> + .threads_per_core = 2, .cores_per_pkg = 128,
> + .policy = {
> + .x86_vendor = X86_VENDOR_AMD,
> + .topo.subleaf = {
> + [0] = { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
> + [1] = { .nr_logical = 128, .level = 1, .type = 2, .id_shift = 8, },
> + },
> + },
> + },
> + {
> + .threads_per_core = 3, .cores_per_pkg = 1,
> + .policy = {
> + .x86_vendor = X86_VENDOR_INTEL,
> + .topo.subleaf = {
> + [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
> + [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
> + },
> + },
> + },
> + {
> + .threads_per_core = 1, .cores_per_pkg = 3,
> + .policy = {
> + .x86_vendor = X86_VENDOR_INTEL,
> + .topo.subleaf = {
> + [0] = { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
> + [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
> + },
> + },
> + },
> + {
> + .threads_per_core = 7, .cores_per_pkg = 5,
> + .policy = {
> + .x86_vendor = X86_VENDOR_INTEL,
> + .topo.subleaf = {
> + [0] = { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
> + [1] = { .nr_logical = 35, .level = 1, .type = 2, .id_shift = 6, },
> + },
> + },
> + },
> + {
> + .threads_per_core = 2, .cores_per_pkg = 128,
> + .policy = {
> + .x86_vendor = X86_VENDOR_INTEL,
> + .topo.subleaf = {
> + [0] = { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
> + [1] = { .nr_logical = 256, .level = 1, .type = 2, .id_shift = 8, },
You don't need the array index in the initialization:
.topo.subleaf = {
{ .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
{ .nr_logical = 256, .level = 1, .type = 2,
.id_shift = 8, },
}
And lines should be limited to 80 columns if possible.
> + },
> + },
> + },
> + };
> +
> + printf("Testing topology synthesis from parts:\n");
> +
> + for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
> + {
> + const struct test *t = &tests[i];
> + struct cpu_policy actual = { .x86_vendor = t->policy.x86_vendor };
> + int rc = x86_topo_from_parts(&actual, t->threads_per_core, t->cores_per_pkg);
> +
> + if ( rc || memcmp(&actual.topo, &t->policy.topo, sizeof(actual.topo)) )
> + {
> +#define TOPO(n) topo.subleaf[(n)]
> + fail("FAIL[%d] - '%s %u t/c, %u c/p'\n",
> + rc,
> + x86_cpuid_vendor_to_str(t->policy.x86_vendor),
> + t->threads_per_core, t->cores_per_pkg);
> + printf(" subleaf=%u expected_n=%u actual_n=%u\n"
> + " expected_lvl=%u actual_lvl=%u\n"
> + " expected_type=%u actual_type=%u\n"
> + " expected_shift=%u actual_shift=%u\n",
> + 0, t->policy.TOPO(0).nr_logical, actual.TOPO(0).nr_logical,
> + t->policy.TOPO(0).level, actual.TOPO(0).level,
> + t->policy.TOPO(0).type, actual.TOPO(0).type,
> + t->policy.TOPO(0).id_shift, actual.TOPO(0).id_shift);
> +
> + printf(" subleaf=%u expected_n=%u actual_n=%u\n"
> + " expected_lvl=%u actual_lvl=%u\n"
> + " expected_type=%u actual_type=%u\n"
> + " expected_shift=%u actual_shift=%u\n",
> + 1, t->policy.TOPO(1).nr_logical, actual.TOPO(1).nr_logical,
> + t->policy.TOPO(1).level, actual.TOPO(1).level,
> + t->policy.TOPO(1).type, actual.TOPO(1).type,
> + t->policy.TOPO(1).id_shift, actual.TOPO(1).id_shift);
> +#undef TOPO
Seeing the usage of the macro, maybe you could even do something like:
TOPO(n, f) t->policy.topo.subleaf[(n)].f, actual.topo.subleaf[(n)].f
This will limit a bit the repetition of the "t->policy..., actual..."
tuple.
> + }
> + }
> +}
> +
> int main(int argc, char **argv)
> {
> printf("CPU Policy unit tests\n");
> @@ -667,6 +793,8 @@ int main(int argc, char **argv)
> test_is_compatible_success();
> test_is_compatible_failure();
>
> + test_topo_from_parts();
> +
> if ( nr_failures )
> printf("Done: %u failures\n", nr_failures);
> else
> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
> index 392320b9adbe..f5df18e9f77c 100644
> --- a/xen/include/xen/lib/x86/cpu-policy.h
> +++ b/xen/include/xen/lib/x86/cpu-policy.h
> @@ -551,6 +551,22 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
> */
> uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id);
>
> +/**
> + * Synthesise topology information in `p` given high-level constraints
> + *
> + * Topology is given in various fields accross several leaves, some of
> + * which are vendor-specific. This function uses the policy itself to
> + * derive such leaves from threads/core and cores/package.
> + *
> + * @param p CPU policy of the domain.
> + * @param threads_per_core threads/core. Doesn't need to be a power of 2.
> + * @param cores_per_package cores/package. Doesn't need to be a power of 2.
> + * @return 0 on success; -errno on failure
> + */
> +int x86_topo_from_parts(struct cpu_policy *p,
> + unsigned int threads_per_core,
> + unsigned int cores_per_pkg);
> +
> #endif /* !XEN_LIB_X86_POLICIES_H */
>
> /*
> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
> index 4cef658feeb8..d033ee5398dd 100644
> --- a/xen/lib/x86/policy.c
> +++ b/xen/lib/x86/policy.c
> @@ -13,6 +13,92 @@ uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
> return vcpu_id * 2;
> }
>
> +static unsigned int order(unsigned int n)
> +{
> + return 8 * sizeof(n) - __builtin_clz(n);
Do we need to assert that n is not 0, otherwise the return of
__builtin_clz() is undefined.
I think the usage below doesn't pass 0 to __builtin_clz() in any case,
but better add the check IMO.
Is __builtin_clz() also available in all versions of GCC and CLANG
that we support? I have no idea when this was introduced.
> +}
> +
> +int x86_topo_from_parts(struct cpu_policy *p,
> + unsigned int threads_per_core,
> + unsigned int cores_per_pkg)
> +{
> + unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;
> + unsigned int apic_id_size;
> +
> + if ( !p || !threads_per_core || !cores_per_pkg )
> + return -EINVAL;
> +
> + p->basic.max_leaf = MAX(0xb, p->basic.max_leaf);
> +
> + memset(p->topo.raw, 0, sizeof(p->topo.raw));
> +
> + /* thread level */
> + p->topo.subleaf[0].nr_logical = threads_per_core;
> + p->topo.subleaf[0].id_shift = 0;
> + p->topo.subleaf[0].level = 0;
> + p->topo.subleaf[0].type = 1;
> + if ( threads_per_core > 1 )
> + p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
> +
> + /* core level */
> + p->topo.subleaf[1].nr_logical = cores_per_pkg;
> + if ( p->x86_vendor == X86_VENDOR_INTEL )
> + p->topo.subleaf[1].nr_logical = threads_per_pkg;
> + p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
> + p->topo.subleaf[1].level = 1;
> + p->topo.subleaf[1].type = 2;
> + if ( cores_per_pkg > 1 )
> + p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
> +
> + apic_id_size = p->topo.subleaf[1].id_shift;
> +
> + /*
> + * Contrary to what the name might seem to imply. HTT is an enabler for
> + * SMP and there's no harm in setting it even with a single vCPU.
> + */
> + p->basic.htt = true;
> + p->basic.lppp = MIN(0xff, p->basic.lppp);
> +
> + switch ( p->x86_vendor )
> + {
> + case X86_VENDOR_INTEL: {
> + struct cpuid_cache_leaf *sl = p->cache.subleaf;
Newline please.
> + for ( size_t i = 0; sl->type &&
> + i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
> + {
> + sl->cores_per_package = cores_per_pkg - 1;
> + sl->threads_per_cache = threads_per_core - 1;
> + if ( sl->type == 3 /* unified cache */ )
> + sl->threads_per_cache = threads_per_pkg - 1;
> + }
> + break;
> + }
Newline here also.
> + case X86_VENDOR_AMD:
> + case X86_VENDOR_HYGON:
> + /* Expose p->basic.lppp */
> + p->extd.cmp_legacy = true;
> +
> + /* Clip NC to the maximum value it can hold */
> + p->extd.nc = 0xff;
> + if ( threads_per_pkg <= 0xff )
> + p->extd.nc = threads_per_pkg - 1;
> +
> + /* TODO: Expose leaf e1E */
> + p->extd.topoext = false;
> +
> + /*
> + * Clip APIC ID to 8 bits, as that's what high core-count machines do
Overly long line? And missing full stop.
> + *
> + * That what AMD EPYC 9654 does with >256 CPUs
^ That's
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook
2024-05-08 12:39 ` [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook Alejandro Vallejo
2024-05-23 14:50 ` Roger Pau Monné
@ 2024-05-23 18:58 ` Andrew Cooper
2024-05-24 7:22 ` Roger Pau Monné
1 sibling, 1 reply; 43+ messages in thread
From: Andrew Cooper @ 2024-05-23 18:58 UTC (permalink / raw)
To: Alejandro Vallejo, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné
On 08/05/2024 1:39 pm, Alejandro Vallejo wrote:
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 8a244100009c..2f06bff1b2cc 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
> }
>
> -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
> +static int cf_check lapic_check_hidden(const struct domain *d,
> + hvm_domain_context_t *h)
> {
> unsigned int vcpuid = hvm_load_instance(h);
> - struct vcpu *v;
> - struct vlapic *s;
> + struct hvm_hw_lapic s;
>
> if ( !has_vlapic(d) )
> return -ENODEV;
>
> /* Which vlapic to load? */
> - if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> + if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )
As you're editing this anyway, swap for
if ( !domain_vcpu(d, vcpuid) )
please.
> {
> dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
> d->domain_id, vcpuid);
> return -EINVAL;
> }
> - s = vcpu_vlapic(v);
>
> - if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
> + if ( hvm_load_entry_zeroextend(LAPIC, h, &s) )
> + return -ENODATA;
> +
> + /* EN=0 with EXTD=1 is illegal */
> + if ( (s.apic_base_msr & (APIC_BASE_ENABLE | APIC_BASE_EXTD)) ==
> + APIC_BASE_EXTD )
> + return -EINVAL;
This is very insufficient auditing for the incoming value, but it turns
out that there's no nice logic for this at all.
As it's just a less obfuscated form of the logic from
lapic_load_hidden(), it's probably fine to stay as it is for now.
The major changes since this logic was written originally are that the
CPU policy correct (so we can reject EXTD on VMs which can't see
x2apic), and that we now prohibit VMs moving the xAPIC MMIO window away
from its default location (as this would require per-vCPU P2Ms in order
to virtualise properly.)
~Andrew
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
2024-05-08 12:39 ` [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
2024-05-23 16:13 ` Roger Pau Monné
@ 2024-05-24 7:21 ` Roger Pau Monné
2024-05-24 15:15 ` Alejandro Vallejo
1 sibling, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-24 7:21 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Wed, May 08, 2024 at 01:39:24PM +0100, Alejandro Vallejo wrote:
> Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
> populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
> from hvmloader.
>
> While at this also remove ap_callin, as writing the APIC ID may serve the same
> purpose.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
> v2:
> * New patch. Replaces adding cpu policy to hvmloader in v1.
> ---
> tools/firmware/hvmloader/config.h | 6 ++++-
> tools/firmware/hvmloader/hvmloader.c | 4 +--
> tools/firmware/hvmloader/smp.c | 40 +++++++++++++++++++++++-----
> tools/firmware/hvmloader/util.h | 5 ++++
> xen/arch/x86/include/asm/hvm/hvm.h | 1 +
> 5 files changed, 47 insertions(+), 9 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
> index c82adf6dc508..edf6fa9c908c 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -4,6 +4,8 @@
> #include <stdint.h>
> #include <stdbool.h>
>
> +#include <xen/hvm/hvm_info_table.h>
> +
> enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
> extern enum virtual_vga virtual_vga;
>
> @@ -49,8 +51,10 @@ extern uint8_t ioapic_version;
>
> #define IOAPIC_ID 0x01
>
> +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
> +
> #define LAPIC_BASE_ADDRESS 0xfee00000
> -#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2)
> +#define LAPIC_ID(vcpu_id) (CPU_TO_X2APICID[(vcpu_id)])
>
> #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
> #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> index c58841e5b556..1eba92229925 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -342,11 +342,11 @@ int main(void)
>
> printf("CPU speed is %u MHz\n", get_cpu_mhz());
>
> + smp_initialise();
> +
> apic_setup();
> pci_setup();
>
> - smp_initialise();
> -
> perform_tests();
>
> if ( bios->bios_info_setup )
> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
> index a668f15d7e1f..4d75f239c2f5 100644
> --- a/tools/firmware/hvmloader/smp.c
> +++ b/tools/firmware/hvmloader/smp.c
> @@ -29,7 +29,34 @@
>
> #include <xen/vcpu.h>
>
> -static int ap_callin, ap_cpuid;
> +static int ap_cpuid;
> +
> +/**
> + * Lookup table of x2APIC IDs.
> + *
> + * Each entry is populated its respective CPU as they come online. This is required
> + * for generating the MADT with minimal assumptions about ID relationships.
> + */
> +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
> +
> +static uint32_t read_apic_id(void)
> +{
> + uint32_t apic_id;
> +
> + cpuid(1, NULL, &apic_id, NULL, NULL);
> + apic_id >>= 24;
> +
> + /*
> + * APIC IDs over 255 are represented by 255 in leaf 1 and are meant to be
> + * read from topology leaves instead. Xen exposes x2APIC IDs in leaf 0xb,
> + * but only if the x2APIC feature is present. If there are that many CPUs
> + * it's guaranteed to be there so we can avoid checking for it specifically
> + */
> + if ( apic_id == 255 )
> + cpuid(0xb, NULL, NULL, NULL, &apic_id);
> +
> + return apic_id;
> +}
>
> static void ap_start(void)
> {
> @@ -37,12 +64,12 @@ static void ap_start(void)
> cacheattr_init();
> printf("done.\n");
>
> + wmb();
> + ACCESS_ONCE(CPU_TO_X2APICID[ap_cpuid]) = read_apic_id();
Further thinking about this: do we really need the wmb(), given the
usage of ACCESS_ONCE()?
wmb() is a compiler barrier, and the usage of volatile in
ACCESS_ONCE() should already prevent any compiler re-ordering.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook
2024-05-23 18:58 ` Andrew Cooper
@ 2024-05-24 7:22 ` Roger Pau Monné
0 siblings, 0 replies; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-24 7:22 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Alejandro Vallejo, Xen-devel, Jan Beulich
On Thu, May 23, 2024 at 07:58:57PM +0100, Andrew Cooper wrote:
> On 08/05/2024 1:39 pm, Alejandro Vallejo wrote:
> > diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> > index 8a244100009c..2f06bff1b2cc 100644
> > --- a/xen/arch/x86/hvm/vlapic.c
> > +++ b/xen/arch/x86/hvm/vlapic.c
> > @@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> > v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
> > }
> >
> > -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
> > +static int cf_check lapic_check_hidden(const struct domain *d,
> > + hvm_domain_context_t *h)
> > {
> > unsigned int vcpuid = hvm_load_instance(h);
> > - struct vcpu *v;
> > - struct vlapic *s;
> > + struct hvm_hw_lapic s;
> >
> > if ( !has_vlapic(d) )
> > return -ENODEV;
> >
> > /* Which vlapic to load? */
> > - if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> > + if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )
>
> As you're editing this anyway, swap for
>
> if ( !domain_vcpu(d, vcpuid) )
>
> please.
>
> > {
> > dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
> > d->domain_id, vcpuid);
> > return -EINVAL;
> > }
> > - s = vcpu_vlapic(v);
> >
> > - if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
> > + if ( hvm_load_entry_zeroextend(LAPIC, h, &s) )
> > + return -ENODATA;
> > +
> > + /* EN=0 with EXTD=1 is illegal */
> > + if ( (s.apic_base_msr & (APIC_BASE_ENABLE | APIC_BASE_EXTD)) ==
> > + APIC_BASE_EXTD )
> > + return -EINVAL;
>
> This is very insufficient auditing for the incoming value, but it turns
> out that there's no nice logic for this at all.
>
> As it's just a less obfuscated form of the logic from
> lapic_load_hidden(), it's probably fine to stay as it is for now.
>
> The major changes since this logic was written originally are that the
> CPU policy correct (so we can reject EXTD on VMs which can't see
> x2apic), and that we now prohibit VMs moving the xAPIC MMIO window away
> from its default location (as this would require per-vCPU P2Ms in order
> to virtualise properly.)
Since this is just migration of the existing checks I think keeping
them as-is is best. Adding new checks should be done in a followup
patch.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 7/8] xen/x86: Derive topologically correct x2APIC IDs from the policy
2024-05-08 12:39 ` [PATCH v2 7/8] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
@ 2024-05-24 8:39 ` Roger Pau Monné
2024-05-24 17:03 ` Alejandro Vallejo
0 siblings, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-24 8:39 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Wed, May 08, 2024 at 01:39:26PM +0100, Alejandro Vallejo wrote:
> Implements the helper for mapping vcpu_id to x2apic_id given a valid
> topology in a policy. The algo is written with the intention of extending
> it to leaves 0x1f and e26 in the future.
Using 0x1f and e26 is kind of confusing. I would word as "0x1f and
extended leaf 0x26" to avoid confusion.
>
> Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared,
> so the leaf is not implemented. In that case, the new helper just returns
> the legacy mapping.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
> v2:
> * const-ify the test definitions
> * Cosmetic changes (newline + parameter name in prototype)
> ---
> tools/tests/cpu-policy/test-cpu-policy.c | 63 ++++++++++++++++++++
> xen/include/xen/lib/x86/cpu-policy.h | 2 +
> xen/lib/x86/policy.c | 73 ++++++++++++++++++++++--
> 3 files changed, 133 insertions(+), 5 deletions(-)
>
> diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
> index 0ba8c418b1b3..82a6aeb23317 100644
> --- a/tools/tests/cpu-policy/test-cpu-policy.c
> +++ b/tools/tests/cpu-policy/test-cpu-policy.c
> @@ -776,6 +776,68 @@ static void test_topo_from_parts(void)
> }
> }
>
> +static void test_x2apic_id_from_vcpu_id_success(void)
> +{
> + static const struct test {
> + unsigned int vcpu_id;
> + unsigned int threads_per_core;
> + unsigned int cores_per_pkg;
> + uint32_t x2apic_id;
> + uint8_t x86_vendor;
> + } tests[] = {
> + {
> + .vcpu_id = 3, .threads_per_core = 3, .cores_per_pkg = 8,
> + .x2apic_id = 1 << 2,
> + },
> + {
> + .vcpu_id = 6, .threads_per_core = 3, .cores_per_pkg = 8,
> + .x2apic_id = 2 << 2,
> + },
> + {
> + .vcpu_id = 24, .threads_per_core = 3, .cores_per_pkg = 8,
> + .x2apic_id = 1 << 5,
> + },
> + {
> + .vcpu_id = 35, .threads_per_core = 3, .cores_per_pkg = 8,
> + .x2apic_id = (35 % 3) | (((35 / 3) % 8) << 2) | ((35 / 24) << 5),
> + },
> + {
> + .vcpu_id = 96, .threads_per_core = 7, .cores_per_pkg = 3,
> + .x2apic_id = (96 % 7) | (((96 / 7) % 3) << 3) | ((96 / 21) << 5),
^ extra space (same above)
> + },
> + };
> +
> + const uint8_t vendors[] = {
> + X86_VENDOR_INTEL,
> + X86_VENDOR_AMD,
> + X86_VENDOR_CENTAUR,
> + X86_VENDOR_SHANGHAI,
> + X86_VENDOR_HYGON,
> + };
> +
> + printf("Testing x2apic id from vcpu id success:\n");
> +
> + /* Perform the test run on every vendor we know about */
> + for ( size_t i = 0; i < ARRAY_SIZE(vendors); ++i )
> + {
> + struct cpu_policy policy = { .x86_vendor = vendors[i] };
Newline.
> + for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
> + {
> + const struct test *t = &tests[i];
> + uint32_t x2apic_id;
> + int rc = x86_topo_from_parts(&policy, t->threads_per_core, t->cores_per_pkg);
Overly long line.
Won't it be better to define `policy` in this scope, so that for each
test you start with a clean policy, rather than having leftover data
from the previous test?
Also you could initialize x2apic_id at definition:
const struct test *t = &tests[j];
struct cpu_policy policy = { .x86_vendor = vendors[i] };
int rc = x86_topo_from_parts(&policy, t->threads_per_core, t->cores_per_pkg);
uint32_t x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id);
> +
> + x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id);
> + if ( rc || x2apic_id != t->x2apic_id )
> + fail("FAIL[%d] - '%s cpu%u %u t/c %u c/p'. bad x2apic_id: expected=%u actual=%u\n",
> + rc,
> + x86_cpuid_vendor_to_str(policy.x86_vendor),
> + t->vcpu_id, t->threads_per_core, t->cores_per_pkg,
> + t->x2apic_id, x2apic_id);
> + }
> + }
> +}
> +
> int main(int argc, char **argv)
> {
> printf("CPU Policy unit tests\n");
> @@ -794,6 +856,7 @@ int main(int argc, char **argv)
> test_is_compatible_failure();
>
> test_topo_from_parts();
> + test_x2apic_id_from_vcpu_id_success();
>
> if ( nr_failures )
> printf("Done: %u failures\n", nr_failures);
> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
> index f5df18e9f77c..2cbc2726a861 100644
> --- a/xen/include/xen/lib/x86/cpu-policy.h
> +++ b/xen/include/xen/lib/x86/cpu-policy.h
> @@ -545,6 +545,8 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
> /**
> * Calculates the x2APIC ID of a vCPU given a CPU policy
> *
> + * If the policy lacks leaf 0xb falls back to legacy mapping of apic_id=cpu*2
> + *
> * @param p CPU policy of the domain.
> * @param id vCPU ID of the vCPU.
> * @returns x2APIC ID of the vCPU.
> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
> index d033ee5398dd..e498e32f8fd7 100644
> --- a/xen/lib/x86/policy.c
> +++ b/xen/lib/x86/policy.c
> @@ -2,15 +2,78 @@
>
> #include <xen/lib/x86/cpu-policy.h>
>
> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
> +{
> + /*
> + * `nr_logical` reported by Intel is the number of THREADS contained in
> + * the next topological scope. For example, assuming a system with 2
> + * threads/core and 3 cores/module in a fully symmetric topology,
> + * `nr_logical` at the core level will report 6. Because it's reporting
> + * the number of threads in a module.
> + *
> + * On AMD/Hygon, nr_logical is already normalized by the higher scoped
> + * level (cores/complex, etc) so we can return it as-is.
> + */
> + if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
> + return p->topo.subleaf[lvl].nr_logical;
> +
> + return p->topo.subleaf[lvl].nr_logical / p->topo.subleaf[lvl - 1].nr_logical;
Line length here and in the function declaration.
> +}
> +
> uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
> {
> + uint32_t shift = 0, x2apic_id = 0;
> +
> + /* In the absence of topology leaves, fallback to traditional mapping */
> + if ( !p->topo.subleaf[0].type )
> + return id * 2;
> +
> /*
> - * TODO: Derive x2APIC ID from the topology information inside `p`
> - * rather than from vCPU ID. This bodge is a temporary measure
> - * until all infra is in place to retrieve or derive the initial
> - * x2APIC ID from migrated domains.
I'm a bit confused with this, the policy is domain wide, so we will
always need to pass the vCPU ID into x86_x2apic_id_from_vcpu_id()?
IOW: the x2APIC ID will always be derived from the vCPU ID.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 8/8] xen/x86: Synthesise domain topologies
2024-05-08 12:39 ` [PATCH v2 8/8] xen/x86: Synthesise domain topologies Alejandro Vallejo
@ 2024-05-24 8:58 ` Roger Pau Monné
2024-05-24 17:16 ` Alejandro Vallejo
0 siblings, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-24 8:58 UTC (permalink / raw)
To: Alejandro Vallejo
Cc: Xen-devel, Anthony PERARD, Juergen Gross, Jan Beulich,
Andrew Cooper
On Wed, May 08, 2024 at 01:39:27PM +0100, Alejandro Vallejo wrote:
> Expose sensible topologies in leaf 0xb. At the moment it synthesises non-HT
> systems, in line with the previous code intent.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
> v2:
> * Zap the topology leaves of (pv/hvm)_(def/max)_policy rather than the host policy
> ---
> tools/libs/guest/xg_cpuid_x86.c | 62 +++++----------------------------
> xen/arch/x86/cpu-policy.c | 9 +++--
> 2 files changed, 15 insertions(+), 56 deletions(-)
>
> diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
> index 4453178100ad..8170769dbe43 100644
> --- a/tools/libs/guest/xg_cpuid_x86.c
> +++ b/tools/libs/guest/xg_cpuid_x86.c
> @@ -584,7 +584,7 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
> bool hvm;
> xc_domaininfo_t di;
> struct xc_cpu_policy *p = xc_cpu_policy_init();
> - unsigned int i, nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
> + unsigned int nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
> uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
> uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
> uint32_t len = ARRAY_SIZE(host_featureset);
> @@ -727,59 +727,15 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
> }
> else
> {
> - /*
> - * Topology for HVM guests is entirely controlled by Xen. For now, we
> - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
> - */
> - p->policy.basic.htt = true;
> - p->policy.extd.cmp_legacy = false;
> -
> - /*
> - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
> - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
> - * overflow.
> - */
> - if ( !p->policy.basic.lppp )
> - p->policy.basic.lppp = 2;
> - else if ( !(p->policy.basic.lppp & 0x80) )
> - p->policy.basic.lppp *= 2;
> -
> - switch ( p->policy.x86_vendor )
> + /* TODO: Expose the ability to choose a custom topology for HVM/PVH */
> + unsigned int threads_per_core = 1;
> + unsigned int cores_per_pkg = di.max_vcpu_id + 1;
Newline.
> + rc = x86_topo_from_parts(&p->policy, threads_per_core, cores_per_pkg);
I assume this generates the same topology as the current code, or will
the population of the leaves be different in some way?
> + if ( rc )
> {
> - case X86_VENDOR_INTEL:
> - for ( i = 0; (p->policy.cache.subleaf[i].type &&
> - i < ARRAY_SIZE(p->policy.cache.raw)); ++i )
> - {
> - p->policy.cache.subleaf[i].cores_per_package =
> - (p->policy.cache.subleaf[i].cores_per_package << 1) | 1;
> - p->policy.cache.subleaf[i].threads_per_cache = 0;
> - }
> - break;
> -
> - case X86_VENDOR_AMD:
> - case X86_VENDOR_HYGON:
> - /*
> - * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize.
> - * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one).
> - * Update to reflect vLAPIC_ID = vCPU_ID * 2. But avoid
> - * - overflow,
> - * - going out of sync with leaf 1 EBX[23:16],
> - * - incrementing ApicIdCoreSize when it's zero (which changes the
> - * meaning of bits 7:0).
> - *
> - * UPDATE: I addition to avoiding overflow, some
> - * proprietary operating systems have trouble with
> - * apic_id_size values greater than 7. Limit the value to
> - * 7 for now.
> - */
> - if ( p->policy.extd.nc < 0x7f )
> - {
> - if ( p->policy.extd.apic_id_size != 0 && p->policy.extd.apic_id_size < 0x7 )
> - p->policy.extd.apic_id_size++;
> -
> - p->policy.extd.nc = (p->policy.extd.nc << 1) | 1;
> - }
> - break;
> + ERROR("Failed to generate topology: t/c=%u c/p=%u",
> + threads_per_core, cores_per_pkg);
Could you also print the error code?
> + goto out;
> }
> }
>
> diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
> index 4b6d96276399..0ad871732ba0 100644
> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -278,9 +278,6 @@ static void recalculate_misc(struct cpu_policy *p)
>
> p->basic.raw[0x8] = EMPTY_LEAF;
>
> - /* TODO: Rework topology logic. */
> - memset(p->topo.raw, 0, sizeof(p->topo.raw));
> -
> p->basic.raw[0xc] = EMPTY_LEAF;
>
> p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES;
> @@ -621,6 +618,9 @@ static void __init calculate_pv_max_policy(void)
> recalculate_xstate(p);
>
> p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */
> +
> + /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
> + memset(p->topo.raw, 0, sizeof(p->topo.raw));
> }
>
> static void __init calculate_pv_def_policy(void)
> @@ -773,6 +773,9 @@ static void __init calculate_hvm_max_policy(void)
>
> /* It's always possible to emulate CPUID faulting for HVM guests */
> p->platform_info.cpuid_faulting = true;
> +
> + /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
Line length.
/* Wipe host topology. Populated by toolstack. */
Would be OK IMO.
Thanks, Roger.
> + memset(p->topo.raw, 0, sizeof(p->topo.raw));
Note that currently the host policy also gets the topology leaves
cleared, is it intended to not clear them anymore after this change?
(as you only clear the leaves for the guest {max,def} policies)
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
2024-05-23 14:32 ` Roger Pau Monné
@ 2024-05-24 10:58 ` Alejandro Vallejo
2024-05-24 11:15 ` Roger Pau Monné
0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-24 10:58 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: Xen-devel, Jan Beulich, Andrew Cooper
On 23/05/2024 15:32, Roger Pau Monné wrote:
>> case 0xb:
>> - /*
>> - * In principle, this leaf is Intel-only. In practice, it is tightly
>> - * coupled with x2apic, and we offer an x2apic-capable APIC emulation
>> - * to guests on AMD hardware as well.
>> - *
>> - * TODO: Rework topology logic.
>> - */
>> - if ( p->basic.x2apic )
>> + /* Don't expose topology information to PV guests */
>
> Not sure whether we want to keep part of the comment about exposing
> x2APIC to guests even when x2APIC is not present in the host. I think
> this code has changed and the comment is kind of stale now.
The comment is definitely stale. Nowadays x2APIC is fully supported by
AMD, as is leaf 0xb. The fact we emulate the x2APIC seems hardly
relevant in a CPUID leaf about topology. I could keep a note showing...
/* Exposed alongside x2apic, as it's tightly coupled with it */
... although that's directly implied by the conditional.
>> +void vlapic_cpu_policy_changed(struct vcpu *v)
>> +{
>> + struct vlapic *vlapic = vcpu_vlapic(v);
>> + const struct cpu_policy *cp = v->domain->arch.cpu_policy;
>> +
>> + /*
>> + * Don't override the initial x2APIC ID if we have migrated it or
>> + * if the domain doesn't have vLAPIC at all.
>> + */
>> + if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
>> + return;
>> +
>> + vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
>> + vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
>
> Nit: in case we decide to start APICs in x2APIC mode, might be good to
> take this into account here and use vlapic_x2apic_mode(vlapic) to
> select whether SET_xAPIC_ID() needs to be used or not:>
> vlapic_set_reg(vlapic, APIC_ID,
> vlapic_x2apic_mode(vlapic) ? vlapic->hw.x2apic_id
> : SET_xAPIC_ID(vlapic->hw.x2apic_id));
>
> Or similar.
>
I like it. Sure.
>> +}
>> +
>> int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>> {
>> const struct cpu_policy *cp = v->domain->arch.cpu_policy;
>> @@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
>> if ( v->vcpu_id == 0 )
>> vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>>
>> - vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
>> + vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
>> vlapic_do_init(vlapic);
>> }
>>
>> @@ -1514,6 +1530,16 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>> const struct vcpu *v = vlapic_vcpu(vlapic);
>> uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>>
>> + /*
>> + * Loading record without hw.x2apic_id in the save stream, calculate using
>> + * the traditional "vcpu_id * 2" relation. There's an implicit assumption
>> + * that vCPU0 always has x2APIC0, which is true for the old relation, and
>> + * still holds under the new x2APIC generation algorithm. While that case
>> + * goes through the conditional it's benign because it still maps to zero.
>> + */
>> + if ( !vlapic->hw.x2apic_id )
>> + vlapic->hw.x2apic_id = v->vcpu_id * 2;
>> +
>> /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
>> if ( !vlapic_x2apic_mode(vlapic) ||
>> (vlapic->loaded.ldr == good_ldr) )
>> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
>> index 0c9e6f15645d..e1f0585d75a9 100644
>> --- a/xen/arch/x86/include/asm/hvm/hvm.h
>> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
>> @@ -448,6 +448,7 @@ static inline void hvm_update_guest_efer(struct vcpu *v)
>> static inline void hvm_cpuid_policy_changed(struct vcpu *v)
>> {
>> alternative_vcall(hvm_funcs.cpuid_policy_changed, v);
>> + vlapic_cpu_policy_changed(v);
>
> Note sure whether this call would better be placed in
> cpu_policy_updated() inside the is_hvm_vcpu() conditional branch.
>
> hvm_cpuid_policy_changed() &c are just wrappers around the hvm_funcs
> hooks, pulling vlapic functions in there is likely to complicate the
> header dependencies in the long term.
>
That's how it was in v1 and I moved it in v2 answering one of Jan's
feedback points.
I don't mind either way.
>> }
>>
>> static inline void hvm_set_tsc_offset(struct vcpu *v, uint64_t offset,
>> diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
>> index 88ef94524339..e8d41313abd3 100644
>> --- a/xen/arch/x86/include/asm/hvm/vlapic.h
>> +++ b/xen/arch/x86/include/asm/hvm/vlapic.h
>> @@ -44,6 +44,7 @@
>> #define vlapic_xapic_mode(vlapic) \
>> (!vlapic_hw_disabled(vlapic) && \
>> !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
>> +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
>>
>> /*
>> * Generic APIC bitmap vector update & search routines.
>> @@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack);
>>
>> int vlapic_init(struct vcpu *v);
>> void vlapic_destroy(struct vcpu *v);
>> +void vlapic_cpu_policy_changed(struct vcpu *v);
>>
>> void vlapic_reset(struct vlapic *vlapic);
>>
>> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
>> index 7ecacadde165..1c2ec669ffc9 100644
>> --- a/xen/include/public/arch-x86/hvm/save.h
>> +++ b/xen/include/public/arch-x86/hvm/save.h
>> @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
>> uint32_t disabled; /* VLAPIC_xx_DISABLED */
>> uint32_t timer_divisor;
>> uint64_t tdt_msr;
>> + uint32_t x2apic_id;
>> + uint32_t rsvd_zero;
>
> I think Jan requested for this field to be checked to be 0 for
> incoming migrations, yet I don't see such check added.
>
It's on the next patch, when the checks are moved into the check hook.
Doing it all as part of the same patch seemed dirty. I guess I can
invert them.
>> };
>>
>> DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);
>> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
>> index d5e447e9dc06..392320b9adbe 100644
>> --- a/xen/include/xen/lib/x86/cpu-policy.h
>> +++ b/xen/include/xen/lib/x86/cpu-policy.h
>> @@ -542,6 +542,15 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>> const struct cpu_policy *guest,
>> struct cpu_policy_errors *err);
>>
>> +/**
>> + * Calculates the x2APIC ID of a vCPU given a CPU policy
>> + *
>> + * @param p CPU policy of the domain.
>> + * @param id vCPU ID of the vCPU.
>> + * @returns x2APIC ID of the vCPU.
>> + */
>> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id);
>> +
>> #endif /* !XEN_LIB_X86_POLICIES_H */
>>
>> /*
>> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
>> index f033d22785be..4cef658feeb8 100644
>> --- a/xen/lib/x86/policy.c
>> +++ b/xen/lib/x86/policy.c
>> @@ -2,6 +2,17 @@
>>
>> #include <xen/lib/x86/cpu-policy.h>
>>
>> +uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
>> +{
>> + /*
>> + * TODO: Derive x2APIC ID from the topology information inside `p`
>> + * rather than from vCPU ID. This bodge is a temporary measure
>> + * until all infra is in place to retrieve or derive the initial
>> + * x2APIC ID from migrated domains.
>> + */
>> + return vcpu_id * 2;
>
> I don't think this builds?
>
> Note the parameter is plain 'id' not 'vcpu_id'.
>
> Thanks, Roger.
Bah. mid-patches after v1 don't get much attention. Ack.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
2024-05-24 10:58 ` Alejandro Vallejo
@ 2024-05-24 11:15 ` Roger Pau Monné
0 siblings, 0 replies; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-24 11:15 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper
On Fri, May 24, 2024 at 11:58:44AM +0100, Alejandro Vallejo wrote:
> On 23/05/2024 15:32, Roger Pau Monné wrote:
> >> case 0xb:
> >> - /*
> >> - * In principle, this leaf is Intel-only. In practice, it is tightly
> >> - * coupled with x2apic, and we offer an x2apic-capable APIC emulation
> >> - * to guests on AMD hardware as well.
> >> - *
> >> - * TODO: Rework topology logic.
> >> - */
> >> - if ( p->basic.x2apic )
> >> + /* Don't expose topology information to PV guests */
> >
> > Not sure whether we want to keep part of the comment about exposing
> > x2APIC to guests even when x2APIC is not present in the host. I think
> > this code has changed and the comment is kind of stale now.
>
> The comment is definitely stale. Nowadays x2APIC is fully supported by
> AMD, as is leaf 0xb. The fact we emulate the x2APIC seems hardly
> relevant in a CPUID leaf about topology. I could keep a note showing...
>
> /* Exposed alongside x2apic, as it's tightly coupled with it */
>
> ... although that's directly implied by the conditional.
Yeah, I haven't gone through the history of this file, but I bet at
some point before the introduction of CPUID policies we leaked (part
of) the host CPUID contents in here.
It's also no longer true that the leaf is Intel only.
I fine either adding your newly proposed comment, or leaving it as-is.
> >> +}
> >> +
> >> int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
> >> {
> >> const struct cpu_policy *cp = v->domain->arch.cpu_policy;
Le> >> @@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
> >> if ( v->vcpu_id == 0 )
> >> vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
> >>
> >> - vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> >> + vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
> >> vlapic_do_init(vlapic);
> >> }
> >>
> >> @@ -1514,6 +1530,16 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> >> const struct vcpu *v = vlapic_vcpu(vlapic);
> >> uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
> >>
> >> + /*
> >> + * Loading record without hw.x2apic_id in the save stream, calculate using
> >> + * the traditional "vcpu_id * 2" relation. There's an implicit assumption
> >> + * that vCPU0 always has x2APIC0, which is true for the old relation, and
> >> + * still holds under the new x2APIC generation algorithm. While that case
> >> + * goes through the conditional it's benign because it still maps to zero.
> >> + */
> >> + if ( !vlapic->hw.x2apic_id )
> >> + vlapic->hw.x2apic_id = v->vcpu_id * 2;
> >> +
> >> /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
> >> if ( !vlapic_x2apic_mode(vlapic) ||
> >> (vlapic->loaded.ldr == good_ldr) )
> >> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
> >> index 0c9e6f15645d..e1f0585d75a9 100644
> >> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> >> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> >> @@ -448,6 +448,7 @@ static inline void hvm_update_guest_efer(struct vcpu *v)
> >> static inline void hvm_cpuid_policy_changed(struct vcpu *v)
> >> {
> >> alternative_vcall(hvm_funcs.cpuid_policy_changed, v);
> >> + vlapic_cpu_policy_changed(v);
> >
> > Note sure whether this call would better be placed in
> > cpu_policy_updated() inside the is_hvm_vcpu() conditional branch.
> >
> > hvm_cpuid_policy_changed() &c are just wrappers around the hvm_funcs
> > hooks, pulling vlapic functions in there is likely to complicate the
> > header dependencies in the long term.
> >
>
> That's how it was in v1 and I moved it in v2 answering one of Jan's
> feedback points.
>
> I don't mind either way.
Oh (goes and reads Jan's reply to v1) I see. Let's leave it as-is
then.
>
> >> }
> >>
> >> static inline void hvm_set_tsc_offset(struct vcpu *v, uint64_t offset,
> >> diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
> >> index 88ef94524339..e8d41313abd3 100644
> >> --- a/xen/arch/x86/include/asm/hvm/vlapic.h
> >> +++ b/xen/arch/x86/include/asm/hvm/vlapic.h
> >> @@ -44,6 +44,7 @@
> >> #define vlapic_xapic_mode(vlapic) \
> >> (!vlapic_hw_disabled(vlapic) && \
> >> !((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
> >> +#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
> >>
> >> /*
> >> * Generic APIC bitmap vector update & search routines.
> >> @@ -107,6 +108,7 @@ int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool force_ack);
> >>
> >> int vlapic_init(struct vcpu *v);
> >> void vlapic_destroy(struct vcpu *v);
> >> +void vlapic_cpu_policy_changed(struct vcpu *v);
> >>
> >> void vlapic_reset(struct vlapic *vlapic);
> >>
> >> diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
> >> index 7ecacadde165..1c2ec669ffc9 100644
> >> --- a/xen/include/public/arch-x86/hvm/save.h
> >> +++ b/xen/include/public/arch-x86/hvm/save.h
> >> @@ -394,6 +394,8 @@ struct hvm_hw_lapic {
> >> uint32_t disabled; /* VLAPIC_xx_DISABLED */
> >> uint32_t timer_divisor;
> >> uint64_t tdt_msr;
> >> + uint32_t x2apic_id;
> >> + uint32_t rsvd_zero;
> >
> > I think Jan requested for this field to be checked to be 0 for
> > incoming migrations, yet I don't see such check added.
> >
>
> It's on the next patch, when the checks are moved into the check hook.
> Doing it all as part of the same patch seemed dirty. I guess I can
> invert them.
Yeah, sorry, realized later. I think the introduction of the lapic
record checking wants to be the first patch in the series, so that
when you introduce the field here you can check it's zero.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook
2024-05-23 14:50 ` Roger Pau Monné
@ 2024-05-24 11:16 ` Alejandro Vallejo
2024-05-24 12:53 ` Roger Pau Monné
0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-24 11:16 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: Xen-devel, Jan Beulich, Andrew Cooper
On 23/05/2024 15:50, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:22PM +0100, Alejandro Vallejo wrote:
>> While at it, add a check for the reserved field in the hidden save area.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>> ---
>> v2:
>> * New patch. Addresses the missing check for rsvd_zero in v1.
>
> Oh, it would be better if this was done at the time when rsvd_zero is
> introduced. I think this should be moved ahead of the series, so that
> the patch that introduces rsvd_zero can add the check in
> lapic_check_hidden().
I'll give that a whirl.
>
>> ---
>> xen/arch/x86/hvm/vlapic.c | 41 ++++++++++++++++++++++++++++-----------
>> 1 file changed, 30 insertions(+), 11 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> index 8a244100009c..2f06bff1b2cc 100644
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
>> v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
>> }
>>
>> -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
>> +static int cf_check lapic_check_hidden(const struct domain *d,
>> + hvm_domain_context_t *h)
>> {
>> unsigned int vcpuid = hvm_load_instance(h);
>> - struct vcpu *v;
>> - struct vlapic *s;
>> + struct hvm_hw_lapic s;
>>
>> if ( !has_vlapic(d) )
>> return -ENODEV;
>>
>> /* Which vlapic to load? */
>> - if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
>> + if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )
>> {
>> dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
>> d->domain_id, vcpuid);
>> return -EINVAL;
>> }
>> - s = vcpu_vlapic(v);
>>
>> - if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
>> + if ( hvm_load_entry_zeroextend(LAPIC, h, &s) )
>
> Can't you use hvm_get_entry() to perform the sanity checks:
>
> const struct hvm_hw_lapic *s = hvm_get_entry(LAPIC, h);
>
> Thanks, Roger.
I don't think I can. Because the last field (rsvd_zero) might or might
not be there, so it needs to be zero-extended. Unless I misunderstood
what hvm_get_entry() is meant to do. It seems to check for exact sizes.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook
2024-05-24 11:16 ` Alejandro Vallejo
@ 2024-05-24 12:53 ` Roger Pau Monné
0 siblings, 0 replies; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-24 12:53 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper
On Fri, May 24, 2024 at 12:16:00PM +0100, Alejandro Vallejo wrote:
> On 23/05/2024 15:50, Roger Pau Monné wrote:
> > On Wed, May 08, 2024 at 01:39:22PM +0100, Alejandro Vallejo wrote:
> >> While at it, add a check for the reserved field in the hidden save area.
> >>
> >> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> >> ---
> >> v2:
> >> * New patch. Addresses the missing check for rsvd_zero in v1.
> >
> > Oh, it would be better if this was done at the time when rsvd_zero is
> > introduced. I think this should be moved ahead of the series, so that
> > the patch that introduces rsvd_zero can add the check in
> > lapic_check_hidden().
>
> I'll give that a whirl.
>
> >
> >> ---
> >> xen/arch/x86/hvm/vlapic.c | 41 ++++++++++++++++++++++++++++-----------
> >> 1 file changed, 30 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> >> index 8a244100009c..2f06bff1b2cc 100644
> >> --- a/xen/arch/x86/hvm/vlapic.c
> >> +++ b/xen/arch/x86/hvm/vlapic.c
> >> @@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> >> v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
> >> }
> >>
> >> -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
> >> +static int cf_check lapic_check_hidden(const struct domain *d,
> >> + hvm_domain_context_t *h)
> >> {
> >> unsigned int vcpuid = hvm_load_instance(h);
> >> - struct vcpu *v;
> >> - struct vlapic *s;
> >> + struct hvm_hw_lapic s;
> >>
> >> if ( !has_vlapic(d) )
> >> return -ENODEV;
> >>
> >> /* Which vlapic to load? */
> >> - if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> >> + if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )
> >> {
> >> dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
> >> d->domain_id, vcpuid);
> >> return -EINVAL;
> >> }
> >> - s = vcpu_vlapic(v);
> >>
> >> - if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
> >> + if ( hvm_load_entry_zeroextend(LAPIC, h, &s) )
> >
> > Can't you use hvm_get_entry() to perform the sanity checks:
> >
> > const struct hvm_hw_lapic *s = hvm_get_entry(LAPIC, h);
> >
> > Thanks, Roger.
>
> I don't think I can. Because the last field (rsvd_zero) might or might
> not be there, so it needs to be zero-extended. Unless I misunderstood
> what hvm_get_entry() is meant to do. It seems to check for exact sizes.
Oh, indeed, hvm_get_entry() uses strict checking and will refuse to
return the entry if sizes don't match. There seems to be no way to
avoid the copy if we want to do this in a sane way.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
2024-05-24 7:21 ` Roger Pau Monné
@ 2024-05-24 15:15 ` Alejandro Vallejo
2024-05-27 8:52 ` Roger Pau Monné
0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-24 15:15 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On 24/05/2024 08:21, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:24PM +0100, Alejandro Vallejo wrote:
>> Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
>> populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
>> from hvmloader.
>>
>> While at this also remove ap_callin, as writing the APIC ID may serve the same
>> purpose.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>> ---
>> v2:
>> * New patch. Replaces adding cpu policy to hvmloader in v1.
>> ---
>> tools/firmware/hvmloader/config.h | 6 ++++-
>> tools/firmware/hvmloader/hvmloader.c | 4 +--
>> tools/firmware/hvmloader/smp.c | 40 +++++++++++++++++++++++-----
>> tools/firmware/hvmloader/util.h | 5 ++++
>> xen/arch/x86/include/asm/hvm/hvm.h | 1 +
>> 5 files changed, 47 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
>> index c82adf6dc508..edf6fa9c908c 100644
>> --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -4,6 +4,8 @@
>> #include <stdint.h>
>> #include <stdbool.h>
>>
>> +#include <xen/hvm/hvm_info_table.h>
>> +
>> enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
>> extern enum virtual_vga virtual_vga;
>>
>> @@ -49,8 +51,10 @@ extern uint8_t ioapic_version;
>>
>> #define IOAPIC_ID 0x01
>>
>> +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
>> +
>> #define LAPIC_BASE_ADDRESS 0xfee00000
>> -#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2)
>> +#define LAPIC_ID(vcpu_id) (CPU_TO_X2APICID[(vcpu_id)])
>>
>> #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
>> #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
>> index c58841e5b556..1eba92229925 100644
>> --- a/tools/firmware/hvmloader/hvmloader.c
>> +++ b/tools/firmware/hvmloader/hvmloader.c
>> @@ -342,11 +342,11 @@ int main(void)
>>
>> printf("CPU speed is %u MHz\n", get_cpu_mhz());
>>
>> + smp_initialise();
>> +
>> apic_setup();
>> pci_setup();
>>
>> - smp_initialise();
>> -
>> perform_tests();
>>
>> if ( bios->bios_info_setup )
>> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
>> index a668f15d7e1f..4d75f239c2f5 100644
>> --- a/tools/firmware/hvmloader/smp.c
>> +++ b/tools/firmware/hvmloader/smp.c
>> @@ -29,7 +29,34 @@
>>
>> #include <xen/vcpu.h>
>>
>> -static int ap_callin, ap_cpuid;
>> +static int ap_cpuid;
>> +
>> +/**
>> + * Lookup table of x2APIC IDs.
>> + *
>> + * Each entry is populated its respective CPU as they come online. This is required
>> + * for generating the MADT with minimal assumptions about ID relationships.
>> + */
>> +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
>> +
>> +static uint32_t read_apic_id(void)
>> +{
>> + uint32_t apic_id;
>> +
>> + cpuid(1, NULL, &apic_id, NULL, NULL);
>> + apic_id >>= 24;
>> +
>> + /*
>> + * APIC IDs over 255 are represented by 255 in leaf 1 and are meant to be
>> + * read from topology leaves instead. Xen exposes x2APIC IDs in leaf 0xb,
>> + * but only if the x2APIC feature is present. If there are that many CPUs
>> + * it's guaranteed to be there so we can avoid checking for it specifically
>> + */
>> + if ( apic_id == 255 )
>> + cpuid(0xb, NULL, NULL, NULL, &apic_id);
>> +
>> + return apic_id;
>> +}
>>
>> static void ap_start(void)
>> {
>> @@ -37,12 +64,12 @@ static void ap_start(void)
>> cacheattr_init();
>> printf("done.\n");
>>
>> + wmb();
>> + ACCESS_ONCE(CPU_TO_X2APICID[ap_cpuid]) = read_apic_id();
>
> Further thinking about this: do we really need the wmb(), given the
> usage of ACCESS_ONCE()?
>
> wmb() is a compiler barrier, and the usage of volatile in
> ACCESS_ONCE() should already prevent any compiler re-ordering.
>
> Thanks, Roger.
volatile reads/writes cannot be reordered with other volatile
reads/writes, but volatile reads/writes can be reordered with
non-volatile reads/writes, AFAIR.
My intent here was to prevent the compiler omitting the write (though in
practice it didn't). I think the barrier is still required to prevent
reordering according to the spec.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
2024-05-23 16:13 ` Roger Pau Monné
@ 2024-05-24 15:16 ` Alejandro Vallejo
2024-05-27 8:55 ` Roger Pau Monné
0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-24 15:16 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On 23/05/2024 17:13, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:24PM +0100, Alejandro Vallejo wrote:
>> Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
>> populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
>> from hvmloader.
>>
>> While at this also remove ap_callin, as writing the APIC ID may serve the same
>> purpose.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>> ---
>> v2:
>> * New patch. Replaces adding cpu policy to hvmloader in v1.
>> ---
>> tools/firmware/hvmloader/config.h | 6 ++++-
>> tools/firmware/hvmloader/hvmloader.c | 4 +--
>> tools/firmware/hvmloader/smp.c | 40 +++++++++++++++++++++++-----
>> tools/firmware/hvmloader/util.h | 5 ++++
>> xen/arch/x86/include/asm/hvm/hvm.h | 1 +
>> 5 files changed, 47 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
>> index c82adf6dc508..edf6fa9c908c 100644
>> --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -4,6 +4,8 @@
>> #include <stdint.h>
>> #include <stdbool.h>
>>
>> +#include <xen/hvm/hvm_info_table.h>
>> +
>> enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
>> extern enum virtual_vga virtual_vga;
>>
>> @@ -49,8 +51,10 @@ extern uint8_t ioapic_version;
>>
>> #define IOAPIC_ID 0x01
>>
>> +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
>> +
>> #define LAPIC_BASE_ADDRESS 0xfee00000
>> -#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2)
>> +#define LAPIC_ID(vcpu_id) (CPU_TO_X2APICID[(vcpu_id)])
>>
>> #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
>> #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
>> index c58841e5b556..1eba92229925 100644
>> --- a/tools/firmware/hvmloader/hvmloader.c
>> +++ b/tools/firmware/hvmloader/hvmloader.c
>> @@ -342,11 +342,11 @@ int main(void)
>>
>> printf("CPU speed is %u MHz\n", get_cpu_mhz());
>>
>> + smp_initialise();
>> +
>> apic_setup();
>> pci_setup();
>>
>> - smp_initialise();
>> -
>> perform_tests();
>>
>> if ( bios->bios_info_setup )
>> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
>> index a668f15d7e1f..4d75f239c2f5 100644
>> --- a/tools/firmware/hvmloader/smp.c
>> +++ b/tools/firmware/hvmloader/smp.c
>> @@ -29,7 +29,34 @@
>>
>> #include <xen/vcpu.h>
>>
>> -static int ap_callin, ap_cpuid;
>> +static int ap_cpuid;
>> +
>> +/**
>> + * Lookup table of x2APIC IDs.
>> + *
>> + * Each entry is populated its respective CPU as they come online. This is required
>> + * for generating the MADT with minimal assumptions about ID relationships.
>> + */
>> +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
>> +
>> +static uint32_t read_apic_id(void)
>> +{
>> + uint32_t apic_id;
>> +
>> + cpuid(1, NULL, &apic_id, NULL, NULL);
>> + apic_id >>= 24;
>> +
>> + /*
>> + * APIC IDs over 255 are represented by 255 in leaf 1 and are meant to be
>> + * read from topology leaves instead. Xen exposes x2APIC IDs in leaf 0xb,
>> + * but only if the x2APIC feature is present. If there are that many CPUs
>> + * it's guaranteed to be there so we can avoid checking for it specifically
>> + */
>
> Maybe I'm missing something, but given the current code won't Xen just
> return the low 8 bits from the x2APIC ID? I don't see any code in
> guest_cpuid() that adjusts the IDs to be 255 when > 255.>
>> + if ( apic_id == 255 )
>> + cpuid(0xb, NULL, NULL, NULL, &apic_id);
>
> Won't the correct logic be to check if x2APIC is set in CPUID, and
> then fetch the APIC ID from leaf 0xb, otherwise fallback to fetching
> the APID ID from leaf 1?
I was pretty sure that was the behaviour of real HW, but clearly I was
wrong. Just checked on a beefy machine and that's indeed the low 8 bits,
just as Xen emulates. Got confused with the core count, that does clip
to 255.
Will adjust by explicitly checking for the x2apic_id bit.
>
>> +
>> + return apic_id;
>> +}
>>
>> static void ap_start(void)
>> {
>> @@ -37,12 +64,12 @@ static void ap_start(void)
>> cacheattr_init();
>> printf("done.\n");
>>
>> + wmb();
>> + ACCESS_ONCE(CPU_TO_X2APICID[ap_cpuid]) = read_apic_id();
>
> A comment would be helpful here, that CPU_TO_X2APICID[ap_cpuid] is
> used as synchronization that the AP has started.
>
> You probably want to assert that read_apic_id() doesn't return 0,
> otherwise we are skewed.
Not a bad idea. Sure
>
>> +
>> if ( !ap_cpuid )
>> return;
>>
>> - wmb();
>> - ap_callin = 1;
>> -
>> while ( 1 )
>> asm volatile ( "hlt" );
>> }
>> @@ -86,10 +113,11 @@ static void boot_cpu(unsigned int cpu)
>> BUG();
>>
>> /*
>> - * Wait for the secondary processor to complete initialisation.
>> + * Wait for the secondary processor to complete initialisation,
>> + * which is signaled by its x2APIC ID being writted to the LUT.
>> * Do not touch shared resources meanwhile.
>> */
>> - while ( !ap_callin )
>> + while ( !ACCESS_ONCE(CPU_TO_X2APICID[cpu]) )
>> cpu_relax();
>
> As a further improvement, we could launch all APs in pararell, and use
> a for loop to wait until all positions of the CPU_TO_X2APICID array
> are set.
I thought about it, but then we'd need locking for the prints as well,
or refactor things so only the BSP prints on success.
The time taken is truly negligible, so I reckon it's better left for
another patch.
>
>>
>> /* Take the secondary processor offline. */
>> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
>> index 14078bde1e30..51e9003bc615 100644
>> --- a/tools/firmware/hvmloader/util.h
>> +++ b/tools/firmware/hvmloader/util.h
>> @@ -23,6 +23,11 @@ enum {
>> #define __STR(...) #__VA_ARGS__
>> #define STR(...) __STR(__VA_ARGS__)
>>
>> +#define __ACCESS_ONCE(x) ({ \
>> + (void)(typeof(x))0; /* Scalar typecheck. */ \
>> + (volatile typeof(x) *)&(x); })
>> +#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
>
> Might be better for this to be placed in common-macros.h?
Sure.
>
>> +
>> /* GDT selector values. */
>> #define SEL_CODE16 0x0008
>> #define SEL_DATA16 0x0010
>> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
>> index 84911f3ebcb4..6c005f0b0b38 100644
>> --- a/xen/arch/x86/include/asm/hvm/hvm.h
>> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
>> @@ -16,6 +16,7 @@
>> #include <asm/current.h>
>> #include <asm/x86_emulate.h>
>> #include <asm/hvm/asid.h>
>> +#include <asm/hvm/vlapic.h>
>
> Is this a stray change? Otherwise I don't see why this need to be
> part of this patch when the rest of the changes are exclusively to
> hvmloader.
>
> Thanks, Roger.
Should've been part of the vlapic_policy_update change. That line got
lost in the v1->v2 conversion. I just moved to where it logically
belongs now.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 7/8] xen/x86: Derive topologically correct x2APIC IDs from the policy
2024-05-24 8:39 ` Roger Pau Monné
@ 2024-05-24 17:03 ` Alejandro Vallejo
2024-05-27 8:06 ` Roger Pau Monné
0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-24 17:03 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On 24/05/2024 09:39, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:26PM +0100, Alejandro Vallejo wrote:
>> Implements the helper for mapping vcpu_id to x2apic_id given a valid
>> topology in a policy. The algo is written with the intention of extending
>> it to leaves 0x1f and e26 in the future.
>
> Using 0x1f and e26 is kind of confusing. I would word as "0x1f and
> extended leaf 0x26" to avoid confusion.
>
>>
>> Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared,
>> so the leaf is not implemented. In that case, the new helper just returns
>> the legacy mapping.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>> ---
>> v2:
>> * const-ify the test definitions
>> * Cosmetic changes (newline + parameter name in prototype)
>> ---
>> tools/tests/cpu-policy/test-cpu-policy.c | 63 ++++++++++++++++++++
>> xen/include/xen/lib/x86/cpu-policy.h | 2 +
>> xen/lib/x86/policy.c | 73 ++++++++++++++++++++++--
>> 3 files changed, 133 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
>> index 0ba8c418b1b3..82a6aeb23317 100644
>> --- a/tools/tests/cpu-policy/test-cpu-policy.c
>> +++ b/tools/tests/cpu-policy/test-cpu-policy.c
>> @@ -776,6 +776,68 @@ static void test_topo_from_parts(void)
>> }
>> }
>>
>> +static void test_x2apic_id_from_vcpu_id_success(void)
>> +{
>> + static const struct test {
>> + unsigned int vcpu_id;
>> + unsigned int threads_per_core;
>> + unsigned int cores_per_pkg;
>> + uint32_t x2apic_id;
>> + uint8_t x86_vendor;
>> + } tests[] = {
>> + {
>> + .vcpu_id = 3, .threads_per_core = 3, .cores_per_pkg = 8,
>> + .x2apic_id = 1 << 2,
>> + },
>> + {
>> + .vcpu_id = 6, .threads_per_core = 3, .cores_per_pkg = 8,
>> + .x2apic_id = 2 << 2,
>> + },
>> + {
>> + .vcpu_id = 24, .threads_per_core = 3, .cores_per_pkg = 8,
>> + .x2apic_id = 1 << 5,
>> + },
>> + {
>> + .vcpu_id = 35, .threads_per_core = 3, .cores_per_pkg = 8,
>> + .x2apic_id = (35 % 3) | (((35 / 3) % 8) << 2) | ((35 / 24) << 5),
>> + },
>> + {
>> + .vcpu_id = 96, .threads_per_core = 7, .cores_per_pkg = 3,
>> + .x2apic_id = (96 % 7) | (((96 / 7) % 3) << 3) | ((96 / 21) << 5),
> ^ extra space (same above)
>
>> + },
>> + };
>> +
>> + const uint8_t vendors[] = {
>> + X86_VENDOR_INTEL,
>> + X86_VENDOR_AMD,
>> + X86_VENDOR_CENTAUR,
>> + X86_VENDOR_SHANGHAI,
>> + X86_VENDOR_HYGON,
>> + };
>> +
>> + printf("Testing x2apic id from vcpu id success:\n");
>> +
>> + /* Perform the test run on every vendor we know about */
>> + for ( size_t i = 0; i < ARRAY_SIZE(vendors); ++i )
>> + {
>> + struct cpu_policy policy = { .x86_vendor = vendors[i] };
>
> Newline.
Ack
>
>> + for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
>> + {
>> + const struct test *t = &tests[i];
>> + uint32_t x2apic_id;
>> + int rc = x86_topo_from_parts(&policy, t->threads_per_core, t->cores_per_pkg);
>
> Overly long line.
>
> Won't it be better to define `policy` in this scope, so that for each
> test you start with a clean policy, rather than having leftover data
> from the previous test?
The leftover data is overridden during setup by x86_topo_from_parts(),
but I can see the appeal. Sure.
>
> Also you could initialize x2apic_id at definition:
>
> const struct test *t = &tests[j];
> struct cpu_policy policy = { .x86_vendor = vendors[i] };
> int rc = x86_topo_from_parts(&policy, t->threads_per_core, t->cores_per_pkg);
> uint32_t x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id);
Seeing this snippet I just realized there's a bug. The second loop
should use j rather than i. Ugh.
As for the initialization, I want to prevent feeding garbage into
x86_x2apic_id_from_vcpu_id(). For which there's an "if ( !rc )" missing
to gate the call.
I'll sort both of those.
>
>> +
>> + x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id);
>> + if ( rc || x2apic_id != t->x2apic_id )
>> + fail("FAIL[%d] - '%s cpu%u %u t/c %u c/p'. bad x2apic_id: expected=%u actual=%u\n",
>> + rc,
>> + x86_cpuid_vendor_to_str(policy.x86_vendor),
>> + t->vcpu_id, t->threads_per_core, t->cores_per_pkg,
>> + t->x2apic_id, x2apic_id);
>> + }
>> + }
>> +}
>> +
>> int main(int argc, char **argv)
>> {
>> printf("CPU Policy unit tests\n");
>> @@ -794,6 +856,7 @@ int main(int argc, char **argv)
>> test_is_compatible_failure();
>>
>> test_topo_from_parts();
>> + test_x2apic_id_from_vcpu_id_success();
>>
>> if ( nr_failures )
>> printf("Done: %u failures\n", nr_failures);
>> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
>> index f5df18e9f77c..2cbc2726a861 100644
>> --- a/xen/include/xen/lib/x86/cpu-policy.h
>> +++ b/xen/include/xen/lib/x86/cpu-policy.h
>> @@ -545,6 +545,8 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>> /**
>> * Calculates the x2APIC ID of a vCPU given a CPU policy
>> *
>> + * If the policy lacks leaf 0xb falls back to legacy mapping of apic_id=cpu*2
>> + *
>> * @param p CPU policy of the domain.
>> * @param id vCPU ID of the vCPU.
>> * @returns x2APIC ID of the vCPU.
>> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
>> index d033ee5398dd..e498e32f8fd7 100644
>> --- a/xen/lib/x86/policy.c
>> +++ b/xen/lib/x86/policy.c
>> @@ -2,15 +2,78 @@
>>
>> #include <xen/lib/x86/cpu-policy.h>
>>
>> +static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p, size_t lvl)
>> +{
>> + /*
>> + * `nr_logical` reported by Intel is the number of THREADS contained in
>> + * the next topological scope. For example, assuming a system with 2
>> + * threads/core and 3 cores/module in a fully symmetric topology,
>> + * `nr_logical` at the core level will report 6. Because it's reporting
>> + * the number of threads in a module.
>> + *
>> + * On AMD/Hygon, nr_logical is already normalized by the higher scoped
>> + * level (cores/complex, etc) so we can return it as-is.
>> + */
>> + if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
>> + return p->topo.subleaf[lvl].nr_logical;
>> +
>> + return p->topo.subleaf[lvl].nr_logical / p->topo.subleaf[lvl - 1].nr_logical;
>
> Line length here and in the function declaration.
>
Ack
>> +}
>> +
>> uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
>> {
>> + uint32_t shift = 0, x2apic_id = 0;
>> +
>> + /* In the absence of topology leaves, fallback to traditional mapping */
>> + if ( !p->topo.subleaf[0].type )
>> + return id * 2;
>> +
>> /*
>> - * TODO: Derive x2APIC ID from the topology information inside `p`
>> - * rather than from vCPU ID. This bodge is a temporary measure
>> - * until all infra is in place to retrieve or derive the initial
>> - * x2APIC ID from migrated domains.
>
> I'm a bit confused with this, the policy is domain wide, so we will
> always need to pass the vCPU ID into x86_x2apic_id_from_vcpu_id()?
> IOW: the x2APIC ID will always be derived from the vCPU ID.
>
> Thanks, Roger.
The x2APIC ID is derived (after the series) from the vCPU ID _and_ the
topology information. The vCPU alone will work out in all cases because
it'll be cached in the vlapic hvm structure.
I guess the comment could be rewritten as "... rather than from the vCPU
ID alone..."
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 8/8] xen/x86: Synthesise domain topologies
2024-05-24 8:58 ` Roger Pau Monné
@ 2024-05-24 17:16 ` Alejandro Vallejo
2024-05-27 8:20 ` Roger Pau Monné
0 siblings, 1 reply; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-24 17:16 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Xen-devel, Anthony PERARD, Juergen Gross, Jan Beulich,
Andrew Cooper
On 24/05/2024 09:58, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:27PM +0100, Alejandro Vallejo wrote:
>> Expose sensible topologies in leaf 0xb. At the moment it synthesises non-HT
>> systems, in line with the previous code intent.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>> ---
>> v2:
>> * Zap the topology leaves of (pv/hvm)_(def/max)_policy rather than the host policy
>> ---
>> tools/libs/guest/xg_cpuid_x86.c | 62 +++++----------------------------
>> xen/arch/x86/cpu-policy.c | 9 +++--
>> 2 files changed, 15 insertions(+), 56 deletions(-)
>>
>> diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
>> index 4453178100ad..8170769dbe43 100644
>> --- a/tools/libs/guest/xg_cpuid_x86.c
>> +++ b/tools/libs/guest/xg_cpuid_x86.c
>> @@ -584,7 +584,7 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>> bool hvm;
>> xc_domaininfo_t di;
>> struct xc_cpu_policy *p = xc_cpu_policy_init();
>> - unsigned int i, nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
>> + unsigned int nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
>> uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
>> uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
>> uint32_t len = ARRAY_SIZE(host_featureset);
>> @@ -727,59 +727,15 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
>> }
>> else
>> {
>> - /*
>> - * Topology for HVM guests is entirely controlled by Xen. For now, we
>> - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
>> - */
>> - p->policy.basic.htt = true;
>> - p->policy.extd.cmp_legacy = false;
>> -
>> - /*
>> - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
>> - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
>> - * overflow.
>> - */
>> - if ( !p->policy.basic.lppp )
>> - p->policy.basic.lppp = 2;
>> - else if ( !(p->policy.basic.lppp & 0x80) )
>> - p->policy.basic.lppp *= 2;
>> -
>> - switch ( p->policy.x86_vendor )
>> + /* TODO: Expose the ability to choose a custom topology for HVM/PVH */
>> + unsigned int threads_per_core = 1;
>> + unsigned int cores_per_pkg = di.max_vcpu_id + 1;
>
> Newline.
ack
>
>> + rc = x86_topo_from_parts(&p->policy, threads_per_core, cores_per_pkg);
>
> I assume this generates the same topology as the current code, or will
> the population of the leaves be different in some way?
>
The current code does not populate 0xb. This generates a topology
consistent with the existing INTENDED topology. The actual APIC IDs will
be different though (because there's no skipping of odd values).
All the dance in patch 1 was to make this migrate-safe. The x2apic ID is
stored in the lapic hidden regs so differences with previous behaviour
don't matter.
IOW, The differences are:
* 0xb is exposed, whereas previously it wasn't
* APIC IDs are compacted such that new_apicid=old_apicid/2
* There's also a cleanup of the murkier paths to put the right core
counts in the right leaves (whereas previously it was bonkers)
>> + if ( rc )
>> {
>> - case X86_VENDOR_INTEL:
>> - for ( i = 0; (p->policy.cache.subleaf[i].type &&
>> - i < ARRAY_SIZE(p->policy.cache.raw)); ++i )
>> - {
>> - p->policy.cache.subleaf[i].cores_per_package =
>> - (p->policy.cache.subleaf[i].cores_per_package << 1) | 1;
>> - p->policy.cache.subleaf[i].threads_per_cache = 0;
>> - }
>> - break;
>> -
>> - case X86_VENDOR_AMD:
>> - case X86_VENDOR_HYGON:
>> - /*
>> - * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize.
>> - * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one).
>> - * Update to reflect vLAPIC_ID = vCPU_ID * 2. But avoid
>> - * - overflow,
>> - * - going out of sync with leaf 1 EBX[23:16],
>> - * - incrementing ApicIdCoreSize when it's zero (which changes the
>> - * meaning of bits 7:0).
>> - *
>> - * UPDATE: I addition to avoiding overflow, some
>> - * proprietary operating systems have trouble with
>> - * apic_id_size values greater than 7. Limit the value to
>> - * 7 for now.
>> - */
>> - if ( p->policy.extd.nc < 0x7f )
>> - {
>> - if ( p->policy.extd.apic_id_size != 0 && p->policy.extd.apic_id_size < 0x7 )
>> - p->policy.extd.apic_id_size++;
>> -
>> - p->policy.extd.nc = (p->policy.extd.nc << 1) | 1;
>> - }
>> - break;
>> + ERROR("Failed to generate topology: t/c=%u c/p=%u",
>> + threads_per_core, cores_per_pkg);
>
> Could you also print the error code?
Sure
>
>> + goto out;
>> }
>> }
>>
>> diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
>> index 4b6d96276399..0ad871732ba0 100644
>> --- a/xen/arch/x86/cpu-policy.c
>> +++ b/xen/arch/x86/cpu-policy.c
>> @@ -278,9 +278,6 @@ static void recalculate_misc(struct cpu_policy *p)
>>
>> p->basic.raw[0x8] = EMPTY_LEAF;
>>
>> - /* TODO: Rework topology logic. */
>> - memset(p->topo.raw, 0, sizeof(p->topo.raw));
>> -
>> p->basic.raw[0xc] = EMPTY_LEAF;
>>
>> p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES;
>> @@ -621,6 +618,9 @@ static void __init calculate_pv_max_policy(void)
>> recalculate_xstate(p);
>>
>> p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */
>> +
>> + /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
>> + memset(p->topo.raw, 0, sizeof(p->topo.raw));
>> }
>>
>> static void __init calculate_pv_def_policy(void)
>> @@ -773,6 +773,9 @@ static void __init calculate_hvm_max_policy(void)
>>
>> /* It's always possible to emulate CPUID faulting for HVM guests */
>> p->platform_info.cpuid_faulting = true;
>> +
>> + /* Wipe host topology. Toolstack is expected to synthesise a sensible one */
>
> Line length.
>
> /* Wipe host topology. Populated by toolstack. */
>
> Would be OK IMO.
Ack
>
> Thanks, Roger.
>> + memset(p->topo.raw, 0, sizeof(p->topo.raw));
>
> Note that currently the host policy also gets the topology leaves
> cleared, is it intended to not clear them anymore after this change?
>
> (as you only clear the leaves for the guest {max,def} policies)
>
> Thanks, Roger.
It was like that originally in v1, I changed in v2 as part of feedback
from Jan.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 7/8] xen/x86: Derive topologically correct x2APIC IDs from the policy
2024-05-24 17:03 ` Alejandro Vallejo
@ 2024-05-27 8:06 ` Roger Pau Monné
0 siblings, 0 replies; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-27 8:06 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Fri, May 24, 2024 at 06:03:22PM +0100, Alejandro Vallejo wrote:
> On 24/05/2024 09:39, Roger Pau Monné wrote:
> > On Wed, May 08, 2024 at 01:39:26PM +0100, Alejandro Vallejo wrote:
> >
> > Also you could initialize x2apic_id at definition:
> >
> > const struct test *t = &tests[j];
> > struct cpu_policy policy = { .x86_vendor = vendors[i] };
> > int rc = x86_topo_from_parts(&policy, t->threads_per_core, t->cores_per_pkg);
> > uint32_t x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id);
>
> Seeing this snippet I just realized there's a bug. The second loop
> should use j rather than i. Ugh.
Well, you shadow the outer variable with the inner one, which makes it
still fine. Yet I don't like that shadowing much. I was going to
comment, but for the requested change you need to not shadow the outer
loop variable (in the example chunk I've used 'j' to signal the outer
loop index).
> >> +}
> >> +
> >> uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
> >> {
> >> + uint32_t shift = 0, x2apic_id = 0;
> >> +
> >> + /* In the absence of topology leaves, fallback to traditional mapping */
> >> + if ( !p->topo.subleaf[0].type )
> >> + return id * 2;
> >> +
> >> /*
> >> - * TODO: Derive x2APIC ID from the topology information inside `p`
> >> - * rather than from vCPU ID. This bodge is a temporary measure
> >> - * until all infra is in place to retrieve or derive the initial
> >> - * x2APIC ID from migrated domains.
> >
> > I'm a bit confused with this, the policy is domain wide, so we will
> > always need to pass the vCPU ID into x86_x2apic_id_from_vcpu_id()?
> > IOW: the x2APIC ID will always be derived from the vCPU ID.
> >
> > Thanks, Roger.
>
> The x2APIC ID is derived (after the series) from the vCPU ID _and_ the
> topology information. The vCPU alone will work out in all cases because
> it'll be cached in the vlapic hvm structure.
>
> I guess the comment could be rewritten as "... rather than from the vCPU
> ID alone..."
Yup, that's clearer :).
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 8/8] xen/x86: Synthesise domain topologies
2024-05-24 17:16 ` Alejandro Vallejo
@ 2024-05-27 8:20 ` Roger Pau Monné
2024-05-28 14:06 ` Alejandro Vallejo
0 siblings, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-27 8:20 UTC (permalink / raw)
To: Alejandro Vallejo
Cc: Xen-devel, Anthony PERARD, Juergen Gross, Jan Beulich,
Andrew Cooper
On Fri, May 24, 2024 at 06:16:01PM +0100, Alejandro Vallejo wrote:
> On 24/05/2024 09:58, Roger Pau Monné wrote:
> > On Wed, May 08, 2024 at 01:39:27PM +0100, Alejandro Vallejo wrote:
> >
> >> + rc = x86_topo_from_parts(&p->policy, threads_per_core, cores_per_pkg);
> >
> > I assume this generates the same topology as the current code, or will
> > the population of the leaves be different in some way?
> >
>
> The current code does not populate 0xb. This generates a topology
> consistent with the existing INTENDED topology. The actual APIC IDs will
> be different though (because there's no skipping of odd values).
>
> All the dance in patch 1 was to make this migrate-safe. The x2apic ID is
> stored in the lapic hidden regs so differences with previous behaviour
> don't matter.
What about systems without CPU policy in the migration stream, will
those also get restored as expected?
I think you likely need to check whether 'restore' is set and keep the
old logic in that case?
As otherwise migrated systems without a CPU policy will get the new
topology information instead of the old one?
> IOW, The differences are:
> * 0xb is exposed, whereas previously it wasn't
> * APIC IDs are compacted such that new_apicid=old_apicid/2
> * There's also a cleanup of the murkier paths to put the right core
> counts in the right leaves (whereas previously it was bonkers)
This needs to be in the commit message IMO.
> >
> > Note that currently the host policy also gets the topology leaves
> > cleared, is it intended to not clear them anymore after this change?
> >
> > (as you only clear the leaves for the guest {max,def} policies)
> >
> > Thanks, Roger.
>
> It was like that originally in v1, I changed in v2 as part of feedback
> from Jan.
I think that's fine, but this divergence from current behavior of
cleaning the topology for the host policy needs to be mentioned in
the commit message.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
2024-05-24 15:15 ` Alejandro Vallejo
@ 2024-05-27 8:52 ` Roger Pau Monné
2024-05-28 12:35 ` Alejandro Vallejo
0 siblings, 1 reply; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-27 8:52 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Fri, May 24, 2024 at 04:15:34PM +0100, Alejandro Vallejo wrote:
> On 24/05/2024 08:21, Roger Pau Monné wrote:
> > On Wed, May 08, 2024 at 01:39:24PM +0100, Alejandro Vallejo wrote:
> >> Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
> >> populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
> >> from hvmloader.
> >>
> >> While at this also remove ap_callin, as writing the APIC ID may serve the same
> >> purpose.
> >>
> >> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> >> ---
> >> v2:
> >> * New patch. Replaces adding cpu policy to hvmloader in v1.
> >> ---
> >> tools/firmware/hvmloader/config.h | 6 ++++-
> >> tools/firmware/hvmloader/hvmloader.c | 4 +--
> >> tools/firmware/hvmloader/smp.c | 40 +++++++++++++++++++++++-----
> >> tools/firmware/hvmloader/util.h | 5 ++++
> >> xen/arch/x86/include/asm/hvm/hvm.h | 1 +
> >> 5 files changed, 47 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
> >> index c82adf6dc508..edf6fa9c908c 100644
> >> --- a/tools/firmware/hvmloader/config.h
> >> +++ b/tools/firmware/hvmloader/config.h
> >> @@ -4,6 +4,8 @@
> >> #include <stdint.h>
> >> #include <stdbool.h>
> >>
> >> +#include <xen/hvm/hvm_info_table.h>
> >> +
> >> enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
> >> extern enum virtual_vga virtual_vga;
> >>
> >> @@ -49,8 +51,10 @@ extern uint8_t ioapic_version;
> >>
> >> #define IOAPIC_ID 0x01
> >>
> >> +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
> >> +
> >> #define LAPIC_BASE_ADDRESS 0xfee00000
> >> -#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2)
> >> +#define LAPIC_ID(vcpu_id) (CPU_TO_X2APICID[(vcpu_id)])
> >>
> >> #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
> >> #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
> >> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> >> index c58841e5b556..1eba92229925 100644
> >> --- a/tools/firmware/hvmloader/hvmloader.c
> >> +++ b/tools/firmware/hvmloader/hvmloader.c
> >> @@ -342,11 +342,11 @@ int main(void)
> >>
> >> printf("CPU speed is %u MHz\n", get_cpu_mhz());
> >>
> >> + smp_initialise();
> >> +
> >> apic_setup();
> >> pci_setup();
> >>
> >> - smp_initialise();
> >> -
> >> perform_tests();
> >>
> >> if ( bios->bios_info_setup )
> >> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
> >> index a668f15d7e1f..4d75f239c2f5 100644
> >> --- a/tools/firmware/hvmloader/smp.c
> >> +++ b/tools/firmware/hvmloader/smp.c
> >> @@ -29,7 +29,34 @@
> >>
> >> #include <xen/vcpu.h>
> >>
> >> -static int ap_callin, ap_cpuid;
> >> +static int ap_cpuid;
> >> +
> >> +/**
> >> + * Lookup table of x2APIC IDs.
> >> + *
> >> + * Each entry is populated its respective CPU as they come online. This is required
> >> + * for generating the MADT with minimal assumptions about ID relationships.
> >> + */
> >> +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
> >> +
> >> +static uint32_t read_apic_id(void)
> >> +{
> >> + uint32_t apic_id;
> >> +
> >> + cpuid(1, NULL, &apic_id, NULL, NULL);
> >> + apic_id >>= 24;
> >> +
> >> + /*
> >> + * APIC IDs over 255 are represented by 255 in leaf 1 and are meant to be
> >> + * read from topology leaves instead. Xen exposes x2APIC IDs in leaf 0xb,
> >> + * but only if the x2APIC feature is present. If there are that many CPUs
> >> + * it's guaranteed to be there so we can avoid checking for it specifically
> >> + */
> >> + if ( apic_id == 255 )
> >> + cpuid(0xb, NULL, NULL, NULL, &apic_id);
> >> +
> >> + return apic_id;
> >> +}
> >>
> >> static void ap_start(void)
> >> {
> >> @@ -37,12 +64,12 @@ static void ap_start(void)
> >> cacheattr_init();
> >> printf("done.\n");
> >>
> >> + wmb();
> >> + ACCESS_ONCE(CPU_TO_X2APICID[ap_cpuid]) = read_apic_id();
> >
> > Further thinking about this: do we really need the wmb(), given the
> > usage of ACCESS_ONCE()?
> >
> > wmb() is a compiler barrier, and the usage of volatile in
> > ACCESS_ONCE() should already prevent any compiler re-ordering.
> >
> > Thanks, Roger.
>
> volatile reads/writes cannot be reordered with other volatile
> reads/writes, but volatile reads/writes can be reordered with
> non-volatile reads/writes, AFAIR.
Right, I've read the C99 spec and it does indeed only guarantee the
state of volatile objects to be as expected at sequence points. I
think however this specific code would still be fine since the
function calls are considered to have side-effects, and those must all
be completed before the volatile access.
> My intent here was to prevent the compiler omitting the write (though in
> practice it didn't). I think the barrier is still required to prevent
> reordering according to the spec.
Yes, my understating is that you added the ACCESS_ONCE() to possibly
prevent the compiler from re-ordering the CPU_TO_X2APICID[ap_cpuid]) =
read_apic_id() after the 'hlt' loop?
AFAICT the expressions in the `for` statement are considered sequence
points, and the calling `read_apic_id()` could have side-effects, so
the compiler won't be able to elide or move the setting of
CPU_TO_X2APICID[] due to all side-effects of previous evaluations must
be complete at sequence points.
I'm fine with leaving the wmb() + ACCESS_ONCE().
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
2024-05-24 15:16 ` Alejandro Vallejo
@ 2024-05-27 8:55 ` Roger Pau Monné
0 siblings, 0 replies; 43+ messages in thread
From: Roger Pau Monné @ 2024-05-27 8:55 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Fri, May 24, 2024 at 04:16:01PM +0100, Alejandro Vallejo wrote:
> On 23/05/2024 17:13, Roger Pau Monné wrote:
> > On Wed, May 08, 2024 at 01:39:24PM +0100, Alejandro Vallejo wrote:
> >> @@ -86,10 +113,11 @@ static void boot_cpu(unsigned int cpu)
> >> BUG();
> >>
> >> /*
> >> - * Wait for the secondary processor to complete initialisation.
> >> + * Wait for the secondary processor to complete initialisation,
> >> + * which is signaled by its x2APIC ID being writted to the LUT.
> >> * Do not touch shared resources meanwhile.
> >> */
> >> - while ( !ap_callin )
> >> + while ( !ACCESS_ONCE(CPU_TO_X2APICID[cpu]) )
> >> cpu_relax();
> >
> > As a further improvement, we could launch all APs in pararell, and use
> > a for loop to wait until all positions of the CPU_TO_X2APICID array
> > are set.
>
> I thought about it, but then we'd need locking for the prints as well,
> or refactor things so only the BSP prints on success.
Hm, I see, yes, we would likely need to refactor the printing a bit so
each AP only prints one line, and then add locking around the calls in
`cpu_setup()`.
> The time taken is truly negligible, so I reckon it's better left for
> another patch.
Oh, indeed, sorry if I made it look it should be part of this patch,
that wasn't my intention. Just something that might be worth looking
into doing in the future.
Thanks, Roger.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
2024-05-27 8:52 ` Roger Pau Monné
@ 2024-05-28 12:35 ` Alejandro Vallejo
0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-28 12:35 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On 27/05/2024 09:52, Roger Pau Monné wrote:
>> My intent here was to prevent the compiler omitting the write (though in
>> practice it didn't). I think the barrier is still required to prevent
>> reordering according to the spec.
>
> Yes, my understating is that you added the ACCESS_ONCE() to possibly
> prevent the compiler from re-ordering the CPU_TO_X2APICID[ap_cpuid]) =
> read_apic_id() after the 'hlt' loop?
Yes, but not only that. Also preventing the compiler from eliding the
write altogether on the basis that it's not read after within the
function. I have suffered that particular behaviour on older versions of
GCC and it stung very much.
>
> AFAICT the expressions in the `for` statement are considered sequence
> points, and the calling `read_apic_id()` could have side-effects, so
> the compiler won't be able to elide or move the setting of
> CPU_TO_X2APICID[] due to all side-effects of previous evaluations must
> be complete at sequence points.
> > I'm fine with leaving the wmb() + ACCESS_ONCE().
>
> Thanks, Roger.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 6/8] xen/lib: Add topology generator for x86
2024-05-23 16:50 ` Roger Pau Monné
@ 2024-05-28 13:28 ` Alejandro Vallejo
0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-28 13:28 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On 23/05/2024 17:50, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:25PM +0100, Alejandro Vallejo wrote:
>> Add a helper to populate topology leaves in the cpu policy from
>> threads/core and cores/package counts.
>>
>> No functional change, as it's not connected to anything yet.
>
> There is a functional change in test-cpu-policy.c.
>
> Maybe the commit message needs to be updated to reflect the added
> testing to test-cpu-policy.c using the newly introduced helper to
> generate topologies?
>
Sure to this and all formatting feedback below.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>> ---
>> v2:
>> * New patch. Extracted from v1/patch6
>> ---
>> tools/tests/cpu-policy/test-cpu-policy.c | 128 +++++++++++++++++++++++
>> xen/include/xen/lib/x86/cpu-policy.h | 16 +++
>> xen/lib/x86/policy.c | 86 +++++++++++++++
>> 3 files changed, 230 insertions(+)
>>
>> diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
>> index 301df2c00285..0ba8c418b1b3 100644
>> --- a/tools/tests/cpu-policy/test-cpu-policy.c
>> +++ b/tools/tests/cpu-policy/test-cpu-policy.c
>> @@ -650,6 +650,132 @@ static void test_is_compatible_failure(void)
>> }
>> }
>>
>> +static void test_topo_from_parts(void)
>> +{
>> + static const struct test {
>> + unsigned int threads_per_core;
>> + unsigned int cores_per_pkg;
>> + struct cpu_policy policy;
>> + } tests[] = {
>> + {
>> + .threads_per_core = 3, .cores_per_pkg = 1,
>> + .policy = {
>> + .x86_vendor = X86_VENDOR_AMD,
>> + .topo.subleaf = {
>> + [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
>> + [1] = { .nr_logical = 1, .level = 1, .type = 2, .id_shift = 2, },
>> + },
>> + },
>> + },
>> + {
>> + .threads_per_core = 1, .cores_per_pkg = 3,
>> + .policy = {
>> + .x86_vendor = X86_VENDOR_AMD,
>> + .topo.subleaf = {
>> + [0] = { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
>> + [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
>> + },
>> + },
>> + },
>> + {
>> + .threads_per_core = 7, .cores_per_pkg = 5,
>> + .policy = {
>> + .x86_vendor = X86_VENDOR_AMD,
>> + .topo.subleaf = {
>> + [0] = { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
>> + [1] = { .nr_logical = 5, .level = 1, .type = 2, .id_shift = 6, },
>> + },
>> + },
>> + },
>> + {
>> + .threads_per_core = 2, .cores_per_pkg = 128,
>> + .policy = {
>> + .x86_vendor = X86_VENDOR_AMD,
>> + .topo.subleaf = {
>> + [0] = { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
>> + [1] = { .nr_logical = 128, .level = 1, .type = 2, .id_shift = 8, },
>> + },
>> + },
>> + },
>> + {
>> + .threads_per_core = 3, .cores_per_pkg = 1,
>> + .policy = {
>> + .x86_vendor = X86_VENDOR_INTEL,
>> + .topo.subleaf = {
>> + [0] = { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
>> + [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
>> + },
>> + },
>> + },
>> + {
>> + .threads_per_core = 1, .cores_per_pkg = 3,
>> + .policy = {
>> + .x86_vendor = X86_VENDOR_INTEL,
>> + .topo.subleaf = {
>> + [0] = { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
>> + [1] = { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
>> + },
>> + },
>> + },
>> + {
>> + .threads_per_core = 7, .cores_per_pkg = 5,
>> + .policy = {
>> + .x86_vendor = X86_VENDOR_INTEL,
>> + .topo.subleaf = {
>> + [0] = { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
>> + [1] = { .nr_logical = 35, .level = 1, .type = 2, .id_shift = 6, },
>> + },
>> + },
>> + },
>> + {
>> + .threads_per_core = 2, .cores_per_pkg = 128,
>> + .policy = {
>> + .x86_vendor = X86_VENDOR_INTEL,
>> + .topo.subleaf = {
>> + [0] = { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
>> + [1] = { .nr_logical = 256, .level = 1, .type = 2, .id_shift = 8, },
>
> You don't need the array index in the initialization:
>
> .topo.subleaf = {
> { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
> { .nr_logical = 256, .level = 1, .type = 2,
> .id_shift = 8, },
> }
>
> And lines should be limited to 80 columns if possible.
>
>> + },
>> + },
>> + },
>> + };
>> +
>> + printf("Testing topology synthesis from parts:\n");
>> +
>> + for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
>> + {
>> + const struct test *t = &tests[i];
>> + struct cpu_policy actual = { .x86_vendor = t->policy.x86_vendor };
>> + int rc = x86_topo_from_parts(&actual, t->threads_per_core, t->cores_per_pkg);
>> +
>> + if ( rc || memcmp(&actual.topo, &t->policy.topo, sizeof(actual.topo)) )
>> + {
>> +#define TOPO(n) topo.subleaf[(n)]
>> + fail("FAIL[%d] - '%s %u t/c, %u c/p'\n",
>> + rc,
>> + x86_cpuid_vendor_to_str(t->policy.x86_vendor),
>> + t->threads_per_core, t->cores_per_pkg);
>> + printf(" subleaf=%u expected_n=%u actual_n=%u\n"
>> + " expected_lvl=%u actual_lvl=%u\n"
>> + " expected_type=%u actual_type=%u\n"
>> + " expected_shift=%u actual_shift=%u\n",
>> + 0, t->policy.TOPO(0).nr_logical, actual.TOPO(0).nr_logical,
>> + t->policy.TOPO(0).level, actual.TOPO(0).level,
>> + t->policy.TOPO(0).type, actual.TOPO(0).type,
>> + t->policy.TOPO(0).id_shift, actual.TOPO(0).id_shift);
>> +
>> + printf(" subleaf=%u expected_n=%u actual_n=%u\n"
>> + " expected_lvl=%u actual_lvl=%u\n"
>> + " expected_type=%u actual_type=%u\n"
>> + " expected_shift=%u actual_shift=%u\n",
>> + 1, t->policy.TOPO(1).nr_logical, actual.TOPO(1).nr_logical,
>> + t->policy.TOPO(1).level, actual.TOPO(1).level,
>> + t->policy.TOPO(1).type, actual.TOPO(1).type,
>> + t->policy.TOPO(1).id_shift, actual.TOPO(1).id_shift);
>> +#undef TOPO
>
> Seeing the usage of the macro, maybe you could even do something like:
>
> TOPO(n, f) t->policy.topo.subleaf[(n)].f, actual.topo.subleaf[(n)].f
>
> This will limit a bit the repetition of the "t->policy..., actual..."
> tuple.
>
Hm. Sure, but bear in mind it ends up looking rather cryptic.
>> + }
>> + }
>> +}
>> +
>> int main(int argc, char **argv)
>> {
>> printf("CPU Policy unit tests\n");
>> @@ -667,6 +793,8 @@ int main(int argc, char **argv)
>> test_is_compatible_success();
>> test_is_compatible_failure();
>>
>> + test_topo_from_parts();
>> +
>> if ( nr_failures )
>> printf("Done: %u failures\n", nr_failures);
>> else
>> diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
>> index 392320b9adbe..f5df18e9f77c 100644
>> --- a/xen/include/xen/lib/x86/cpu-policy.h
>> +++ b/xen/include/xen/lib/x86/cpu-policy.h
>> @@ -551,6 +551,22 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
>> */
>> uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id);
>>
>> +/**
>> + * Synthesise topology information in `p` given high-level constraints
>> + *
>> + * Topology is given in various fields accross several leaves, some of
>> + * which are vendor-specific. This function uses the policy itself to
>> + * derive such leaves from threads/core and cores/package.
>> + *
>> + * @param p CPU policy of the domain.
>> + * @param threads_per_core threads/core. Doesn't need to be a power of 2.
>> + * @param cores_per_package cores/package. Doesn't need to be a power of 2.
>> + * @return 0 on success; -errno on failure
>> + */
>> +int x86_topo_from_parts(struct cpu_policy *p,
>> + unsigned int threads_per_core,
>> + unsigned int cores_per_pkg);
>> +
>> #endif /* !XEN_LIB_X86_POLICIES_H */
>>
>> /*
>> diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
>> index 4cef658feeb8..d033ee5398dd 100644
>> --- a/xen/lib/x86/policy.c
>> +++ b/xen/lib/x86/policy.c
>> @@ -13,6 +13,92 @@ uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
>> return vcpu_id * 2;
>> }
>>
>> +static unsigned int order(unsigned int n)
>> +{
>> + return 8 * sizeof(n) - __builtin_clz(n);
>
> Do we need to assert that n is not 0, otherwise the return of
> __builtin_clz() is undefined.
>
> I think the usage below doesn't pass 0 to __builtin_clz() in any case,
> but better add the check IMO.
I doesn't, but asserting sanity sounds good.
>
> Is __builtin_clz() also available in all versions of GCC and CLANG
> that we support? I have no idea when this was introduced.
>
Works on GCC 4.1.2 and Clang 3.5 according to godbolt.
>> +}
>> +
>> +int x86_topo_from_parts(struct cpu_policy *p,
>> + unsigned int threads_per_core,
>> + unsigned int cores_per_pkg)
>> +{
>> + unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;
>> + unsigned int apic_id_size;
>> +
>> + if ( !p || !threads_per_core || !cores_per_pkg )
>> + return -EINVAL;
>> +
>> + p->basic.max_leaf = MAX(0xb, p->basic.max_leaf);
>> +
>> + memset(p->topo.raw, 0, sizeof(p->topo.raw));
>> +
>> + /* thread level */
>> + p->topo.subleaf[0].nr_logical = threads_per_core;
>> + p->topo.subleaf[0].id_shift = 0;
>> + p->topo.subleaf[0].level = 0;
>> + p->topo.subleaf[0].type = 1;
>> + if ( threads_per_core > 1 )
>> + p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
>> +
>> + /* core level */
>> + p->topo.subleaf[1].nr_logical = cores_per_pkg;
>> + if ( p->x86_vendor == X86_VENDOR_INTEL )
>> + p->topo.subleaf[1].nr_logical = threads_per_pkg;
>> + p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
>> + p->topo.subleaf[1].level = 1;
>> + p->topo.subleaf[1].type = 2;
>> + if ( cores_per_pkg > 1 )
>> + p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
>> +
>> + apic_id_size = p->topo.subleaf[1].id_shift;
>> +
>> + /*
>> + * Contrary to what the name might seem to imply. HTT is an enabler for
>> + * SMP and there's no harm in setting it even with a single vCPU.
>> + */
>> + p->basic.htt = true;
>> + p->basic.lppp = MIN(0xff, p->basic.lppp);
>> +
>> + switch ( p->x86_vendor )
>> + {
>> + case X86_VENDOR_INTEL: {
>> + struct cpuid_cache_leaf *sl = p->cache.subleaf;
>
> Newline please.
>
>> + for ( size_t i = 0; sl->type &&
>> + i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
>> + {
>> + sl->cores_per_package = cores_per_pkg - 1;
>> + sl->threads_per_cache = threads_per_core - 1;
>> + if ( sl->type == 3 /* unified cache */ )
>> + sl->threads_per_cache = threads_per_pkg - 1;
>> + }
>> + break;
>> + }
>
> Newline here also.
>
>> + case X86_VENDOR_AMD:
>> + case X86_VENDOR_HYGON:
>> + /* Expose p->basic.lppp */
>> + p->extd.cmp_legacy = true;
>> +
>> + /* Clip NC to the maximum value it can hold */
>> + p->extd.nc = 0xff;
>> + if ( threads_per_pkg <= 0xff )
>> + p->extd.nc = threads_per_pkg - 1;
>> +
>> + /* TODO: Expose leaf e1E */
>> + p->extd.topoext = false;
>> +
>> + /*
>> + * Clip APIC ID to 8 bits, as that's what high core-count machines do
>
> Overly long line? And missing full stop.
I've reduced the indentation of the case statement to align them to the
switch. The line fits afterwards.
>
>> + *
>> + * That what AMD EPYC 9654 does with >256 CPUs
> ^ That's
>
> Thanks, Roger.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 8/8] xen/x86: Synthesise domain topologies
2024-05-27 8:20 ` Roger Pau Monné
@ 2024-05-28 14:06 ` Alejandro Vallejo
0 siblings, 0 replies; 43+ messages in thread
From: Alejandro Vallejo @ 2024-05-28 14:06 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Xen-devel, Anthony PERARD, Juergen Gross, Jan Beulich,
Andrew Cooper
On 27/05/2024 09:20, Roger Pau Monné wrote:
> On Fri, May 24, 2024 at 06:16:01PM +0100, Alejandro Vallejo wrote:
>> On 24/05/2024 09:58, Roger Pau Monné wrote:
>>> On Wed, May 08, 2024 at 01:39:27PM +0100, Alejandro Vallejo wrote:
>>>
>>>> + rc = x86_topo_from_parts(&p->policy, threads_per_core, cores_per_pkg);
>>>
>>> I assume this generates the same topology as the current code, or will
>>> the population of the leaves be different in some way?
>>>
>>
>> The current code does not populate 0xb. This generates a topology
>> consistent with the existing INTENDED topology. The actual APIC IDs will
>> be different though (because there's no skipping of odd values).
>>
>> All the dance in patch 1 was to make this migrate-safe. The x2apic ID is
>> stored in the lapic hidden regs so differences with previous behaviour
>> don't matter.
>
> What about systems without CPU policy in the migration stream, will
> those also get restored as expected?
>
> I think you likely need to check whether 'restore' is set and keep the
> old logic in that case?
>
> As otherwise migrated systems without a CPU policy will get the new
> topology information instead of the old one?
Bah. I hoped the x2apic ID restoration would mean I could get away with
removing all that junk, but migrations from Xen v.StoneAge do cause
mayhem. And it'd be very bizarre because the new topology leaves would
not reflect the existing x2apic IDs either.
I'll condense that blob of nonsense with the old scheme into a separate
function so we can easily deprecate it in the future.
>
>> IOW, The differences are:
>> * 0xb is exposed, whereas previously it wasn't
>> * APIC IDs are compacted such that new_apicid=old_apicid/2
>> * There's also a cleanup of the murkier paths to put the right core
>> counts in the right leaves (whereas previously it was bonkers)
>
> This needs to be in the commit message IMO.
>
Sure.
>>>
>>> Note that currently the host policy also gets the topology leaves
>>> cleared, is it intended to not clear them anymore after this change?
>>>
>>> (as you only clear the leaves for the guest {max,def} policies)
>>>
>>> Thanks, Roger.
>>
>> It was like that originally in v1, I changed in v2 as part of feedback
>> from Jan.
>
> I think that's fine, but this divergence from current behavior of
> cleaning the topology for the host policy needs to be mentioned in
> the commit message.
>
> Thanks, Roger.
Sure.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2024-05-28 14:07 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-08 12:39 [PATCH v2 0/8] x86: Expose consistent topology to guests Alejandro Vallejo
2024-05-08 12:39 ` [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
2024-05-23 14:32 ` Roger Pau Monné
2024-05-24 10:58 ` Alejandro Vallejo
2024-05-24 11:15 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm Alejandro Vallejo
2024-05-22 9:33 ` Jan Beulich
2024-05-22 15:24 ` Alejandro Vallejo
2024-05-23 14:37 ` Roger Pau Monné
2024-05-23 14:40 ` Jan Beulich
2024-05-23 14:52 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook Alejandro Vallejo
2024-05-23 14:50 ` Roger Pau Monné
2024-05-24 11:16 ` Alejandro Vallejo
2024-05-24 12:53 ` Roger Pau Monné
2024-05-23 18:58 ` Andrew Cooper
2024-05-24 7:22 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 4/8] tools/hvmloader: Wake APs with hypercalls and not with INIT+SIPI+SIPI Alejandro Vallejo
2024-05-08 19:13 ` Andrew Cooper
2024-05-09 11:04 ` Alejandro Vallejo
2024-05-09 17:50 ` [PATCH 4.5/8] tools/hvmloader: Further simplify SMP setup Andrew Cooper
2024-05-13 17:19 ` Alejandro Vallejo
2024-05-23 15:04 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
2024-05-23 16:13 ` Roger Pau Monné
2024-05-24 15:16 ` Alejandro Vallejo
2024-05-27 8:55 ` Roger Pau Monné
2024-05-24 7:21 ` Roger Pau Monné
2024-05-24 15:15 ` Alejandro Vallejo
2024-05-27 8:52 ` Roger Pau Monné
2024-05-28 12:35 ` Alejandro Vallejo
2024-05-08 12:39 ` [PATCH v2 6/8] xen/lib: Add topology generator for x86 Alejandro Vallejo
2024-05-23 16:50 ` Roger Pau Monné
2024-05-28 13:28 ` Alejandro Vallejo
2024-05-08 12:39 ` [PATCH v2 7/8] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
2024-05-24 8:39 ` Roger Pau Monné
2024-05-24 17:03 ` Alejandro Vallejo
2024-05-27 8:06 ` Roger Pau Monné
2024-05-08 12:39 ` [PATCH v2 8/8] xen/x86: Synthesise domain topologies Alejandro Vallejo
2024-05-24 8:58 ` Roger Pau Monné
2024-05-24 17:16 ` Alejandro Vallejo
2024-05-27 8:20 ` Roger Pau Monné
2024-05-28 14:06 ` Alejandro Vallejo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.