* [PATCH for 4.19 v4 01/10] tools/hvmloader: Fix non-deterministic cpuid()
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
2024-06-26 16:43 ` Andrew Cooper
2024-06-26 16:28 ` [PATCH for 4.19 v4 02/10] x86/vlapic: Move lapic migration checks to the check hooks Alejandro Vallejo
` (8 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD, Oleksii Kurochko
hvmloader's cpuid() implementation deviates from Xen's in that the value passed
on ecx is unspecified. This means that when used on leaves that implement
subleaves it's unspecified which one you get; though it's more than likely an
invalid one.
Import Xen's implementation so there are no surprises.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
This is a fix for a latent bug. Should go into 4.19.
v4
* New patch
---
tools/firmware/hvmloader/util.c | 9 ---------
tools/firmware/hvmloader/util.h | 27 ++++++++++++++++++++++++---
2 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index c34f077b38e3..d3b3f9038e64 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -267,15 +267,6 @@ memcmp(const void *s1, const void *s2, unsigned n)
return 0;
}
-void
-cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx)
-{
- asm volatile (
- "cpuid"
- : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
- : "0" (idx) );
-}
-
static const char hex_digits[] = "0123456789abcdef";
/* Write a two-character hex representation of 'byte' to digits[].
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index deb823a892ef..3ad7c4f6d6a2 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -184,9 +184,30 @@ int uart_exists(uint16_t uart_base);
int lpt_exists(uint16_t lpt_base);
int hpet_exists(unsigned long hpet_base);
-/* Do cpuid instruction, with operation 'idx' */
-void cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx,
- uint32_t *ecx, uint32_t *edx);
+/* Some CPUID calls want 'count' to be placed in ecx */
+static inline void cpuid_count(
+ uint32_t op,
+ uint32_t count,
+ uint32_t *eax,
+ uint32_t *ebx,
+ uint32_t *ecx,
+ uint32_t *edx)
+{
+ asm volatile ( "cpuid"
+ : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
+ : "0" (op), "c" (count) );
+}
+
+/* Generic CPUID function (subleaf 0) */
+static inline void cpuid(
+ uint32_t leaf,
+ uint32_t *eax,
+ uint32_t *ebx,
+ uint32_t *ecx,
+ uint32_t *edx)
+{
+ cpuid_count(leaf, 0, eax, ebx, ecx, edx);
+}
/* Read the TSC register. */
static inline uint64_t rdtsc(void)
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH for 4.19 v4 01/10] tools/hvmloader: Fix non-deterministic cpuid()
2024-06-26 16:28 ` [PATCH for 4.19 v4 01/10] tools/hvmloader: Fix non-deterministic cpuid() Alejandro Vallejo
@ 2024-06-26 16:43 ` Andrew Cooper
2024-06-26 16:52 ` Alejandro Vallejo
2024-06-27 9:48 ` oleksii.kurochko
0 siblings, 2 replies; 17+ messages in thread
From: Andrew Cooper @ 2024-06-26 16:43 UTC (permalink / raw)
To: Alejandro Vallejo, Xen-devel
Cc: Jan Beulich, Roger Pau Monné, Anthony PERARD,
Oleksii Kurochko
On 26/06/2024 5:28 pm, Alejandro Vallejo wrote:
> hvmloader's cpuid() implementation deviates from Xen's in that the value passed
> on ecx is unspecified. This means that when used on leaves that implement
> subleaves it's unspecified which one you get; though it's more than likely an
> invalid one.
>
> Import Xen's implementation so there are no surprises.
Fixes: 318ac791f9f9 ("Add utilities needed for SMBIOS generation to
hvmloader")
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>
>
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index deb823a892ef..3ad7c4f6d6a2 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -184,9 +184,30 @@ int uart_exists(uint16_t uart_base);
> int lpt_exists(uint16_t lpt_base);
> int hpet_exists(unsigned long hpet_base);
>
> -/* Do cpuid instruction, with operation 'idx' */
> -void cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx,
> - uint32_t *ecx, uint32_t *edx);
> +/* Some CPUID calls want 'count' to be placed in ecx */
> +static inline void cpuid_count(
> + uint32_t op,
> + uint32_t count,
> + uint32_t *eax,
> + uint32_t *ebx,
> + uint32_t *ecx,
> + uint32_t *edx)
> +{
> + asm volatile ( "cpuid"
> + : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
> + : "0" (op), "c" (count) );
"a" to be consistent with "c".
Also it would be better to name the parameters as leaf and subleaf.
Both can be fixed on commit. However, there's no use in HVMLoader
tickling this bug right now, so I'm not sure we want to rush this into
4.19 at this point.
~Andrew
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH for 4.19 v4 01/10] tools/hvmloader: Fix non-deterministic cpuid()
2024-06-26 16:43 ` Andrew Cooper
@ 2024-06-26 16:52 ` Alejandro Vallejo
2024-06-27 9:48 ` oleksii.kurochko
1 sibling, 0 replies; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:52 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel
Cc: Jan Beulich, Roger Pau Monné, Anthony PERARD,
Oleksii Kurochko
On Wed Jun 26, 2024 at 5:43 PM BST, Andrew Cooper wrote:
> On 26/06/2024 5:28 pm, Alejandro Vallejo wrote:
> > hvmloader's cpuid() implementation deviates from Xen's in that the value passed
> > on ecx is unspecified. This means that when used on leaves that implement
> > subleaves it's unspecified which one you get; though it's more than likely an
> > invalid one.
> >
> > Import Xen's implementation so there are no surprises.
>
> Fixes: 318ac791f9f9 ("Add utilities needed for SMBIOS generation to
> hvmloader")
>
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> >
> >
> > diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> > index deb823a892ef..3ad7c4f6d6a2 100644
> > --- a/tools/firmware/hvmloader/util.h
> > +++ b/tools/firmware/hvmloader/util.h
> > @@ -184,9 +184,30 @@ int uart_exists(uint16_t uart_base);
> > int lpt_exists(uint16_t lpt_base);
> > int hpet_exists(unsigned long hpet_base);
> >
> > -/* Do cpuid instruction, with operation 'idx' */
> > -void cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx,
> > - uint32_t *ecx, uint32_t *edx);
> > +/* Some CPUID calls want 'count' to be placed in ecx */
> > +static inline void cpuid_count(
> > + uint32_t op,
> > + uint32_t count,
> > + uint32_t *eax,
> > + uint32_t *ebx,
> > + uint32_t *ecx,
> > + uint32_t *edx)
> > +{
> > + asm volatile ( "cpuid"
> > + : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
> > + : "0" (op), "c" (count) );
>
> "a" to be consistent with "c".
>
> Also it would be better to name the parameters as leaf and subleaf.
>
> Both can be fixed on commit. However, there's no use in HVMLoader
> tickling this bug right now, so I'm not sure we want to rush this into
> 4.19 at this point.
>
> ~Andrew
All sound good to me. For the record, the static inlines are copied verbatim
from Xen so if you'd like these adjusted you probably also want to make a
postit to change Xen's too.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH for 4.19 v4 01/10] tools/hvmloader: Fix non-deterministic cpuid()
2024-06-26 16:43 ` Andrew Cooper
2024-06-26 16:52 ` Alejandro Vallejo
@ 2024-06-27 9:48 ` oleksii.kurochko
1 sibling, 0 replies; 17+ messages in thread
From: oleksii.kurochko @ 2024-06-27 9:48 UTC (permalink / raw)
To: Andrew Cooper, Alejandro Vallejo, Xen-devel
Cc: Jan Beulich, Roger Pau Monné, Anthony PERARD
On Wed, 2024-06-26 at 17:43 +0100, Andrew Cooper wrote:
> On 26/06/2024 5:28 pm, Alejandro Vallejo wrote:
> > hvmloader's cpuid() implementation deviates from Xen's in that the
> > value passed
> > on ecx is unspecified. This means that when used on leaves that
> > implement
> > subleaves it's unspecified which one you get; though it's more than
> > likely an
> > invalid one.
> >
> > Import Xen's implementation so there are no surprises.
>
> Fixes: 318ac791f9f9 ("Add utilities needed for SMBIOS generation to
> hvmloader")
>
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> >
> >
> > diff --git a/tools/firmware/hvmloader/util.h
> > b/tools/firmware/hvmloader/util.h
> > index deb823a892ef..3ad7c4f6d6a2 100644
> > --- a/tools/firmware/hvmloader/util.h
> > +++ b/tools/firmware/hvmloader/util.h
> > @@ -184,9 +184,30 @@ int uart_exists(uint16_t uart_base);
> > int lpt_exists(uint16_t lpt_base);
> > int hpet_exists(unsigned long hpet_base);
> >
> > -/* Do cpuid instruction, with operation 'idx' */
> > -void cpuid(uint32_t idx, uint32_t *eax, uint32_t *ebx,
> > - uint32_t *ecx, uint32_t *edx);
> > +/* Some CPUID calls want 'count' to be placed in ecx */
> > +static inline void cpuid_count(
> > + uint32_t op,
> > + uint32_t count,
> > + uint32_t *eax,
> > + uint32_t *ebx,
> > + uint32_t *ecx,
> > + uint32_t *edx)
> > +{
> > + asm volatile ( "cpuid"
> > + : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx)
> > + : "0" (op), "c" (count) );
>
> "a" to be consistent with "c".
>
> Also it would be better to name the parameters as leaf and subleaf.
>
> Both can be fixed on commit. However, there's no use in HVMLoader
> tickling this bug right now, so I'm not sure we want to rush this
> into
> 4.19 at this point.
I agree, I think it would be better to postpone the patch until 4.20
branch.
~ Oleksii
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH for 4.19 v4 02/10] x86/vlapic: Move lapic migration checks to the check hooks
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
2024-06-26 16:28 ` [PATCH for 4.19 v4 01/10] tools/hvmloader: Fix non-deterministic cpuid() Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
2024-06-26 16:28 ` [PATCH for-4.19 v4 03/10] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
` (7 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Oleksii Kurochko
While doing this, factor out checks common to architectural and hidden state.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
This puts essential LAPIC information in the stream. It's technically a feature
but it makes 4.19 guests a lot more future-proof. I think this should go on 4.19
v4:
* Replaced BUG() with ASSERT_UNREACHABLE(), and allow ret -EINVAL on release.
* Adjust printk() to be clearer
* Assign lapic_check_common() outside the "if" condition.
---
xen/arch/x86/hvm/vlapic.c | 85 ++++++++++++++++++++++++++-------------
1 file changed, 58 insertions(+), 27 deletions(-)
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 9cfc82666ae5..1a7bca5afd2f 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1553,60 +1553,91 @@ static void lapic_load_fixup(struct vlapic *vlapic)
v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
}
-static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
-{
- unsigned int vcpuid = hvm_load_instance(h);
- struct vcpu *v;
- struct vlapic *s;
+static int lapic_check_common(const struct domain *d, unsigned int vcpuid)
+{
if ( !has_vlapic(d) )
return -ENODEV;
/* Which vlapic to load? */
- if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+ if ( !domain_vcpu(d, vcpuid) )
{
- dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
+ dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no vCPU %u\n",
d->domain_id, vcpuid);
return -EINVAL;
}
- s = vcpu_vlapic(v);
+
+ return 0;
+}
+
+static int cf_check lapic_check_hidden(const struct domain *d,
+ hvm_domain_context_t *h)
+{
+ unsigned int vcpuid = hvm_load_instance(h);
+ struct hvm_hw_lapic s;
+ int rc = lapic_check_common(d, vcpuid);
+
+ if ( rc )
+ return rc;
+
+ if ( hvm_load_entry_zeroextend(LAPIC, h, &s) != 0 )
+ return -ENODATA;
+
+ /* EN=0 with EXTD=1 is illegal */
+ if ( (s.apic_base_msr & (APIC_BASE_ENABLE | APIC_BASE_EXTD)) ==
+ APIC_BASE_EXTD )
+ return -EINVAL;
+
+ return 0;
+}
+
+static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
+{
+ unsigned int vcpuid = hvm_load_instance(h);
+ struct vcpu *v = d->vcpu[vcpuid];
+ struct vlapic *s = vcpu_vlapic(v);
if ( hvm_load_entry_zeroextend(LAPIC, h, &s->hw) != 0 )
+ {
+ ASSERT_UNREACHABLE();
return -EINVAL;
+ }
s->loaded.hw = 1;
if ( s->loaded.regs )
lapic_load_fixup(s);
- if ( !(s->hw.apic_base_msr & APIC_BASE_ENABLE) &&
- unlikely(vlapic_x2apic_mode(s)) )
- return -EINVAL;
-
hvm_update_vlapic_mode(v);
return 0;
}
-static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
+static int cf_check lapic_check_regs(const struct domain *d,
+ hvm_domain_context_t *h)
{
unsigned int vcpuid = hvm_load_instance(h);
- struct vcpu *v;
- struct vlapic *s;
+ int rc;
- if ( !has_vlapic(d) )
- return -ENODEV;
+ if ( (rc = lapic_check_common(d, vcpuid)) )
+ return rc;
- /* Which vlapic to load? */
- if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
- {
- dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
- d->domain_id, vcpuid);
- return -EINVAL;
- }
- s = vcpu_vlapic(v);
+ if ( !hvm_get_entry(LAPIC_REGS, h) )
+ return -ENODATA;
+
+ return 0;
+}
+
+static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
+{
+ unsigned int vcpuid = hvm_load_instance(h);
+ struct vcpu *v = d->vcpu[vcpuid];
+ struct vlapic *s = vcpu_vlapic(v);
if ( hvm_load_entry(LAPIC_REGS, h, s->regs) != 0 )
+ {
+ ASSERT_UNREACHABLE();
return -EINVAL;
+ }
s->loaded.id = vlapic_get_reg(s, APIC_ID);
s->loaded.ldr = vlapic_get_reg(s, APIC_LDR);
@@ -1623,9 +1654,9 @@ static int cf_check lapic_load_regs(struct domain *d, hvm_domain_context_t *h)
return 0;
}
-HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, NULL,
+HVM_REGISTER_SAVE_RESTORE(LAPIC, lapic_save_hidden, lapic_check_hidden,
lapic_load_hidden, 1, HVMSR_PER_VCPU);
-HVM_REGISTER_SAVE_RESTORE(LAPIC_REGS, lapic_save_regs, NULL,
+HVM_REGISTER_SAVE_RESTORE(LAPIC_REGS, lapic_save_regs, lapic_check_regs,
lapic_load_regs, 1, HVMSR_PER_VCPU);
int vlapic_init(struct vcpu *v)
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH for-4.19 v4 03/10] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
2024-06-26 16:28 ` [PATCH for 4.19 v4 01/10] tools/hvmloader: Fix non-deterministic cpuid() Alejandro Vallejo
2024-06-26 16:28 ` [PATCH for 4.19 v4 02/10] x86/vlapic: Move lapic migration checks to the check hooks Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
2024-06-26 16:28 ` [PATCH v4 04/10] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
` (6 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Oleksii Kurochko
This allows the initial x2APIC ID to be sent on the migration stream. This
allows further changes to topology and APIC ID assignment without breaking
existing hosts. Given the vlapic data is zero-extended on restore, fix up
migrations from hosts without the field by setting it to the old convention if
zero.
The hardcoded mapping x2apic_id=2*vcpu_id is kept for the time being, but it's
meant to be overriden by toolstack on a later patch with appropriate values.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
Same rationale as previous patch for inclusion in 4.19
Roger replied to v3 with an R-by for this patch. I didn't add it here because
the patch has seen substantial changes and it's probably worth looking at again
All changes are removals. In particular...
v4:
* Removed hooks into cpu policy update events. They are no longer relevant.
* Remove the derivation (within Xen) of x2apic_id from vcpu_id via lib/x86.
* Rearranged for toolstack to provide those on hvmcontext blobs on a later
patch. This still works out because the default is the legacy scheme of
apicid=vcpuid*2
---
xen/arch/x86/cpuid.c | 14 +++++---------
xen/arch/x86/hvm/vlapic.c | 22 ++++++++++++++++++++--
xen/arch/x86/include/asm/hvm/vlapic.h | 1 +
xen/include/public/arch-x86/hvm/save.h | 2 ++
4 files changed, 28 insertions(+), 11 deletions(-)
diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index a822e80c7ea7..7ee596ab66a4 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
const struct cpu_user_regs *regs;
case 0x1:
- /* TODO: Rework topology logic. */
res->b &= 0x00ffffffu;
if ( is_hvm_domain(d) )
- res->b |= (v->vcpu_id * 2) << 24;
+ res->b |= vlapic_x2apic_id(vcpu_vlapic(v)) << 24;
/* TODO: Rework vPMU control in terms of toolstack choices. */
if ( vpmu_available(v) &&
@@ -312,18 +311,15 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
case 0xb:
/*
- * In principle, this leaf is Intel-only. In practice, it is tightly
- * coupled with x2apic, and we offer an x2apic-capable APIC emulation
- * to guests on AMD hardware as well.
- *
- * TODO: Rework topology logic.
+ * Don't expose topology information to PV guests. Exposed on HVM
+ * along with x2APIC because they are tightly coupled.
*/
- if ( p->basic.x2apic )
+ if ( is_hvm_domain(d) && p->basic.x2apic )
{
*(uint8_t *)&res->c = subleaf;
/* Fix the x2APIC identifier. */
- res->d = v->vcpu_id * 2;
+ res->d = vlapic_x2apic_id(vcpu_vlapic(v));
}
break;
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 1a7bca5afd2f..b57e39d1c6dd 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1072,7 +1072,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id)
static void set_x2apic_id(struct vlapic *vlapic)
{
const struct vcpu *v = vlapic_vcpu(vlapic);
- uint32_t apic_id = v->vcpu_id * 2;
+ uint32_t apic_id = vlapic->hw.x2apic_id;
uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
/*
@@ -1452,7 +1452,7 @@ void vlapic_reset(struct vlapic *vlapic)
if ( v->vcpu_id == 0 )
vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
- vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
+ vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));
vlapic_do_init(vlapic);
}
@@ -1520,6 +1520,16 @@ static void lapic_load_fixup(struct vlapic *vlapic)
const struct vcpu *v = vlapic_vcpu(vlapic);
uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
+ /*
+ * Loading record without hw.x2apic_id in the save stream, calculate using
+ * the traditional "vcpu_id * 2" relation. There's an implicit assumption
+ * that vCPU0 always has x2APIC0, which is true for the old relation, and
+ * still holds under the new x2APIC generation algorithm. While that case
+ * goes through the conditional it's benign because it still maps to zero.
+ */
+ if ( !vlapic->hw.x2apic_id )
+ vlapic->hw.x2apic_id = v->vcpu_id * 2;
+
/* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
if ( !vlapic_x2apic_mode(vlapic) ||
(vlapic->loaded.ldr == good_ldr) )
@@ -1588,6 +1598,13 @@ static int cf_check lapic_check_hidden(const struct domain *d,
APIC_BASE_EXTD )
return -EINVAL;
+ /*
+ * Fail migrations from newer versions of Xen where
+ * rsvd_zero is interpreted as something else.
+ */
+ if ( s.rsvd_zero )
+ return -EINVAL;
+
return 0;
}
@@ -1672,6 +1689,7 @@ int vlapic_init(struct vcpu *v)
}
vlapic->pt.source = PTSRC_lapic;
+ vlapic->hw.x2apic_id = 2 * v->vcpu_id;
vlapic->regs_page = alloc_domheap_page(v->domain, MEMF_no_owner);
if ( !vlapic->regs_page )
diff --git a/xen/arch/x86/include/asm/hvm/vlapic.h b/xen/arch/x86/include/asm/hvm/vlapic.h
index 2c4ff94ae7a8..85c4a236b9f6 100644
--- a/xen/arch/x86/include/asm/hvm/vlapic.h
+++ b/xen/arch/x86/include/asm/hvm/vlapic.h
@@ -44,6 +44,7 @@
#define vlapic_xapic_mode(vlapic) \
(!vlapic_hw_disabled(vlapic) && \
!((vlapic)->hw.apic_base_msr & APIC_BASE_EXTD))
+#define vlapic_x2apic_id(vlapic) ((vlapic)->hw.x2apic_id)
/*
* Generic APIC bitmap vector update & search routines.
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 7ecacadde165..1c2ec669ffc9 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -394,6 +394,8 @@ struct hvm_hw_lapic {
uint32_t disabled; /* VLAPIC_xx_DISABLED */
uint32_t timer_divisor;
uint64_t tdt_msr;
+ uint32_t x2apic_id;
+ uint32_t rsvd_zero;
};
DECLARE_HVM_SAVE_TYPE(LAPIC, 5, struct hvm_hw_lapic);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v4 04/10] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
` (2 preceding siblings ...)
2024-06-26 16:28 ` [PATCH for-4.19 v4 03/10] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
2024-06-26 16:28 ` [PATCH v4 05/10] xen/x86: Add supporting code for uploading LAPIC contexts during domain create Alejandro Vallejo
` (5 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
from hvmloader.
While at this also remove ap_callin, as writing the APIC ID may serve the same
purpose.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v4:
* Removed bogus ! in ASSERT() statement introduced in v3.
---
tools/firmware/hvmloader/config.h | 6 ++-
tools/firmware/hvmloader/hvmloader.c | 4 +-
tools/firmware/hvmloader/smp.c | 54 ++++++++++++++++++++-----
tools/include/xen-tools/common-macros.h | 5 +++
4 files changed, 56 insertions(+), 13 deletions(-)
diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index cd716bf39245..213ac1f28e17 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -4,6 +4,8 @@
#include <stdint.h>
#include <stdbool.h>
+#include <xen/hvm/hvm_info_table.h>
+
enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
extern enum virtual_vga virtual_vga;
@@ -48,8 +50,10 @@ extern uint8_t ioapic_version;
#define IOAPIC_ID 0x01
+extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
+
#define LAPIC_BASE_ADDRESS 0xfee00000
-#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2)
+#define LAPIC_ID(vcpu_id) (CPU_TO_X2APICID[(vcpu_id)])
#define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
#define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index f8af88fabf24..5c02e8fc226a 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -341,11 +341,11 @@ int main(void)
printf("CPU speed is %u MHz\n", get_cpu_mhz());
+ smp_initialise();
+
apic_setup();
pci_setup();
- smp_initialise();
-
perform_tests();
if ( bios->bios_info_setup )
diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
index 5d46eee1c5f4..43eb17e4e3be 100644
--- a/tools/firmware/hvmloader/smp.c
+++ b/tools/firmware/hvmloader/smp.c
@@ -29,7 +29,34 @@
#include <xen/vcpu.h>
-static int ap_callin;
+/**
+ * Lookup table of x2APIC IDs.
+ *
+ * Each entry is populated its respective CPU as they come online. This is required
+ * for generating the MADT with minimal assumptions about ID relationships.
+ */
+uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
+
+/** Tristate about x2apic being supported. -1=unknown */
+static int has_x2apic = -1;
+
+static uint32_t read_apic_id(void)
+{
+ uint32_t apic_id;
+
+ if ( has_x2apic )
+ cpuid(0xb, NULL, NULL, NULL, &apic_id);
+ else
+ {
+ cpuid(1, NULL, &apic_id, NULL, NULL);
+ apic_id >>= 24;
+ }
+
+ /* Never called by cpu0, so should never return 0 */
+ ASSERT(apic_id);
+
+ return apic_id;
+}
static void __attribute__((regparm(1))) cpu_setup(unsigned int cpu)
{
@@ -37,13 +64,17 @@ static void __attribute__((regparm(1))) cpu_setup(unsigned int cpu)
cacheattr_init();
printf("done.\n");
- if ( !cpu ) /* Used on the BSP too */
+ /* The BSP exits early because its APIC ID is known to be zero */
+ if ( !cpu )
return;
wmb();
- ap_callin = 1;
+ ACCESS_ONCE(CPU_TO_X2APICID[cpu]) = read_apic_id();
- /* After this point, the BSP will shut us down. */
+ /*
+ * After this point the BSP will shut us down. A write to
+ * CPU_TO_X2APICID[cpu] signals the BSP to bring down `cpu`.
+ */
for ( ;; )
asm volatile ( "hlt" );
@@ -54,10 +85,6 @@ static void boot_cpu(unsigned int cpu)
static uint8_t ap_stack[PAGE_SIZE] __attribute__ ((aligned (16)));
static struct vcpu_hvm_context ap;
- /* Initialise shared variables. */
- ap_callin = 0;
- wmb();
-
/* Wake up the secondary processor */
ap = (struct vcpu_hvm_context) {
.mode = VCPU_HVM_MODE_32B,
@@ -90,10 +117,11 @@ static void boot_cpu(unsigned int cpu)
BUG();
/*
- * Wait for the secondary processor to complete initialisation.
+ * Wait for the secondary processor to complete initialisation,
+ * which is signaled by its x2APIC ID being written to the LUT.
* Do not touch shared resources meanwhile.
*/
- while ( !ap_callin )
+ while ( !ACCESS_ONCE(CPU_TO_X2APICID[cpu]) )
cpu_relax();
/* Take the secondary processor offline. */
@@ -104,6 +132,12 @@ static void boot_cpu(unsigned int cpu)
void smp_initialise(void)
{
unsigned int i, nr_cpus = hvm_info->nr_vcpus;
+ uint32_t ecx;
+
+ cpuid(1, NULL, NULL, &ecx, NULL);
+ has_x2apic = (ecx >> 21) & 1;
+ if ( has_x2apic )
+ printf("x2APIC supported\n");
printf("Multiprocessor initialisation:\n");
cpu_setup(0);
diff --git a/tools/include/xen-tools/common-macros.h b/tools/include/xen-tools/common-macros.h
index 60912225cb7a..336c6309d96e 100644
--- a/tools/include/xen-tools/common-macros.h
+++ b/tools/include/xen-tools/common-macros.h
@@ -108,4 +108,9 @@
#define get_unaligned(ptr) get_unaligned_t(typeof(*(ptr)), ptr)
#define put_unaligned(val, ptr) put_unaligned_t(typeof(*(ptr)), val, ptr)
+#define __ACCESS_ONCE(x) ({ \
+ (void)(typeof(x))0; /* Scalar typecheck. */ \
+ (volatile typeof(x) *)&(x); })
+#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
+
#endif /* __XEN_TOOLS_COMMON_MACROS__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v4 05/10] xen/x86: Add supporting code for uploading LAPIC contexts during domain create
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
` (3 preceding siblings ...)
2024-06-26 16:28 ` [PATCH v4 04/10] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
2024-06-26 16:28 ` [PATCH v4 06/10] tools/libguest: Make setting MTRR registers unconditional Alejandro Vallejo
` (4 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné
This patch is a precondition for a later patch in which toolstack uses HVM
contexts to upload LAPIC data to a newly constructed domain.
If toolstack were to upload LAPIC contexts as part of domain creation as-is it
would encounter a problem were the architectural state does not reflect the APIC
ID in the hidden state. This patch ensures updates to the hidden state trigger
an update in the architectural registers so the APIC ID in both is consistent.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v4:
* New patch
---
xen/arch/x86/hvm/vlapic.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index b57e39d1c6dd..ebcf74711a13 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1622,7 +1622,27 @@ static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
s->loaded.hw = 1;
if ( s->loaded.regs )
+ {
+ /*
+ * We already processed architectural regs in lapic_load_regs(), so
+ * this must be a migration. Fix up inconsistencies from any older Xen.
+ */
lapic_load_fixup(s);
+ }
+ else
+ {
+ /*
+ * We haven't seen architectural regs so this could be a migration or a
+ * plain domain create. In the domain create case it's fine to modify
+ * the architectural state to align it to the APIC ID that was just
+ * uploaded and in the migrate case it doesn't matter because the
+ * architectural state will be replaced by the LAPIC_REGS ctx later on.
+ */
+ if ( vlapic_x2apic_mode(s) )
+ set_x2apic_id(s);
+ else
+ vlapic_set_reg(s, APIC_ID, SET_xAPIC_ID(s->hw.x2apic_id));
+ }
hvm_update_vlapic_mode(v);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v4 06/10] tools/libguest: Make setting MTRR registers unconditional
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
` (4 preceding siblings ...)
2024-06-26 16:28 ` [PATCH v4 05/10] xen/x86: Add supporting code for uploading LAPIC contexts during domain create Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
2024-06-27 9:42 ` Jan Beulich
2024-06-26 16:28 ` [PATCH v4 07/10] xen/lib: Add topology generator for x86 Alejandro Vallejo
` (3 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel; +Cc: Alejandro Vallejo, Anthony PERARD, Juergen Gross
This greatly simplifies a later patch that makes use of HVM contexts to upload
LAPIC data. The idea is to reuse MTRR setting procedure to avoid code
duplication. It's currently only used for PVH, but there's no real reason to
overcomplicate the toolstack preventing them being set for HVM too when
hvmloader will override them anyway.
While at it, add a missing "goto out" to what was the error condition in the
loop.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v4:
* New patch
---
tools/libs/guest/xg_dom_x86.c | 83 ++++++++++++++++++-----------------
1 file changed, 43 insertions(+), 40 deletions(-)
diff --git a/tools/libs/guest/xg_dom_x86.c b/tools/libs/guest/xg_dom_x86.c
index cba01384ae75..82ea3e2aab0b 100644
--- a/tools/libs/guest/xg_dom_x86.c
+++ b/tools/libs/guest/xg_dom_x86.c
@@ -989,6 +989,7 @@ const static void *hvm_get_save_record(const void *ctx, unsigned int type,
static int vcpu_hvm(struct xc_dom_image *dom)
{
+ /* Initialises the BSP */
struct {
struct hvm_save_descriptor header_d;
HVM_SAVE_TYPE(HEADER) header;
@@ -997,6 +998,18 @@ static int vcpu_hvm(struct xc_dom_image *dom)
struct hvm_save_descriptor end_d;
HVM_SAVE_TYPE(END) end;
} bsp_ctx;
+ /* Initialises APICs and MTRRs of every vCPU */
+ struct {
+ struct hvm_save_descriptor header_d;
+ HVM_SAVE_TYPE(HEADER) header;
+ struct hvm_save_descriptor mtrr_d;
+ HVM_SAVE_TYPE(MTRR) mtrr;
+ struct hvm_save_descriptor end_d;
+ HVM_SAVE_TYPE(END) end;
+ } vcpu_ctx;
+ /* Context from full_ctx */
+ const HVM_SAVE_TYPE(MTRR) *mtrr_record;
+ /* Raw context as taken from Xen */
uint8_t *full_ctx = NULL;
int rc;
@@ -1083,51 +1096,41 @@ static int vcpu_hvm(struct xc_dom_image *dom)
bsp_ctx.end_d.instance = 0;
bsp_ctx.end_d.length = HVM_SAVE_LENGTH(END);
- /* TODO: maybe this should be a firmware option instead? */
- if ( !dom->device_model )
+ /* TODO: maybe setting MTRRs should be a firmware option instead? */
+ mtrr_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 0);
+
+ if ( !mtrr_record)
{
- struct {
- struct hvm_save_descriptor header_d;
- HVM_SAVE_TYPE(HEADER) header;
- struct hvm_save_descriptor mtrr_d;
- HVM_SAVE_TYPE(MTRR) mtrr;
- struct hvm_save_descriptor end_d;
- HVM_SAVE_TYPE(END) end;
- } mtrr = {
- .header_d = bsp_ctx.header_d,
- .header = bsp_ctx.header,
- .mtrr_d.typecode = HVM_SAVE_CODE(MTRR),
- .mtrr_d.length = HVM_SAVE_LENGTH(MTRR),
- .end_d = bsp_ctx.end_d,
- .end = bsp_ctx.end,
- };
- const HVM_SAVE_TYPE(MTRR) *mtrr_record =
- hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 0);
- unsigned int i;
-
- if ( !mtrr_record )
- {
- xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
- "%s: unable to get MTRR save record", __func__);
- goto out;
- }
+ xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+ "%s: unable to get MTRR save record", __func__);
+ goto out;
+ }
- memcpy(&mtrr.mtrr, mtrr_record, sizeof(mtrr.mtrr));
+ vcpu_ctx.header_d = bsp_ctx.header_d;
+ vcpu_ctx.header = bsp_ctx.header;
+ vcpu_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR);
+ vcpu_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR);
+ vcpu_ctx.mtrr = *mtrr_record;
+ vcpu_ctx.end_d = bsp_ctx.end_d;
+ vcpu_ctx.end = bsp_ctx.end;
- /*
- * Enable MTRR, set default type to WB.
- * TODO: add MMIO areas as UC when passthrough is supported.
- */
- mtrr.mtrr.msr_mtrr_def_type = MTRR_TYPE_WRBACK | MTRR_DEF_TYPE_ENABLE;
+ /*
+ * Enable MTRR, set default type to WB.
+ * TODO: add MMIO areas as UC when passthrough is supported in PVH
+ */
+ vcpu_ctx.mtrr.msr_mtrr_def_type = MTRR_TYPE_WRBACK | MTRR_DEF_TYPE_ENABLE;
- for ( i = 0; i < dom->max_vcpus; i++ )
+ for ( unsigned int i = 0; i < dom->max_vcpus; i++ )
+ {
+ vcpu_ctx.mtrr_d.instance = i;
+
+ rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid,
+ (uint8_t *)&vcpu_ctx, sizeof(vcpu_ctx));
+ if ( rc != 0 )
{
- mtrr.mtrr_d.instance = i;
- rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid,
- (uint8_t *)&mtrr, sizeof(mtrr));
- if ( rc != 0 )
- xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
- "%s: SETHVMCONTEXT failed (rc=%d)", __func__, rc);
+ xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+ "%s: SETHVMCONTEXT failed (rc=%d)", __func__, rc);
+ goto out;
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v4 06/10] tools/libguest: Make setting MTRR registers unconditional
2024-06-26 16:28 ` [PATCH v4 06/10] tools/libguest: Make setting MTRR registers unconditional Alejandro Vallejo
@ 2024-06-27 9:42 ` Jan Beulich
2024-06-27 12:02 ` Alejandro Vallejo
0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-06-27 9:42 UTC (permalink / raw)
To: Alejandro Vallejo
Cc: Anthony PERARD, Juergen Gross, Xen-devel, Andrew Cooper,
Roger Pau Monné
On 26.06.2024 18:28, Alejandro Vallejo wrote:
> This greatly simplifies a later patch that makes use of HVM contexts to upload
> LAPIC data. The idea is to reuse MTRR setting procedure to avoid code
> duplication. It's currently only used for PVH, but there's no real reason to
> overcomplicate the toolstack preventing them being set for HVM too when
> hvmloader will override them anyway.
Yet then - why set them when hvmloader will do so again? Is it even guaranteed
to be no change in (guest) behavior to do so?
Plus what about a guest which was configured to have the CPUID bit for MTRRs
clear? I think we ought to document this as not supported for PVH (we may
actually choose to refuse building such a guest), but in principle the MTRR
save/load operations should simply fail for a HVM guest in said configuration.
Making such a change in Xen now would, afaict, be benign to the tool stack.
After this adjustment it would result in a perceived regression, when there
shouldn't be any.
Thinking about it, even for PVH it may make sense to allow CPUID.MTRR=0, as
long as CPUID.PAT=1, thus forcing it into PAT-only mode. I think we did even
discuss this possible configuration before.
Jan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 06/10] tools/libguest: Make setting MTRR registers unconditional
2024-06-27 9:42 ` Jan Beulich
@ 2024-06-27 12:02 ` Alejandro Vallejo
2024-06-27 14:53 ` Jan Beulich
0 siblings, 1 reply; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-27 12:02 UTC (permalink / raw)
To: Jan Beulich
Cc: Anthony PERARD, Juergen Gross, Xen-devel, Andrew Cooper,
Roger Pau Monné
On Thu Jun 27, 2024 at 10:42 AM BST, Jan Beulich wrote:
> On 26.06.2024 18:28, Alejandro Vallejo wrote:
> > This greatly simplifies a later patch that makes use of HVM contexts to upload
> > LAPIC data. The idea is to reuse MTRR setting procedure to avoid code
> > duplication. It's currently only used for PVH, but there's no real reason to
> > overcomplicate the toolstack preventing them being set for HVM too when
> > hvmloader will override them anyway.
>
> Yet then - why set them when hvmloader will do so again?
To keep the toolstack complexity tractable, essentially. This way I can send N
hypercalls (for N vCPUs) rather than 2*N and have a single hvmcontext struct
rather than several.
In truth though, I could simply write back the old MTRRs taken from bsp_ctx on
HVM.
> Is it even guaranteed
> to be no change in (guest) behavior to do so?
hvmloader overrides those values, so there is no change by the time BIOS or OVMF
start running. As I mentioned before though, I can actually upload back the old
values in the HVM case.
>
> Plus what about a guest which was configured to have the CPUID bit for MTRRs
> clear?
> I think we ought to document this as not supported for PVH (we may
By "this" do you mean PVH _must_ have MTRR support? I would agree.
> actually choose to refuse building such a guest), but in principle the MTRR
> save/load operations should simply fail for a HVM guest in said configuration.
What use cases does that cover? With the adjustment I mention at the top that
should be sorted. I'm wondering why we allow !mtrr at all.
> Making such a change in Xen now would, afaict, be benign to the tool stack.
> After this adjustment it would result in a perceived regression, when there
> shouldn't be any.
Fair point.
>
> Thinking about it, even for PVH it may make sense to allow CPUID.MTRR=0, as
> long as CPUID.PAT=1, thus forcing it into PAT-only mode. I think we did even
> discuss this possible configuration before.
>
> Jan
Is PAT-only an existing real HW configuration? Can't say I've seen any.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 06/10] tools/libguest: Make setting MTRR registers unconditional
2024-06-27 12:02 ` Alejandro Vallejo
@ 2024-06-27 14:53 ` Jan Beulich
0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2024-06-27 14:53 UTC (permalink / raw)
To: Alejandro Vallejo
Cc: Anthony PERARD, Juergen Gross, Xen-devel, Andrew Cooper,
Roger Pau Monné
On 27.06.2024 14:02, Alejandro Vallejo wrote:
> On Thu Jun 27, 2024 at 10:42 AM BST, Jan Beulich wrote:
>> Plus what about a guest which was configured to have the CPUID bit for MTRRs
>> clear?
>> I think we ought to document this as not supported for PVH (we may
>
> By "this" do you mean PVH _must_ have MTRR support? I would agree.
That was my first thought, yes. But then further down I adjusted my
considerations.
>> actually choose to refuse building such a guest), but in principle the MTRR
>> save/load operations should simply fail for a HVM guest in said configuration.
>
> What use cases does that cover? With the adjustment I mention at the top that
> should be sorted. I'm wondering why we allow !mtrr at all.
Not allowing it would open up for a mess in what CPUID bits we allow to
override and for which ones we'd deny overrides.
>> Making such a change in Xen now would, afaict, be benign to the tool stack.
>> After this adjustment it would result in a perceived regression, when there
>> shouldn't be any.
>
> Fair point.
>
>>
>> Thinking about it, even for PVH it may make sense to allow CPUID.MTRR=0, as
>> long as CPUID.PAT=1, thus forcing it into PAT-only mode. I think we did even
>> discuss this possible configuration before.
>
> Is PAT-only an existing real HW configuration? Can't say I've seen any.
I don't think there are any, but the architecture doesn't preclude it, and
that's a simpler model overall for an OS to work with. Hence why it was
discussed (to some degree) before (if my memory doesn't fail me there).
Jan
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 07/10] xen/lib: Add topology generator for x86
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
` (5 preceding siblings ...)
2024-06-26 16:28 ` [PATCH v4 06/10] tools/libguest: Make setting MTRR registers unconditional Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
2024-06-26 16:28 ` [PATCH v4 08/10] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
` (2 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Add a helper to populate topology leaves in the cpu policy from
threads/core and cores/package counts. It's unit-tested in test-cpu-policy.c,
but it's not connected to the rest of the code yet.
Adds the ASSERT() macro to xen/lib/x86/private.h, as it was missing.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v4:
* v1->v2 introduced a bug. lppp must be MIN(0xff, threads_per_pkg).
* Add missing MIN() when setting p->extd.nc (should've been done in v2)
---
tools/tests/cpu-policy/test-cpu-policy.c | 133 +++++++++++++++++++++++
xen/include/xen/lib/x86/cpu-policy.h | 16 +++
xen/lib/x86/policy.c | 88 +++++++++++++++
xen/lib/x86/private.h | 4 +
4 files changed, 241 insertions(+)
diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index 301df2c00285..849d7cebaa7c 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -650,6 +650,137 @@ static void test_is_compatible_failure(void)
}
}
+static void test_topo_from_parts(void)
+{
+ static const struct test {
+ unsigned int threads_per_core;
+ unsigned int cores_per_pkg;
+ struct cpu_policy policy;
+ } tests[] = {
+ {
+ .threads_per_core = 3, .cores_per_pkg = 1,
+ .policy = {
+ .x86_vendor = X86_VENDOR_AMD,
+ .topo.subleaf = {
+ { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+ { .nr_logical = 1, .level = 1, .type = 2, .id_shift = 2, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 1, .cores_per_pkg = 3,
+ .policy = {
+ .x86_vendor = X86_VENDOR_AMD,
+ .topo.subleaf = {
+ { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
+ { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 7, .cores_per_pkg = 5,
+ .policy = {
+ .x86_vendor = X86_VENDOR_AMD,
+ .topo.subleaf = {
+ { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
+ { .nr_logical = 5, .level = 1, .type = 2, .id_shift = 6, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 2, .cores_per_pkg = 128,
+ .policy = {
+ .x86_vendor = X86_VENDOR_AMD,
+ .topo.subleaf = {
+ { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
+ { .nr_logical = 128, .level = 1, .type = 2,
+ .id_shift = 8, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 3, .cores_per_pkg = 1,
+ .policy = {
+ .x86_vendor = X86_VENDOR_INTEL,
+ .topo.subleaf = {
+ { .nr_logical = 3, .level = 0, .type = 1, .id_shift = 2, },
+ { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 1, .cores_per_pkg = 3,
+ .policy = {
+ .x86_vendor = X86_VENDOR_INTEL,
+ .topo.subleaf = {
+ { .nr_logical = 1, .level = 0, .type = 1, .id_shift = 0, },
+ { .nr_logical = 3, .level = 1, .type = 2, .id_shift = 2, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 7, .cores_per_pkg = 5,
+ .policy = {
+ .x86_vendor = X86_VENDOR_INTEL,
+ .topo.subleaf = {
+ { .nr_logical = 7, .level = 0, .type = 1, .id_shift = 3, },
+ { .nr_logical = 35, .level = 1, .type = 2, .id_shift = 6, },
+ },
+ },
+ },
+ {
+ .threads_per_core = 2, .cores_per_pkg = 128,
+ .policy = {
+ .x86_vendor = X86_VENDOR_INTEL,
+ .topo.subleaf = {
+ { .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
+ { .nr_logical = 256, .level = 1, .type = 2,
+ .id_shift = 8, },
+ },
+ },
+ },
+ };
+
+ printf("Testing topology synthesis from parts:\n");
+
+ for ( size_t i = 0; i < ARRAY_SIZE(tests); ++i )
+ {
+ const struct test *t = &tests[i];
+ struct cpu_policy actual = { .x86_vendor = t->policy.x86_vendor };
+ int rc = x86_topo_from_parts(&actual, t->threads_per_core,
+ t->cores_per_pkg);
+
+ if ( rc || memcmp(&actual.topo, &t->policy.topo, sizeof(actual.topo)) )
+ {
+#define TOPO(n, f) t->policy.topo.subleaf[(n)].f, actual.topo.subleaf[(n)].f
+ fail("FAIL[%d] - '%s %u t/c, %u c/p'\n",
+ rc,
+ x86_cpuid_vendor_to_str(t->policy.x86_vendor),
+ t->threads_per_core, t->cores_per_pkg);
+ printf(" subleaf=%u expected_n=%u actual_n=%u\n"
+ " expected_lvl=%u actual_lvl=%u\n"
+ " expected_type=%u actual_type=%u\n"
+ " expected_shift=%u actual_shift=%u\n",
+ 0,
+ TOPO(0, nr_logical),
+ TOPO(0, level),
+ TOPO(0, type),
+ TOPO(0, id_shift));
+
+ printf(" subleaf=%u expected_n=%u actual_n=%u\n"
+ " expected_lvl=%u actual_lvl=%u\n"
+ " expected_type=%u actual_type=%u\n"
+ " expected_shift=%u actual_shift=%u\n",
+ 1,
+ TOPO(1, nr_logical),
+ TOPO(1, level),
+ TOPO(1, type),
+ TOPO(1, id_shift));
+#undef TOPO
+ }
+ }
+}
+
int main(int argc, char **argv)
{
printf("CPU Policy unit tests\n");
@@ -667,6 +798,8 @@ int main(int argc, char **argv)
test_is_compatible_success();
test_is_compatible_failure();
+ test_topo_from_parts();
+
if ( nr_failures )
printf("Done: %u failures\n", nr_failures);
else
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index d26012c6da78..79fdf9045a1b 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -542,6 +542,22 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
const struct cpu_policy *guest,
struct cpu_policy_errors *err);
+/**
+ * Synthesise topology information in `p` given high-level constraints
+ *
+ * Topology is given in various fields accross several leaves, some of
+ * which are vendor-specific. This function uses the policy itself to
+ * derive such leaves from threads/core and cores/package.
+ *
+ * @param p CPU policy of the domain.
+ * @param threads_per_core threads/core. Doesn't need to be a power of 2.
+ * @param cores_per_package cores/package. Doesn't need to be a power of 2.
+ * @return 0 on success; -errno on failure
+ */
+int x86_topo_from_parts(struct cpu_policy *p,
+ unsigned int threads_per_core,
+ unsigned int cores_per_pkg);
+
#endif /* !XEN_LIB_X86_POLICIES_H */
/*
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index f033d22785be..72b67b44a893 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,6 +2,94 @@
#include <xen/lib/x86/cpu-policy.h>
+static unsigned int order(unsigned int n)
+{
+ ASSERT(n); /* clz(0) is UB */
+
+ return 8 * sizeof(n) - __builtin_clz(n);
+}
+
+int x86_topo_from_parts(struct cpu_policy *p,
+ unsigned int threads_per_core,
+ unsigned int cores_per_pkg)
+{
+ unsigned int threads_per_pkg = threads_per_core * cores_per_pkg;
+ unsigned int apic_id_size;
+
+ if ( !p || !threads_per_core || !cores_per_pkg )
+ return -EINVAL;
+
+ p->basic.max_leaf = MAX(0xb, p->basic.max_leaf);
+
+ memset(p->topo.raw, 0, sizeof(p->topo.raw));
+
+ /* thread level */
+ p->topo.subleaf[0].nr_logical = threads_per_core;
+ p->topo.subleaf[0].id_shift = 0;
+ p->topo.subleaf[0].level = 0;
+ p->topo.subleaf[0].type = 1;
+ if ( threads_per_core > 1 )
+ p->topo.subleaf[0].id_shift = order(threads_per_core - 1);
+
+ /* core level */
+ p->topo.subleaf[1].nr_logical = cores_per_pkg;
+ if ( p->x86_vendor == X86_VENDOR_INTEL )
+ p->topo.subleaf[1].nr_logical = threads_per_pkg;
+ p->topo.subleaf[1].id_shift = p->topo.subleaf[0].id_shift;
+ p->topo.subleaf[1].level = 1;
+ p->topo.subleaf[1].type = 2;
+ if ( cores_per_pkg > 1 )
+ p->topo.subleaf[1].id_shift += order(cores_per_pkg - 1);
+
+ apic_id_size = p->topo.subleaf[1].id_shift;
+
+ /*
+ * Contrary to what the name might seem to imply. HTT is an enabler for
+ * SMP and there's no harm in setting it even with a single vCPU.
+ */
+ p->basic.htt = true;
+ p->basic.lppp = MIN(0xff, threads_per_pkg);
+
+ switch ( p->x86_vendor )
+ {
+ case X86_VENDOR_INTEL: {
+ struct cpuid_cache_leaf *sl = p->cache.subleaf;
+
+ for ( size_t i = 0; sl->type &&
+ i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
+ {
+ sl->cores_per_package = cores_per_pkg - 1;
+ sl->threads_per_cache = threads_per_core - 1;
+ if ( sl->type == 3 /* unified cache */ )
+ sl->threads_per_cache = threads_per_pkg - 1;
+ }
+ break;
+ }
+
+ case X86_VENDOR_AMD:
+ case X86_VENDOR_HYGON:
+ /* Expose p->basic.lppp */
+ p->extd.cmp_legacy = true;
+
+ /* Clip NC to the maximum value it can hold */
+ p->extd.nc = MIN(0xff, threads_per_pkg - 1);
+
+ /* TODO: Expose leaf e1E */
+ p->extd.topoext = false;
+
+ /*
+ * Clip APIC ID to 8 bits, as that's what high core-count machines do.
+ *
+ * That's what AMD EPYC 9654 does with >256 CPUs.
+ */
+ p->extd.apic_id_size = MIN(8, apic_id_size);
+
+ break;
+ }
+
+ return 0;
+}
+
int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
const struct cpu_policy *guest,
struct cpu_policy_errors *err)
diff --git a/xen/lib/x86/private.h b/xen/lib/x86/private.h
index 60bb82a400b7..2ec9dbee33c2 100644
--- a/xen/lib/x86/private.h
+++ b/xen/lib/x86/private.h
@@ -4,6 +4,7 @@
#ifdef __XEN__
#include <xen/bitops.h>
+#include <xen/bug.h>
#include <xen/guest_access.h>
#include <xen/kernel.h>
#include <xen/lib.h>
@@ -17,6 +18,7 @@
#else
+#include <assert.h>
#include <errno.h>
#include <inttypes.h>
#include <stdbool.h>
@@ -28,6 +30,8 @@
#include <xen-tools/common-macros.h>
+#define ASSERT(x) assert(x)
+
static inline bool test_bit(unsigned int bit, const void *vaddr)
{
const char *addr = vaddr;
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v4 08/10] xen/x86: Derive topologically correct x2APIC IDs from the policy
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
` (6 preceding siblings ...)
2024-06-26 16:28 ` [PATCH v4 07/10] xen/lib: Add topology generator for x86 Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
2024-06-26 16:28 ` [PATCH v4 09/10] xen/x86: Synthesise domain topologies Alejandro Vallejo
2024-06-26 16:28 ` [PATCH v4 10/10] tools/libguest: Set topologically correct x2APIC IDs for each vCPU Alejandro Vallejo
9 siblings, 0 replies; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD
Implements the helper for mapping vcpu_id to x2apic_id given a valid
topology in a policy. The algo is written with the intention of extending
it to leaves 0x1f and extended 0x26 in the future.
Toolstack doesn't set leaf 0xb and the HVM default policy has it cleared,
so the leaf is not implemented. In that case, the new helper just returns
the legacy mapping.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v2->v4 (v3 was not reviewed):
* Rewrite eXX notation for CPUID leaves as "extended XX"
* Newlines and linewraps
* In the unit-test, reduce the scope of `policy`
* In the unit-test, fail if topology generation fails.
---
tools/tests/cpu-policy/test-cpu-policy.c | 68 +++++++++++++++++++++
xen/include/xen/lib/x86/cpu-policy.h | 11 ++++
xen/lib/x86/policy.c | 76 ++++++++++++++++++++++++
3 files changed, 155 insertions(+)
diff --git a/tools/tests/cpu-policy/test-cpu-policy.c b/tools/tests/cpu-policy/test-cpu-policy.c
index 849d7cebaa7c..e5f9b8f7ee39 100644
--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -781,6 +781,73 @@ static void test_topo_from_parts(void)
}
}
+static void test_x2apic_id_from_vcpu_id_success(void)
+{
+ static const struct test {
+ unsigned int vcpu_id;
+ unsigned int threads_per_core;
+ unsigned int cores_per_pkg;
+ uint32_t x2apic_id;
+ uint8_t x86_vendor;
+ } tests[] = {
+ {
+ .vcpu_id = 3, .threads_per_core = 3, .cores_per_pkg = 8,
+ .x2apic_id = 1 << 2,
+ },
+ {
+ .vcpu_id = 6, .threads_per_core = 3, .cores_per_pkg = 8,
+ .x2apic_id = 2 << 2,
+ },
+ {
+ .vcpu_id = 24, .threads_per_core = 3, .cores_per_pkg = 8,
+ .x2apic_id = 1 << 5,
+ },
+ {
+ .vcpu_id = 35, .threads_per_core = 3, .cores_per_pkg = 8,
+ .x2apic_id = (35 % 3) | (((35 / 3) % 8) << 2) | ((35 / 24) << 5),
+ },
+ {
+ .vcpu_id = 96, .threads_per_core = 7, .cores_per_pkg = 3,
+ .x2apic_id = (96 % 7) | (((96 / 7) % 3) << 3) | ((96 / 21) << 5),
+ },
+ };
+
+ const uint8_t vendors[] = {
+ X86_VENDOR_INTEL,
+ X86_VENDOR_AMD,
+ X86_VENDOR_CENTAUR,
+ X86_VENDOR_SHANGHAI,
+ X86_VENDOR_HYGON,
+ };
+
+ printf("Testing x2apic id from vcpu id success:\n");
+
+ /* Perform the test run on every vendor we know about */
+ for ( size_t i = 0; i < ARRAY_SIZE(vendors); ++i )
+ {
+ for ( size_t j = 0; j < ARRAY_SIZE(tests); ++j )
+ {
+ struct cpu_policy policy = { .x86_vendor = vendors[i] };
+ const struct test *t = &tests[j];
+ uint32_t x2apic_id;
+ int rc = x86_topo_from_parts(&policy, t->threads_per_core,
+ t->cores_per_pkg);
+
+ if ( rc ) {
+ fail("FAIL[%d] - 'x86_topo_from_parts() failed", rc);
+ continue;
+ }
+
+ x2apic_id = x86_x2apic_id_from_vcpu_id(&policy, t->vcpu_id);
+ if ( x2apic_id != t->x2apic_id )
+ fail("FAIL - '%s cpu%u %u t/c %u c/p'. bad x2apic_id: expected=%u actual=%u\n",
+ x86_cpuid_vendor_to_str(policy.x86_vendor),
+ t->vcpu_id, t->threads_per_core, t->cores_per_pkg,
+ t->x2apic_id, x2apic_id);
+ }
+ }
+}
+
int main(int argc, char **argv)
{
printf("CPU Policy unit tests\n");
@@ -799,6 +866,7 @@ int main(int argc, char **argv)
test_is_compatible_failure();
test_topo_from_parts();
+ test_x2apic_id_from_vcpu_id_success();
if ( nr_failures )
printf("Done: %u failures\n", nr_failures);
diff --git a/xen/include/xen/lib/x86/cpu-policy.h b/xen/include/xen/lib/x86/cpu-policy.h
index 79fdf9045a1b..d545d4727711 100644
--- a/xen/include/xen/lib/x86/cpu-policy.h
+++ b/xen/include/xen/lib/x86/cpu-policy.h
@@ -542,6 +542,17 @@ int x86_cpu_policies_are_compatible(const struct cpu_policy *host,
const struct cpu_policy *guest,
struct cpu_policy_errors *err);
+/**
+ * Calculates the x2APIC ID of a vCPU given a CPU policy
+ *
+ * If the policy lacks leaf 0xb falls back to legacy mapping of apic_id=cpu*2
+ *
+ * @param p CPU policy of the domain.
+ * @param id vCPU ID of the vCPU.
+ * @returns x2APIC ID of the vCPU.
+ */
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id);
+
/**
* Synthesise topology information in `p` given high-level constraints
*
diff --git a/xen/lib/x86/policy.c b/xen/lib/x86/policy.c
index 72b67b44a893..c52b7192559a 100644
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -2,6 +2,82 @@
#include <xen/lib/x86/cpu-policy.h>
+static uint32_t parts_per_higher_scoped_level(const struct cpu_policy *p,
+ size_t lvl)
+{
+ /*
+ * `nr_logical` reported by Intel is the number of THREADS contained in
+ * the next topological scope. For example, assuming a system with 2
+ * threads/core and 3 cores/module in a fully symmetric topology,
+ * `nr_logical` at the core level will report 6. Because it's reporting
+ * the number of threads in a module.
+ *
+ * On AMD/Hygon, nr_logical is already normalized by the higher scoped
+ * level (cores/complex, etc) so we can return it as-is.
+ */
+ if ( p->x86_vendor != X86_VENDOR_INTEL || !lvl )
+ return p->topo.subleaf[lvl].nr_logical;
+
+ return p->topo.subleaf[lvl].nr_logical /
+ p->topo.subleaf[lvl - 1].nr_logical;
+}
+
+uint32_t x86_x2apic_id_from_vcpu_id(const struct cpu_policy *p, uint32_t id)
+{
+ uint32_t shift = 0, x2apic_id = 0;
+
+ /* In the absence of topology leaves, fallback to traditional mapping */
+ if ( !p->topo.subleaf[0].type )
+ return id * 2;
+
+ /*
+ * `id` means different things at different points of the algo
+ *
+ * At lvl=0: global thread_id (same as vcpu_id)
+ * At lvl=1: global core_id
+ * At lvl=2: global socket_id (actually complex_id in AMD, module_id
+ * in Intel, but the name is inconsequential)
+ *
+ * +--+
+ * ____ |#0| ______ <= 1 socket
+ * / +--+ \+--+
+ * __#0__ __|#1|__ <= 2 cores/socket
+ * / | \ +--+/ +-|+ \
+ * #0 #1 #2 |#3| #4 #5 <= 3 threads/core
+ * +--+
+ *
+ * ... and so on. Global in this context means that it's a unique
+ * identifier for the whole topology, and not relative to the level
+ * it's in. For example, in the diagram shown above, we're looking at
+ * thread #3 in the global sense, though it's #0 within its core.
+ *
+ * Note that dividing a global thread_id by the number of threads per
+ * core returns the global core id that contains it. e.g: 0, 1 or 2
+ * divided by 3 returns core_id=0. 3, 4 or 5 divided by 3 returns core
+ * 1, and so on. An analogous argument holds for higher levels. This is
+ * the property we exploit to derive x2apic_id from vcpu_id.
+ *
+ * NOTE: `topo` is currently derived from leaf 0xb, which is bound to two
+ * levels, but once we track leaves 0x1f (or extended 0x26) there will be a
+ * few more. The algorithm is written to cope with that case.
+ */
+ for ( uint32_t i = 0; i < ARRAY_SIZE(p->topo.raw); i++ )
+ {
+ uint32_t nr_parts;
+
+ if ( !p->topo.subleaf[i].type )
+ /* sentinel subleaf */
+ break;
+
+ nr_parts = parts_per_higher_scoped_level(p, i);
+ x2apic_id |= (id % nr_parts) << shift;
+ id /= nr_parts;
+ shift = p->topo.subleaf[i].id_shift;
+ }
+
+ return (id << shift) | x2apic_id;
+}
+
static unsigned int order(unsigned int n)
{
ASSERT(n); /* clz(0) is UB */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v4 09/10] xen/x86: Synthesise domain topologies
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
` (7 preceding siblings ...)
2024-06-26 16:28 ` [PATCH v4 08/10] xen/x86: Derive topologically correct x2APIC IDs from the policy Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
2024-06-26 16:28 ` [PATCH v4 10/10] tools/libguest: Set topologically correct x2APIC IDs for each vCPU Alejandro Vallejo
9 siblings, 0 replies; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Anthony PERARD, Juergen Gross, Jan Beulich,
Andrew Cooper, Roger Pau Monné
Expose sensible topologies in leaf 0xb. At the moment it synthesises non-HT
systems, in line with the previous code intent.
Leaf 0xb in the host policy is no longer zapped and the guest {max,def} policies
have their topology leaves zapped instead. The intent is for toolstack to
populate them. There's no current use for the topology information in the host
policy, but it makes no harm.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
This patch MUST NOT go in without the following intimately related one
"Set topologically correct x2APIC IDs for each vCPU"
Otherwise we expose one topology and then create APIC IDs that don't reflect it
v2->v4 (v3 was not reviewed):
* Adjustments to the commit message
* Various newline/linewrap fixes
* Also print error code in new ERROR() message
* Preserve old logic to recreate old CPUID policy to enable migrations from
versions of Xen without policy information in the migration stream.
---
tools/libs/guest/xg_cpuid_x86.c | 24 +++++++++++++++++++++++-
xen/arch/x86/cpu-policy.c | 9 ++++++---
2 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 4453178100ad..6062dcab01ce 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -725,8 +725,16 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
p->policy.basic.htt = test_bit(X86_FEATURE_HTT, host_featureset);
p->policy.extd.cmp_legacy = test_bit(X86_FEATURE_CMP_LEGACY, host_featureset);
}
- else
+ else if ( restore )
{
+ /*
+ * Reconstruct the topology exposed on Xen <= 4.13. It makes very little
+ * sense, but it's what those guests saw so it's set in stone now.
+ *
+ * Guests from Xen 4.14 onwards carry their own CPUID leaves in the
+ * migration stream so they don't need special treatment.
+ */
+
/*
* Topology for HVM guests is entirely controlled by Xen. For now, we
* hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
@@ -782,6 +790,20 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, bool restore,
break;
}
}
+ else
+ {
+ /* TODO: Expose the ability to choose a custom topology for HVM/PVH */
+ unsigned int threads_per_core = 1;
+ unsigned int cores_per_pkg = di.max_vcpu_id + 1;
+
+ rc = x86_topo_from_parts(&p->policy, threads_per_core, cores_per_pkg);
+ if ( rc )
+ {
+ ERROR("Failed to generate topology: rc=%d t/c=%u c/p=%u",
+ rc, threads_per_core, cores_per_pkg);
+ goto out;
+ }
+ }
nr_leaves = ARRAY_SIZE(p->leaves);
rc = x86_cpuid_copy_to_buffer(&p->policy, p->leaves, &nr_leaves);
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 304dc20cfab8..55a95f6e164c 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -263,9 +263,6 @@ static void recalculate_misc(struct cpu_policy *p)
p->basic.raw[0x8] = EMPTY_LEAF;
- /* TODO: Rework topology logic. */
- memset(p->topo.raw, 0, sizeof(p->topo.raw));
-
p->basic.raw[0xc] = EMPTY_LEAF;
p->extd.e1d &= ~CPUID_COMMON_1D_FEATURES;
@@ -613,6 +610,9 @@ static void __init calculate_pv_max_policy(void)
recalculate_xstate(p);
p->extd.raw[0xa] = EMPTY_LEAF; /* No SVM for PV guests. */
+
+ /* Wipe host topology. Populated by toolstack */
+ memset(p->topo.raw, 0, sizeof(p->topo.raw));
}
static void __init calculate_pv_def_policy(void)
@@ -776,6 +776,9 @@ static void __init calculate_hvm_max_policy(void)
/* It's always possible to emulate CPUID faulting for HVM guests */
p->platform_info.cpuid_faulting = true;
+
+ /* Wipe host topology. Populated by toolstack */
+ memset(p->topo.raw, 0, sizeof(p->topo.raw));
}
static void __init calculate_hvm_def_policy(void)
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v4 10/10] tools/libguest: Set topologically correct x2APIC IDs for each vCPU
2024-06-26 16:28 [PATCH for-4.19 v4 00/10] x86: Expose consistent topology to guests Alejandro Vallejo
` (8 preceding siblings ...)
2024-06-26 16:28 ` [PATCH v4 09/10] xen/x86: Synthesise domain topologies Alejandro Vallejo
@ 2024-06-26 16:28 ` Alejandro Vallejo
9 siblings, 0 replies; 17+ messages in thread
From: Alejandro Vallejo @ 2024-06-26 16:28 UTC (permalink / raw)
To: Xen-devel; +Cc: Alejandro Vallejo, Anthony PERARD, Juergen Gross
Have toolstack populate the new x2APIC ID in the LAPIC save record with the
proper IDs intended for each vCPU.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v4:
* New patch. Replaced v3's method of letting Xen find out via the same
algorithm toolstack uses.
---
tools/libs/guest/xg_dom_x86.c | 37 ++++++++++++++++++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/tools/libs/guest/xg_dom_x86.c b/tools/libs/guest/xg_dom_x86.c
index 82ea3e2aab0b..2ae3a779b016 100644
--- a/tools/libs/guest/xg_dom_x86.c
+++ b/tools/libs/guest/xg_dom_x86.c
@@ -1004,19 +1004,40 @@ static int vcpu_hvm(struct xc_dom_image *dom)
HVM_SAVE_TYPE(HEADER) header;
struct hvm_save_descriptor mtrr_d;
HVM_SAVE_TYPE(MTRR) mtrr;
+ struct hvm_save_descriptor lapic_d;
+ HVM_SAVE_TYPE(LAPIC) lapic;
struct hvm_save_descriptor end_d;
HVM_SAVE_TYPE(END) end;
} vcpu_ctx;
- /* Context from full_ctx */
+ /* Contexts from full_ctx */
const HVM_SAVE_TYPE(MTRR) *mtrr_record;
+ const HVM_SAVE_TYPE(LAPIC) *lapic_record;
/* Raw context as taken from Xen */
uint8_t *full_ctx = NULL;
+ xc_cpu_policy_t *policy = xc_cpu_policy_init();
int rc;
DOMPRINTF_CALLED(dom->xch);
assert(dom->max_vcpus);
+ /*
+ * Fetch the CPU policy of this domain. We need it to determine the APIC IDs
+ * each of vCPU in a manner consistent with the exported topology.
+ *
+ * TODO: It's silly to query a policy we have ourselves created. It should
+ * instead be part of xc_dom_image
+ */
+
+ rc = xc_cpu_policy_get_domain(dom->xch, dom->guest_domid, policy);
+ if ( rc != 0 )
+ {
+ xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+ "%s: unable to fetch cpu policy for dom%u (rc=%d)",
+ __func__, dom->guest_domid, rc);
+ goto out;
+ }
+
/*
* Get the full HVM context in order to have the header, it is not
* possible to get the header with getcontext_partial, and crafting one
@@ -1111,6 +1132,8 @@ static int vcpu_hvm(struct xc_dom_image *dom)
vcpu_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR);
vcpu_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR);
vcpu_ctx.mtrr = *mtrr_record;
+ vcpu_ctx.lapic_d.typecode = HVM_SAVE_CODE(LAPIC);
+ vcpu_ctx.lapic_d.length = HVM_SAVE_LENGTH(LAPIC);
vcpu_ctx.end_d = bsp_ctx.end_d;
vcpu_ctx.end = bsp_ctx.end;
@@ -1124,6 +1147,17 @@ static int vcpu_hvm(struct xc_dom_image *dom)
{
vcpu_ctx.mtrr_d.instance = i;
+ lapic_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(LAPIC), i);
+ if ( !lapic_record )
+ {
+ xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+ "%s: unable to get LAPIC[%d] save record", __func__, i);
+ goto out;
+ }
+ vcpu_ctx.lapic = *lapic_record;
+ vcpu_ctx.lapic.x2apic_id = x86_x2apic_id_from_vcpu_id(&policy->policy, i);
+ vcpu_ctx.lapic_d.instance = i;
+
rc = xc_domain_hvm_setcontext(dom->xch, dom->guest_domid,
(uint8_t *)&vcpu_ctx, sizeof(vcpu_ctx));
if ( rc != 0 )
@@ -1146,6 +1180,7 @@ static int vcpu_hvm(struct xc_dom_image *dom)
out:
free(full_ctx);
+ xc_cpu_policy_destroy(policy);
return rc;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread