* [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
@ 2016-07-04 19:16 Dr. David Alan Gilbert (git)
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 1/6] x86: Allow physical address bits to be set Dr. David Alan Gilbert (git)
` (7 more replies)
0 siblings, 8 replies; 39+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-07-04 19:16 UTC (permalink / raw)
To: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
QEMU sets the guests physical address bits to 40; this is wrong
on most hardware, and can be detected by the guest.
It also stops you using really huge multi-TB VMs.
Red Hat has had a patch, that Andrea wrote, downstream for a couple
of years that reads the hosts value and uses that in the guest. That's
correct as far as the guest sees it, and lets you create huge VMs.
The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
life gets complicated in migration; prior to 2.6 it all apparently
worked (although a guest that looked might spot the change).
In 2.6 Paolo started checking MSR writes and they failed when the
incoming MTRR mask didn't fit.
This series:
a) Fixes up mtrr masks so that if you're migrating between hosts
of different physical address size it tries to do something sensible.
b) Lets you specify the guest physical address size via a CPU property, i.e.
-cpu SandyBridge,phys-bits=36
The default on old machine types is to use the existing 40 bits value.
c) Lets you tell qemu to use the same setting as the host, i.e.
-cpu SandyBridge,phys-bits=0
This is the default on new machine types.
Note that mixed size hosts are still not necessarily safe; a guest
started on a host with a large physical address size might start using
those bits and get upset when it's moved to a small host.
However that was already potentially broken in existing qemu that
used a magic value of 40.
There's potential to add some extra guards against people
doing silly stuff; e.g. stop people running VMs using 1TB of
address space on a tiny host.
Dave
v2
Default on new machine types is to read from the host
Use the MAKE_64BIT_MASK macro
Validate phys_bits in the realise method
Move reading the host physical bits to the realise method
Set phys-bits even for 32bit guests
Add warning when your phys-bits doesn't match your host in the none
default case
Dr. David Alan Gilbert (6):
x86: Allow physical address bits to be set
x86: Mask mtrr mask based on CPU physical address limits
x86: fill high bits of mtrr mask
x86: Set physical address bits based on host
x86: fix up 32 bit phys_bits case
x86: Add sanity checks on phys_bits
include/hw/i386/pc.h | 10 ++++++++
target-i386/cpu.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++------
target-i386/cpu.h | 6 +++++
target-i386/kvm.c | 36 +++++++++++++++++++++++---
4 files changed, 112 insertions(+), 11 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [Qemu-devel] [PATCH v2 1/6] x86: Allow physical address bits to be set
2016-07-04 19:16 [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Dr. David Alan Gilbert (git)
@ 2016-07-04 19:16 ` Dr. David Alan Gilbert (git)
2016-07-04 19:33 ` Eduardo Habkost
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits Dr. David Alan Gilbert (git)
` (6 subsequent siblings)
7 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-07-04 19:16 UTC (permalink / raw)
To: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Currently QEMU sets the x86 number of physical address bits to the
magic number 40. This is only correct on some small AMD systems;
Intel systems tend to have 36, 39, 46 bits, and large AMD systems
tend to have 48.
Having the value different from your actual hardware is detectable
by the guest and in principal can cause problems;
The current limit of 40 stops TB VMs being created by those lucky
enough to have that much.
This patch lets you set the physical bits by a cpu property but
defaults to the same existing magic 40.
I've removed the ancient warning about the 42 bit limit in exec.c;
I can't find that limit in there and no one else seems to know where
it is.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
target-i386/cpu.c | 8 +++++---
target-i386/cpu.h | 3 +++
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 3bd3cfc..ab13ef5 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2604,9 +2604,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
/* virtual & phys address size in low 2 bytes. */
/* XXX: This value must match the one used in the MMU code. */
if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
- /* 64 bit processor */
-/* XXX: The physical address space is limited to 42 bits in exec.c. */
- *eax = 0x00003028; /* 48 bits virtual, 40 bits physical */
+ /* 64 bit processor, 48 bits virtual, configurable
+ * physical bits.
+ */
+ *eax = 0x00003000 + cpu->phys_bits;
} else {
if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
*eax = 0x00000024; /* 36 bits physical */
@@ -3257,6 +3258,7 @@ static Property x86_cpu_properties[] = {
DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
+ DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 40),
DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 474b0b9..221b1a2 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -1181,6 +1181,9 @@ struct X86CPU {
/* Compatibility bits for old machine types: */
bool enable_cpuid_0xb;
+ /* Number of physical address bits supported */
+ uint32_t phys_bits;
+
/* in order to simplify APIC support, we leave this pointer to the
user */
struct DeviceState *apic_state;
--
2.7.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits
2016-07-04 19:16 [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Dr. David Alan Gilbert (git)
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 1/6] x86: Allow physical address bits to be set Dr. David Alan Gilbert (git)
@ 2016-07-04 19:16 ` Dr. David Alan Gilbert (git)
2016-07-04 20:02 ` Michael S. Tsirkin
2016-07-04 20:03 ` Eduardo Habkost
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask Dr. David Alan Gilbert (git)
` (5 subsequent siblings)
7 siblings, 2 replies; 39+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-07-04 19:16 UTC (permalink / raw)
To: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
The CPU GPs if we try and set a bit in a variable MTRR mask above
the limit of physical address bits on the host. We hit this
when loading a migration from a host with a larger physical
address limit than our destination (e.g. a Xeon->i7 of same
generation) but previously used to get away with it
until 48e1a45 started checking that msr writes actually worked.
It seems in our case the GP probably comes from KVM emulating
that GP.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
target-i386/kvm.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f3698f1..6429205 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1677,6 +1677,8 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
}
}
if (has_msr_mtrr) {
+ uint64_t phys_mask = MAKE_64BIT_MASK(0, cpu->phys_bits);
+
kvm_msr_entry_add(cpu, MSR_MTRRdefType, env->mtrr_deftype);
kvm_msr_entry_add(cpu, MSR_MTRRfix64K_00000, env->mtrr_fixed[0]);
kvm_msr_entry_add(cpu, MSR_MTRRfix16K_80000, env->mtrr_fixed[1]);
@@ -1690,10 +1692,15 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
kvm_msr_entry_add(cpu, MSR_MTRRfix4K_F0000, env->mtrr_fixed[9]);
kvm_msr_entry_add(cpu, MSR_MTRRfix4K_F8000, env->mtrr_fixed[10]);
for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
+ /* The CPU GPs if we write to a bit above the physical limit of
+ * the host CPU (and KVM emulates that)
+ */
+ uint64_t mask = env->mtrr_var[i].mask;
+ mask &= phys_mask;
+
kvm_msr_entry_add(cpu, MSR_MTRRphysBase(i),
env->mtrr_var[i].base);
- kvm_msr_entry_add(cpu, MSR_MTRRphysMask(i),
- env->mtrr_var[i].mask);
+ kvm_msr_entry_add(cpu, MSR_MTRRphysMask(i), mask);
}
}
--
2.7.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask
2016-07-04 19:16 [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Dr. David Alan Gilbert (git)
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 1/6] x86: Allow physical address bits to be set Dr. David Alan Gilbert (git)
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits Dr. David Alan Gilbert (git)
@ 2016-07-04 19:16 ` Dr. David Alan Gilbert (git)
2016-07-04 20:03 ` Michael S. Tsirkin
2016-07-04 20:21 ` Eduardo Habkost
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 4/6] x86: Set physical address bits based on host Dr. David Alan Gilbert (git)
` (4 subsequent siblings)
7 siblings, 2 replies; 39+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-07-04 19:16 UTC (permalink / raw)
To: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Fill the bits between 51..number-of-physical-address-bits in the
MTRR_PHYSMASKn variable range mtrr masks so that they're consistent
in the migration stream irrespective of the physical address space
of the source VM in a migration.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/hw/i386/pc.h | 5 +++++
target-i386/cpu.c | 1 +
target-i386/cpu.h | 3 +++
target-i386/kvm.c | 25 ++++++++++++++++++++++++-
4 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 7e43b20..d85e924 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -374,6 +374,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
.driver = TYPE_X86_CPU,\
.property = "cpuid-0xb",\
.value = "off",\
+ },\
+ {\
+ .driver = TYPE_X86_CPU,\
+ .property = "fill-mtrr-mask",\
+ .value = "off",\
},
#define PC_COMPAT_2_5 \
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index ab13ef5..5737aba 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3258,6 +3258,7 @@ static Property x86_cpu_properties[] = {
DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
+ DEFINE_PROP_BOOL("fill-mtrr-mask", X86CPU, fill_mtrr_mask, true),
DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 40),
DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 221b1a2..a9bbd91 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -1181,6 +1181,9 @@ struct X86CPU {
/* Compatibility bits for old machine types: */
bool enable_cpuid_0xb;
+ /* if true fill the top bits of the MTRR_PHYSMASKn variable range */
+ bool fill_mtrr_mask;
+
/* Number of physical address bits supported */
uint32_t phys_bits;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 6429205..cbcc30d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1935,6 +1935,7 @@ static int kvm_get_msrs(X86CPU *cpu)
CPUX86State *env = &cpu->env;
struct kvm_msr_entry *msrs = cpu->kvm_msr_buf->entries;
int ret, i;
+ uint64_t mtrr_top_bits;
kvm_msr_buf_reset(cpu);
@@ -2084,6 +2085,27 @@ static int kvm_get_msrs(X86CPU *cpu)
}
assert(ret == cpu->kvm_msr_buf->nmsrs);
+ /*
+ * MTRR masks: Each mask consists of 5 parts
+ * a 10..0: must be zero
+ * b 11 : valid bit
+ * c n-1.12: actual mask bits
+ * d 51..n: reserved must be zero
+ * e 63.52: reserved must be zero
+ *
+ * 'n' is the number of physical bits supported by the CPU and is
+ * apparently always <= 52. We know our 'n' but don't know what
+ * the destinations 'n' is; it might be smaller, in which case
+ * it masks (c) on loading. It might be larger, in which case
+ * we fill 'd' so that d..c is consistent irrespetive of the 'n'
+ * we're migrating to.
+ */
+ if (cpu->fill_mtrr_mask && cpu->phys_bits < 52) {
+ mtrr_top_bits = MAKE_64BIT_MASK(cpu->phys_bits, 52 - cpu->phys_bits);
+ } else {
+ mtrr_top_bits = 0;
+ }
+
for (i = 0; i < ret; i++) {
uint32_t index = msrs[i].index;
switch (index) {
@@ -2279,7 +2301,8 @@ static int kvm_get_msrs(X86CPU *cpu)
break;
case MSR_MTRRphysBase(0) ... MSR_MTRRphysMask(MSR_MTRRcap_VCNT - 1):
if (index & 1) {
- env->mtrr_var[MSR_MTRRphysIndex(index)].mask = msrs[i].data;
+ env->mtrr_var[MSR_MTRRphysIndex(index)].mask = msrs[i].data |
+ mtrr_top_bits;
} else {
env->mtrr_var[MSR_MTRRphysIndex(index)].base = msrs[i].data;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [Qemu-devel] [PATCH v2 4/6] x86: Set physical address bits based on host
2016-07-04 19:16 [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Dr. David Alan Gilbert (git)
` (2 preceding siblings ...)
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask Dr. David Alan Gilbert (git)
@ 2016-07-04 19:16 ` Dr. David Alan Gilbert (git)
2016-07-04 20:27 ` Eduardo Habkost
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case Dr. David Alan Gilbert (git)
` (3 subsequent siblings)
7 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-07-04 19:16 UTC (permalink / raw)
To: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
A special case based on the previous phys-bits property; if it's
the magic value 0 then use the hosts capabilities.
This becomes the default on new machine types.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
include/hw/i386/pc.h | 5 +++++
target-i386/cpu.c | 36 +++++++++++++++++++++++++++++++++++-
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index d85e924..bf31609 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -379,6 +379,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
.driver = TYPE_X86_CPU,\
.property = "fill-mtrr-mask",\
.value = "off",\
+ },\
+ {\
+ .driver = TYPE_X86_CPU,\
+ .property = "phys-bits",\
+ .value = "40",\
},
#define PC_COMPAT_2_5 \
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 5737aba..d45d2a6 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2957,6 +2957,40 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
& CPUID_EXT2_AMD_ALIASES);
}
+ /* For 64bit systems think about the number of physical bits to present.
+ * ideally this should be the same as the host; anything other than matching
+ * the host can cause incorrect guest behaviour.
+ * QEMU used to pick the magic value of 40 bits that corresponds to
+ * consumer AMD devices but nothing esle.
+ */
+ if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
+ uint32_t eax;
+ /* Read the hosts physical address size, and compare it to what we
+ * were asked for; note old machine types default to 40 bits
+ */
+ uint32_t host_phys_bits = 0;
+ host_cpuid(0x80000000, 0, &eax, NULL, NULL, NULL);
+ if (eax >= 0x80000008) {
+ host_cpuid(0x80000008, 0, &eax, NULL, NULL, NULL);
+ /* Note: According to AMD doc 25481 rev 2.34 they have a field
+ * at 23:16 that can specify a maximum physical address bits for
+ * the guest that can override this value; but I've not seen
+ * anything with that set.
+ */
+ host_phys_bits = eax & 0xff;
+ } else {
+ /* It's an odd 64 bit machine that doesn't have the leaf for
+ * physical address bits; fall back to 36 that's most older Intel.
+ */
+ host_phys_bits = 36;
+ }
+
+ if (cpu->phys_bits == 0) {
+ /* The user asked for us to use the host physical bits */
+ cpu->phys_bits = host_phys_bits;
+
+ }
+ }
cpu_exec_init(cs, &error_abort);
@@ -3259,7 +3293,7 @@ static Property x86_cpu_properties[] = {
DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
DEFINE_PROP_BOOL("fill-mtrr-mask", X86CPU, fill_mtrr_mask, true),
- DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 40),
+ DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 0),
DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
--
2.7.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case
2016-07-04 19:16 [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Dr. David Alan Gilbert (git)
` (3 preceding siblings ...)
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 4/6] x86: Set physical address bits based on host Dr. David Alan Gilbert (git)
@ 2016-07-04 19:16 ` Dr. David Alan Gilbert (git)
2016-07-05 9:42 ` Daniel P. Berrange
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 6/6] x86: Add sanity checks on phys_bits Dr. David Alan Gilbert (git)
` (2 subsequent siblings)
7 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-07-04 19:16 UTC (permalink / raw)
To: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
On 32 bit systems fix up phys_bits to be consistent with what
we tell the guest; don't ever bother with using the phys_bits
property.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
target-i386/cpu.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index d45d2a6..e15abea 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2609,11 +2609,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
*/
*eax = 0x00003000 + cpu->phys_bits;
} else {
- if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
- *eax = 0x00000024; /* 36 bits physical */
- } else {
- *eax = 0x00000020; /* 32 bits physical */
- }
+ *eax = cpu->phys_bits;
}
*ebx = 0;
*ecx = 0;
@@ -2990,6 +2986,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
cpu->phys_bits = host_phys_bits;
}
+ } else {
+ /* For 32 bit systems don't use the user set value, but keep
+ * phys_bits consistent with what we tell the guest.
+ */
+ if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
+ cpu->phys_bits = 36;
+ } else {
+ cpu->phys_bits = 32;
+ }
}
cpu_exec_init(cs, &error_abort);
--
2.7.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [Qemu-devel] [PATCH v2 6/6] x86: Add sanity checks on phys_bits
2016-07-04 19:16 [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Dr. David Alan Gilbert (git)
` (4 preceding siblings ...)
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case Dr. David Alan Gilbert (git)
@ 2016-07-04 19:16 ` Dr. David Alan Gilbert (git)
2016-07-04 20:46 ` Eduardo Habkost
2016-07-04 20:23 ` [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Michael S. Tsirkin
2016-07-05 9:46 ` Daniel P. Berrange
7 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2016-07-04 19:16 UTC (permalink / raw)
To: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Add some sanity checks on the phys-bits setting now that
the user can set it.
a) That it's in a sane range (52..32)
b) Warn if it mismatches the host and isn't the old default.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
target-i386/cpu.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index e15abea..5402002 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2985,6 +2985,19 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
/* The user asked for us to use the host physical bits */
cpu->phys_bits = host_phys_bits;
+ } else if (cpu->phys_bits > 52 || cpu->phys_bits < 32) {
+ error_setg(errp, "phys_bits should be between 32 and 52 or 0 to"
+ " use host size (but is %u)", cpu->phys_bits);
+ return;
+ }
+ /* Print a warning if the user set it to a value that's not the
+ * host value; ignore the magic value 40 because it may well just
+ * be the old machine type.
+ */
+ if (cpu->phys_bits != host_phys_bits && cpu->phys_bits != 40) {
+ fprintf(stderr, "Warning: Host physical bits (%u)"
+ " does not match phys_bits (%u)\n",
+ host_phys_bits, cpu->phys_bits);
}
} else {
/* For 32 bit systems don't use the user set value, but keep
--
2.7.4
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/6] x86: Allow physical address bits to be set
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 1/6] x86: Allow physical address bits to be set Dr. David Alan Gilbert (git)
@ 2016-07-04 19:33 ` Eduardo Habkost
2016-07-05 13:43 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 39+ messages in thread
From: Eduardo Habkost @ 2016-07-04 19:33 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, pbonzini, marcel, mst, kraxel
On Mon, Jul 04, 2016 at 08:16:04PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Currently QEMU sets the x86 number of physical address bits to the
> magic number 40. This is only correct on some small AMD systems;
> Intel systems tend to have 36, 39, 46 bits, and large AMD systems
> tend to have 48.
>
> Having the value different from your actual hardware is detectable
> by the guest and in principal can cause problems;
> The current limit of 40 stops TB VMs being created by those lucky
> enough to have that much.
target-i386/cpu.h has:
#ifdef TARGET_X86_64
#define TARGET_PHYS_ADDR_SPACE_BITS 52
/* ??? This is really 48 bits, sign-extended, but the only thing
accessible to userland with bit 48 set is the VSYSCALL, and that
is handled via other mechanisms. */
#define TARGET_VIRT_ADDR_SPACE_BITS 47
#else
#define TARGET_PHYS_ADDR_SPACE_BITS 36
#define TARGET_VIRT_ADDR_SPACE_BITS 32
#endif
Where did those values come from? What are they used for?
/* XXX: This value should match the one returned by CPUID
* and in exec.c */
# if defined(TARGET_X86_64)
# define PHYS_ADDR_MASK 0xffffffffffLL
# else
# define PHYS_ADDR_MASK 0xfffffffffLL
# endif
PHYS_ADDR_MASK is only used at:
#define PG_HI_RSVD_MASK (PG_ADDRESS_MASK & ~PHYS_ADDR_MASK)
PG_HI_RSVD_MASK is only used at x86_cpu_handle_mmu_fault(), and I
don't know if that function is TCG-specific or not.
>
> This patch lets you set the physical bits by a cpu property but
> defaults to the same existing magic 40.
>
> I've removed the ancient warning about the 42 bit limit in exec.c;
> I can't find that limit in there and no one else seems to know where
> it is.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> target-i386/cpu.c | 8 +++++---
> target-i386/cpu.h | 3 +++
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 3bd3cfc..ab13ef5 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -2604,9 +2604,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> /* virtual & phys address size in low 2 bytes. */
> /* XXX: This value must match the one used in the MMU code. */
> if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
> - /* 64 bit processor */
> -/* XXX: The physical address space is limited to 42 bits in exec.c. */
> - *eax = 0x00003028; /* 48 bits virtual, 40 bits physical */
> + /* 64 bit processor, 48 bits virtual, configurable
> + * physical bits.
> + */
> + *eax = 0x00003000 + cpu->phys_bits;
We need to ensure phys-bits is not out of range, either on the
property setter or on realizefn.
(I suggest realizefn. Custom property setters are evil)
Now, related to the questions above about the cpu.h macros: what
would be the appropriate range to check for?
> } else {
> if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
> *eax = 0x00000024; /* 36 bits physical */
> @@ -3257,6 +3258,7 @@ static Property x86_cpu_properties[] = {
> DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
> DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> + DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 40),
> DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
> DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
> DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 474b0b9..221b1a2 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -1181,6 +1181,9 @@ struct X86CPU {
> /* Compatibility bits for old machine types: */
> bool enable_cpuid_0xb;
>
> + /* Number of physical address bits supported */
> + uint32_t phys_bits;
> +
> /* in order to simplify APIC support, we leave this pointer to the
> user */
> struct DeviceState *apic_state;
> --
> 2.7.4
>
--
Eduardo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits Dr. David Alan Gilbert (git)
@ 2016-07-04 20:02 ` Michael S. Tsirkin
2016-07-04 20:05 ` Eduardo Habkost
2016-07-04 20:03 ` Eduardo Habkost
1 sibling, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 20:02 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git)
Cc: qemu-devel, pbonzini, ehabkost, marcel, kraxel
On Mon, Jul 04, 2016 at 08:16:05PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The CPU GPs if we try and set a bit in a variable MTRR mask above
> the limit of physical address bits on the host. We hit this
> when loading a migration from a host with a larger physical
> address limit than our destination (e.g. a Xeon->i7 of same
> generation) but previously used to get away with it
> until 48e1a45 started checking that msr writes actually worked.
>
> It seems in our case the GP probably comes from KVM emulating
> that GP.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Why don't we mask with host bits? This is the kvm limitation,
isn't it? This will make it possible to CC stable as well.
> ---
> target-i386/kvm.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index f3698f1..6429205 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -1677,6 +1677,8 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> }
> }
> if (has_msr_mtrr) {
> + uint64_t phys_mask = MAKE_64BIT_MASK(0, cpu->phys_bits);
> +
> kvm_msr_entry_add(cpu, MSR_MTRRdefType, env->mtrr_deftype);
> kvm_msr_entry_add(cpu, MSR_MTRRfix64K_00000, env->mtrr_fixed[0]);
> kvm_msr_entry_add(cpu, MSR_MTRRfix16K_80000, env->mtrr_fixed[1]);
> @@ -1690,10 +1692,15 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> kvm_msr_entry_add(cpu, MSR_MTRRfix4K_F0000, env->mtrr_fixed[9]);
> kvm_msr_entry_add(cpu, MSR_MTRRfix4K_F8000, env->mtrr_fixed[10]);
> for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
> + /* The CPU GPs if we write to a bit above the physical limit of
> + * the host CPU (and KVM emulates that)
> + */
> + uint64_t mask = env->mtrr_var[i].mask;
> + mask &= phys_mask;
> +
> kvm_msr_entry_add(cpu, MSR_MTRRphysBase(i),
> env->mtrr_var[i].base);
> - kvm_msr_entry_add(cpu, MSR_MTRRphysMask(i),
> - env->mtrr_var[i].mask);
> + kvm_msr_entry_add(cpu, MSR_MTRRphysMask(i), mask);
> }
> }
>
> --
> 2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits Dr. David Alan Gilbert (git)
2016-07-04 20:02 ` Michael S. Tsirkin
@ 2016-07-04 20:03 ` Eduardo Habkost
1 sibling, 0 replies; 39+ messages in thread
From: Eduardo Habkost @ 2016-07-04 20:03 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, pbonzini, marcel, mst, kraxel
On Mon, Jul 04, 2016 at 08:16:05PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The CPU GPs if we try and set a bit in a variable MTRR mask above
> the limit of physical address bits on the host. We hit this
> when loading a migration from a host with a larger physical
> address limit than our destination (e.g. a Xeon->i7 of same
> generation) but previously used to get away with it
> until 48e1a45 started checking that msr writes actually worked.
>
> It seems in our case the GP probably comes from KVM emulating
> that GP.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> target-i386/kvm.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index f3698f1..6429205 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -1677,6 +1677,8 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> }
> }
> if (has_msr_mtrr) {
> + uint64_t phys_mask = MAKE_64BIT_MASK(0, cpu->phys_bits);
> +
> kvm_msr_entry_add(cpu, MSR_MTRRdefType, env->mtrr_deftype);
> kvm_msr_entry_add(cpu, MSR_MTRRfix64K_00000, env->mtrr_fixed[0]);
> kvm_msr_entry_add(cpu, MSR_MTRRfix16K_80000, env->mtrr_fixed[1]);
> @@ -1690,10 +1692,15 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> kvm_msr_entry_add(cpu, MSR_MTRRfix4K_F0000, env->mtrr_fixed[9]);
> kvm_msr_entry_add(cpu, MSR_MTRRfix4K_F8000, env->mtrr_fixed[10]);
> for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
> + /* The CPU GPs if we write to a bit above the physical limit of
> + * the host CPU (and KVM emulates that)
> + */
> + uint64_t mask = env->mtrr_var[i].mask;
> + mask &= phys_mask;
> +
I was arguing that we should print warnings when we really had to
change the MTRR values on migration.
But if we are already inside kvm_put_msrs() with a different
phys_bits value, the best we can do is to trim the bits so they
still make sense. If we want to print a warning, this can be done
somewhere else.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> kvm_msr_entry_add(cpu, MSR_MTRRphysBase(i),
> env->mtrr_var[i].base);
> - kvm_msr_entry_add(cpu, MSR_MTRRphysMask(i),
> - env->mtrr_var[i].mask);
> + kvm_msr_entry_add(cpu, MSR_MTRRphysMask(i), mask);
> }
> }
>
> --
> 2.7.4
>
--
Eduardo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask Dr. David Alan Gilbert (git)
@ 2016-07-04 20:03 ` Michael S. Tsirkin
2016-07-04 20:14 ` Eduardo Habkost
2016-07-04 20:21 ` Eduardo Habkost
1 sibling, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 20:03 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git)
Cc: qemu-devel, pbonzini, ehabkost, marcel, kraxel
On Mon, Jul 04, 2016 at 08:16:06PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Fill the bits between 51..number-of-physical-address-bits in the
> MTRR_PHYSMASKn variable range mtrr masks so that they're consistent
> in the migration stream irrespective of the physical address space
> of the source VM in a migration.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Could a bit more explanation be added here?
Why don't we migrate exactly what guest programmed?
With previous patch we mask on destination, do we not?
If it's for compat with old PC types, then why not
limit it to that?
> ---
> include/hw/i386/pc.h | 5 +++++
> target-i386/cpu.c | 1 +
> target-i386/cpu.h | 3 +++
> target-i386/kvm.c | 25 ++++++++++++++++++++++++-
> 4 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 7e43b20..d85e924 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -374,6 +374,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
> .driver = TYPE_X86_CPU,\
> .property = "cpuid-0xb",\
> .value = "off",\
> + },\
> + {\
> + .driver = TYPE_X86_CPU,\
> + .property = "fill-mtrr-mask",\
> + .value = "off",\
> },
>
> #define PC_COMPAT_2_5 \
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index ab13ef5..5737aba 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -3258,6 +3258,7 @@ static Property x86_cpu_properties[] = {
> DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
> DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> + DEFINE_PROP_BOOL("fill-mtrr-mask", X86CPU, fill_mtrr_mask, true),
> DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 40),
> DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
> DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 221b1a2..a9bbd91 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -1181,6 +1181,9 @@ struct X86CPU {
> /* Compatibility bits for old machine types: */
> bool enable_cpuid_0xb;
>
> + /* if true fill the top bits of the MTRR_PHYSMASKn variable range */
> + bool fill_mtrr_mask;
> +
> /* Number of physical address bits supported */
> uint32_t phys_bits;
>
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 6429205..cbcc30d 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -1935,6 +1935,7 @@ static int kvm_get_msrs(X86CPU *cpu)
> CPUX86State *env = &cpu->env;
> struct kvm_msr_entry *msrs = cpu->kvm_msr_buf->entries;
> int ret, i;
> + uint64_t mtrr_top_bits;
>
> kvm_msr_buf_reset(cpu);
>
> @@ -2084,6 +2085,27 @@ static int kvm_get_msrs(X86CPU *cpu)
> }
>
> assert(ret == cpu->kvm_msr_buf->nmsrs);
> + /*
> + * MTRR masks: Each mask consists of 5 parts
> + * a 10..0: must be zero
> + * b 11 : valid bit
> + * c n-1.12: actual mask bits
> + * d 51..n: reserved must be zero
> + * e 63.52: reserved must be zero
> + *
> + * 'n' is the number of physical bits supported by the CPU and is
> + * apparently always <= 52. We know our 'n' but don't know what
> + * the destinations 'n' is; it might be smaller, in which case
> + * it masks (c) on loading. It might be larger, in which case
> + * we fill 'd' so that d..c is consistent irrespetive of the 'n'
> + * we're migrating to.
> + */
> + if (cpu->fill_mtrr_mask && cpu->phys_bits < 52) {
> + mtrr_top_bits = MAKE_64BIT_MASK(cpu->phys_bits, 52 - cpu->phys_bits);
> + } else {
> + mtrr_top_bits = 0;
> + }
> +
> for (i = 0; i < ret; i++) {
> uint32_t index = msrs[i].index;
> switch (index) {
> @@ -2279,7 +2301,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> break;
> case MSR_MTRRphysBase(0) ... MSR_MTRRphysMask(MSR_MTRRcap_VCNT - 1):
> if (index & 1) {
> - env->mtrr_var[MSR_MTRRphysIndex(index)].mask = msrs[i].data;
> + env->mtrr_var[MSR_MTRRphysIndex(index)].mask = msrs[i].data |
> + mtrr_top_bits;
> } else {
> env->mtrr_var[MSR_MTRRphysIndex(index)].base = msrs[i].data;
> }
> --
> 2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits
2016-07-04 20:02 ` Michael S. Tsirkin
@ 2016-07-04 20:05 ` Eduardo Habkost
2016-07-04 22:37 ` Michael S. Tsirkin
0 siblings, 1 reply; 39+ messages in thread
From: Eduardo Habkost @ 2016-07-04 20:05 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Dr. David Alan Gilbert (git), qemu-devel, pbonzini, marcel,
kraxel
On Mon, Jul 04, 2016 at 11:02:00PM +0300, Michael S. Tsirkin wrote:
> On Mon, Jul 04, 2016 at 08:16:05PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > The CPU GPs if we try and set a bit in a variable MTRR mask above
> > the limit of physical address bits on the host. We hit this
> > when loading a migration from a host with a larger physical
> > address limit than our destination (e.g. a Xeon->i7 of same
> > generation) but previously used to get away with it
> > until 48e1a45 started checking that msr writes actually worked.
> >
> > It seems in our case the GP probably comes from KVM emulating
> > that GP.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> Why don't we mask with host bits? This is the kvm limitation,
> isn't it? This will make it possible to CC stable as well.
KVM validates the value written to the MSR using the guest CPUID
data, not the host bits. If we trim using the host bits, we would
still crash if cpu->phys_bits < host_phys_bits.
--
Eduardo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask
2016-07-04 20:03 ` Michael S. Tsirkin
@ 2016-07-04 20:14 ` Eduardo Habkost
0 siblings, 0 replies; 39+ messages in thread
From: Eduardo Habkost @ 2016-07-04 20:14 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Dr. David Alan Gilbert (git), qemu-devel, pbonzini, marcel,
kraxel
On Mon, Jul 04, 2016 at 11:03:59PM +0300, Michael S. Tsirkin wrote:
> On Mon, Jul 04, 2016 at 08:16:06PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Fill the bits between 51..number-of-physical-address-bits in the
> > MTRR_PHYSMASKn variable range mtrr masks so that they're consistent
> > in the migration stream irrespective of the physical address space
> > of the source VM in a migration.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>
> Could a bit more explanation be added here?
> Why don't we migrate exactly what guest programmed?
Because "exactly what the guest programmed" stops making sense if
phys_bits seen by the guest change when migrating (see Paolo's
explanation below).
> With previous patch we mask on destination, do we not?
The previous patch is necessary when phys_bits is smaller on
migration. This part is necessary when phys_bits increase on
migration.
>
> If it's for compat with old PC types, then why not
> limit it to that?
It's not just for compat with old PC types. It is necessary when
phys_bits increase on migration, which may happen because it is
the default for a machine-type, or in case it was configured this
way by management.
I asked similar questions on v1, see
http://article.gmane.org/gmane.comp.emulators.qemu/419712
Paolo's reply is below.
On Fri, Jun 17, 2016 at 03:01:55PM +0200, Paolo Bonzini wrote:
> On 17/06/2016 14:46, Eduardo Habkost wrote:
> >> >
> >> > The bits are reserved anyway, so we can do whatever we want with them.
> >> > In fact I think it's weird for the architecture to make them
> >> > must-be-zero, it might even make more sense to make them must-be-one...
> >> > It's a mask after all, and there's no way to access out-of-range
> >> > physical addresses.
> >
> > If we always fill the bits on the source, the destination can't
> > differentiate between a 40-bit source that set the MTRR to
> > 0xffffffffff from a 36-bit source that set the MTRR to
> > 0xfffffffff.
>
> That's not a bug, it's a feature. MTRRs work by comparing (address &
> mtrr_mask) with mtrr_base.
>
> A 40-bit source that sets the MTRR to 0xffffffffff must set higher bits
> of mtrr_base to zero, but when you migrate to another destination, you
> want the comparison to hold. There are two ways for it to hold:
>
> - if the physical address space becomes shorter, the base bits must be
> zero or writing mtrr_base fails, and the address bits must be zero or
> the processor fails to walk EPT page tables. So it's fine to strip the
> top four bits of the mask.
>
> - if the physical address space becomes larger, the base bits must be
> zero but the mask bits must become one. So why not migrate them this
> way directly.
>
> (For what it's worth, KVM also stores the mask extended to all ones, and
> strips the unnecessary high bits when reading the MSRs).
>
> > I really want to print a warning if the MTRR value or the
> > phys-bits value is being changed during migration, just in case
> > this has unintended consequences in the future. We can send the
> > current phys_bits value in the migration stream, so the
> > destination can decide how to handle it (and which warnings to
> > print).
>
> The problem with this is that it must be a warning only, because usually
> things will work (evidence: they have worked on RHEL6 and RHEL7 since
> 2010). No one will read it until it's too late and they have lost a VM.
>
> Note that the failure mode is pretty brutal since KVM reports an
> internal error right after restarting on the destination, and who knows
> what used to happen before the assertion was introduced. All MSRs after
> MTRRs would be ignored by KVM!
>
> Migrating and checking the configuration complicates the code and,
> because it's only for a warning, doesn't save you from massaging the
> values to make things work.
>
> Paolo
>
--
Eduardo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask Dr. David Alan Gilbert (git)
2016-07-04 20:03 ` Michael S. Tsirkin
@ 2016-07-04 20:21 ` Eduardo Habkost
2016-07-05 8:39 ` Dr. David Alan Gilbert
1 sibling, 1 reply; 39+ messages in thread
From: Eduardo Habkost @ 2016-07-04 20:21 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, pbonzini, marcel, mst, kraxel
On Mon, Jul 04, 2016 at 08:16:06PM +0100, Dr. David Alan Gilbert (git) wrote:
[...]
> @@ -2084,6 +2085,27 @@ static int kvm_get_msrs(X86CPU *cpu)
> }
>
> assert(ret == cpu->kvm_msr_buf->nmsrs);
> + /*
> + * MTRR masks: Each mask consists of 5 parts
> + * a 10..0: must be zero
> + * b 11 : valid bit
> + * c n-1.12: actual mask bits
> + * d 51..n: reserved must be zero
> + * e 63.52: reserved must be zero
> + *
> + * 'n' is the number of physical bits supported by the CPU and is
> + * apparently always <= 52. We know our 'n' but don't know what
> + * the destinations 'n' is; it might be smaller, in which case
> + * it masks (c) on loading. It might be larger, in which case
> + * we fill 'd' so that d..c is consistent irrespetive of the 'n'
> + * we're migrating to.
> + */
> + if (cpu->fill_mtrr_mask && cpu->phys_bits < 52) {
> + mtrr_top_bits = MAKE_64BIT_MASK(cpu->phys_bits, 52 - cpu->phys_bits);
> + } else {
> + mtrr_top_bits = 0;
How/where did you find this 52-bit limit? Is it documented
somewhere?
> + }
> +
> for (i = 0; i < ret; i++) {
> uint32_t index = msrs[i].index;
> switch (index) {
> @@ -2279,7 +2301,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> break;
> case MSR_MTRRphysBase(0) ... MSR_MTRRphysMask(MSR_MTRRcap_VCNT - 1):
> if (index & 1) {
> - env->mtrr_var[MSR_MTRRphysIndex(index)].mask = msrs[i].data;
> + env->mtrr_var[MSR_MTRRphysIndex(index)].mask = msrs[i].data |
> + mtrr_top_bits;
> } else {
> env->mtrr_var[MSR_MTRRphysIndex(index)].base = msrs[i].data;
> }
> --
> 2.7.4
>
--
Eduardo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-04 19:16 [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Dr. David Alan Gilbert (git)
` (5 preceding siblings ...)
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 6/6] x86: Add sanity checks on phys_bits Dr. David Alan Gilbert (git)
@ 2016-07-04 20:23 ` Michael S. Tsirkin
2016-07-05 9:33 ` Dr. David Alan Gilbert
2016-07-05 9:46 ` Daniel P. Berrange
7 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 20:23 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git)
Cc: qemu-devel, pbonzini, ehabkost, marcel, kraxel
On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> QEMU sets the guests physical address bits to 40; this is wrong
> on most hardware, and can be detected by the guest.
> It also stops you using really huge multi-TB VMs.
>
> Red Hat has had a patch, that Andrea wrote, downstream for a couple
> of years that reads the hosts value and uses that in the guest. That's
> correct as far as the guest sees it, and lets you create huge VMs.
>
> The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> life gets complicated in migration; prior to 2.6 it all apparently
> worked (although a guest that looked might spot the change).
> In 2.6 Paolo started checking MSR writes and they failed when the
> incoming MTRR mask didn't fit.
>
> This series:
> a) Fixes up mtrr masks so that if you're migrating between hosts
> of different physical address size it tries to do something sensible.
>
> b) Lets you specify the guest physical address size via a CPU property, i.e.
> -cpu SandyBridge,phys-bits=36
>
> The default on old machine types is to use the existing 40 bits value.
>
> c) Lets you tell qemu to use the same setting as the host, i.e.
> -cpu SandyBridge,phys-bits=0
>
> This is the default on new machine types.
>
> Note that mixed size hosts are still not necessarily safe; a guest
> started on a host with a large physical address size might start using
> those bits and get upset when it's moved to a small host.
> However that was already potentially broken in existing qemu that
> used a magic value of 40.
>
> There's potential to add some extra guards against people
> doing silly stuff; e.g. stop people running VMs using 1TB of
> address space on a tiny host.
>
> Dave
This is all in target-i386 so if the maintainers want it this way, they
can merge this, and I do not have strong objections, but I wanted to
document an alternative that is IMHO somewhat nicer. Feel free to
ignore. See below.
How can guest use more memory than what host supports?
I think there are two ways:
1. more memory than host supports is supplied
This is a configuration error. We can simply detect this
and fail init, or print a warning, no need for new flags.
2. pci addresses out of host range assigned by guest
Again normally at least seabios will not do this,
maybe OVMF will?
we certainly can add an interface telling firmware
what the limit is.
Thus an alternative is:
- add interface to tell QEMU how much 64 bit memory can pci use.
- teach firmware to limit itself to that
- set guest bits to 48 unconditionally
the disadvantage of this approach is that firmware needs to be changed
the advantage is that we get seemless migration between different
hosts as long as they both can support the configuration,
without any management effort.
>
> v2
> Default on new machine types is to read from the host
> Use the MAKE_64BIT_MASK macro
> Validate phys_bits in the realise method
> Move reading the host physical bits to the realise method
> Set phys-bits even for 32bit guests
> Add warning when your phys-bits doesn't match your host in the none
> default case
>
> Dr. David Alan Gilbert (6):
> x86: Allow physical address bits to be set
> x86: Mask mtrr mask based on CPU physical address limits
> x86: fill high bits of mtrr mask
> x86: Set physical address bits based on host
> x86: fix up 32 bit phys_bits case
> x86: Add sanity checks on phys_bits
>
> include/hw/i386/pc.h | 10 ++++++++
> target-i386/cpu.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++------
> target-i386/cpu.h | 6 +++++
> target-i386/kvm.c | 36 +++++++++++++++++++++++---
> 4 files changed, 112 insertions(+), 11 deletions(-)
>
> --
> 2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/6] x86: Set physical address bits based on host
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 4/6] x86: Set physical address bits based on host Dr. David Alan Gilbert (git)
@ 2016-07-04 20:27 ` Eduardo Habkost
2016-07-05 8:44 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 39+ messages in thread
From: Eduardo Habkost @ 2016-07-04 20:27 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, pbonzini, marcel, mst, kraxel
On Mon, Jul 04, 2016 at 08:16:07PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> A special case based on the previous phys-bits property; if it's
> the magic value 0 then use the hosts capabilities.
>
> This becomes the default on new machine types.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> include/hw/i386/pc.h | 5 +++++
> target-i386/cpu.c | 36 +++++++++++++++++++++++++++++++++++-
> 2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index d85e924..bf31609 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -379,6 +379,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
> .driver = TYPE_X86_CPU,\
> .property = "fill-mtrr-mask",\
> .value = "off",\
> + },\
> + {\
> + .driver = TYPE_X86_CPU,\
> + .property = "phys-bits",\
> + .value = "40",\
> },
>
> #define PC_COMPAT_2_5 \
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 5737aba..d45d2a6 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -2957,6 +2957,40 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> & CPUID_EXT2_AMD_ALIASES);
> }
>
> + /* For 64bit systems think about the number of physical bits to present.
> + * ideally this should be the same as the host; anything other than matching
> + * the host can cause incorrect guest behaviour.
> + * QEMU used to pick the magic value of 40 bits that corresponds to
> + * consumer AMD devices but nothing esle.
> + */
> + if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
> + uint32_t eax;
> + /* Read the hosts physical address size, and compare it to what we
> + * were asked for; note old machine types default to 40 bits
> + */
> + uint32_t host_phys_bits = 0;
> + host_cpuid(0x80000000, 0, &eax, NULL, NULL, NULL);
> + if (eax >= 0x80000008) {
> + host_cpuid(0x80000008, 0, &eax, NULL, NULL, NULL);
> + /* Note: According to AMD doc 25481 rev 2.34 they have a field
> + * at 23:16 that can specify a maximum physical address bits for
> + * the guest that can override this value; but I've not seen
> + * anything with that set.
> + */
> + host_phys_bits = eax & 0xff;
> + } else {
> + /* It's an odd 64 bit machine that doesn't have the leaf for
> + * physical address bits; fall back to 36 that's most older Intel.
> + */
> + host_phys_bits = 36;
> + }
Why do we need to calculate host_phys_bits when phys_bits is
already set? Shouldn't we put all the code above after the "if
(cpu->phys_bits)" check?
> +
> + if (cpu->phys_bits == 0) {
> + /* The user asked for us to use the host physical bits */
> + cpu->phys_bits = host_phys_bits;
> +
> + }
> + }
>
> cpu_exec_init(cs, &error_abort);
>
> @@ -3259,7 +3293,7 @@ static Property x86_cpu_properties[] = {
> DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> DEFINE_PROP_BOOL("fill-mtrr-mask", X86CPU, fill_mtrr_mask, true),
> - DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 40),
> + DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 0),
I would put this part (that sets the default to 0) and the
PC_COMPAT_2_6 part in a separate patch. This way we can include
the mechanism for setting phys-bits=0 even if we didn't reach a
conclusion about the proper pc-2.7 default yet.
> DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
> DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
> DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
> --
> 2.7.4
>
--
Eduardo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 6/6] x86: Add sanity checks on phys_bits
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 6/6] x86: Add sanity checks on phys_bits Dr. David Alan Gilbert (git)
@ 2016-07-04 20:46 ` Eduardo Habkost
2016-07-05 10:40 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 39+ messages in thread
From: Eduardo Habkost @ 2016-07-04 20:46 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git); +Cc: qemu-devel, pbonzini, marcel, mst, kraxel
On Mon, Jul 04, 2016 at 08:16:09PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Add some sanity checks on the phys-bits setting now that
> the user can set it.
> a) That it's in a sane range (52..32)
> b) Warn if it mismatches the host and isn't the old default.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> target-i386/cpu.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index e15abea..5402002 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -2985,6 +2985,19 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> /* The user asked for us to use the host physical bits */
> cpu->phys_bits = host_phys_bits;
>
> + } else if (cpu->phys_bits > 52 || cpu->phys_bits < 32) {
> + error_setg(errp, "phys_bits should be between 32 and 52 or 0 to"
> + " use host size (but is %u)", cpu->phys_bits);
> + return;
> + }
This check belongs to patch 1/6, doesn't it?
Here we have the same magic number (52), and I don't know where
it came from. Maybe it should become a (documented) macro?
Also, won't this make the "phys_bits < 52" check added by patch
3/6 unnecessary?
> + /* Print a warning if the user set it to a value that's not the
> + * host value; ignore the magic value 40 because it may well just
> + * be the old machine type.
> + */
With this, we won't print a warning if "phys-bits=40" is set
explicitly. If we want to disable the warning only for the old
machine-types, we can add a boolean flag that disables it.
> + if (cpu->phys_bits != host_phys_bits && cpu->phys_bits != 40) {
> + fprintf(stderr, "Warning: Host physical bits (%u)"
> + " does not match phys_bits (%u)\n",
> + host_phys_bits, cpu->phys_bits);
Shouldn't we use error_report() for this?
Also, this prints a warning for each VCPU. This is not the first
time we want to print a warning only once (see
x86_cpu_apic_id_from_index() in hw/i386/pc.c and ht_warned in
target-i386/cpu.c). It looks like QEMU needs a warn_once()
helper.
> }
> } else {
> /* For 32 bit systems don't use the user set value, but keep
> --
> 2.7.4
>
--
Eduardo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits
2016-07-04 20:05 ` Eduardo Habkost
@ 2016-07-04 22:37 ` Michael S. Tsirkin
0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2016-07-04 22:37 UTC (permalink / raw)
To: Eduardo Habkost
Cc: Dr. David Alan Gilbert (git), qemu-devel, pbonzini, marcel,
kraxel
On Mon, Jul 04, 2016 at 05:05:36PM -0300, Eduardo Habkost wrote:
> On Mon, Jul 04, 2016 at 11:02:00PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jul 04, 2016 at 08:16:05PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > The CPU GPs if we try and set a bit in a variable MTRR mask above
> > > the limit of physical address bits on the host. We hit this
> > > when loading a migration from a host with a larger physical
> > > address limit than our destination (e.g. a Xeon->i7 of same
> > > generation) but previously used to get away with it
> > > until 48e1a45 started checking that msr writes actually worked.
> > >
> > > It seems in our case the GP probably comes from KVM emulating
> > > that GP.
> > >
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >
> > Why don't we mask with host bits? This is the kvm limitation,
> > isn't it? This will make it possible to CC stable as well.
>
> KVM validates the value written to the MSR using the guest CPUID
> data, not the host bits. If we trim using the host bits, we would
> still crash if cpu->phys_bits < host_phys_bits.
I didn't realise. Will have to re-do the review.
> --
> Eduardo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask
2016-07-04 20:21 ` Eduardo Habkost
@ 2016-07-05 8:39 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 8:39 UTC (permalink / raw)
To: Eduardo Habkost; +Cc: qemu-devel, pbonzini, marcel, mst, kraxel
* Eduardo Habkost (ehabkost@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:06PM +0100, Dr. David Alan Gilbert (git) wrote:
> [...]
> > @@ -2084,6 +2085,27 @@ static int kvm_get_msrs(X86CPU *cpu)
> > }
> >
> > assert(ret == cpu->kvm_msr_buf->nmsrs);
> > + /*
> > + * MTRR masks: Each mask consists of 5 parts
> > + * a 10..0: must be zero
> > + * b 11 : valid bit
> > + * c n-1.12: actual mask bits
> > + * d 51..n: reserved must be zero
> > + * e 63.52: reserved must be zero
> > + *
> > + * 'n' is the number of physical bits supported by the CPU and is
> > + * apparently always <= 52. We know our 'n' but don't know what
> > + * the destinations 'n' is; it might be smaller, in which case
> > + * it masks (c) on loading. It might be larger, in which case
> > + * we fill 'd' so that d..c is consistent irrespetive of the 'n'
> > + * we're migrating to.
> > + */
> > + if (cpu->fill_mtrr_mask && cpu->phys_bits < 52) {
> > + mtrr_top_bits = MAKE_64BIT_MASK(cpu->phys_bits, 52 - cpu->phys_bits);
> > + } else {
> > + mtrr_top_bits = 0;
>
> How/where did you find this 52-bit limit? Is it documented
> somewhere?
It seems to come from AMDs original specification of AMD64; but you're
right we could do with a constant rather than the magical 52 everywhere.
Looking in AMD doc 24593 Rev 3.26 (AMD64 Arch Programmer's manual vol. 2)
p.191 Fig 7.6 MTRRphysBasen Register it shows it as PhysBase running from 51:32
and 63:52 being Reserved/MBZ;
The corresponding Intel doc (Intel 64 & IA-32 Architectures Dev manual 3A 11-25
fig 11-7) doesn't have that limit shown; however it does talk about 52-bit
physical addresses in a few places; e.g. 4.4 PAE paging talks about
'paging translates 32-bit linear addresses to 52-bit physical addresses'
I think the most relevant place in the Intel doc is 5.13.3 'Reserved Bit Checking'
which has a
Table 5-8 'IA-32e Mode Page Level Protection Matrix with Execute-Disable Bit Capability Enabled'
this is a table of reserved bit fields and for each of the
page tables it shows bits checked as [51:MAXPHYADDR].
Any suggestions for a name for the 52 constant? I guess MaxMaxPhyAddress?
I guess someone decided that 4PB ought to be enough for anyone.
Dave
>
> > + }
> > +
> > for (i = 0; i < ret; i++) {
> > uint32_t index = msrs[i].index;
> > switch (index) {
> > @@ -2279,7 +2301,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> > break;
> > case MSR_MTRRphysBase(0) ... MSR_MTRRphysMask(MSR_MTRRcap_VCNT - 1):
> > if (index & 1) {
> > - env->mtrr_var[MSR_MTRRphysIndex(index)].mask = msrs[i].data;
> > + env->mtrr_var[MSR_MTRRphysIndex(index)].mask = msrs[i].data |
> > + mtrr_top_bits;
> > } else {
> > env->mtrr_var[MSR_MTRRphysIndex(index)].base = msrs[i].data;
> > }
> > --
> > 2.7.4
> >
>
> --
> Eduardo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 4/6] x86: Set physical address bits based on host
2016-07-04 20:27 ` Eduardo Habkost
@ 2016-07-05 8:44 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 8:44 UTC (permalink / raw)
To: Eduardo Habkost; +Cc: qemu-devel, pbonzini, marcel, mst, kraxel
* Eduardo Habkost (ehabkost@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:07PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > A special case based on the previous phys-bits property; if it's
> > the magic value 0 then use the hosts capabilities.
> >
> > This becomes the default on new machine types.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> > include/hw/i386/pc.h | 5 +++++
> > target-i386/cpu.c | 36 +++++++++++++++++++++++++++++++++++-
> > 2 files changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > index d85e924..bf31609 100644
> > --- a/include/hw/i386/pc.h
> > +++ b/include/hw/i386/pc.h
> > @@ -379,6 +379,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
> > .driver = TYPE_X86_CPU,\
> > .property = "fill-mtrr-mask",\
> > .value = "off",\
> > + },\
> > + {\
> > + .driver = TYPE_X86_CPU,\
> > + .property = "phys-bits",\
> > + .value = "40",\
> > },
> >
> > #define PC_COMPAT_2_5 \
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index 5737aba..d45d2a6 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -2957,6 +2957,40 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> > & CPUID_EXT2_AMD_ALIASES);
> > }
> >
> > + /* For 64bit systems think about the number of physical bits to present.
> > + * ideally this should be the same as the host; anything other than matching
> > + * the host can cause incorrect guest behaviour.
> > + * QEMU used to pick the magic value of 40 bits that corresponds to
> > + * consumer AMD devices but nothing esle.
> > + */
> > + if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
> > + uint32_t eax;
> > + /* Read the hosts physical address size, and compare it to what we
> > + * were asked for; note old machine types default to 40 bits
> > + */
> > + uint32_t host_phys_bits = 0;
> > + host_cpuid(0x80000000, 0, &eax, NULL, NULL, NULL);
> > + if (eax >= 0x80000008) {
> > + host_cpuid(0x80000008, 0, &eax, NULL, NULL, NULL);
> > + /* Note: According to AMD doc 25481 rev 2.34 they have a field
> > + * at 23:16 that can specify a maximum physical address bits for
> > + * the guest that can override this value; but I've not seen
> > + * anything with that set.
> > + */
> > + host_phys_bits = eax & 0xff;
> > + } else {
> > + /* It's an odd 64 bit machine that doesn't have the leaf for
> > + * physical address bits; fall back to 36 that's most older Intel.
> > + */
> > + host_phys_bits = 36;
> > + }
>
> Why do we need to calculate host_phys_bits when phys_bits is
> already set? Shouldn't we put all the code above after the "if
> (cpu->phys_bits)" check?
Because I reuse host_phys_bits to generate the warning if you've
explicitly set phys-bits and it doesn't match the host.
> > +
> > + if (cpu->phys_bits == 0) {
> > + /* The user asked for us to use the host physical bits */
> > + cpu->phys_bits = host_phys_bits;
> > +
> > + }
> > + }
> >
> > cpu_exec_init(cs, &error_abort);
> >
> > @@ -3259,7 +3293,7 @@ static Property x86_cpu_properties[] = {
> > DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> > DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> > DEFINE_PROP_BOOL("fill-mtrr-mask", X86CPU, fill_mtrr_mask, true),
> > - DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 40),
> > + DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 0),
>
> I would put this part (that sets the default to 0) and the
> PC_COMPAT_2_6 part in a separate patch. This way we can include
> the mechanism for setting phys-bits=0 even if we didn't reach a
> conclusion about the proper pc-2.7 default yet.
Will do.
Dave
>
> > DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
> > DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
> > DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
> > --
> > 2.7.4
> >
>
> --
> Eduardo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-04 20:23 ` [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Michael S. Tsirkin
@ 2016-07-05 9:33 ` Dr. David Alan Gilbert
2016-07-05 10:06 ` Michael S. Tsirkin
0 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 9:33 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, pbonzini, ehabkost, marcel, kraxel
* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > QEMU sets the guests physical address bits to 40; this is wrong
> > on most hardware, and can be detected by the guest.
> > It also stops you using really huge multi-TB VMs.
> >
> > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > of years that reads the hosts value and uses that in the guest. That's
> > correct as far as the guest sees it, and lets you create huge VMs.
> >
> > The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> > life gets complicated in migration; prior to 2.6 it all apparently
> > worked (although a guest that looked might spot the change).
> > In 2.6 Paolo started checking MSR writes and they failed when the
> > incoming MTRR mask didn't fit.
> >
> > This series:
> > a) Fixes up mtrr masks so that if you're migrating between hosts
> > of different physical address size it tries to do something sensible.
> >
> > b) Lets you specify the guest physical address size via a CPU property, i.e.
> > -cpu SandyBridge,phys-bits=36
> >
> > The default on old machine types is to use the existing 40 bits value.
> >
> > c) Lets you tell qemu to use the same setting as the host, i.e.
> > -cpu SandyBridge,phys-bits=0
> >
> > This is the default on new machine types.
> >
> > Note that mixed size hosts are still not necessarily safe; a guest
> > started on a host with a large physical address size might start using
> > those bits and get upset when it's moved to a small host.
> > However that was already potentially broken in existing qemu that
> > used a magic value of 40.
> >
> > There's potential to add some extra guards against people
> > doing silly stuff; e.g. stop people running VMs using 1TB of
> > address space on a tiny host.
> >
> > Dave
>
> This is all in target-i386 so if the maintainers want it this way, they
> can merge this, and I do not have strong objections, but I wanted to
> document an alternative that is IMHO somewhat nicer. Feel free to
> ignore. See below.
>
> How can guest use more memory than what host supports?
> I think there are two ways:
>
> 1. more memory than host supports is supplied
> This is a configuration error. We can simply detect this
> and fail init, or print a warning, no need for new flags.
Yes we should do that; however there's a case that's potentially
currently working for people but actually kind of illegal.
That case is specifying a small amount of actual memory
but a large maxmem - i.e.:
-m 2G,slots=16,maxmem=2T
On a host with a 39bit physaddress limit do you error
on that or not? I think oVirt is currently doing something
similar to that, but I'm trying to get confirmation.
> 2. pci addresses out of host range assigned by guest
> Again normally at least seabios will not do this,
> maybe OVMF will?
> we certainly can add an interface telling firmware
> what the limit is.
>
> Thus an alternative is:
> - add interface to tell QEMU how much 64 bit memory can pci use.
> - teach firmware to limit itself to that
> - set guest bits to 48 unconditionally
>
>
> the disadvantage of this approach is that firmware needs to be changed
I guess it also needs the CRS to tell the guest OS not
to remap PCI stuff into that space? I thought also from the previous
discussions that the guest would get a different exception if it
actually tried to use any of the bits below 48 it didn't have.
> the advantage is that we get seemless migration between different
> hosts as long as they both can support the configuration,
> without any management effort.
The reality (Linux guest) is that this already works as long as you don't
map anything into the high address space, and the firmware wont do
that unless it's pushed to by an excessive maxmem or huge
64bit PCI bars.
Dave
>
> >
> > v2
> > Default on new machine types is to read from the host
> > Use the MAKE_64BIT_MASK macro
> > Validate phys_bits in the realise method
> > Move reading the host physical bits to the realise method
> > Set phys-bits even for 32bit guests
> > Add warning when your phys-bits doesn't match your host in the none
> > default case
> >
> > Dr. David Alan Gilbert (6):
> > x86: Allow physical address bits to be set
> > x86: Mask mtrr mask based on CPU physical address limits
> > x86: fill high bits of mtrr mask
> > x86: Set physical address bits based on host
> > x86: fix up 32 bit phys_bits case
> > x86: Add sanity checks on phys_bits
> >
> > include/hw/i386/pc.h | 10 ++++++++
> > target-i386/cpu.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++------
> > target-i386/cpu.h | 6 +++++
> > target-i386/kvm.c | 36 +++++++++++++++++++++++---
> > 4 files changed, 112 insertions(+), 11 deletions(-)
> >
> > --
> > 2.7.4
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case Dr. David Alan Gilbert (git)
@ 2016-07-05 9:42 ` Daniel P. Berrange
2016-07-05 11:29 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 39+ messages in thread
From: Daniel P. Berrange @ 2016-07-05 9:42 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git)
Cc: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
On Mon, Jul 04, 2016 at 08:16:08PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> On 32 bit systems fix up phys_bits to be consistent with what
> we tell the guest; don't ever bother with using the phys_bits
> property.
> @@ -2990,6 +2986,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> cpu->phys_bits = host_phys_bits;
>
> }
> + } else {
> + /* For 32 bit systems don't use the user set value, but keep
> + * phys_bits consistent with what we tell the guest.
> + */
> + if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
> + cpu->phys_bits = 36;
> + } else {
> + cpu->phys_bits = 32;
> + }
I kind of feel like we should report an error and exit if the
user/app has provided a phys_bits property value, rather than
silently ignoring their provided value, on the basis that this
is a user/app configuration error.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-04 19:16 [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Dr. David Alan Gilbert (git)
` (6 preceding siblings ...)
2016-07-04 20:23 ` [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Michael S. Tsirkin
@ 2016-07-05 9:46 ` Daniel P. Berrange
2016-07-05 9:49 ` Dr. David Alan Gilbert
7 siblings, 1 reply; 39+ messages in thread
From: Daniel P. Berrange @ 2016-07-05 9:46 UTC (permalink / raw)
To: Dr. David Alan Gilbert (git)
Cc: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> QEMU sets the guests physical address bits to 40; this is wrong
> on most hardware, and can be detected by the guest.
> It also stops you using really huge multi-TB VMs.
>
> Red Hat has had a patch, that Andrea wrote, downstream for a couple
> of years that reads the hosts value and uses that in the guest. That's
> correct as far as the guest sees it, and lets you create huge VMs.
>
> The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> life gets complicated in migration; prior to 2.6 it all apparently
> worked (although a guest that looked might spot the change).
> In 2.6 Paolo started checking MSR writes and they failed when the
> incoming MTRR mask didn't fit.
>
> This series:
> a) Fixes up mtrr masks so that if you're migrating between hosts
> of different physical address size it tries to do something sensible.
>
> b) Lets you specify the guest physical address size via a CPU property, i.e.
> -cpu SandyBridge,phys-bits=36
>
> The default on old machine types is to use the existing 40 bits value.
>
> c) Lets you tell qemu to use the same setting as the host, i.e.
> -cpu SandyBridge,phys-bits=0
>
> This is the default on new machine types.
As a general rule we've tried to say that if you pick an explicit CPU
model, we're migration safe. By having the phys-bits default value
always reflect the host CPU value, it feels like we've made the explicit
CPU model choice less safe, just like -cpu host is.
IOW, if choosing a named CPU model, it feels like we should have a
corresponding fixed phys-bit value for that CPU model, even if it
has to be quiet conservative (eg default to bits=36). A phys-bits=0
value should only be used with -cpu host.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 9:46 ` Daniel P. Berrange
@ 2016-07-05 9:49 ` Dr. David Alan Gilbert
2016-07-05 12:38 ` Eduardo Habkost
0 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 9:49 UTC (permalink / raw)
To: Daniel P. Berrange; +Cc: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > QEMU sets the guests physical address bits to 40; this is wrong
> > on most hardware, and can be detected by the guest.
> > It also stops you using really huge multi-TB VMs.
> >
> > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > of years that reads the hosts value and uses that in the guest. That's
> > correct as far as the guest sees it, and lets you create huge VMs.
> >
> > The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> > life gets complicated in migration; prior to 2.6 it all apparently
> > worked (although a guest that looked might spot the change).
> > In 2.6 Paolo started checking MSR writes and they failed when the
> > incoming MTRR mask didn't fit.
> >
> > This series:
> > a) Fixes up mtrr masks so that if you're migrating between hosts
> > of different physical address size it tries to do something sensible.
> >
> > b) Lets you specify the guest physical address size via a CPU property, i.e.
> > -cpu SandyBridge,phys-bits=36
> >
> > The default on old machine types is to use the existing 40 bits value.
> >
> > c) Lets you tell qemu to use the same setting as the host, i.e.
> > -cpu SandyBridge,phys-bits=0
> >
> > This is the default on new machine types.
>
> As a general rule we've tried to say that if you pick an explicit CPU
> model, we're migration safe. By having the phys-bits default value
> always reflect the host CPU value, it feels like we've made the explicit
> CPU model choice less safe, just like -cpu host is.
>
> IOW, if choosing a named CPU model, it feels like we should have a
> corresponding fixed phys-bit value for that CPU model, even if it
> has to be quiet conservative (eg default to bits=36). A phys-bits=0
> value should only be used with -cpu host.
phys-bits doesn't follow a cpu model in real hardware
e.g. a SandyBridge Xeon and a SandyBridge i7 are different.
So unless we suddenly created at least 2x as many cpu models we can't
do that.
Dave
>
>
> Regards,
> Daniel
> --
> |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org :|
> |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 9:33 ` Dr. David Alan Gilbert
@ 2016-07-05 10:06 ` Michael S. Tsirkin
2016-07-05 10:13 ` Dr. David Alan Gilbert
2016-07-05 10:59 ` Paolo Bonzini
0 siblings, 2 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2016-07-05 10:06 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: qemu-devel, pbonzini, ehabkost, marcel, kraxel
On Tue, Jul 05, 2016 at 10:33:25AM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > QEMU sets the guests physical address bits to 40; this is wrong
> > > on most hardware, and can be detected by the guest.
> > > It also stops you using really huge multi-TB VMs.
> > >
> > > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > > of years that reads the hosts value and uses that in the guest. That's
> > > correct as far as the guest sees it, and lets you create huge VMs.
> > >
> > > The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> > > life gets complicated in migration; prior to 2.6 it all apparently
> > > worked (although a guest that looked might spot the change).
> > > In 2.6 Paolo started checking MSR writes and they failed when the
> > > incoming MTRR mask didn't fit.
> > >
> > > This series:
> > > a) Fixes up mtrr masks so that if you're migrating between hosts
> > > of different physical address size it tries to do something sensible.
> > >
> > > b) Lets you specify the guest physical address size via a CPU property, i.e.
> > > -cpu SandyBridge,phys-bits=36
> > >
> > > The default on old machine types is to use the existing 40 bits value.
> > >
> > > c) Lets you tell qemu to use the same setting as the host, i.e.
> > > -cpu SandyBridge,phys-bits=0
> > >
> > > This is the default on new machine types.
> > >
> > > Note that mixed size hosts are still not necessarily safe; a guest
> > > started on a host with a large physical address size might start using
> > > those bits and get upset when it's moved to a small host.
> > > However that was already potentially broken in existing qemu that
> > > used a magic value of 40.
> > >
> > > There's potential to add some extra guards against people
> > > doing silly stuff; e.g. stop people running VMs using 1TB of
> > > address space on a tiny host.
> > >
> > > Dave
> >
> > This is all in target-i386 so if the maintainers want it this way, they
> > can merge this, and I do not have strong objections, but I wanted to
> > document an alternative that is IMHO somewhat nicer. Feel free to
> > ignore. See below.
> >
> > How can guest use more memory than what host supports?
> > I think there are two ways:
> >
> > 1. more memory than host supports is supplied
> > This is a configuration error. We can simply detect this
> > and fail init, or print a warning, no need for new flags.
>
> Yes we should do that; however there's a case that's potentially
> currently working for people but actually kind of illegal.
> That case is specifying a small amount of actual memory
> but a large maxmem - i.e.:
>
> -m 2G,slots=16,maxmem=2T
>
> On a host with a 39bit physaddress limit do you error
> on that or not? I think oVirt is currently doing something
> similar to that, but I'm trying to get confirmation.
That would only be a problem since pci is allocated above
maxmem so 64 bit pci addresses aren't accessible.
With my proposal we can actually force firmware to avoid
using 64 bit memory for that config.
Will work better than today.
> > 2. pci addresses out of host range assigned by guest
> > Again normally at least seabios will not do this,
> > maybe OVMF will?
> > we certainly can add an interface telling firmware
> > what the limit is.
> >
> > Thus an alternative is:
> > - add interface to tell QEMU how much 64 bit memory can pci use.
> > - teach firmware to limit itself to that
> > - set guest bits to 48 unconditionally
> >
> >
> > the disadvantage of this approach is that firmware needs to be changed
>
> I guess it also needs the CRS to tell the guest OS not
> to remap PCI stuff into that space?
CRS is a list of legal addresses, not list of illegal ones.
So just don't include what's illegal there.
> I thought also from the previous
> discussions that the guest would get a different exception if it
> actually tried to use any of the bits below 48 it didn't have.
Basically if you try to map pci at an address outside CRS
you can get any kind of crash since there could be on-board
hardware handling these addresses.
So I do not think we care about that.
> > the advantage is that we get seemless migration between different
> > hosts as long as they both can support the configuration,
> > without any management effort.
>
> The reality (Linux guest) is that this already works as long as you don't
> map anything into the high address space, and the firmware wont do
> that unless it's pushed to by an excessive maxmem or huge
> 64bit PCI bars.
>
> Dave
Right. So the disadvantage isn't big at all, and I think advantages
outweight it.
> >
> > >
> > > v2
> > > Default on new machine types is to read from the host
> > > Use the MAKE_64BIT_MASK macro
> > > Validate phys_bits in the realise method
> > > Move reading the host physical bits to the realise method
> > > Set phys-bits even for 32bit guests
> > > Add warning when your phys-bits doesn't match your host in the none
> > > default case
> > >
> > > Dr. David Alan Gilbert (6):
> > > x86: Allow physical address bits to be set
> > > x86: Mask mtrr mask based on CPU physical address limits
> > > x86: fill high bits of mtrr mask
> > > x86: Set physical address bits based on host
> > > x86: fix up 32 bit phys_bits case
> > > x86: Add sanity checks on phys_bits
> > >
> > > include/hw/i386/pc.h | 10 ++++++++
> > > target-i386/cpu.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++------
> > > target-i386/cpu.h | 6 +++++
> > > target-i386/kvm.c | 36 +++++++++++++++++++++++---
> > > 4 files changed, 112 insertions(+), 11 deletions(-)
> > >
> > > --
> > > 2.7.4
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 10:06 ` Michael S. Tsirkin
@ 2016-07-05 10:13 ` Dr. David Alan Gilbert
2016-07-05 10:41 ` Michael S. Tsirkin
2016-07-05 10:59 ` Paolo Bonzini
1 sibling, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 10:13 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, pbonzini, ehabkost, marcel, kraxel
* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Tue, Jul 05, 2016 at 10:33:25AM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > >
> > > > QEMU sets the guests physical address bits to 40; this is wrong
> > > > on most hardware, and can be detected by the guest.
> > > > It also stops you using really huge multi-TB VMs.
> > > >
> > > > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > > > of years that reads the hosts value and uses that in the guest. That's
> > > > correct as far as the guest sees it, and lets you create huge VMs.
> > > >
> > > > The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> > > > life gets complicated in migration; prior to 2.6 it all apparently
> > > > worked (although a guest that looked might spot the change).
> > > > In 2.6 Paolo started checking MSR writes and they failed when the
> > > > incoming MTRR mask didn't fit.
> > > >
> > > > This series:
> > > > a) Fixes up mtrr masks so that if you're migrating between hosts
> > > > of different physical address size it tries to do something sensible.
> > > >
> > > > b) Lets you specify the guest physical address size via a CPU property, i.e.
> > > > -cpu SandyBridge,phys-bits=36
> > > >
> > > > The default on old machine types is to use the existing 40 bits value.
> > > >
> > > > c) Lets you tell qemu to use the same setting as the host, i.e.
> > > > -cpu SandyBridge,phys-bits=0
> > > >
> > > > This is the default on new machine types.
> > > >
> > > > Note that mixed size hosts are still not necessarily safe; a guest
> > > > started on a host with a large physical address size might start using
> > > > those bits and get upset when it's moved to a small host.
> > > > However that was already potentially broken in existing qemu that
> > > > used a magic value of 40.
> > > >
> > > > There's potential to add some extra guards against people
> > > > doing silly stuff; e.g. stop people running VMs using 1TB of
> > > > address space on a tiny host.
> > > >
> > > > Dave
> > >
> > > This is all in target-i386 so if the maintainers want it this way, they
> > > can merge this, and I do not have strong objections, but I wanted to
> > > document an alternative that is IMHO somewhat nicer. Feel free to
> > > ignore. See below.
> > >
> > > How can guest use more memory than what host supports?
> > > I think there are two ways:
> > >
> > > 1. more memory than host supports is supplied
> > > This is a configuration error. We can simply detect this
> > > and fail init, or print a warning, no need for new flags.
> >
> > Yes we should do that; however there's a case that's potentially
> > currently working for people but actually kind of illegal.
> > That case is specifying a small amount of actual memory
> > but a large maxmem - i.e.:
> >
> > -m 2G,slots=16,maxmem=2T
> >
> > On a host with a 39bit physaddress limit do you error
> > on that or not? I think oVirt is currently doing something
> > similar to that, but I'm trying to get confirmation.
>
> That would only be a problem since pci is allocated above
> maxmem so 64 bit pci addresses aren't accessible.
> With my proposal we can actually force firmware to avoid
> using 64 bit memory for that config.
> Will work better than today.
>
>
> > > 2. pci addresses out of host range assigned by guest
> > > Again normally at least seabios will not do this,
> > > maybe OVMF will?
> > > we certainly can add an interface telling firmware
> > > what the limit is.
> > >
> > > Thus an alternative is:
> > > - add interface to tell QEMU how much 64 bit memory can pci use.
> > > - teach firmware to limit itself to that
> > > - set guest bits to 48 unconditionally
> > >
> > >
> > > the disadvantage of this approach is that firmware needs to be changed
> >
> > I guess it also needs the CRS to tell the guest OS not
> > to remap PCI stuff into that space?
>
> CRS is a list of legal addresses, not list of illegal ones.
> So just don't include what's illegal there.
>
> > I thought also from the previous
> > discussions that the guest would get a different exception if it
> > actually tried to use any of the bits below 48 it didn't have.
>
> Basically if you try to map pci at an address outside CRS
> you can get any kind of crash since there could be on-board
> hardware handling these addresses.
> So I do not think we care about that.
The issue about guest bits is not purely about PCI addresses though;
I thought it was also to do with visible behaviour/exceptions in
page tables.
> > > the advantage is that we get seemless migration between different
> > > hosts as long as they both can support the configuration,
> > > without any management effort.
> >
> > The reality (Linux guest) is that this already works as long as you don't
> > map anything into the high address space, and the firmware wont do
> > that unless it's pushed to by an excessive maxmem or huge
> > 64bit PCI bars.
> >
> > Dave
>
> Right. So the disadvantage isn't big at all, and I think advantages
> outweight it.
Except that no one will ever get around to writing the firmware changes
for both sets of firmware; so we never move forward?
Dave
>
> > >
> > > >
> > > > v2
> > > > Default on new machine types is to read from the host
> > > > Use the MAKE_64BIT_MASK macro
> > > > Validate phys_bits in the realise method
> > > > Move reading the host physical bits to the realise method
> > > > Set phys-bits even for 32bit guests
> > > > Add warning when your phys-bits doesn't match your host in the none
> > > > default case
> > > >
> > > > Dr. David Alan Gilbert (6):
> > > > x86: Allow physical address bits to be set
> > > > x86: Mask mtrr mask based on CPU physical address limits
> > > > x86: fill high bits of mtrr mask
> > > > x86: Set physical address bits based on host
> > > > x86: fix up 32 bit phys_bits case
> > > > x86: Add sanity checks on phys_bits
> > > >
> > > > include/hw/i386/pc.h | 10 ++++++++
> > > > target-i386/cpu.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++------
> > > > target-i386/cpu.h | 6 +++++
> > > > target-i386/kvm.c | 36 +++++++++++++++++++++++---
> > > > 4 files changed, 112 insertions(+), 11 deletions(-)
> > > >
> > > > --
> > > > 2.7.4
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 6/6] x86: Add sanity checks on phys_bits
2016-07-04 20:46 ` Eduardo Habkost
@ 2016-07-05 10:40 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 10:40 UTC (permalink / raw)
To: Eduardo Habkost; +Cc: qemu-devel, pbonzini, marcel, mst, kraxel
* Eduardo Habkost (ehabkost@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:09PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Add some sanity checks on the phys-bits setting now that
> > the user can set it.
> > a) That it's in a sane range (52..32)
> > b) Warn if it mismatches the host and isn't the old default.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> > target-i386/cpu.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index e15abea..5402002 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -2985,6 +2985,19 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> > /* The user asked for us to use the host physical bits */
> > cpu->phys_bits = host_phys_bits;
> >
> > + } else if (cpu->phys_bits > 52 || cpu->phys_bits < 32) {
> > + error_setg(errp, "phys_bits should be between 32 and 52 or 0 to"
> > + " use host size (but is %u)", cpu->phys_bits);
> > + return;
> > + }
>
> This check belongs to patch 1/6, doesn't it?
Yes, I can move it to there.
> Here we have the same magic number (52), and I don't know where
> it came from. Maybe it should become a (documented) macro?
>
> Also, won't this make the "phys_bits < 52" check added by patch
> 3/6 unnecessary?
Possibly; although it did feel safer to put that in where we were generating
the bitmask.
> > + /* Print a warning if the user set it to a value that's not the
> > + * host value; ignore the magic value 40 because it may well just
> > + * be the old machine type.
> > + */
>
> With this, we won't print a warning if "phys-bits=40" is set
> explicitly. If we want to disable the warning only for the old
> machine-types, we can add a boolean flag that disables it.
Yes, I can do that.
> > + if (cpu->phys_bits != host_phys_bits && cpu->phys_bits != 40) {
> > + fprintf(stderr, "Warning: Host physical bits (%u)"
> > + " does not match phys_bits (%u)\n",
> > + host_phys_bits, cpu->phys_bits);
>
> Shouldn't we use error_report() for this?
>
> Also, this prints a warning for each VCPU. This is not the first
> time we want to print a warning only once (see
> x86_cpu_apic_id_from_index() in hw/i386/pc.c and ht_warned in
> target-i386/cpu.c). It looks like QEMU needs a warn_once()
> helper.
OK, will do.
Dave
> > }
> > } else {
> > /* For 32 bit systems don't use the user set value, but keep
> > --
> > 2.7.4
> >
>
> --
> Eduardo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 10:13 ` Dr. David Alan Gilbert
@ 2016-07-05 10:41 ` Michael S. Tsirkin
0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2016-07-05 10:41 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: qemu-devel, pbonzini, ehabkost, marcel, kraxel
On Tue, Jul 05, 2016 at 11:13:26AM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Tue, Jul 05, 2016 at 10:33:25AM +0100, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > >
> > > > > QEMU sets the guests physical address bits to 40; this is wrong
> > > > > on most hardware, and can be detected by the guest.
> > > > > It also stops you using really huge multi-TB VMs.
> > > > >
> > > > > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > > > > of years that reads the hosts value and uses that in the guest. That's
> > > > > correct as far as the guest sees it, and lets you create huge VMs.
> > > > >
> > > > > The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> > > > > life gets complicated in migration; prior to 2.6 it all apparently
> > > > > worked (although a guest that looked might spot the change).
> > > > > In 2.6 Paolo started checking MSR writes and they failed when the
> > > > > incoming MTRR mask didn't fit.
> > > > >
> > > > > This series:
> > > > > a) Fixes up mtrr masks so that if you're migrating between hosts
> > > > > of different physical address size it tries to do something sensible.
> > > > >
> > > > > b) Lets you specify the guest physical address size via a CPU property, i.e.
> > > > > -cpu SandyBridge,phys-bits=36
> > > > >
> > > > > The default on old machine types is to use the existing 40 bits value.
> > > > >
> > > > > c) Lets you tell qemu to use the same setting as the host, i.e.
> > > > > -cpu SandyBridge,phys-bits=0
> > > > >
> > > > > This is the default on new machine types.
> > > > >
> > > > > Note that mixed size hosts are still not necessarily safe; a guest
> > > > > started on a host with a large physical address size might start using
> > > > > those bits and get upset when it's moved to a small host.
> > > > > However that was already potentially broken in existing qemu that
> > > > > used a magic value of 40.
> > > > >
> > > > > There's potential to add some extra guards against people
> > > > > doing silly stuff; e.g. stop people running VMs using 1TB of
> > > > > address space on a tiny host.
> > > > >
> > > > > Dave
> > > >
> > > > This is all in target-i386 so if the maintainers want it this way, they
> > > > can merge this, and I do not have strong objections, but I wanted to
> > > > document an alternative that is IMHO somewhat nicer. Feel free to
> > > > ignore. See below.
> > > >
> > > > How can guest use more memory than what host supports?
> > > > I think there are two ways:
> > > >
> > > > 1. more memory than host supports is supplied
> > > > This is a configuration error. We can simply detect this
> > > > and fail init, or print a warning, no need for new flags.
> > >
> > > Yes we should do that; however there's a case that's potentially
> > > currently working for people but actually kind of illegal.
> > > That case is specifying a small amount of actual memory
> > > but a large maxmem - i.e.:
> > >
> > > -m 2G,slots=16,maxmem=2T
> > >
> > > On a host with a 39bit physaddress limit do you error
> > > on that or not? I think oVirt is currently doing something
> > > similar to that, but I'm trying to get confirmation.
> >
> > That would only be a problem since pci is allocated above
> > maxmem so 64 bit pci addresses aren't accessible.
> > With my proposal we can actually force firmware to avoid
> > using 64 bit memory for that config.
> > Will work better than today.
> >
> >
> > > > 2. pci addresses out of host range assigned by guest
> > > > Again normally at least seabios will not do this,
> > > > maybe OVMF will?
> > > > we certainly can add an interface telling firmware
> > > > what the limit is.
> > > >
> > > > Thus an alternative is:
> > > > - add interface to tell QEMU how much 64 bit memory can pci use.
> > > > - teach firmware to limit itself to that
> > > > - set guest bits to 48 unconditionally
> > > >
> > > >
> > > > the disadvantage of this approach is that firmware needs to be changed
> > >
> > > I guess it also needs the CRS to tell the guest OS not
> > > to remap PCI stuff into that space?
> >
> > CRS is a list of legal addresses, not list of illegal ones.
> > So just don't include what's illegal there.
> >
> > > I thought also from the previous
> > > discussions that the guest would get a different exception if it
> > > actually tried to use any of the bits below 48 it didn't have.
> >
> > Basically if you try to map pci at an address outside CRS
> > you can get any kind of crash since there could be on-board
> > hardware handling these addresses.
> > So I do not think we care about that.
>
> The issue about guest bits is not purely about PCI addresses though;
> I thought it was also to do with visible behaviour/exceptions in
> page tables.
Only if you make guest phy bits < host phy bits.
If guest phy bits >= host phy bits, then there's never
a configuration that from guest POV should trigger
an exception but does not.
This is another advantage of my proposal.
> > > > the advantage is that we get seemless migration between different
> > > > hosts as long as they both can support the configuration,
> > > > without any management effort.
> > >
> > > The reality (Linux guest) is that this already works as long as you don't
> > > map anything into the high address space, and the firmware wont do
> > > that unless it's pushed to by an excessive maxmem or huge
> > > 64bit PCI bars.
> > >
> > > Dave
> >
> > Right. So the disadvantage isn't big at all, and I think advantages
> > outweight it.
>
> Except that no one will ever get around to writing the firmware changes
> for both sets of firmware; so we never move forward?
>
> Dave
seabios is already ok, so only ovmf needs to be patched.
I agree need to change firmware is a disadvantage.
You guys decide, but I thought I'd put that on the table.
> >
> > > >
> > > > >
> > > > > v2
> > > > > Default on new machine types is to read from the host
> > > > > Use the MAKE_64BIT_MASK macro
> > > > > Validate phys_bits in the realise method
> > > > > Move reading the host physical bits to the realise method
> > > > > Set phys-bits even for 32bit guests
> > > > > Add warning when your phys-bits doesn't match your host in the none
> > > > > default case
> > > > >
> > > > > Dr. David Alan Gilbert (6):
> > > > > x86: Allow physical address bits to be set
> > > > > x86: Mask mtrr mask based on CPU physical address limits
> > > > > x86: fill high bits of mtrr mask
> > > > > x86: Set physical address bits based on host
> > > > > x86: fix up 32 bit phys_bits case
> > > > > x86: Add sanity checks on phys_bits
> > > > >
> > > > > include/hw/i386/pc.h | 10 ++++++++
> > > > > target-i386/cpu.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++------
> > > > > target-i386/cpu.h | 6 +++++
> > > > > target-i386/kvm.c | 36 +++++++++++++++++++++++---
> > > > > 4 files changed, 112 insertions(+), 11 deletions(-)
> > > > >
> > > > > --
> > > > > 2.7.4
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 10:06 ` Michael S. Tsirkin
2016-07-05 10:13 ` Dr. David Alan Gilbert
@ 2016-07-05 10:59 ` Paolo Bonzini
2016-07-05 11:09 ` Michael S. Tsirkin
1 sibling, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-07-05 10:59 UTC (permalink / raw)
To: Michael S. Tsirkin, Dr. David Alan Gilbert
Cc: marcel, kraxel, qemu-devel, ehabkost
On 05/07/2016 12:06, Michael S. Tsirkin wrote:
> > -m 2G,slots=16,maxmem=2T
> >
> > On a host with a 39bit physaddress limit do you error
> > on that or not? I think oVirt is currently doing something
> > similar to that, but I'm trying to get confirmation.
>
> That would only be a problem since pci is allocated above
> maxmem so 64 bit pci addresses aren't accessible.
> With my proposal we can actually force firmware to avoid
> using 64 bit memory for that config.
> Will work better than today.
So you would remove completely the 64-bit _CRS in this case?
How do you handle migration in the above scenario from say 46bit host to
39bit host, where the firmware has mapped (while running on the source)
a 64-bit BAR above the destination's maximum physical address?
Thanks,
Paolo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 10:59 ` Paolo Bonzini
@ 2016-07-05 11:09 ` Michael S. Tsirkin
2016-07-05 11:46 ` Paolo Bonzini
2016-07-05 12:41 ` Dr. David Alan Gilbert
0 siblings, 2 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2016-07-05 11:09 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Dr. David Alan Gilbert, marcel, kraxel, qemu-devel, ehabkost
On Tue, Jul 05, 2016 at 12:59:42PM +0200, Paolo Bonzini wrote:
>
>
> On 05/07/2016 12:06, Michael S. Tsirkin wrote:
> > > -m 2G,slots=16,maxmem=2T
> > >
> > > On a host with a 39bit physaddress limit do you error
> > > on that or not? I think oVirt is currently doing something
> > > similar to that, but I'm trying to get confirmation.
> >
> > That would only be a problem since pci is allocated above
> > maxmem so 64 bit pci addresses aren't accessible.
> > With my proposal we can actually force firmware to avoid
> > using 64 bit memory for that config.
> > Will work better than today.
>
> So you would remove completely the 64-bit _CRS in this case?
Yes.
> How do you handle migration in the above scenario from say 46bit host to
> 39bit host, where the firmware has mapped (while running on the source)
> a 64-bit BAR above the destination's maximum physical address?
>
> Thanks,
>
> Paolo
Again management would specify how much 64 bit pci space firmware should use.
If more is specified than host can support we can error out.
--
MST
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case
2016-07-05 9:42 ` Daniel P. Berrange
@ 2016-07-05 11:29 ` Dr. David Alan Gilbert
2016-07-05 11:55 ` Daniel P. Berrange
0 siblings, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 11:29 UTC (permalink / raw)
To: Daniel P. Berrange; +Cc: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:08PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > On 32 bit systems fix up phys_bits to be consistent with what
> > we tell the guest; don't ever bother with using the phys_bits
> > property.
>
> > @@ -2990,6 +2986,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> > cpu->phys_bits = host_phys_bits;
> >
> > }
> > + } else {
> > + /* For 32 bit systems don't use the user set value, but keep
> > + * phys_bits consistent with what we tell the guest.
> > + */
> > + if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
> > + cpu->phys_bits = 36;
> > + } else {
> > + cpu->phys_bits = 32;
> > + }
>
> I kind of feel like we should report an error and exit if the
> user/app has provided a phys_bits property value, rather than
> silently ignoring their provided value, on the basis that this
> is a user/app configuration error.
Do we have an easy way to tell that the user has set the parameter
as opposed to it being the default?
Dave
> Regards,
> Daniel
> --
> |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org :|
> |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 11:09 ` Michael S. Tsirkin
@ 2016-07-05 11:46 ` Paolo Bonzini
2016-07-05 12:39 ` Michael S. Tsirkin
2016-07-05 12:41 ` Dr. David Alan Gilbert
1 sibling, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-07-05 11:46 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: marcel, qemu-devel, Dr. David Alan Gilbert, ehabkost, kraxel
On 05/07/2016 13:09, Michael S. Tsirkin wrote:
> > How do you handle migration in the above scenario from say 46bit host to
> > 39bit host, where the firmware has mapped (while running on the source)
> > a 64-bit BAR above the destination's maximum physical address?
>
> Again management would specify how much 64 bit pci space firmware should use.
> If more is specified than host can support we can error out.
Ok, so the destination would fail to start.
It sounds good, and it may provide a good reason not to enable
autodetection of phys-addr-bits for new machine types. We still want
David's patches for -cpu host, and for improved backwards compatibility
now that KVM_SET_MSR error detection is strict.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case
2016-07-05 11:29 ` Dr. David Alan Gilbert
@ 2016-07-05 11:55 ` Daniel P. Berrange
2016-07-05 19:05 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 39+ messages in thread
From: Daniel P. Berrange @ 2016-07-05 11:55 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
On Tue, Jul 05, 2016 at 12:29:30PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Mon, Jul 04, 2016 at 08:16:08PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > On 32 bit systems fix up phys_bits to be consistent with what
> > > we tell the guest; don't ever bother with using the phys_bits
> > > property.
> >
> > > @@ -2990,6 +2986,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> > > cpu->phys_bits = host_phys_bits;
> > >
> > > }
> > > + } else {
> > > + /* For 32 bit systems don't use the user set value, but keep
> > > + * phys_bits consistent with what we tell the guest.
> > > + */
> > > + if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
> > > + cpu->phys_bits = 36;
> > > + } else {
> > > + cpu->phys_bits = 32;
> > > + }
> >
> > I kind of feel like we should report an error and exit if the
> > user/app has provided a phys_bits property value, rather than
> > silently ignoring their provided value, on the basis that this
> > is a user/app configuration error.
>
> Do we have an easy way to tell that the user has set the parameter
> as opposed to it being the default?
Not sure if there's an official way, but you could perhaps default
phys_bits to -1, and treat -1 as being equivalent to 0 if set by
the user.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 9:49 ` Dr. David Alan Gilbert
@ 2016-07-05 12:38 ` Eduardo Habkost
0 siblings, 0 replies; 39+ messages in thread
From: Eduardo Habkost @ 2016-07-05 12:38 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Daniel P. Berrange, qemu-devel, pbonzini, marcel, mst, kraxel
On Tue, Jul 05, 2016 at 10:49:48AM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > QEMU sets the guests physical address bits to 40; this is wrong
> > > on most hardware, and can be detected by the guest.
> > > It also stops you using really huge multi-TB VMs.
> > >
> > > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > > of years that reads the hosts value and uses that in the guest. That's
> > > correct as far as the guest sees it, and lets you create huge VMs.
> > >
> > > The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> > > life gets complicated in migration; prior to 2.6 it all apparently
> > > worked (although a guest that looked might spot the change).
> > > In 2.6 Paolo started checking MSR writes and they failed when the
> > > incoming MTRR mask didn't fit.
> > >
> > > This series:
> > > a) Fixes up mtrr masks so that if you're migrating between hosts
> > > of different physical address size it tries to do something sensible.
> > >
> > > b) Lets you specify the guest physical address size via a CPU property, i.e.
> > > -cpu SandyBridge,phys-bits=36
> > >
> > > The default on old machine types is to use the existing 40 bits value.
> > >
> > > c) Lets you tell qemu to use the same setting as the host, i.e.
> > > -cpu SandyBridge,phys-bits=0
> > >
> > > This is the default on new machine types.
> >
> > As a general rule we've tried to say that if you pick an explicit CPU
> > model, we're migration safe. By having the phys-bits default value
> > always reflect the host CPU value, it feels like we've made the explicit
> > CPU model choice less safe, just like -cpu host is.
> >
> > IOW, if choosing a named CPU model, it feels like we should have a
> > corresponding fixed phys-bit value for that CPU model, even if it
> > has to be quiet conservative (eg default to bits=36). A phys-bits=0
> > value should only be used with -cpu host.
>
> phys-bits doesn't follow a cpu model in real hardware
> e.g. a SandyBridge Xeon and a SandyBridge i7 are different.
>
> So unless we suddenly created at least 2x as many cpu models we can't
> do that.
I believe Daniel's point is not whether the CPU model matches
real hardware (we don't really care), but whether its ABI is
stable and migration-safe.
So, replying to Daniel: the issue you point out is real, but the
problem here is that both options are bad and we need to choose
the least harmful one.
We can't emulate properly a phys-bits=36 VM in a phys-bits=40
host, and vice versa. So keeping phys-bits the same on migration
is also a bad solution. See the v1 discussion thread:
Subject: Default for phys-addr-bits? (was Re: [PATCH 4/5] x86:
Allow physical address bits to be set)
http://article.gmane.org/gmane.comp.emulators.qemu/421132
--
Eduardo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 11:46 ` Paolo Bonzini
@ 2016-07-05 12:39 ` Michael S. Tsirkin
0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2016-07-05 12:39 UTC (permalink / raw)
To: Paolo Bonzini
Cc: marcel, qemu-devel, Dr. David Alan Gilbert, ehabkost, kraxel
On Tue, Jul 05, 2016 at 01:46:58PM +0200, Paolo Bonzini wrote:
>
>
> On 05/07/2016 13:09, Michael S. Tsirkin wrote:
> > > How do you handle migration in the above scenario from say 46bit host to
> > > 39bit host, where the firmware has mapped (while running on the source)
> > > a 64-bit BAR above the destination's maximum physical address?
> >
> > Again management would specify how much 64 bit pci space firmware should use.
> > If more is specified than host can support we can error out.
>
> Ok, so the destination would fail to start.
>
> It sounds good, and it may provide a good reason not to enable
> autodetection of phys-addr-bits for new machine types. We still want
> David's patches for -cpu host,
Right, but I think current patches aren't limited to -cpu host -
I guess they will need some changes then?
> and for improved backwards compatibility
> now that KVM_SET_MSR error detection is strict.
Sounds good.
> Thanks,
>
> Paolo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 11:09 ` Michael S. Tsirkin
2016-07-05 11:46 ` Paolo Bonzini
@ 2016-07-05 12:41 ` Dr. David Alan Gilbert
2016-07-05 13:38 ` Michael S. Tsirkin
1 sibling, 1 reply; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 12:41 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Paolo Bonzini, marcel, kraxel, qemu-devel, ehabkost
* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Tue, Jul 05, 2016 at 12:59:42PM +0200, Paolo Bonzini wrote:
> >
> >
> > On 05/07/2016 12:06, Michael S. Tsirkin wrote:
> > > > -m 2G,slots=16,maxmem=2T
> > > >
> > > > On a host with a 39bit physaddress limit do you error
> > > > on that or not? I think oVirt is currently doing something
> > > > similar to that, but I'm trying to get confirmation.
> > >
> > > That would only be a problem since pci is allocated above
> > > maxmem so 64 bit pci addresses aren't accessible.
> > > With my proposal we can actually force firmware to avoid
> > > using 64 bit memory for that config.
> > > Will work better than today.
> >
> > So you would remove completely the 64-bit _CRS in this case?
>
> Yes.
>
> > How do you handle migration in the above scenario from say 46bit host to
> > 39bit host, where the firmware has mapped (while running on the source)
> > a 64-bit BAR above the destination's maximum physical address?
> >
> > Thanks,
> >
> > Paolo
>
> Again management would specify how much 64 bit pci space firmware should use.
> If more is specified than host can support we can error out.
What stops the guest OS mapping PCI stuff high up - however much you change
the firmware?
Dave
>
> --
> MST
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches
2016-07-05 12:41 ` Dr. David Alan Gilbert
@ 2016-07-05 13:38 ` Michael S. Tsirkin
0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2016-07-05 13:38 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Paolo Bonzini, marcel, kraxel, qemu-devel, ehabkost
On Tue, Jul 05, 2016 at 01:41:50PM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Tue, Jul 05, 2016 at 12:59:42PM +0200, Paolo Bonzini wrote:
> > >
> > >
> > > On 05/07/2016 12:06, Michael S. Tsirkin wrote:
> > > > > -m 2G,slots=16,maxmem=2T
> > > > >
> > > > > On a host with a 39bit physaddress limit do you error
> > > > > on that or not? I think oVirt is currently doing something
> > > > > similar to that, but I'm trying to get confirmation.
> > > >
> > > > That would only be a problem since pci is allocated above
> > > > maxmem so 64 bit pci addresses aren't accessible.
> > > > With my proposal we can actually force firmware to avoid
> > > > using 64 bit memory for that config.
> > > > Will work better than today.
> > >
> > > So you would remove completely the 64-bit _CRS in this case?
> >
> > Yes.
> >
> > > How do you handle migration in the above scenario from say 46bit host to
> > > 39bit host, where the firmware has mapped (while running on the source)
> > > a 64-bit BAR above the destination's maximum physical address?
> > >
> > > Thanks,
> > >
> > > Paolo
> >
> > Again management would specify how much 64 bit pci space firmware should use.
> > If more is specified than host can support we can error out.
>
> What stops the guest OS mapping PCI stuff high up - however much you change
> the firmware?
>
> Dave
It only maps what is described by _CRS.
Mapping addresses outside CRS will crash baremetal too,
with unpredictable results.
> >
> > --
> > MST
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/6] x86: Allow physical address bits to be set
2016-07-04 19:33 ` Eduardo Habkost
@ 2016-07-05 13:43 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 13:43 UTC (permalink / raw)
To: Eduardo Habkost; +Cc: qemu-devel, pbonzini, marcel, mst, kraxel
* Eduardo Habkost (ehabkost@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:04PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Currently QEMU sets the x86 number of physical address bits to the
> > magic number 40. This is only correct on some small AMD systems;
> > Intel systems tend to have 36, 39, 46 bits, and large AMD systems
> > tend to have 48.
> >
> > Having the value different from your actual hardware is detectable
> > by the guest and in principal can cause problems;
> > The current limit of 40 stops TB VMs being created by those lucky
> > enough to have that much.
>
> target-i386/cpu.h has:
>
> #ifdef TARGET_X86_64
> #define TARGET_PHYS_ADDR_SPACE_BITS 52
> /* ??? This is really 48 bits, sign-extended, but the only thing
> accessible to userland with bit 48 set is the VSYSCALL, and that
> is handled via other mechanisms. */
> #define TARGET_VIRT_ADDR_SPACE_BITS 47
> #else
> #define TARGET_PHYS_ADDR_SPACE_BITS 36
> #define TARGET_VIRT_ADDR_SPACE_BITS 32
> #endif
>
> Where did those values come from? What are they used for?
>
> /* XXX: This value should match the one returned by CPUID
> * and in exec.c */
> # if defined(TARGET_X86_64)
> # define PHYS_ADDR_MASK 0xffffffffffLL
> # else
> # define PHYS_ADDR_MASK 0xfffffffffLL
> # endif
>
> PHYS_ADDR_MASK is only used at:
>
> #define PG_HI_RSVD_MASK (PG_ADDRESS_MASK & ~PHYS_ADDR_MASK)
>
> PG_HI_RSVD_MASK is only used at x86_cpu_handle_mmu_fault(), and I
> don't know if that function is TCG-specific or not.
Ah, so that explains what that other comment was about;
to me that looks TCG only, so I should actually be careful in my phys-bits=0
case to go back to using 40 for TCG.
> > This patch lets you set the physical bits by a cpu property but
> > defaults to the same existing magic 40.
> >
> > I've removed the ancient warning about the 42 bit limit in exec.c;
> > I can't find that limit in there and no one else seems to know where
> > it is.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> > target-i386/cpu.c | 8 +++++---
> > target-i386/cpu.h | 3 +++
> > 2 files changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index 3bd3cfc..ab13ef5 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -2604,9 +2604,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> > /* virtual & phys address size in low 2 bytes. */
> > /* XXX: This value must match the one used in the MMU code. */
> > if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
> > - /* 64 bit processor */
> > -/* XXX: The physical address space is limited to 42 bits in exec.c. */
> > - *eax = 0x00003028; /* 48 bits virtual, 40 bits physical */
> > + /* 64 bit processor, 48 bits virtual, configurable
> > + * physical bits.
> > + */
> > + *eax = 0x00003000 + cpu->phys_bits;
>
> We need to ensure phys-bits is not out of range, either on the
> property setter or on realizefn.
>
> (I suggest realizefn. Custom property setters are evil)
Yep I've moved the sanity check up to this patch as commented in the
later one.
> Now, related to the questions above about the cpu.h macros: what
> would be the appropriate range to check for?
Probably TARGET_PHYS_ADDR_SPACE_BITS which is the magic 52 in the x86-64
case.
Dave
> > } else {
> > if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
> > *eax = 0x00000024; /* 36 bits physical */
> > @@ -3257,6 +3258,7 @@ static Property x86_cpu_properties[] = {
> > DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
> > DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> > DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> > + DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 40),
> > DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
> > DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
> > DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
> > diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> > index 474b0b9..221b1a2 100644
> > --- a/target-i386/cpu.h
> > +++ b/target-i386/cpu.h
> > @@ -1181,6 +1181,9 @@ struct X86CPU {
> > /* Compatibility bits for old machine types: */
> > bool enable_cpuid_0xb;
> >
> > + /* Number of physical address bits supported */
> > + uint32_t phys_bits;
> > +
> > /* in order to simplify APIC support, we leave this pointer to the
> > user */
> > struct DeviceState *apic_state;
> > --
> > 2.7.4
> >
>
> --
> Eduardo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case
2016-07-05 11:55 ` Daniel P. Berrange
@ 2016-07-05 19:05 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 39+ messages in thread
From: Dr. David Alan Gilbert @ 2016-07-05 19:05 UTC (permalink / raw)
To: Daniel P. Berrange; +Cc: qemu-devel, pbonzini, ehabkost, marcel, mst, kraxel
* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Tue, Jul 05, 2016 at 12:29:30PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > On Mon, Jul 04, 2016 at 08:16:08PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > >
> > > > On 32 bit systems fix up phys_bits to be consistent with what
> > > > we tell the guest; don't ever bother with using the phys_bits
> > > > property.
> > >
> > > > @@ -2990,6 +2986,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> > > > cpu->phys_bits = host_phys_bits;
> > > >
> > > > }
> > > > + } else {
> > > > + /* For 32 bit systems don't use the user set value, but keep
> > > > + * phys_bits consistent with what we tell the guest.
> > > > + */
> > > > + if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
> > > > + cpu->phys_bits = 36;
> > > > + } else {
> > > > + cpu->phys_bits = 32;
> > > > + }
> > >
> > > I kind of feel like we should report an error and exit if the
> > > user/app has provided a phys_bits property value, rather than
> > > silently ignoring their provided value, on the basis that this
> > > is a user/app configuration error.
> >
> > Do we have an easy way to tell that the user has set the parameter
> > as opposed to it being the default?
>
> Not sure if there's an official way, but you could perhaps default
> phys_bits to -1, and treat -1 as being equivalent to 0 if set by
> the user.
Done in v3 (I used 9999 since I'd kept it as an unsigned)
Dave
>
>
> Regards,
> Daniel
> --
> |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org :|
> |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2016-07-05 19:06 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-04 19:16 [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Dr. David Alan Gilbert (git)
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 1/6] x86: Allow physical address bits to be set Dr. David Alan Gilbert (git)
2016-07-04 19:33 ` Eduardo Habkost
2016-07-05 13:43 ` Dr. David Alan Gilbert
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 2/6] x86: Mask mtrr mask based on CPU physical address limits Dr. David Alan Gilbert (git)
2016-07-04 20:02 ` Michael S. Tsirkin
2016-07-04 20:05 ` Eduardo Habkost
2016-07-04 22:37 ` Michael S. Tsirkin
2016-07-04 20:03 ` Eduardo Habkost
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask Dr. David Alan Gilbert (git)
2016-07-04 20:03 ` Michael S. Tsirkin
2016-07-04 20:14 ` Eduardo Habkost
2016-07-04 20:21 ` Eduardo Habkost
2016-07-05 8:39 ` Dr. David Alan Gilbert
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 4/6] x86: Set physical address bits based on host Dr. David Alan Gilbert (git)
2016-07-04 20:27 ` Eduardo Habkost
2016-07-05 8:44 ` Dr. David Alan Gilbert
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case Dr. David Alan Gilbert (git)
2016-07-05 9:42 ` Daniel P. Berrange
2016-07-05 11:29 ` Dr. David Alan Gilbert
2016-07-05 11:55 ` Daniel P. Berrange
2016-07-05 19:05 ` Dr. David Alan Gilbert
2016-07-04 19:16 ` [Qemu-devel] [PATCH v2 6/6] x86: Add sanity checks on phys_bits Dr. David Alan Gilbert (git)
2016-07-04 20:46 ` Eduardo Habkost
2016-07-05 10:40 ` Dr. David Alan Gilbert
2016-07-04 20:23 ` [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches Michael S. Tsirkin
2016-07-05 9:33 ` Dr. David Alan Gilbert
2016-07-05 10:06 ` Michael S. Tsirkin
2016-07-05 10:13 ` Dr. David Alan Gilbert
2016-07-05 10:41 ` Michael S. Tsirkin
2016-07-05 10:59 ` Paolo Bonzini
2016-07-05 11:09 ` Michael S. Tsirkin
2016-07-05 11:46 ` Paolo Bonzini
2016-07-05 12:39 ` Michael S. Tsirkin
2016-07-05 12:41 ` Dr. David Alan Gilbert
2016-07-05 13:38 ` Michael S. Tsirkin
2016-07-05 9:46 ` Daniel P. Berrange
2016-07-05 9:49 ` Dr. David Alan Gilbert
2016-07-05 12:38 ` Eduardo Habkost
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).