* [PATCH 1/6] xen: set cpu_callout_mask to make mtrr work
2009-05-12 23:27 [GIT PULL] xen /proc/mtrr implementation Jeremy Fitzhardinge
@ 2009-05-12 23:27 ` Jeremy Fitzhardinge
2009-05-12 23:27 ` [PATCH 2/6] xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has() Jeremy Fitzhardinge
` (5 subsequent siblings)
6 siblings, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
Jeremy Fitzhardinge
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
The mtrr code uses num_booting_cpus() to manage its corrdinated mtrr update,
which in turn depends on cpu_callout_mask.
[ Impact: bugfix; make /proc/mtrr writes work in Xen ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/xen/smp.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 429834e..4706af7 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -297,6 +297,8 @@ static int __cpuinit xen_cpu_up(unsigned int cpu)
xen_setup_timer(cpu);
xen_init_lock_cpu(cpu);
+ cpumask_set_cpu(cpu, cpu_callout_mask);
+
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
/* make sure interrupts start blocked */
--
1.6.0.6
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH 2/6] xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has()
2009-05-12 23:27 [GIT PULL] xen /proc/mtrr implementation Jeremy Fitzhardinge
2009-05-12 23:27 ` [PATCH 1/6] xen: set cpu_callout_mask to make mtrr work Jeremy Fitzhardinge
@ 2009-05-12 23:27 ` Jeremy Fitzhardinge
2009-05-12 23:27 ` [PATCH 3/6] xen mtrr: Use generic_validate_add_page() Jeremy Fitzhardinge
` (4 subsequent siblings)
6 siblings, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
Mark McLoughlin, Jeremy Fitzhardinge
From: Mark McLoughlin <markmc@redhat.com>
Use specific cpu_has_foo macros instead of generic cpu_has()
[ Impact: cleanup ]
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/kernel/cpu/mtrr/xen.c | 10 ++++------
1 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/xen.c b/arch/x86/kernel/cpu/mtrr/xen.c
index 6760907..8ac2ccd 100644
--- a/arch/x86/kernel/cpu/mtrr/xen.c
+++ b/arch/x86/kernel/cpu/mtrr/xen.c
@@ -41,15 +41,13 @@ static int __init xen_num_var_ranges(void)
void __init xen_init_mtrr(void)
{
- struct cpuinfo_x86 *c = &boot_cpu_data;
-
if (!xen_initial_domain())
return;
- if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
- (!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
- (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
- (!cpu_has(c, X86_FEATURE_CENTAUR_MCR)))
+ if (!cpu_has_mtrr &&
+ !cpu_has_k6_mtrr &&
+ !cpu_has_cyrix_arr &&
+ !cpu_has_centaur_mcr)
return;
mtrr_if = &xen_mtrr_ops;
--
1.6.0.6
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH 3/6] xen mtrr: Use generic_validate_add_page()
2009-05-12 23:27 [GIT PULL] xen /proc/mtrr implementation Jeremy Fitzhardinge
2009-05-12 23:27 ` [PATCH 1/6] xen: set cpu_callout_mask to make mtrr work Jeremy Fitzhardinge
2009-05-12 23:27 ` [PATCH 2/6] xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has() Jeremy Fitzhardinge
@ 2009-05-12 23:27 ` Jeremy Fitzhardinge
2009-05-12 23:27 ` [PATCH 4/6] xen mtrr: Implement xen_get_free_region() Jeremy Fitzhardinge
` (3 subsequent siblings)
6 siblings, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
Mark McLoughlin, Jeremy Fitzhardinge
From: Mark McLoughlin <markmc@redhat.com>
The hypervisor already performs the same validation, but
better to do it early before getting to the range combining
code.
[ Impact: cleanup, avoid Xen console noise ]
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/kernel/cpu/mtrr/xen.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/xen.c b/arch/x86/kernel/cpu/mtrr/xen.c
index 8ac2ccd..3077bb3 100644
--- a/arch/x86/kernel/cpu/mtrr/xen.c
+++ b/arch/x86/kernel/cpu/mtrr/xen.c
@@ -20,6 +20,7 @@ static int __init xen_num_var_ranges(void);
static struct mtrr_ops xen_mtrr_ops = {
.vendor = X86_VENDOR_UNKNOWN,
.get_free_region = generic_get_free_region,
+ .validate_add_page = generic_validate_add_page,
.have_wrcomb = positive_have_wrcomb,
.use_intel_if = 0,
.num_var_ranges = xen_num_var_ranges,
--
1.6.0.6
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH 4/6] xen mtrr: Implement xen_get_free_region()
2009-05-12 23:27 [GIT PULL] xen /proc/mtrr implementation Jeremy Fitzhardinge
` (2 preceding siblings ...)
2009-05-12 23:27 ` [PATCH 3/6] xen mtrr: Use generic_validate_add_page() Jeremy Fitzhardinge
@ 2009-05-12 23:27 ` Jeremy Fitzhardinge
2009-05-12 23:27 ` [PATCH 5/6] xen mtrr: Add xen_{get,set}_mtrr() implementations Jeremy Fitzhardinge
` (2 subsequent siblings)
6 siblings, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
Mark McLoughlin, Jeremy Fitzhardinge
From: Mark McLoughlin <markmc@redhat.com>
When an already set MTRR is being changed, we need to
first unset, since Xen also maintains a usage count.
[ Impact: complete xen mtrr implementation ]
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/kernel/cpu/mtrr/mtrr.h | 2 ++
arch/x86/kernel/cpu/mtrr/xen.c | 28 +++++++++++++++++++++++++++-
2 files changed, 29 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 3502f6c..52bd87d 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -5,6 +5,8 @@
#include <linux/types.h>
#include <linux/stddef.h>
+#include <asm/mtrr.h>
+
#define MTRRcap_MSR 0x0fe
#define MTRRdefType_MSR 0x2ff
diff --git a/arch/x86/kernel/cpu/mtrr/xen.c b/arch/x86/kernel/cpu/mtrr/xen.c
index 3077bb3..f99fa15 100644
--- a/arch/x86/kernel/cpu/mtrr/xen.c
+++ b/arch/x86/kernel/cpu/mtrr/xen.c
@@ -15,11 +15,37 @@
static int __init xen_num_var_ranges(void);
+static int xen_get_free_region(unsigned long base, unsigned long size,
+ int replace_reg)
+{
+ struct xen_platform_op op;
+ int error;
+
+ if (replace_reg < 0)
+ return generic_get_free_region(base, size, -1);
+
+ /* If we're replacing the contents of a register,
+ * we need to first unset it since Xen also keeps
+ * a usage count.
+ */
+ op.cmd = XENPF_del_memtype;
+ op.u.del_memtype.handle = 0;
+ op.u.del_memtype.reg = replace_reg;
+
+ error = HYPERVISOR_dom0_op(&op);
+ if (error) {
+ BUG_ON(error > 0);
+ return error;
+ }
+
+ return replace_reg;
+}
+
/* DOM0 TODO: Need to fill in the remaining mtrr methods to have full
* working userland mtrr support. */
static struct mtrr_ops xen_mtrr_ops = {
.vendor = X86_VENDOR_UNKNOWN,
- .get_free_region = generic_get_free_region,
+ .get_free_region = xen_get_free_region,
.validate_add_page = generic_validate_add_page,
.have_wrcomb = positive_have_wrcomb,
.use_intel_if = 0,
--
1.6.0.6
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH 5/6] xen mtrr: Add xen_{get,set}_mtrr() implementations
2009-05-12 23:27 [GIT PULL] xen /proc/mtrr implementation Jeremy Fitzhardinge
` (3 preceding siblings ...)
2009-05-12 23:27 ` [PATCH 4/6] xen mtrr: Implement xen_get_free_region() Jeremy Fitzhardinge
@ 2009-05-12 23:27 ` Jeremy Fitzhardinge
2009-05-12 23:27 ` [PATCH 6/6] xen mtrr: Kill some unnecessary includes Jeremy Fitzhardinge
2009-05-13 13:30 ` [GIT PULL] xen /proc/mtrr implementation Ingo Molnar
6 siblings, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
Mark McLoughlin, Jeremy Fitzhardinge
From: Mark McLoughlin <markmc@redhat.com>
Straightforward apart from the hack to turn mtrr_ops->set()
into a no-op on all but one CPU.
[ Impact: complete Xen mtrr implementation ]
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/kernel/cpu/mtrr/xen.c | 50 ++++++++++++++++++++++++++++++++++++++-
1 files changed, 48 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/xen.c b/arch/x86/kernel/cpu/mtrr/xen.c
index f99fa15..5e0aa2b 100644
--- a/arch/x86/kernel/cpu/mtrr/xen.c
+++ b/arch/x86/kernel/cpu/mtrr/xen.c
@@ -15,6 +15,52 @@
static int __init xen_num_var_ranges(void);
+static void xen_set_mtrr(unsigned int reg, unsigned long base,
+ unsigned long size, mtrr_type type)
+{
+ struct xen_platform_op op;
+ int error;
+
+ /* mtrr_ops->set() is called once per CPU,
+ * but Xen's ops apply to all CPUs.
+ */
+ if (smp_processor_id())
+ return;
+
+ if (size == 0) {
+ op.cmd = XENPF_del_memtype;
+ op.u.del_memtype.handle = 0;
+ op.u.del_memtype.reg = reg;
+ } else {
+ op.cmd = XENPF_add_memtype;
+ op.u.add_memtype.mfn = base;
+ op.u.add_memtype.nr_mfns = size;
+ op.u.add_memtype.type = type;
+ }
+
+ error = HYPERVISOR_dom0_op(&op);
+ BUG_ON(error != 0);
+}
+
+static void xen_get_mtrr(unsigned int reg, unsigned long *base,
+ unsigned long *size, mtrr_type *type)
+{
+ struct xen_platform_op op;
+
+ op.cmd = XENPF_read_memtype;
+ op.u.read_memtype.reg = reg;
+ if (HYPERVISOR_dom0_op(&op) != 0) {
+ *base = 0;
+ *size = 0;
+ *type = 0;
+ return;
+ }
+
+ *size = op.u.read_memtype.nr_mfns;
+ *base = op.u.read_memtype.mfn;
+ *type = op.u.read_memtype.type;
+}
+
static int xen_get_free_region(unsigned long base, unsigned long size,
int replace_reg)
{
@@ -41,10 +87,10 @@ static int xen_get_free_region(unsigned long base, unsigned long size,
return replace_reg;
}
-/* DOM0 TODO: Need to fill in the remaining mtrr methods to have full
- * working userland mtrr support. */
static struct mtrr_ops xen_mtrr_ops = {
.vendor = X86_VENDOR_UNKNOWN,
+ .set = xen_set_mtrr,
+ .get = xen_get_mtrr,
.get_free_region = xen_get_free_region,
.validate_add_page = generic_validate_add_page,
.have_wrcomb = positive_have_wrcomb,
--
1.6.0.6
^ permalink raw reply related [flat|nested] 48+ messages in thread* [PATCH 6/6] xen mtrr: Kill some unnecessary includes
2009-05-12 23:27 [GIT PULL] xen /proc/mtrr implementation Jeremy Fitzhardinge
` (4 preceding siblings ...)
2009-05-12 23:27 ` [PATCH 5/6] xen mtrr: Add xen_{get,set}_mtrr() implementations Jeremy Fitzhardinge
@ 2009-05-12 23:27 ` Jeremy Fitzhardinge
2009-05-13 13:30 ` [GIT PULL] xen /proc/mtrr implementation Ingo Molnar
6 siblings, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
Mark McLoughlin, Jeremy Fitzhardinge
From: Mark McLoughlin <markmc@redhat.com>
[ Impact: cleanup ]
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
arch/x86/kernel/cpu/mtrr/xen.c | 8 +-------
1 files changed, 1 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/xen.c b/arch/x86/kernel/cpu/mtrr/xen.c
index 5e0aa2b..4286569 100644
--- a/arch/x86/kernel/cpu/mtrr/xen.c
+++ b/arch/x86/kernel/cpu/mtrr/xen.c
@@ -1,12 +1,6 @@
#include <linux/init.h>
-#include <linux/proc_fs.h>
-#include <linux/ctype.h>
-#include <linux/module.h>
-#include <linux/seq_file.h>
-#include <asm/uaccess.h>
-#include <linux/mutex.h>
-
-#include <asm/mtrr.h>
+#include <linux/mm.h>
+
#include "mtrr.h"
#include <xen/interface/platform.h>
--
1.6.0.6
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-12 23:27 [GIT PULL] xen /proc/mtrr implementation Jeremy Fitzhardinge
` (5 preceding siblings ...)
2009-05-12 23:27 ` [PATCH 6/6] xen mtrr: Kill some unnecessary includes Jeremy Fitzhardinge
@ 2009-05-13 13:30 ` Ingo Molnar
2009-05-13 14:39 ` Jeremy Fitzhardinge
6 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2009-05-13 13:30 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Hi Ingo,
>
> This series of patches makes /proc/mtrr fully functional under Xen.
>
> Thanks,
> J
>
> The following changes since commit ce791368bb4a53d05e78e1588bac0aacde8db84c:
> Jeremy Fitzhardinge (1):
> xen/i386: make sure initial VGA/ISA mappings are not overridden
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git for-ingo/xen/dom0/mtrr
>
> Jeremy Fitzhardinge (1):
> xen: set cpu_callout_mask to make mtrr work
>
> Mark McLoughlin (5):
> xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has()
> xen mtrr: Use generic_validate_add_page()
> xen mtrr: Implement xen_get_free_region()
> xen mtrr: Add xen_{get,set}_mtrr() implementations
> xen mtrr: Kill some unnecessary includes
>
> arch/x86/kernel/cpu/mtrr/mtrr.h | 2 +
> arch/x86/kernel/cpu/mtrr/xen.c | 99 ++++++++++++++++++++++++++++++++-------
> arch/x86/xen/smp.c | 2 +
> 3 files changed, 86 insertions(+), 17 deletions(-)
i never got a reply to my question for your previous submission:
http://lkml.indiana.edu/hypermail/linux/kernel/0905.1/00152.html
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-13 13:30 ` [GIT PULL] xen /proc/mtrr implementation Ingo Molnar
@ 2009-05-13 14:39 ` Jeremy Fitzhardinge
2009-05-15 18:27 ` Ingo Molnar
0 siblings, 1 reply; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-13 14:39 UTC (permalink / raw)
To: Ingo Molnar
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel
Ingo Molnar wrote:
> i never got a reply to my question for your previous submission:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/0905.1/00152.html
>
That was in response to the mtrr patch in the dom0/core series.
> Please dont post patches with ugly TODO items in them.
I removed them in the repost.
> Also, a more general objection is that /proc/mtrr is a legacy
> interface, we dont really want to extend its use.
It's not an extended use; its just making the existing interface work
under Xen (ie, not breaking the userspace ABI). The only other
alternatives would be to 1) use Kconfig to prevent MTRR and Xen from
being set at the same time, or 2) put a runtime hack in to disable MTRR
when running under Xen. Neither seems like a good idea when we can just
keep the interface working.
> The Xen hypervisor
> should get proper PAT support instead ...
Well, it has PAT support, but there's an issue that the Xen PAT setup
isn't quite the same as Linux's (but I thought you were cc:d on the
discussion about that). We need to sort out some details about the
precise mechanism, but it looks like we'll be able to support PAT in
Linux guests relatively easily (but not immediately).
J
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-13 14:39 ` Jeremy Fitzhardinge
@ 2009-05-15 18:27 ` Ingo Molnar
2009-05-15 20:09 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2009-05-15 18:27 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Ingo Molnar wrote:
>> i never got a reply to my question for your previous submission:
>>
>> http://lkml.indiana.edu/hypermail/linux/kernel/0905.1/00152.html
>>
>
> That was in response to the mtrr patch in the dom0/core series.
>
>> Please dont post patches with ugly TODO items in them.
> I removed them in the repost.
>> Also, a more general objection is that /proc/mtrr is a legacy
>> interface, we dont really want to extend its use.
> It's not an extended use; its just making the existing interface work
> under Xen (ie, not breaking the userspace ABI). The only other
> alternatives would be to 1) use Kconfig to prevent MTRR and Xen from
> being set at the same time, or 2) put a runtime hack in to disable MTRR
> when running under Xen. Neither seems like a good idea when we can just
> keep the interface working.
Right now there's no MTRR support under Xen guests and the Xen
hypervisor was able to survive, right? Why should we do it under
dom0?
The MTRR code is extremely fragile, we dont really need an added
layer there. _Especially_ since /proc/mtrr is an obsolete API.
If you want to allow a guest to do MTRR ops, you can do it by
catching the native kernel accesses to the MTRR space. There's no
guest side support needed for that.
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-15 18:27 ` Ingo Molnar
@ 2009-05-15 20:09 ` Jeremy Fitzhardinge
2009-05-15 23:26 ` Eric W. Biederman
0 siblings, 1 reply; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-15 20:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel
Ingo Molnar wrote:
> Right now there's no MTRR support under Xen guests and the Xen
> hypervisor was able to survive, right? Why should we do it under
> dom0?
>
Because dom0 has direct hardware access, and is running real device
drivers. domU guests don't see physical memory, and so MTRR has no
relevance for them.
> The MTRR code is extremely fragile, we dont really need an added
> layer there. _Especially_ since /proc/mtrr is an obsolete API.
>
There's no added layer there. I'm just adding another implementation of
mtrr_ops.
/proc/mtrr is in wide use today. It may be planned for obsolescence,
but there's no way you can claim its obsolete today (my completely
up-to-date F10 X server is using it, for example). We don't break
oldish usermode ABIs in new kernels.
Besides, the MTRR code is also a kernel-internal API, used by DRM and
other drivers to configure the system MTRR state. Those drivers will
either perform badly or outright fail if they can't set the appropriate
cachability properties. That is not obsolete in any way.
> If you want to allow a guest to do MTRR ops, you can do it by
> catching the native kernel accesses to the MTRR space. There's no
> guest side support needed for that.
>
MTRR can't be virtualized like that. It can't be meaningfully
multiplexed, and must be set in a uniform way on all physical CPUs.
Guests run on virtual CPUs, and don't have any knowledge of what the
mapping of VCPU to PCPU is, or even any visibility of all PCPUs.
It is not a piece of per-guest state; it is system-wide property,
maintained by Xen. These patches add the mechanism for dom0 (=hardware
control domain) to tell Xen what state they should be in.
J
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-15 20:09 ` Jeremy Fitzhardinge
@ 2009-05-15 23:26 ` Eric W. Biederman
2009-05-15 23:49 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 48+ messages in thread
From: Eric W. Biederman @ 2009-05-15 23:26 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
Xen-devel
Jeremy Fitzhardinge <jeremy@goop.org> writes:
> Ingo Molnar wrote:
>> Right now there's no MTRR support under Xen guests and the Xen hypervisor was
>> able to survive, right? Why should we do it under dom0?
>>
>
> Because dom0 has direct hardware access, and is running real device drivers.
> domU guests don't see physical memory, and so MTRR has no relevance for them.
>> The MTRR code is extremely fragile, we dont really need an added layer
>> there. _Especially_ since /proc/mtrr is an obsolete API.
>>
>
> There's no added layer there. I'm just adding another implementation of
> mtrr_ops.
>
> /proc/mtrr is in wide use today. It may be planned for obsolescence, but
> there's no way you can claim its obsolete today (my completely up-to-date F10 X
> server is using it, for example). We don't break oldish usermode ABIs in new
> kernels.
Sure it is. There is a better newer replacement. It is taking a while to
get userspace transitioned but that is different. Honestly I am puzzled
why that it but whatever.
> Besides, the MTRR code is also a kernel-internal API, used by DRM and other
> drivers to configure the system MTRR state. Those drivers will either perform
> badly or outright fail if they can't set the appropriate cachability properties.
> That is not obsolete in any way.
There are about 5 of them so let's fix them.
With PAT we are in a much better position both for portability and for
flexibility.
>> If you want to allow a guest to do MTRR ops, you can do it by catching the
>> native kernel accesses to the MTRR space. There's no guest side support needed
>> for that.
>>
>
> MTRR can't be virtualized like that. It can't be meaningfully multiplexed, and
> must be set in a uniform way on all physical CPUs. Guests run on virtual CPUs,
> and don't have any knowledge of what the mapping of VCPU to PCPU is, or even any
> visibility of all PCPUs.
>
> It is not a piece of per-guest state; it is system-wide property, maintained by
> Xen. These patches add the mechanism for dom0 (=hardware control domain) to
> tell Xen what state they should be in.
Is it possible to fix PAT and get that working first. That is very definitely
the preferend API.
Eric
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-15 23:26 ` Eric W. Biederman
@ 2009-05-15 23:49 ` Jeremy Fitzhardinge
2009-05-16 3:22 ` Jesse Barnes
0 siblings, 1 reply; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-15 23:49 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
Xen-devel
Eric W. Biederman wrote:
>> /proc/mtrr is in wide use today. It may be planned for obsolescence, but
>> there's no way you can claim its obsolete today (my completely up-to-date F10 X
>> server is using it, for example). We don't break oldish usermode ABIs in new
>> kernels.
>>
>
> Sure it is. There is a better newer replacement. It is taking a while to
> get userspace transitioned but that is different. Honestly I am puzzled
> why that it but whatever.
>
There's no mention in feature-removal-schedule.txt.
>> Besides, the MTRR code is also a kernel-internal API, used by DRM and other
>> drivers to configure the system MTRR state. Those drivers will either perform
>> badly or outright fail if they can't set the appropriate cachability properties.
>> That is not obsolete in any way.
>>
>
> There are about 5 of them so let's fix them.
>
Well, I count at least 30+, but anyway.
> With PAT we are in a much better position both for portability and for
> flexibility.
>
PAT is relatively recent, and even more recently bug-free. There are
many people with processors which can't or won't do PAT; what's the plan
to support them? Just hit them with a performance regression? Or wrap
MTRR in some other API?
> Is it possible to fix PAT and get that working first. That is very definitely
> the preferend API.
>
Sure, when available. We're sorting out the details for Xen, but even
then it may not be available, either because we're running on an old
version of Xen, or because some other guest is using PAT differently.
But I honestly don't understand the hostility towards 120 lines of code
to make an interface (albeit legacy/deprecated/whatever) behave in an
expected way.
J
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-15 23:49 ` Jeremy Fitzhardinge
@ 2009-05-16 3:22 ` Jesse Barnes
2009-05-16 4:26 ` Eric W. Biederman
0 siblings, 1 reply; 48+ messages in thread
From: Jesse Barnes @ 2009-05-16 3:22 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Eric W. Biederman, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Xen-devel
On Fri, 15 May 2009 16:49:12 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Eric W. Biederman wrote:
> >> /proc/mtrr is in wide use today. It may be planned for
> >> obsolescence, but there's no way you can claim its obsolete today
> >> (my completely up-to-date F10 X server is using it, for example).
> >> We don't break oldish usermode ABIs in new kernels.
> >>
> >
> > Sure it is. There is a better newer replacement. It is taking a
> > while to get userspace transitioned but that is different.
> > Honestly I am puzzled why that it but whatever.
> >
>
> There's no mention in feature-removal-schedule.txt.
>
> >> Besides, the MTRR code is also a kernel-internal API, used by DRM
> >> and other drivers to configure the system MTRR state. Those
> >> drivers will either perform badly or outright fail if they can't
> >> set the appropriate cachability properties. That is not obsolete
> >> in any way.
> >
> > There are about 5 of them so let's fix them.
> >
>
> Well, I count at least 30+, but anyway.
>
> > With PAT we are in a much better position both for portability and
> > for flexibility.
> >
>
> PAT is relatively recent, and even more recently bug-free. There are
> many people with processors which can't or won't do PAT; what's the
> plan to support them? Just hit them with a performance regression?
> Or wrap MTRR in some other API?
>
> > Is it possible to fix PAT and get that working first. That is
> > very definitely the preferend API.
> >
>
> Sure, when available. We're sorting out the details for Xen, but
> even then it may not be available, either because we're running on an
> old version of Xen, or because some other guest is using PAT
> differently.
>
> But I honestly don't understand the hostility towards 120 lines of
> code to make an interface (albeit legacy/deprecated/whatever) behave
> in an expected way.
FWIW I think supporting the MTRR API in Xen makes sense. There's a lot
of old code out there that wants it; would be nice if it mostly worked,
especially at such a minimal cost. It's taken awhile to get PAT going
(and there are still issues here and there) so having the MTRR stuffa
available is awfully nice.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-16 3:22 ` Jesse Barnes
@ 2009-05-16 4:26 ` Eric W. Biederman
2009-05-16 18:22 ` Jesse Barnes
2009-05-18 4:57 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 48+ messages in thread
From: Eric W. Biederman @ 2009-05-16 4:26 UTC (permalink / raw)
To: Jesse Barnes
Cc: Jeremy Fitzhardinge, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Xen-devel
Jesse Barnes <jbarnes@virtuousgeek.org> writes:
> On Fri, 15 May 2009 16:49:12 -0700
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>> Eric W. Biederman wrote:
>> >> /proc/mtrr is in wide use today. It may be planned for
>> >> obsolescence, but there's no way you can claim its obsolete today
>> >> (my completely up-to-date F10 X server is using it, for example).
>> >> We don't break oldish usermode ABIs in new kernels.
>> >>
>> >
>> > Sure it is. There is a better newer replacement. It is taking a
>> > while to get userspace transitioned but that is different.
>> > Honestly I am puzzled why that it but whatever.
>> >
>>
>> There's no mention in feature-removal-schedule.txt.
I don't know that it makes sense to remove mtrrs but it certainly
doesn't make sense to use them if you can avoid it.
>> >> Besides, the MTRR code is also a kernel-internal API, used by DRM
>> >> and other drivers to configure the system MTRR state. Those
>> >> drivers will either perform badly or outright fail if they can't
>> >> set the appropriate cachability properties. That is not obsolete
>> >> in any way.
>> >
>> > There are about 5 of them so let's fix them.
>> >
>>
>> Well, I count at least 30+, but anyway.
Wow. We had a lot of those slip in. Definitely time to fix the
drivers.
>> > With PAT we are in a much better position both for portability and
>> > for flexibility.
>> >
>>
>> PAT is relatively recent, and even more recently bug-free. There are
>> many people with processors which can't or won't do PAT; what's the
>> plan to support them? Just hit them with a performance regression?
>> Or wrap MTRR in some other API?
PPro is roughly when PAT came out. I remember discussing this a while
ago and the conclusion was that there are very few systems with MTRRs
that don't have a usable PAT implementation. I expect many of those
systems are on their last legs today.
>> Sure, when available. We're sorting out the details for Xen, but
>> even then it may not be available, either because we're running on an
>> old version of Xen, or because some other guest is using PAT
>> differently.
There are only 3 states that are interesting. WB UC and WC. Since
Xen controls the page tables anyway. I expect it can even remap
it feels like it.
>> But I honestly don't understand the hostility towards 120 lines of
>> code to make an interface (albeit legacy/deprecated/whatever) behave
>> in an expected way.
> FWIW I think supporting the MTRR API in Xen makes sense. There's a lot
> of old code out there that wants it; would be nice if it mostly worked,
> especially at such a minimal cost. It's taken awhile to get PAT going
> (and there are still issues here and there) so having the MTRR stuffa
> available is awfully nice.
I won't argue that having MTRRs when you can makes sense. It is a bit
weird in a vitalized system. At a practical level there are an
increasing number of systems for which MTRRs are unusable because the
BIOS sets up overlapping mtrrs. With cheap entry level systems
shipping with 4G I expect it is becoming a majority of systems.
Eric
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-16 4:26 ` Eric W. Biederman
@ 2009-05-16 18:22 ` Jesse Barnes
2009-05-18 5:02 ` Jeremy Fitzhardinge
2009-05-18 4:57 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 48+ messages in thread
From: Jesse Barnes @ 2009-05-16 18:22 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Jeremy Fitzhardinge, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Xen-devel
On Fri, 15 May 2009 21:26:47 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:
> > FWIW I think supporting the MTRR API in Xen makes sense. There's a
> > lot of old code out there that wants it; would be nice if it mostly
> > worked, especially at such a minimal cost. It's taken awhile to
> > get PAT going (and there are still issues here and there) so having
> > the MTRR stuffa available is awfully nice.
>
> I won't argue that having MTRRs when you can makes sense. It is a bit
> weird in a vitalized system. At a practical level there are an
> increasing number of systems for which MTRRs are unusable because the
> BIOS sets up overlapping mtrrs. With cheap entry level systems
> shipping with 4G I expect it is becoming a majority of systems.
Yeah, MTRRs definitely have issues too; as you say on many recent
machines they're tougher to use. Either we need to reprogram them to
free some up for WC mappings, or use PAT. But that's a relatively
recent development (last year or two I think?).
This is really about what software Xen wants to support though. You
can say, "it would be easier for you to just support new software that
doesn't use MTRRs," and you might be right, but supporting older stuff
doesn't appear that difficult, and it sounds like something they want
to do.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-16 18:22 ` Jesse Barnes
@ 2009-05-18 5:02 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-18 5:02 UTC (permalink / raw)
To: Jesse Barnes
Cc: Eric W. Biederman, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Xen-devel
Jesse Barnes wrote:
> This is really about what software Xen wants to support though. You
> can say, "it would be easier for you to just support new software that
> doesn't use MTRRs," and you might be right, but supporting older stuff
> doesn't appear that difficult, and it sounds like something they want
> to do.
>
My rough target is that you should be able to take a pvops-dom0 kernel
and use it to replace the 2.6.18-xen (or other patched up -xen) kernel
on an existing server installation without having to replace very much
(or anything) else. In practise that means making it work in something
like RHEL 5 or SLES 10(?) environment. That seems to be what at least
some of my testers are doing.
Of course, most of the deployments will be with whatever new distro
ships with the kernel. But that doesn't mean we can write-off the old
stuff. (I think AKPM still tests current kernels on something like FC1
or 2 to check for general kernel/distro regressions.)
J
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-16 4:26 ` Eric W. Biederman
2009-05-16 18:22 ` Jesse Barnes
@ 2009-05-18 4:57 ` Jeremy Fitzhardinge
2009-05-18 8:59 ` Ingo Molnar
1 sibling, 1 reply; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-18 4:57 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Jesse Barnes, Ingo Molnar, the arch/x86 maintainers,
Linux Kernel Mailing List, Xen-devel
Eric W. Biederman wrote:
> There are only 3 states that are interesting. WB UC and WC. Since
> Xen controls the page tables anyway. I expect it can even remap
> it feels like it.
>
It would be awkward. A paravirtualized guest has direct access to the
real pagetables, and so would notice if Xen swizzled around the PAT bits
when it reads back a pagetable entry. We don't currently have any
paravirtualized hooks for adjusting the PTE flags, because there hasn't
been any need, and it would probably be pretty costly (lots of
read+bit-tests would turn into a function call). On the other hand,
there's probably only a few places (if any) in the kernel which actually
inspect the PAT status of an established PTE, so we could put in some
special case mapping there. It becomes a maintenance burden to 1) track
down all the right places, and then 2) make sure any new instances get
handled properly. So, not a preferred solution, I think.
But our planned approach is to simply make Xen use the same PAT layout
as Linux, and go from there. We still need to sort out the details of
how to handle other Xen guests which use the existing Xen PAT setup, how
to verify that Xen and the guest kernel are really using the same setup,
etc.
But since we support the last few year's worth of released versions of
Xen, we still need to handle the PAT-not-supported case with reasonable
grace.
> I won't argue that having MTRRs when you can makes sense. It is a bit
> weird in a vitalized system.
It's not really virtualized. We're talking about dom0, which is the
guest domain which has access to the real machine's real hardware; the
MTRR is part of that.
> At a practical level there are an
> increasing number of systems for which MTRRs are unusable because the
> BIOS sets up overlapping mtrrs. With cheap entry level systems
> shipping with 4G I expect it is becoming a majority of systems.
>
Yes, but that is true irrespective of Xen.
J
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-18 4:57 ` Jeremy Fitzhardinge
@ 2009-05-18 8:59 ` Ingo Molnar
2009-05-18 13:17 ` [Xen-devel] " Jan Beulich
` (2 more replies)
0 siblings, 3 replies; 48+ messages in thread
From: Ingo Molnar @ 2009-05-18 8:59 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Eric W. Biederman, Jesse Barnes, the arch/x86 maintainers,
Linux Kernel Mailing List, Xen-devel, Linus Torvalds,
H. Peter Anvin, Thomas Gleixner
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> It's not really virtualized. We're talking about dom0, which is
> the guest domain which has access to the real machine's real
> hardware; the MTRR is part of that.
That is a really broken model and design of virtualization:
splitting the hypervisor into Xen and then a separate Linux dom0
entity because reality called home a few years ago and you needed
actual working drivers and hardware support and a developer
community to pull that off ...
Here Xen invades an already fragile piece of upstream code
(/proc/mtrr) that is obsolete and on the way out. If you want a
solution you should add PAT support to Xen and you should use recent
upstream kernels. Or you should emulate /proc/mtrr in _Xen the
hypervisor_, if you really care that much - without increasing the
amount of crap in Linux.
Without a better reason than what you've given so far the answer is
really: "no thanks" ...
My suspicion is that Linus would (rightfully) refuse to pull such a
broken approach from me, so why should i pull it? If i'm wrong and
if you can get an Acked-by from Linus _before_ sending a pull
request we can override my NAK. I've Cc:-ed him, in case he wants to
express an opinion.
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread* [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-18 8:59 ` Ingo Molnar
@ 2009-05-18 13:17 ` Jan Beulich
2009-05-18 17:51 ` Chris Wright
2009-05-18 18:07 ` Jeremy Fitzhardinge
2 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2009-05-18 13:17 UTC (permalink / raw)
To: Ingo Molnar, Jeremy Fitzhardinge
Cc: the arch/x86 maintainers, Thomas Gleixner, Linus Torvalds,
Xen-devel, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin
>>> Ingo Molnar <mingo@elte.hu> 18.05.09 10:59 >>>
>Here Xen invades an already fragile piece of upstream code
>(/proc/mtrr) that is obsolete and on the way out. If you want a
>solution you should add PAT support to Xen and you should use recent
As Jeremy pointed out a number of times, Xen *does* have PAT support,
perhaps (that's my personal opinion) even superior to the Linux one, as
it doesn't redefine the 486-inherited caching mode attributes but rather
uses the full 3 bits that the hardware provides (and, as an
acknowledgement to the various hardware bugs, makes sure not to use
any large page mappings when using non-WB mappings).
>upstream kernels. Or you should emulate /proc/mtrr in _Xen the
>hypervisor_, if you really care that much - without increasing the
>amount of crap in Linux.
As Jeremy also pointed out previously, emulating the MTRRs in the
hypervisor is very undesirable (and technically at least very close to
impossible), as we're talking about the *real* MTTRs that need managing
here (whereas dealing with virtualized MTRRs in a fully virtualized guest
is a completely different - and very reasonable - thing).
I can only support Jeremy in asking that you please reconsider your NAK.
Jan
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-18 8:59 ` Ingo Molnar
2009-05-18 13:17 ` [Xen-devel] " Jan Beulich
@ 2009-05-18 17:51 ` Chris Wright
2009-05-18 18:07 ` Jeremy Fitzhardinge
2 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2009-05-18 17:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jeremy Fitzhardinge, Eric W. Biederman, Jesse Barnes,
the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
Linus Torvalds, H. Peter Anvin, Thomas Gleixner
* Ingo Molnar (mingo@elte.hu) wrote:
> Here Xen invades an already fragile piece of upstream code
> (/proc/mtrr) that is obsolete and on the way out. If you want a
> solution you should add PAT support to Xen and you should use recent
> upstream kernels. Or you should emulate /proc/mtrr in _Xen the
> hypervisor_, if you really care that much - without increasing the
> amount of crap in Linux.
Could you be specific re: technical issues? I see in the general mtrr
impact has one oddity:
+int __init common_num_var_ranges
+static int __init xen_num_var_ranges(void);
+.num_var_ranges =
A bit unusual to have an __init function in an ops table. Albeit safe in
this case. Could slightly minimize impact by keeping setup_num_var_ranges
and have it do:
if (mtrr_if->num_var_ranges)
num_var_ranges = mtrr_if->num_var_ranges();
else
num_var_ranges = common_num_var_ranges;
Similarly, could do a simple inline stub to remove the extra ifdef.
+#ifdef CONFIG_XEN_DOM0
+ xen_init_mtrr();
+#endif
But those are pretty minor. I think the changes proposed are pretty
small and reasonable to make existing /proc/mtrr usable in Xen dom0
(different discussion of when to formally deprecate /proc/mtrr).
thanks,
-chris
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-18 8:59 ` Ingo Molnar
2009-05-18 13:17 ` [Xen-devel] " Jan Beulich
2009-05-18 17:51 ` Chris Wright
@ 2009-05-18 18:07 ` Jeremy Fitzhardinge
2009-05-19 9:59 ` Ingo Molnar
2009-05-20 8:16 ` Andi Kleen
2 siblings, 2 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-18 18:07 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric W. Biederman, Jesse Barnes, the arch/x86 maintainers,
Linux Kernel Mailing List, Xen-devel, Linus Torvalds,
H. Peter Anvin, Thomas Gleixner
Ingo Molnar wrote:
> Here Xen invades an already fragile piece of upstream code
> (/proc/mtrr) that is obsolete and on the way out. If you want a
> solution you should add PAT support to Xen and you should use recent
> upstream kernels. Or you should emulate /proc/mtrr in _Xen the
> hypervisor_, if you really care that much - without increasing the
> amount of crap in Linux.
>
That's a gross mis-characterisation of what we're talking about here.
arch/x86 already defines an mtrr_ops, which defines how to manipulate
the MTRR registers. There are currently several implementations of that
interface. In Xen the MTRR registers belong to the hypervisor, but it
allows a privileged kernel to modify them via hypercalls. I simply
added a new, straightforward mtrr_ops implementation to do that. It
adds about 120 lines of new code, in a single mtrr/xen.c file.
That's it. I could add any number of bizarre convolutions to achieve
the same effect, but given that there's an existing interface that is
exactly designed for what we want to achieve, I have to admit it didn't
occur to me to do anything else.
MTRR may well be on its way out, and PAT is the proper way to achieve
the same effect from now on. But MTRR is still a widely used
kernel-internal API as well as usermode ABI, and it seems just
gratuitous to not support it when doing so is such a low-impact change.
And if/when it comes to be time to completely deprecate/remove mtrr
support, we can delete it along with everything else.
I'm honestly at a loss to explain the vehemence of your objection to
these changes given their nature.
We're also working on making PAT support work where possible. But that
doesn't mean we want to do without anything in the meantime.
But more generally, we need to work out how to move things forward.
That said, we can live without. If these MTRR patches are your biggest
concern, drop them, pull the rest so we can get them seen, tested, in
linux-next, etc, ready for the next merge window. You know, like you
said you wanted to do the last time you blocked the Xen patches.
I would prefer to move them through tip.git: I appreciate your
(constructive) comments and reviews, the testing infrastructure has
proved very valuable in the past, and they'll generally get wider
exposure than my pool of testers. And its just the right way to do it.
But if you're so concerned that they'll simply give Linus more
ammunition to beat you up with, to the extent that you're
second-guessing yourself, then I'm perfectly happy to submit them
myself. I don't think it would be an overall better outcome, but it
keeps the heat off you, and we'd be making some progress.
J
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-18 18:07 ` Jeremy Fitzhardinge
@ 2009-05-19 9:59 ` Ingo Molnar
2009-05-19 10:22 ` [Xen-devel] " Jan Beulich
2009-05-20 16:12 ` Jeremy Fitzhardinge
2009-05-20 8:16 ` Andi Kleen
1 sibling, 2 replies; 48+ messages in thread
From: Ingo Molnar @ 2009-05-19 9:59 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Eric W. Biederman, Jesse Barnes, the arch/x86 maintainers,
Linux Kernel Mailing List, Xen-devel, Linus Torvalds,
H. Peter Anvin, Thomas Gleixner
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Ingo Molnar wrote:
>> Here Xen invades an already fragile piece of upstream code
>> (/proc/mtrr) that is obsolete and on the way out. If you want a
>> solution you should add PAT support to Xen and you should use recent
>> upstream kernels. Or you should emulate /proc/mtrr in _Xen the
>> hypervisor_, if you really care that much - without increasing the
>> amount of crap in Linux.
>>
>
> That's a gross mis-characterisation of what we're talking about here.
>
> arch/x86 already defines an mtrr_ops, which defines how to
> manipulate the MTRR registers. There are currently several
> implementations of that interface. In Xen the MTRR registers
> belong to the hypervisor, but it allows a privileged kernel to
> modify them via hypercalls. I simply added a new, straightforward
> mtrr_ops implementation to do that. It adds about 120 lines of
> new code, in a single mtrr/xen.c file.
>
> That's it. I could add any number of bizarre convolutions to
> achieve the same effect, but given that there's an existing
> interface that is exactly designed for what we want to achieve, I
> have to admit it didn't occur to me to do anything else.
Exactly what is 'bizarre' about using the API defined by the _CPU_
already, without adding any ad-hoc hypecall? Catch the dom0 WRMSRs,
filter out the MTRR indices - that's it.
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread
* [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 9:59 ` Ingo Molnar
@ 2009-05-19 10:22 ` Jan Beulich
2009-05-19 11:08 ` Ingo Molnar
2009-05-20 16:12 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2009-05-19 10:22 UTC (permalink / raw)
To: Ingo Molnar, Jeremy Fitzhardinge
Cc: the arch/x86 maintainers, Thomas Gleixner, Linus Torvalds,
Xen-devel, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin
>>> Ingo Molnar <mingo@elte.hu> 19.05.09 11:59 >>>
>Exactly what is 'bizarre' about using the API defined by the _CPU_
>already, without adding any ad-hoc hypecall? Catch the dom0 WRMSRs,
>filter out the MTRR indices - that's it.
But that is *not* the same as using the hypercalls: The hypercall tells Xen
"Change all CPUs' MTRRs with the indicated index to the indicated value",
while the MSR write says "Change the MTRR with the given index on the
physical CPU the current virtual CPU happens to run on to the given value".
A write-base/write-mask pair may happen to get interrupted (preempted)
by the hypervisor, and hence the two writes may happen on different
pCPU-s. Teaching the hypervisor to (correctly!) guess what the guest
meant in that situation isn't trivial, as then it needs to handle all possible
situations (and it can never know whether Dom0 really intended to do
something that may look bogus/inconsistent at the first glance). This is
even more so considering that Linux is not the only OS capable of
running as Dom0.
An apparently relatively simple solution - latch the writes and commit them
only when the global view became consistent again - isn't possible,
since Dom0 may not have a vCPU running on each individual pCPU.
Jan
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 10:22 ` [Xen-devel] " Jan Beulich
@ 2009-05-19 11:08 ` Ingo Molnar
2009-05-19 12:04 ` Gerd Hoffmann
2009-05-20 16:35 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 48+ messages in thread
From: Ingo Molnar @ 2009-05-19 11:08 UTC (permalink / raw)
To: Jan Beulich
Cc: Jeremy Fitzhardinge, the arch/x86 maintainers, Thomas Gleixner,
Linus Torvalds, Xen-devel, Linux Kernel Mailing List,
Jesse Barnes, Eric W. Biederman, H. Peter Anvin
* Jan Beulich <JBeulich@novell.com> wrote:
> >>> Ingo Molnar <mingo@elte.hu> 19.05.09 11:59 >>>
>
> > Exactly what is 'bizarre' about using the API defined by the
> > _CPU_ already, without adding any ad-hoc hypecall? Catch the
> > dom0 WRMSRs, filter out the MTRR indices - that's it.
>
> But that is *not* the same as using the hypercalls: The hypercall
> tells Xen "Change all CPUs' MTRRs with the indicated index to the
> indicated value", while the MSR write says "Change the MTRR with
> the given index on the physical CPU the current virtual CPU
> happens to run on to the given value". [...]
The change of MTRR's on _any_ of the guest CPUs in a dom0 context
should immediately be refected on all CPUs. Assymetric MTRR settings
are madness.
( And the thing is, changing MTRRs is fragile and racy on native
Linux no matter what - even without any hypervisors - due to SMM
contexts possibly relying on them etc. )
> [...] A write-base/write-mask pair may happen to get interrupted
> (preempted) by the hypervisor, and hence the two writes may happen
> on different pCPU-s. Teaching the hypervisor to (correctly!) guess
> what the guest meant in that situation isn't trivial, as then it
> needs to handle all possible situations (and it can never know
> whether Dom0 really intended to do something that may look
> bogus/inconsistent at the first glance). [...]
None of this is a problem really if a sane approach is used: a
change to the MTRR state on dom0 is applied symmetrically on all
CPUs.
Or, alternatively, the hypervisor can expose its own administrative
interface to manage MTRRs.
There's no need to fuglify the Linux kernel for that.
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 11:08 ` Ingo Molnar
@ 2009-05-19 12:04 ` Gerd Hoffmann
2009-05-19 12:26 ` Ingo Molnar
2009-05-20 16:35 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 48+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 12:04 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
Linus Torvalds
On 05/19/09 13:08, Ingo Molnar wrote:
> Or, alternatively, the hypervisor can expose its own administrative
> interface to manage MTRRs.
Guess what? Xen does exactly that. And the xen mtrr_ops implementation
uses that interface ...
cheers,
Gerd
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 12:04 ` Gerd Hoffmann
@ 2009-05-19 12:26 ` Ingo Molnar
2009-05-19 12:32 ` Alan Cox
2009-05-19 13:21 ` Gerd Hoffmann
0 siblings, 2 replies; 48+ messages in thread
From: Ingo Molnar @ 2009-05-19 12:26 UTC (permalink / raw)
To: Gerd Hoffmann
Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
Linus Torvalds
* Gerd Hoffmann <kraxel@redhat.com> wrote:
> On 05/19/09 13:08, Ingo Molnar wrote:
>> Or, alternatively, the hypervisor can expose its own administrative
>> interface to manage MTRRs.
>
> Guess what? Xen does exactly that. And the xen mtrr_ops
> implementation uses that interface ...
No, that is not an 'administrative interface' - that is a guest
kernel level hack that complicates Linux, extends its effective ABI
dependencies and which has to be maintained there from that point
on.
There's really just three proper technical solutions here:
- either catch the lowlevel CPU hw ops (the MSR modifications, which
isnt really all that different from the mtrr_ops approach so it
shouldnt pose undue difficulties to the Xen hypervisor). That will
be maximally transparent and compatible, with zero changes needed
to the Linux kernel.
- or introduce its own hypercall API based administration API,
without bothering the guest kernel with crap. Trivially patch Xorg
to make use of it and that's it.
There is absolutely no reason to introduce some intermediate crap
into Linux.
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 12:26 ` Ingo Molnar
@ 2009-05-19 12:32 ` Alan Cox
2009-05-19 12:37 ` Ingo Molnar
2009-05-19 13:21 ` Gerd Hoffmann
1 sibling, 1 reply; 48+ messages in thread
From: Alan Cox @ 2009-05-19 12:32 UTC (permalink / raw)
To: Ingo Molnar
Cc: Gerd Hoffmann, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Jan Beulich, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
Eric W. Biederman
> - or introduce its own hypercall API based administration API,
> without bothering the guest kernel with crap. Trivially patch Xorg
> to make use of it and that's it.
PCI pass through mixed with that isn't going to be fun and PCI pass
through is one area where a lot of the MTRR manipulation is meaningful
and valid (and can be handled for the guest).
I really don't see why rd/wrmsr processing isn't sufficient for this
Alan
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 12:32 ` Alan Cox
@ 2009-05-19 12:37 ` Ingo Molnar
0 siblings, 0 replies; 48+ messages in thread
From: Ingo Molnar @ 2009-05-19 12:37 UTC (permalink / raw)
To: Alan Cox
Cc: Gerd Hoffmann, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Jan Beulich, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
Eric W. Biederman
* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > - or introduce its own hypercall API based administration API,
> > without bothering the guest kernel with crap. Trivially patch Xorg
> > to make use of it and that's it.
>
> PCI pass through mixed with that isn't going to be fun and PCI
> pass through is one area where a lot of the MTRR manipulation is
> meaningful and valid (and can be handled for the guest).
virtualizing that aspect of the CPU properly is certainly the
highest grade approach.
> I really don't see why rd/wrmsr processing isn't sufficient for
> this
/me neither.
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 12:26 ` Ingo Molnar
2009-05-19 12:32 ` Alan Cox
@ 2009-05-19 13:21 ` Gerd Hoffmann
2009-05-19 13:31 ` Ingo Molnar
1 sibling, 1 reply; 48+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 13:21 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
Linus Torvalds
On 05/19/09 14:26, Ingo Molnar wrote:
> * Gerd Hoffmann<kraxel@redhat.com> wrote:
>
>> On 05/19/09 13:08, Ingo Molnar wrote:
>>> Or, alternatively, the hypervisor can expose its own administrative
>>> interface to manage MTRRs.
>> Guess what? Xen does exactly that. And the xen mtrr_ops
>> implementation uses that interface ...
>
> No, that is not an 'administrative interface' - that is a guest
> kernel level hack that complicates Linux, extends its effective ABI
> dependencies and which has to be maintained there from that point
> on.
>
> There's really just three proper technical solutions here:
>
> - either catch the lowlevel CPU hw ops (the MSR modifications, which
> isnt really all that different from the mtrr_ops approach so it
> shouldnt pose undue difficulties to the Xen hypervisor).
Devil is in the details.
The dom0 kernel might not see all physical cpus on the system. So Xen
can't leave the job of looping over all cpus to the dom0 kernel, Xen has
to apply the changes made by the (priviledged) guest kernel on any
(virtual) cpu to all (physical) cpus in the machine.
Which in turn means the "lowlevel cpu hw op" would work in a slightly
different way on Xen and native. Nasty.
> That will
> be maximally transparent and compatible, with zero changes needed
> to the Linux kernel.
No, the linux kernel probably should do the wrmsr on one cpu only then.
> - or introduce its own hypercall API based administration API,
> without bothering the guest kernel with crap. Trivially patch Xorg
> to make use of it and that's it.
I have serious doubts that this is going to fly with KMS.
Oops, the third "proper technical solutions" is missing.
cheers,
Gerd
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 13:21 ` Gerd Hoffmann
@ 2009-05-19 13:31 ` Ingo Molnar
2009-05-19 13:51 ` Gerd Hoffmann
0 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2009-05-19 13:31 UTC (permalink / raw)
To: Gerd Hoffmann
Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
Linus Torvalds
* Gerd Hoffmann <kraxel@redhat.com> wrote:
> On 05/19/09 14:26, Ingo Molnar wrote:
>> * Gerd Hoffmann<kraxel@redhat.com> wrote:
>>
>>> On 05/19/09 13:08, Ingo Molnar wrote:
>>>> Or, alternatively, the hypervisor can expose its own administrative
>>>> interface to manage MTRRs.
>>> Guess what? Xen does exactly that. And the xen mtrr_ops
>>> implementation uses that interface ...
>>
>> No, that is not an 'administrative interface' - that is a guest
>> kernel level hack that complicates Linux, extends its effective ABI
>> dependencies and which has to be maintained there from that point
>> on.
>>
>> There's really just three proper technical solutions here:
>>
>> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>> isnt really all that different from the mtrr_ops approach so it
>> shouldnt pose undue difficulties to the Xen hypervisor).
>
> Devil is in the details.
>
> The dom0 kernel might not see all physical cpus on the system. So
> Xen can't leave the job of looping over all cpus to the dom0
> kernel, Xen has to apply the changes made by the (priviledged)
> guest kernel on any (virtual) cpu to all (physical) cpus in the
> machine.
Applying MTRR changes to only part of the CPUs is utter madness.
> Which in turn means the "lowlevel cpu hw op" would work in a
> slightly different way on Xen and native. Nasty.
>
>> That will
>> be maximally transparent and compatible, with zero changes needed
>> to the Linux kernel.
>
> No, the linux kernel probably should do the wrmsr on one cpu only then.
Why?
>> - or introduce its own hypercall API based administration API,
>> without bothering the guest kernel with crap. Trivially patch Xorg
>> to make use of it and that's it.
>
> I have serious doubts that this is going to fly with KMS.
>
> Oops, the third "proper technical solutions" is missing.
Yeah, the third one is to not touch MTRRs after bootup and use PAT.
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 13:31 ` Ingo Molnar
@ 2009-05-19 13:51 ` Gerd Hoffmann
2009-05-19 14:17 ` Ingo Molnar
0 siblings, 1 reply; 48+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 13:51 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
Linus Torvalds
On 05/19/09 15:31, Ingo Molnar wrote:
> * Gerd Hoffmann<kraxel@redhat.com> wrote:
>
>> On 05/19/09 14:26, Ingo Molnar wrote:
>>> * Gerd Hoffmann<kraxel@redhat.com> wrote:
>>>
>>>> On 05/19/09 13:08, Ingo Molnar wrote:
>>>>> Or, alternatively, the hypervisor can expose its own administrative
>>>>> interface to manage MTRRs.
>>>> Guess what? Xen does exactly that. And the xen mtrr_ops
>>>> implementation uses that interface ...
>>> No, that is not an 'administrative interface' - that is a guest
>>> kernel level hack that complicates Linux, extends its effective ABI
>>> dependencies and which has to be maintained there from that point
>>> on.
>>>
>>> There's really just three proper technical solutions here:
>>>
>>> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>>> isnt really all that different from the mtrr_ops approach so it
>>> shouldnt pose undue difficulties to the Xen hypervisor).
>> Devil is in the details.
>>
>> The dom0 kernel might not see all physical cpus on the system. So
>> Xen can't leave the job of looping over all cpus to the dom0
>> kernel, Xen has to apply the changes made by the (priviledged)
>> guest kernel on any (virtual) cpu to all (physical) cpus in the
>> machine.
>
> Applying MTRR changes to only part of the CPUs is utter madness.
Sure. Do you read what I'm writing?
>> Which in turn means the "lowlevel cpu hw op" would work in a
>> slightly different way on Xen and native. Nasty.
>>
>>> That will
>>> be maximally transparent and compatible, with zero changes needed
>>> to the Linux kernel.
>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>
> Why?
See above. Xen has to apply the changes to all cpus anyway.
>> Oops, the third "proper technical solutions" is missing.
>
> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
Works only in case the CPU has PAT support.
cheers,
Gerd
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 13:51 ` Gerd Hoffmann
@ 2009-05-19 14:17 ` Ingo Molnar
2009-05-19 14:55 ` Gerd Hoffmann
0 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2009-05-19 14:17 UTC (permalink / raw)
To: Gerd Hoffmann
Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
Linus Torvalds
* Gerd Hoffmann <kraxel@redhat.com> wrote:
> On 05/19/09 15:31, Ingo Molnar wrote:
>> * Gerd Hoffmann<kraxel@redhat.com> wrote:
>>
>>> On 05/19/09 14:26, Ingo Molnar wrote:
>>>> * Gerd Hoffmann<kraxel@redhat.com> wrote:
>>>>
>>>>> On 05/19/09 13:08, Ingo Molnar wrote:
>>>>>> Or, alternatively, the hypervisor can expose its own administrative
>>>>>> interface to manage MTRRs.
>>>>> Guess what? Xen does exactly that. And the xen mtrr_ops
>>>>> implementation uses that interface ...
>>>> No, that is not an 'administrative interface' - that is a guest
>>>> kernel level hack that complicates Linux, extends its effective ABI
>>>> dependencies and which has to be maintained there from that point
>>>> on.
>>>>
>>>> There's really just three proper technical solutions here:
>>>>
>>>> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>>>> isnt really all that different from the mtrr_ops approach so it
>>>> shouldnt pose undue difficulties to the Xen hypervisor).
>>> Devil is in the details.
>>>
>>> The dom0 kernel might not see all physical cpus on the system. So
>>> Xen can't leave the job of looping over all cpus to the dom0
>>> kernel, Xen has to apply the changes made by the (priviledged)
>>> guest kernel on any (virtual) cpu to all (physical) cpus in the
>>> machine.
>>
>> Applying MTRR changes to only part of the CPUs is utter madness.
>
> Sure. Do you read what I'm writing?
>
>>> Which in turn means the "lowlevel cpu hw op" would work in a
>>> slightly different way on Xen and native. Nasty.
>>>
>>>> That will
>>>> be maximally transparent and compatible, with zero changes needed
>>>> to the Linux kernel.
>>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>>
>> Why?
>
> See above. Xen has to apply the changes to all cpus anyway.
do _you_ read what i wrote, in the thread you are replying to:
|
| The change of MTRR's on _any_ of the guest CPUs in a dom0 context
| should immediately be refected on all CPUs. Assymetric MTRR
| settings are madness.
|
>>> Oops, the third "proper technical solutions" is missing.
>>
>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>
> Works only in case the CPU has PAT support.
Which specific CPU without PAT support do you worry about?
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 14:17 ` Ingo Molnar
@ 2009-05-19 14:55 ` Gerd Hoffmann
2009-05-19 15:24 ` Ingo Molnar
0 siblings, 1 reply; 48+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 14:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
Linus Torvalds
>>>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>>> Why?
> | The change of MTRR's on _any_ of the guest CPUs in a dom0 context
> | should immediately be refected on all CPUs. Assymetric MTRR
> | settings are madness.
Exactly. And thats why it is pointless to let the dom0 kernel write the
mtrr msrs on more than one cpu.
>>>> Oops, the third "proper technical solutions" is missing.
>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>> Works only in case the CPU has PAT support.
>
> Which specific CPU without PAT support do you worry about?
For example: I have a Notebook here, with MTRR and without PAT
according to the boot log. "Pentium III (Coppermine)" says /proc/cpuinfo.
cheers,
Gerd
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 14:55 ` Gerd Hoffmann
@ 2009-05-19 15:24 ` Ingo Molnar
2009-05-20 8:01 ` Gerd Hoffmann
0 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2009-05-19 15:24 UTC (permalink / raw)
To: Gerd Hoffmann
Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
Linus Torvalds
* Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>>>> Why?
>
>> | The change of MTRR's on _any_ of the guest CPUs in a dom0 context
>> | should immediately be refected on all CPUs. Assymetric MTRR
>> | settings are madness.
>
> Exactly. And thats why it is pointless to let the dom0 kernel
> write the mtrr msrs on more than one cpu.
How does this negate my claim that the Linux kernel needs no
modifications? (which i think your point is - let me know if you
have some other point here.)
the Xen hypervisor can simply repeat all requests (i.e. not care at
all about the fact that a guest does these modifications on all CPUs
it sees), or realize that the modification has already been done and
skip it. This is in no way a performance sensitive codepath - mtrrs
are only modified in init sequences - and setting mtrrs is slow and
globally serialized to begin with.
>>>>> Oops, the third "proper technical solutions" is missing.
>>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>>> Works only in case the CPU has PAT support.
>>
>> Which specific CPU without PAT support do you worry about?
>
> For example: I have a Notebook here, with MTRR and without PAT
> according to the boot log. "Pentium III (Coppermine)" says
> /proc/cpuinfo.
That's a really old CPU, but even Coppermine has PAT support in the
CPU. You need to go back to things like P5 200 MHz CPUs to find
PAT-less CPUs.
On the Linux side it's easy to enable it (and _such_ a patch would
make sense indeed) - it's just that nobody has yet gone through the
effort of testing and validating the PAT code on that CPU. If you
care enough, you can do it, send a patch and ping the Intel folks
about it.
Once the issues are framed correctly, Linux is helped for real.
Ingo
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 15:24 ` Ingo Molnar
@ 2009-05-20 8:01 ` Gerd Hoffmann
0 siblings, 0 replies; 48+ messages in thread
From: Gerd Hoffmann @ 2009-05-20 8:01 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
Linus Torvalds
On 05/19/09 17:24, Ingo Molnar wrote:
> the Xen hypervisor can simply repeat all requests (i.e. not care at
> all about the fact that a guest does these modifications on all CPUs
> it sees), or realize that the modification has already been done and
> skip it.
Could be done, yes. It still feels wrong that wrmsr(mtrr) works
slightly different on xen and on native. And it wouldn't work on
existing Xen deployments as the Xen hypervisor doesn't support that today.
>>>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
> That's a really old CPU, but even Coppermine has PAT support in the
> CPU. You need to go back to things like P5 200 MHz CPUs to find
> PAT-less CPUs.
Linux shouln't say "PAT not supported by CPU." then.
Also it doesn't make sense to me to handle things differently on native
and xen. While it might make sense to deprecate mtrrs in favor of PAT
(don't know enougth about all the different cpus in the wild to justify
that) I don't think it makes sense to do that for xen only. Native
should declare mtrrs obsolete as well.
cheers,
Gerd
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 11:08 ` Ingo Molnar
2009-05-19 12:04 ` Gerd Hoffmann
@ 2009-05-20 16:35 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-20 16:35 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jan Beulich, the arch/x86 maintainers, Thomas Gleixner,
Linus Torvalds, Xen-devel, Linux Kernel Mailing List,
Jesse Barnes, Eric W. Biederman, H. Peter Anvin
Ingo Molnar wrote:
> * Jan Beulich <JBeulich@novell.com> wrote:
>
>
>>>>> Ingo Molnar <mingo@elte.hu> 19.05.09 11:59 >>>
>>>>>
>>> Exactly what is 'bizarre' about using the API defined by the
>>> _CPU_ already, without adding any ad-hoc hypecall? Catch the
>>> dom0 WRMSRs, filter out the MTRR indices - that's it.
>>>
>> But that is *not* the same as using the hypercalls: The hypercall
>> tells Xen "Change all CPUs' MTRRs with the indicated index to the
>> indicated value", while the MSR write says "Change the MTRR with
>> the given index on the physical CPU the current virtual CPU
>> happens to run on to the given value". [...]
>>
>
> The change of MTRR's on _any_ of the guest CPUs in a dom0 context
> should immediately be refected on all CPUs. Assymetric MTRR settings
> are madness.
>
> ( And the thing is, changing MTRRs is fragile and racy on native
> Linux no matter what - even without any hypervisors - due to SMM
> contexts possibly relying on them etc. )
>
>
>> [...] A write-base/write-mask pair may happen to get interrupted
>> (preempted) by the hypervisor, and hence the two writes may happen
>> on different pCPU-s. Teaching the hypervisor to (correctly!) guess
>> what the guest meant in that situation isn't trivial, as then it
>> needs to handle all possible situations (and it can never know
>> whether Dom0 really intended to do something that may look
>> bogus/inconsistent at the first glance). [...]
>>
>
> None of this is a problem really if a sane approach is used: a
> change to the MTRR state on dom0 is applied symmetrically on all
> CPUs.
>
> Or, alternatively, the hypervisor can expose its own administrative
> interface to manage MTRRs.
>
> There's no need to fuglify the Linux kernel for that.
I'm not sure what you mean by that, other than as a description of the
current case. The Xen MTRR hypercall:
1. treats MTRR ranges as allocatable resources, and keep track of how
many uses there are of each
2. updates all physical cpus synchronously (ie, the MTRR is not
presented as a property of dom0's virtual CPU, but as a
system-wide resource)
3. prevents guests from setting inconsistent or conflicting MTRRs
Mapping from MSR writes to this interface is moderately complex, because
it requires a mapping from a low-semantic-content interface to a
high-semantic-content interface. It essentially requires parsing the
MSR writes to map them back to the relatively high-level operations at
the mtrr_ops interface and then present that to Xen.
There are at least a couple of secondary issues which arise from that
approach:
* mtrr/generic.c also has to do a number of other things like
disabling caching, tlb flushes, etc. That adds complexity because
Xen guests are never allowed to globally disable caching, so we'd
have to add additional filtering to remove those cr0 writes
* As we've discussed, we'd need to make the mtrr writes implicitly
change all cpus atomically, as the dom0 kernel can't see physical cpus
The net effect would be that we would be making a pile of apparently
generic CPU operations (MSR writes, control register writes) actually
feed a fairly complex parser, increasing the difference between the Xen
and native cases even more.
mtrr/generic.c about 730 lines of fairly intricate arch-specific code.
mtrr/xen.c is 120 lines of straightforward hypercalls. The mtrr_ops
interface and the Xen hypercall interface are a close semantic match, so
there's very little glue code in there.
But that said, this a huge distraction, an unbelievable amount of noise
for a fairly minor point. We can live without these changes, and
they're certainly easy enough to carry out of tree in the meantime. If
you can't live with these changes, then drop them and we'll work out
something else.
J
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-19 9:59 ` Ingo Molnar
2009-05-19 10:22 ` [Xen-devel] " Jan Beulich
@ 2009-05-20 16:12 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-20 16:12 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric W. Biederman, Jesse Barnes, the arch/x86 maintainers,
Linux Kernel Mailing List, Xen-devel, Linus Torvalds,
H. Peter Anvin, Thomas Gleixner
Ingo Molnar wrote:
>> That's it. I could add any number of bizarre convolutions to
>> achieve the same effect, but given that there's an existing
>> interface that is exactly designed for what we want to achieve, I
>> have to admit it didn't occur to me to do anything else.
>>
>
> Exactly what is 'bizarre' about using the API defined by the _CPU_
> already, without adding any ad-hoc hypecall? Catch the dom0 WRMSRs,
> filter out the MTRR indices - that's it.
>
Well, the x86 world can't seem to decide what the ABI is supposed to be,
which is why we have mtrr_ops in the first place. Doing emulation at
the MSR level means that I'd need to decide which MTRR interface we're
emulating today and do that.
Yes, I realize that almost everyone is using the same Intel-like
interface these days, but it does mean there's a level of fragility that
doesn't exist if we just implement mtrr_ops.
There's some secondary issues which arise. For example, the mtrr
trimming test is meaningless in dom0 (the e820 is fake, so it doesn't
make sense to compare it with the mtrrs); we currently avoid that
because the test only happens if the mtrr vendor is Intel. We would
need to disable that test some other way.
J
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-18 18:07 ` Jeremy Fitzhardinge
2009-05-19 9:59 ` Ingo Molnar
@ 2009-05-20 8:16 ` Andi Kleen
2009-05-20 16:39 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2009-05-20 8:16 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Ingo Molnar, Xen-devel, the arch/x86 maintainers,
Linux Kernel Mailing List, Jesse Barnes, Eric W. Biederman,
H. Peter Anvin, Thomas Gleixner, Linus Torvalds
Jeremy Fitzhardinge <jeremy@goop.org> writes:
> arch/x86 already defines an mtrr_ops, which defines how to manipulate
> the MTRR registers. There are currently several implementations of
> that interface. In Xen the MTRR registers belong to the hypervisor,
> but it allows a privileged kernel to modify them via hypercalls.
One part that's unclear to me in this discussion. Could you perhaps
clarify Jeremy?:
Even Dom0 is not continuous in physical memory, but mapped page by page
except for swiotlb mappings. But MTRRs are fundamentally a way
to change attributes for large physically continous mappings. How do these
two meet?
After all when you change a MTRR for a given range of memory
linux sees as continuous it isn't necessarily in Xen.
Is this new interface only defined for swiotlb or MMIO mappings?
If yes did you check the drivers only actually set it on
swiotlb or MMIO (that seems dubious to me)?
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-20 8:16 ` Andi Kleen
@ 2009-05-20 16:39 ` Jeremy Fitzhardinge
2009-05-20 22:52 ` Andi Kleen
0 siblings, 1 reply; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-20 16:39 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, Xen-devel, the arch/x86 maintainers,
Linux Kernel Mailing List, Jesse Barnes, Eric W. Biederman,
H. Peter Anvin, Thomas Gleixner, Linus Torvalds
Andi Kleen wrote:
> Jeremy Fitzhardinge <jeremy@goop.org> writes:
>
>
>> arch/x86 already defines an mtrr_ops, which defines how to manipulate
>> the MTRR registers. There are currently several implementations of
>> that interface. In Xen the MTRR registers belong to the hypervisor,
>> but it allows a privileged kernel to modify them via hypercalls.
>>
>
> One part that's unclear to me in this discussion. Could you perhaps
> clarify Jeremy?:
>
> Even Dom0 is not continuous in physical memory, but mapped page by page
> except for swiotlb mappings. But MTRRs are fundamentally a way
> to change attributes for large physically continous mappings. How do these
> two meet?
>
> After all when you change a MTRR for a given range of memory
> linux sees as continuous it isn't necessarily in Xen.
>
> Is this new interface only defined for swiotlb or MMIO mappings?
> If yes did you check the drivers only actually set it on
> swiotlb or MMIO (that seems dubious to me)?
Really? Do we ever set unusual memory types on normal system memory?
From what I've seen, MTRRs are only ever applied to device mapped
memory (framebuffers, etc). I guess it could possibly make sense on
system memory which is being prepped for DMA (swiotlb, alloc_coherent,
etc), but dom0 would have a pseudo-phys to machine mapping for that
memory too (it would be obviously problematic if something tried to
program MTRR with pseudo-physical addresses, but Xen would/should
probably disallow it anyway).
J
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-20 16:39 ` Jeremy Fitzhardinge
@ 2009-05-20 22:52 ` Andi Kleen
2009-05-20 22:49 ` Jeremy Fitzhardinge
2009-05-20 23:03 ` H. Peter Anvin
0 siblings, 2 replies; 48+ messages in thread
From: Andi Kleen @ 2009-05-20 22:52 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Andi Kleen, Ingo Molnar, Xen-devel, the arch/x86 maintainers,
Linux Kernel Mailing List, Jesse Barnes, Eric W. Biederman,
H. Peter Anvin, Thomas Gleixner, Linus Torvalds
> >Is this new interface only defined for swiotlb or MMIO mappings?
> >If yes did you check the drivers only actually set it on
> >swiotlb or MMIO (that seems dubious to me)?
>
> Really? Do we ever set unusual memory types on normal system memory?
There were drivers at some point that set WC MTRRs on their
transfer rings, not mmio.
Maybe these all switched to PAT now, maybe not
It's hard to say because it could be also done from user space.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-20 22:52 ` Andi Kleen
@ 2009-05-20 22:49 ` Jeremy Fitzhardinge
2009-05-20 23:03 ` H. Peter Anvin
1 sibling, 0 replies; 48+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-20 22:49 UTC (permalink / raw)
To: Andi Kleen
Cc: Ingo Molnar, Xen-devel, the arch/x86 maintainers,
Linux Kernel Mailing List, Jesse Barnes, Eric W. Biederman,
H. Peter Anvin, Thomas Gleixner, Linus Torvalds
Andi Kleen wrote:
> There were drivers at some point that set WC MTRRs on their
> transfer rings, not mmio.
>
To avoid cache pollution effects?
> Maybe these all switched to PAT now, maybe not
>
> It's hard to say because it could be also done from user space.
>
How does userspace do the virtual to physical mapping? If the driver
can't also deal with the extra conversion from physical to machine under
Xen, then it won't have much success either way.
J
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [GIT PULL] xen /proc/mtrr implementation
2009-05-20 22:52 ` Andi Kleen
2009-05-20 22:49 ` Jeremy Fitzhardinge
@ 2009-05-20 23:03 ` H. Peter Anvin
1 sibling, 0 replies; 48+ messages in thread
From: H. Peter Anvin @ 2009-05-20 23:03 UTC (permalink / raw)
To: Andi Kleen
Cc: Jeremy Fitzhardinge, Ingo Molnar, Xen-devel,
the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
Eric W. Biederman, Thomas Gleixner, Linus Torvalds
Andi Kleen wrote:
>>> Is this new interface only defined for swiotlb or MMIO mappings?
>>> If yes did you check the drivers only actually set it on
>>> swiotlb or MMIO (that seems dubious to me)?
>> Really? Do we ever set unusual memory types on normal system memory?
>
> There were drivers at some point that set WC MTRRs on their
> transfer rings, not mmio.
>
> Maybe these all switched to PAT now, maybe not
Most of those that I've heard of use WB memory and CLFLUSH.
The ones I know of which marked descriptor rings WB were ones which had
descriptor rings in MMIO space (which makes sense: MMIO reads are fully
synchronizing, but writes can be posted and write combined.)
-hpa
^ permalink raw reply [flat|nested] 48+ messages in thread