Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
@ 2009-05-25  8:42 Tomasz Chmielewski
  2009-05-25  9:15 ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Tomasz Chmielewski @ 2009-05-25  8:42 UTC (permalink / raw)
  To: LKML
  Cc: mingo, Alan, Gerd Hoffmann, jeremy, Andi Kleen, H. Peter Anvin,
	JBeulich, jbarnes

Ingo Molnar wrote:

>>>> Oops, the third "proper technical solutions" is missing.
>>>
>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>>
>> Works only in case the CPU has PAT support.
> 
> Which specific CPU without PAT support do you worry about?

Are these ones (below) good CPU to worry about? That's slightly 
off-topic as they are KVM guests, but still, the guest kernel seems to 
disable PAT (unless I'm mistaken).

KVM-86 + 2.6.27.19 guest:

# dmesg
Initializing cgroup subsys cpuset
Linux version 2.6.27.19-server-1mnb (qateam@titan.mandriva.com) (gcc 
version 4.3.2 (GCC) ) #1 SMP Thu Mar 5 00:09:04 EST 2009
PAT WC disabled due to known CPU erratum.
(...)

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 2
model name      : QEMU Virtual CPU version 0.10.50
stepping        : 3
cpu MHz         : 2133.285
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 syscall lm up pni
bogomips        : 4266.57
clflush size    : 64
power management:

KVM-85 + 2.6.29.1 guest:

# dmesg
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.29.1-server-4mnb (herton@n2.mandriva.com) (gcc version 
4.3.2 (GCC) ) #1 SMP Mon Apr 20 20:10:24 EDT 2009
KERNEL supported cpus:
   Intel GenuineIntel
   AMD AuthenticAMD
   NSC Geode by NSC
   Cyrix CyrixInstead
   Centaur CentaurHauls
   Transmeta GenuineTMx86
   Transmeta TransmetaCPU
   UMC UMC UMC UMC
PAT WC disabled due to known CPU erratum.
(...)

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 2
model name      : QEMU Virtual CPU version 0.10.0
stepping        : 3
cpu MHz         : 2128.004
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 syscall nx lm up pni hypervisor
bogomips        : 4256.00
clflush size    : 64
power management:

-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-25  8:42 [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation Tomasz Chmielewski
@ 2009-05-25  9:15 ` Andi Kleen
  2009-05-25  9:31   ` Jan Beulich
  2009-05-25 16:05   ` H. Peter Anvin
  0 siblings, 2 replies; 20+ messages in thread
From: Andi Kleen @ 2009-05-25  9:15 UTC (permalink / raw)
  To: Tomasz Chmielewski
  Cc: LKML, mingo, Alan, Gerd Hoffmann, jeremy, Andi Kleen,
	H. Peter Anvin, JBeulich, jbarnes

> Are these ones (below) good CPU to worry about? That's slightly 
> off-topic as they are KVM guests, but still, the guest kernel seems to 
> disable PAT (unless I'm mistaken).

To my knowledge the only CPUs with broken PAT are some very old Pentium
Pros dating from back the time when PAT was new and noone used it.
On those it should be disabled or the high bits not used.

The kernel is overall too conservative with its white list.

Today a lot of other common operating systems use PAT extensively,
so in general it works on the hardware level.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-25  9:15 ` Andi Kleen
@ 2009-05-25  9:31   ` Jan Beulich
  2009-05-25  9:47     ` Andi Kleen
  2009-05-25 16:05   ` H. Peter Anvin
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2009-05-25  9:31 UTC (permalink / raw)
  To: Andi Kleen, Tomasz Chmielewski
  Cc: mingo, jeremy, Alan, Gerd Hoffmann, LKML, jbarnes, H. Peter Anvin

>>> Andi Kleen <andi@firstfloor.org> 25.05.09 11:15 >>>
>To my knowledge the only CPUs with broken PAT are some very old Pentium
>Pros dating from back the time when PAT was new and noone used it.
>On those it should be disabled or the high bits not used.

??? See PentiumIII erratum E27.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-25  9:31   ` Jan Beulich
@ 2009-05-25  9:47     ` Andi Kleen
  0 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2009-05-25  9:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andi Kleen, Tomasz Chmielewski, mingo, jeremy, Alan,
	Gerd Hoffmann, LKML, jbarnes, H. Peter Anvin

On Mon, May 25, 2009 at 10:31:03AM +0100, Jan Beulich wrote:
> >>> Andi Kleen <andi@firstfloor.org> 25.05.09 11:15 >>>
> >To my knowledge the only CPUs with broken PAT are some very old Pentium
> >Pros dating from back the time when PAT was new and noone used it.
> >On those it should be disabled or the high bits not used.
> 
> ??? See PentiumIII erratum E27.

Thanks for the correction.  

It's still all pretty old CPUs.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-25  9:15 ` Andi Kleen
  2009-05-25  9:31   ` Jan Beulich
@ 2009-05-25 16:05   ` H. Peter Anvin
  1 sibling, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2009-05-25 16:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Tomasz Chmielewski, LKML, mingo, Alan, Gerd Hoffmann, jeremy,
	JBeulich, jbarnes

Andi Kleen wrote:
> 
> To my knowledge the only CPUs with broken PAT are some very old Pentium
> Pros dating from back the time when PAT was new and noone used it.
> On those it should be disabled or the high bits not used.
> 
> The kernel is overall too conservative with its white list.
> 
> Today a lot of other common operating systems use PAT extensively,
> so in general it works on the hardware level.
> 

Well, we don't use the high bits exactly for this reason.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [GIT PULL] xen /proc/mtrr implementation
@ 2009-05-12 23:27 Jeremy Fitzhardinge
  2009-05-13 13:30 ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel

Hi Ingo,

This series of patches makes /proc/mtrr fully functional under Xen.

Thanks,
	J

The following changes since commit ce791368bb4a53d05e78e1588bac0aacde8db84c:
  Jeremy Fitzhardinge (1):
        xen/i386: make sure initial VGA/ISA mappings are not overridden

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git for-ingo/xen/dom0/mtrr

Jeremy Fitzhardinge (1):
      xen: set cpu_callout_mask to make mtrr work

Mark McLoughlin (5):
      xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has()
      xen mtrr: Use generic_validate_add_page()
      xen mtrr: Implement xen_get_free_region()
      xen mtrr: Add xen_{get,set}_mtrr() implementations
      xen mtrr: Kill some unnecessary includes

 arch/x86/kernel/cpu/mtrr/mtrr.h |    2 +
 arch/x86/kernel/cpu/mtrr/xen.c  |   99 ++++++++++++++++++++++++++++++++-------
 arch/x86/xen/smp.c              |    2 +
 3 files changed, 86 insertions(+), 17 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-12 23:27 Jeremy Fitzhardinge
@ 2009-05-13 13:30 ` Ingo Molnar
  2009-05-13 14:39   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-05-13 13:30 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Hi Ingo,
> 
> This series of patches makes /proc/mtrr fully functional under Xen.
> 
> Thanks,
> 	J
> 
> The following changes since commit ce791368bb4a53d05e78e1588bac0aacde8db84c:
>   Jeremy Fitzhardinge (1):
>         xen/i386: make sure initial VGA/ISA mappings are not overridden
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git for-ingo/xen/dom0/mtrr
> 
> Jeremy Fitzhardinge (1):
>       xen: set cpu_callout_mask to make mtrr work
> 
> Mark McLoughlin (5):
>       xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has()
>       xen mtrr: Use generic_validate_add_page()
>       xen mtrr: Implement xen_get_free_region()
>       xen mtrr: Add xen_{get,set}_mtrr() implementations
>       xen mtrr: Kill some unnecessary includes
> 
>  arch/x86/kernel/cpu/mtrr/mtrr.h |    2 +
>  arch/x86/kernel/cpu/mtrr/xen.c  |   99 ++++++++++++++++++++++++++++++++-------
>  arch/x86/xen/smp.c              |    2 +
>  3 files changed, 86 insertions(+), 17 deletions(-)

i never got a reply to my question for your previous submission:

  http://lkml.indiana.edu/hypermail/linux/kernel/0905.1/00152.html

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-13 13:30 ` Ingo Molnar
@ 2009-05-13 14:39   ` Jeremy Fitzhardinge
  2009-05-15 18:27     ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-13 14:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel

Ingo Molnar wrote:
> i never got a reply to my question for your previous submission:
>
>   http://lkml.indiana.edu/hypermail/linux/kernel/0905.1/00152.html
>   

That was in response to the mtrr patch in the dom0/core series.

> Please dont post patches with ugly TODO items in them.
I removed them in the repost.
> Also, a more general objection is that /proc/mtrr is a legacy
> interface, we dont really want to extend its use.
It's not an extended use; its just making the existing interface work 
under Xen (ie, not breaking the userspace ABI).  The only other 
alternatives would be to 1) use Kconfig to prevent MTRR and Xen from 
being set at the same time, or 2) put a runtime hack in to disable MTRR 
when running under Xen.  Neither seems like a good idea when we can just 
keep the interface working.
> The Xen hypervisor
> should get proper PAT support instead ...

Well, it has PAT support, but there's an issue that the Xen PAT setup 
isn't quite the same as Linux's (but I thought you were cc:d on the 
discussion about that).  We need to sort out some details about the 
precise mechanism, but it looks like we'll be able to support PAT in 
Linux guests relatively easily (but not immediately).

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-13 14:39   ` Jeremy Fitzhardinge
@ 2009-05-15 18:27     ` Ingo Molnar
  2009-05-15 20:09       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-05-15 18:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>> i never got a reply to my question for your previous submission:
>>
>>   http://lkml.indiana.edu/hypermail/linux/kernel/0905.1/00152.html
>>   
>
> That was in response to the mtrr patch in the dom0/core series.
>
>> Please dont post patches with ugly TODO items in them.
> I removed them in the repost.
>> Also, a more general objection is that /proc/mtrr is a legacy
>> interface, we dont really want to extend its use.
> It's not an extended use; its just making the existing interface work  
> under Xen (ie, not breaking the userspace ABI).  The only other  
> alternatives would be to 1) use Kconfig to prevent MTRR and Xen from  
> being set at the same time, or 2) put a runtime hack in to disable MTRR  
> when running under Xen.  Neither seems like a good idea when we can just  
> keep the interface working.

Right now there's no MTRR support under Xen guests and the Xen 
hypervisor was able to survive, right? Why should we do it under 
dom0?

The MTRR code is extremely fragile, we dont really need an added 
layer there. _Especially_ since /proc/mtrr is an obsolete API.

If you want to allow a guest to do MTRR ops, you can do it by 
catching the native kernel accesses to the MTRR space. There's no 
guest side support needed for that.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-15 18:27     ` Ingo Molnar
@ 2009-05-15 20:09       ` Jeremy Fitzhardinge
  2009-05-15 23:26         ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-15 20:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel

Ingo Molnar wrote:
> Right now there's no MTRR support under Xen guests and the Xen 
> hypervisor was able to survive, right? Why should we do it under 
> dom0?
>   

Because dom0 has direct hardware access, and is running real device 
drivers.  domU guests don't see physical memory, and so MTRR has no 
relevance for them.

> The MTRR code is extremely fragile, we dont really need an added 
> layer there. _Especially_ since /proc/mtrr is an obsolete API.
>   

There's no added layer there.  I'm just adding another implementation of 
mtrr_ops.

/proc/mtrr is in wide use today.  It may be planned for obsolescence, 
but there's no way you can claim its obsolete today (my completely 
up-to-date F10 X server is using it, for example).  We don't break 
oldish usermode ABIs in new kernels.

Besides, the MTRR code is also a kernel-internal API, used by DRM and 
other drivers to configure the system MTRR state.  Those drivers will 
either perform badly or outright fail if they can't set the appropriate 
cachability properties.  That is not obsolete in any way.

> If you want to allow a guest to do MTRR ops, you can do it by 
> catching the native kernel accesses to the MTRR space. There's no 
> guest side support needed for that.
>   

MTRR can't be virtualized like that.  It can't be meaningfully 
multiplexed, and must be set in a uniform way on all physical CPUs.  
Guests run on virtual CPUs, and don't have any knowledge of what the 
mapping of VCPU to PCPU is, or even any visibility of all PCPUs.

It is not a piece of per-guest state; it is system-wide property, 
maintained by Xen.  These patches add the mechanism for dom0 (=hardware 
control domain) to tell Xen what state they should be in.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-15 20:09       ` Jeremy Fitzhardinge
@ 2009-05-15 23:26         ` Eric W. Biederman
  2009-05-15 23:49           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2009-05-15 23:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Xen-devel

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Ingo Molnar wrote:
>> Right now there's no MTRR support under Xen guests and the Xen hypervisor was
>> able to survive, right? Why should we do it under dom0?
>>   
>
> Because dom0 has direct hardware access, and is running real device drivers.
> domU guests don't see physical memory, and so MTRR has no relevance for them.


>> The MTRR code is extremely fragile, we dont really need an added layer
>> there. _Especially_ since /proc/mtrr is an obsolete API.
>>   
>
> There's no added layer there.  I'm just adding another implementation of
> mtrr_ops.
>
> /proc/mtrr is in wide use today.  It may be planned for obsolescence, but
> there's no way you can claim its obsolete today (my completely up-to-date F10 X
> server is using it, for example).  We don't break oldish usermode ABIs in new
> kernels.

Sure it is.  There is a better newer replacement.  It is taking a while to
get userspace transitioned but that is different.  Honestly I am puzzled
why that it but whatever.

> Besides, the MTRR code is also a kernel-internal API, used by DRM and other
> drivers to configure the system MTRR state.  Those drivers will either perform
> badly or outright fail if they can't set the appropriate cachability properties.
> That is not obsolete in any way.

There are about 5 of them so let's fix them.
With PAT we are in a much better position both for portability and for
flexibility.

>> If you want to allow a guest to do MTRR ops, you can do it by catching the
>> native kernel accesses to the MTRR space. There's no guest side support needed
>> for that.
>>   
>
> MTRR can't be virtualized like that.  It can't be meaningfully multiplexed, and
> must be set in a uniform way on all physical CPUs.  Guests run on virtual CPUs,
> and don't have any knowledge of what the mapping of VCPU to PCPU is, or even any
> visibility of all PCPUs.
>
> It is not a piece of per-guest state; it is system-wide property, maintained by
> Xen.  These patches add the mechanism for dom0 (=hardware control domain) to
> tell Xen what state they should be in.

Is it possible to fix PAT and get that working first.   That is very definitely
the preferend API.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-15 23:26         ` Eric W. Biederman
@ 2009-05-15 23:49           ` Jeremy Fitzhardinge
  2009-05-16  3:22             ` Jesse Barnes
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-15 23:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, the arch/x86 maintainers, Linux Kernel Mailing List,
	Xen-devel

Eric W. Biederman wrote:
>> /proc/mtrr is in wide use today.  It may be planned for obsolescence, but
>> there's no way you can claim its obsolete today (my completely up-to-date F10 X
>> server is using it, for example).  We don't break oldish usermode ABIs in new
>> kernels.
>>     
>
> Sure it is.  There is a better newer replacement.  It is taking a while to
> get userspace transitioned but that is different.  Honestly I am puzzled
> why that it but whatever.
>   

There's no mention in feature-removal-schedule.txt.

>> Besides, the MTRR code is also a kernel-internal API, used by DRM and other
>> drivers to configure the system MTRR state.  Those drivers will either perform
>> badly or outright fail if they can't set the appropriate cachability properties.
>> That is not obsolete in any way.
>>     
>
> There are about 5 of them so let's fix them.
>   

Well, I count at least 30+, but anyway.

> With PAT we are in a much better position both for portability and for
> flexibility.
>   

PAT is relatively recent, and even more recently bug-free.  There are 
many people with processors which can't or won't do PAT; what's the plan 
to support them?  Just hit them with a performance regression?  Or wrap 
MTRR in some other API?

> Is it possible to fix PAT and get that working first.   That is very definitely
> the preferend API.
>   

Sure, when available.  We're sorting out the details for Xen, but even 
then it may not be available, either because we're running on an old 
version of Xen, or because some other guest is using PAT differently.

But I honestly don't understand the hostility towards 120 lines of code 
to make an interface (albeit legacy/deprecated/whatever) behave in an 
expected way.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-15 23:49           ` Jeremy Fitzhardinge
@ 2009-05-16  3:22             ` Jesse Barnes
  2009-05-16  4:26               ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: Jesse Barnes @ 2009-05-16  3:22 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric W. Biederman, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel

On Fri, 15 May 2009 16:49:12 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Eric W. Biederman wrote:
> >> /proc/mtrr is in wide use today.  It may be planned for
> >> obsolescence, but there's no way you can claim its obsolete today
> >> (my completely up-to-date F10 X server is using it, for example).
> >> We don't break oldish usermode ABIs in new kernels.
> >>     
> >
> > Sure it is.  There is a better newer replacement.  It is taking a
> > while to get userspace transitioned but that is different.
> > Honestly I am puzzled why that it but whatever.
> >   
> 
> There's no mention in feature-removal-schedule.txt.
> 
> >> Besides, the MTRR code is also a kernel-internal API, used by DRM
> >> and other drivers to configure the system MTRR state.  Those
> >> drivers will either perform badly or outright fail if they can't
> >> set the appropriate cachability properties. That is not obsolete
> >> in any way. 
> >
> > There are about 5 of them so let's fix them.
> >   
> 
> Well, I count at least 30+, but anyway.
> 
> > With PAT we are in a much better position both for portability and
> > for flexibility.
> >   
> 
> PAT is relatively recent, and even more recently bug-free.  There are 
> many people with processors which can't or won't do PAT; what's the
> plan to support them?  Just hit them with a performance regression?
> Or wrap MTRR in some other API?
> 
> > Is it possible to fix PAT and get that working first.   That is
> > very definitely the preferend API.
> >   
> 
> Sure, when available.  We're sorting out the details for Xen, but
> even then it may not be available, either because we're running on an
> old version of Xen, or because some other guest is using PAT
> differently.
> 
> But I honestly don't understand the hostility towards 120 lines of
> code to make an interface (albeit legacy/deprecated/whatever) behave
> in an expected way.

FWIW I think supporting the MTRR API in Xen makes sense.  There's a lot
of old code out there that wants it; would be nice if it mostly worked,
especially at such a minimal cost.  It's taken awhile to get PAT going
(and there are still issues here and there) so having the MTRR stuffa
available is awfully nice.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-16  3:22             ` Jesse Barnes
@ 2009-05-16  4:26               ` Eric W. Biederman
  2009-05-18  4:57                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2009-05-16  4:26 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Jeremy Fitzhardinge, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel

Jesse Barnes <jbarnes@virtuousgeek.org> writes:

> On Fri, 15 May 2009 16:49:12 -0700
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>> Eric W. Biederman wrote:
>> >> /proc/mtrr is in wide use today.  It may be planned for
>> >> obsolescence, but there's no way you can claim its obsolete today
>> >> (my completely up-to-date F10 X server is using it, for example).
>> >> We don't break oldish usermode ABIs in new kernels.
>> >>     
>> >
>> > Sure it is.  There is a better newer replacement.  It is taking a
>> > while to get userspace transitioned but that is different.
>> > Honestly I am puzzled why that it but whatever.
>> >   
>> 
>> There's no mention in feature-removal-schedule.txt.

I don't know that it makes sense to remove mtrrs but it certainly
doesn't make sense to use them if you can avoid it.

>> >> Besides, the MTRR code is also a kernel-internal API, used by DRM
>> >> and other drivers to configure the system MTRR state.  Those
>> >> drivers will either perform badly or outright fail if they can't
>> >> set the appropriate cachability properties. That is not obsolete
>> >> in any way. 
>> >
>> > There are about 5 of them so let's fix them.
>> >   
>> 
>> Well, I count at least 30+, but anyway.

Wow.  We had a lot of those slip in.  Definitely time to fix the
drivers.

>> > With PAT we are in a much better position both for portability and
>> > for flexibility.
>> >   
>> 
>> PAT is relatively recent, and even more recently bug-free.  There are 
>> many people with processors which can't or won't do PAT; what's the
>> plan to support them?  Just hit them with a performance regression?
>> Or wrap MTRR in some other API?

PPro is roughly when PAT came out.  I remember discussing this a while
ago and the conclusion was that there are very few systems with MTRRs
that don't have a usable PAT implementation.  I expect many of those
systems are on their last legs today.

>> Sure, when available.  We're sorting out the details for Xen, but
>> even then it may not be available, either because we're running on an
>> old version of Xen, or because some other guest is using PAT
>> differently.

There are only 3 states that are interesting.  WB UC and WC.  Since
Xen controls the page tables anyway.  I expect it can even remap
it feels like it.

>> But I honestly don't understand the hostility towards 120 lines of
>> code to make an interface (albeit legacy/deprecated/whatever) behave
>> in an expected way.

> FWIW I think supporting the MTRR API in Xen makes sense.  There's a lot
> of old code out there that wants it; would be nice if it mostly worked,
> especially at such a minimal cost.  It's taken awhile to get PAT going
> (and there are still issues here and there) so having the MTRR stuffa
> available is awfully nice.

I won't argue that having MTRRs when you can makes sense.  It is a bit
weird in a vitalized system.  At a practical level there are an
increasing number of systems for which MTRRs are unusable because the
BIOS sets up overlapping mtrrs.  With cheap entry level systems
shipping with 4G I expect it is becoming a majority of systems.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-16  4:26               ` Eric W. Biederman
@ 2009-05-18  4:57                 ` Jeremy Fitzhardinge
  2009-05-18  8:59                   ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-18  4:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jesse Barnes, Ingo Molnar, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel

Eric W. Biederman wrote:
> There are only 3 states that are interesting.  WB UC and WC.  Since
> Xen controls the page tables anyway.  I expect it can even remap
> it feels like it.
>   

It would be awkward.  A paravirtualized guest has direct access to the 
real pagetables, and so would notice if Xen swizzled around the PAT bits 
when it reads back a pagetable entry.  We don't currently have any 
paravirtualized hooks for adjusting the PTE flags, because there hasn't 
been any need, and it would probably be pretty costly (lots of 
read+bit-tests would turn into a function call).  On the other hand, 
there's probably only a few places (if any) in the kernel which actually 
inspect the PAT status of an established PTE, so we could put in some 
special case mapping there.  It becomes a maintenance burden to 1) track 
down all the right places, and then 2) make sure any new instances get 
handled properly.  So, not a preferred solution, I think.

But our planned approach is to simply make Xen use the same PAT layout 
as Linux, and go from there.  We still need to sort out the details of 
how to handle other Xen guests which use the existing Xen PAT setup, how 
to verify that Xen and the guest kernel are really using the same setup, 
etc.

But since we support the last few year's worth of released versions of 
Xen, we still need to handle the PAT-not-supported case with reasonable 
grace.

> I won't argue that having MTRRs when you can makes sense.  It is a bit
> weird in a vitalized system.

It's not really virtualized.  We're talking about dom0, which is the 
guest domain which has access to the real machine's real hardware; the 
MTRR is part of that.

>   At a practical level there are an
> increasing number of systems for which MTRRs are unusable because the
> BIOS sets up overlapping mtrrs.  With cheap entry level systems
> shipping with 4G I expect it is becoming a majority of systems.
>   

Yes, but that is true irrespective of Xen.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-18  4:57                 ` Jeremy Fitzhardinge
@ 2009-05-18  8:59                   ` Ingo Molnar
  2009-05-18 13:17                     ` [Xen-devel] " Jan Beulich
  2009-05-18 18:07                     ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-05-18  8:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric W. Biederman, Jesse Barnes, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel, Linus Torvalds,
	H. Peter Anvin, Thomas Gleixner

* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> It's not really virtualized.  We're talking about dom0, which is 
> the guest domain which has access to the real machine's real 
> hardware; the MTRR is part of that.

That is a really broken model and design of virtualization: 
splitting the hypervisor into Xen and then a separate Linux dom0 
entity because reality called home a few years ago and you needed 
actual working drivers and hardware support and a developer 
community to pull that off ...

Here Xen invades an already fragile piece of upstream code 
(/proc/mtrr) that is obsolete and on the way out. If you want a 
solution you should add PAT support to Xen and you should use recent 
upstream kernels. Or you should emulate /proc/mtrr in _Xen the 
hypervisor_, if you really care that much - without increasing the 
amount of crap in Linux.

Without a better reason than what you've given so far the answer is 
really: "no thanks" ...

My suspicion is that Linus would (rightfully) refuse to pull such a 
broken approach from me, so why should i pull it? If i'm wrong and 
if you can get an Acked-by from Linus _before_ sending a pull 
request we can override my NAK. I've Cc:-ed him, in case he wants to 
express an opinion.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-18  8:59                   ` Ingo Molnar
@ 2009-05-18 13:17                     ` Jan Beulich
  2009-05-18 18:07                     ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2009-05-18 13:17 UTC (permalink / raw)
  To: Ingo Molnar, Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Thomas Gleixner, Linus Torvalds,
	Xen-devel, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin

>>> Ingo Molnar <mingo@elte.hu> 18.05.09 10:59 >>>
>Here Xen invades an already fragile piece of upstream code 
>(/proc/mtrr) that is obsolete and on the way out. If you want a 
>solution you should add PAT support to Xen and you should use recent 

As Jeremy pointed out a number of times, Xen *does* have PAT support,
perhaps (that's my personal opinion) even superior to the Linux one, as
it doesn't redefine the 486-inherited caching mode attributes but rather
uses the full 3 bits that the hardware provides (and, as an
acknowledgement to the various hardware bugs, makes sure not to use
any large page mappings when using non-WB mappings).

>upstream kernels. Or you should emulate /proc/mtrr in _Xen the 
>hypervisor_, if you really care that much - without increasing the 
>amount of crap in Linux.

As Jeremy also pointed out previously, emulating the MTRRs in the
hypervisor is very undesirable (and technically at least very close to
impossible), as we're talking about the *real* MTTRs that need managing
here (whereas dealing with virtualized MTRRs in a fully virtualized guest
is a completely different - and very reasonable - thing).

I can only support Jeremy in asking that you please reconsider your NAK.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-18  8:59                   ` Ingo Molnar
  2009-05-18 13:17                     ` [Xen-devel] " Jan Beulich
@ 2009-05-18 18:07                     ` Jeremy Fitzhardinge
  2009-05-19  9:59                       ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-18 18:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric W. Biederman, Jesse Barnes, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel, Linus Torvalds,
	H. Peter Anvin, Thomas Gleixner

Ingo Molnar wrote:
> Here Xen invades an already fragile piece of upstream code 
> (/proc/mtrr) that is obsolete and on the way out. If you want a 
> solution you should add PAT support to Xen and you should use recent 
> upstream kernels. Or you should emulate /proc/mtrr in _Xen the 
> hypervisor_, if you really care that much - without increasing the 
> amount of crap in Linux.
>   

That's a gross mis-characterisation of what we're talking about here.

arch/x86 already defines an mtrr_ops, which defines how to manipulate 
the MTRR registers.  There are currently several implementations of that 
interface.  In Xen the MTRR registers belong to the hypervisor, but it 
allows a privileged kernel to modify them via hypercalls.  I simply 
added a new, straightforward mtrr_ops implementation to do that.  It 
adds about 120 lines of new code, in a single mtrr/xen.c file.

That's it.  I could add any number of bizarre convolutions to achieve 
the same effect, but given that there's an existing interface that is 
exactly designed for what we want to achieve, I have to admit it didn't 
occur to me to do anything else.

MTRR may well be on its way out, and PAT is the proper way to achieve 
the same effect from now on.  But MTRR is still a widely used 
kernel-internal API as well as usermode ABI, and it seems just 
gratuitous to not support it when doing so is such a low-impact change.  
And if/when it comes to be time to completely deprecate/remove mtrr 
support, we can delete it along with everything else.

I'm honestly at a loss to explain the vehemence of your objection to 
these changes given their nature.

We're also working on making PAT support work where possible.  But that 
doesn't mean we want to do without anything in the meantime.

But more generally, we need to work out how to move things forward.

That said, we can live without.  If these MTRR patches are your biggest 
concern, drop them, pull the rest so we can get them seen, tested, in 
linux-next, etc, ready for the next merge window.  You know, like you 
said you wanted to do the last time you blocked the Xen patches.

I would prefer to move them through tip.git: I appreciate your 
(constructive) comments and reviews, the testing infrastructure has 
proved very valuable in the past, and they'll generally get wider 
exposure than my pool of testers.  And its just the right way to do it.

But if you're so concerned that they'll simply give Linus more 
ammunition to beat you up with, to the extent that you're 
second-guessing yourself, then I'm perfectly happy to submit them 
myself.  I don't think it would be an overall better outcome, but it 
keeps the heat off you, and we'd be making some progress.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-18 18:07                     ` Jeremy Fitzhardinge
@ 2009-05-19  9:59                       ` Ingo Molnar
  2009-05-19 10:22                         ` [Xen-devel] " Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19  9:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric W. Biederman, Jesse Barnes, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel, Linus Torvalds,
	H. Peter Anvin, Thomas Gleixner


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>> Here Xen invades an already fragile piece of upstream code  
>> (/proc/mtrr) that is obsolete and on the way out. If you want a  
>> solution you should add PAT support to Xen and you should use recent  
>> upstream kernels. Or you should emulate /proc/mtrr in _Xen the  
>> hypervisor_, if you really care that much - without increasing the  
>> amount of crap in Linux.
>>   
>
> That's a gross mis-characterisation of what we're talking about here.
>
> arch/x86 already defines an mtrr_ops, which defines how to 
> manipulate the MTRR registers.  There are currently several 
> implementations of that interface.  In Xen the MTRR registers 
> belong to the hypervisor, but it allows a privileged kernel to 
> modify them via hypercalls.  I simply added a new, straightforward 
> mtrr_ops implementation to do that.  It adds about 120 lines of 
> new code, in a single mtrr/xen.c file.
>
> That's it.  I could add any number of bizarre convolutions to 
> achieve the same effect, but given that there's an existing 
> interface that is exactly designed for what we want to achieve, I 
> have to admit it didn't occur to me to do anything else.

Exactly what is 'bizarre' about using the API defined by the _CPU_ 
already, without adding any ad-hoc hypecall? Catch the dom0 WRMSRs, 
filter out the MTRR indices - that's it.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19  9:59                       ` Ingo Molnar
@ 2009-05-19 10:22                         ` Jan Beulich
  2009-05-19 11:08                           ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2009-05-19 10:22 UTC (permalink / raw)
  To: Ingo Molnar, Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Thomas Gleixner, Linus Torvalds,
	Xen-devel, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin

>>> Ingo Molnar <mingo@elte.hu> 19.05.09 11:59 >>>
>Exactly what is 'bizarre' about using the API defined by the _CPU_ 
>already, without adding any ad-hoc hypecall? Catch the dom0 WRMSRs, 
>filter out the MTRR indices - that's it.

But that is *not* the same as using the hypercalls: The hypercall tells Xen
"Change all CPUs' MTRRs with the indicated index to the indicated value",
while the MSR write says "Change the MTRR with the given index on the
physical CPU the current virtual CPU happens to run on to the given value".
A write-base/write-mask pair may happen to get interrupted (preempted)
by the hypervisor, and hence the two writes may happen on different
pCPU-s. Teaching the hypervisor to (correctly!) guess what the guest
meant in that situation isn't trivial, as then it needs to handle all possible
situations (and it can never know whether Dom0 really intended to do
something that may look bogus/inconsistent at the first glance). This is
even more so considering that Linux is not the only OS capable of
running as Dom0.

An apparently relatively simple solution - latch the writes and commit them
only when the global view became consistent again - isn't possible,
since Dom0 may not have a vCPU running on each individual pCPU.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 10:22                         ` [Xen-devel] " Jan Beulich
@ 2009-05-19 11:08                           ` Ingo Molnar
  2009-05-19 12:04                             ` Gerd Hoffmann
  2009-05-20 16:35                             ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 11:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jeremy Fitzhardinge, the arch/x86 maintainers, Thomas Gleixner,
	Linus Torvalds, Xen-devel, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin


* Jan Beulich <JBeulich@novell.com> wrote:

> >>> Ingo Molnar <mingo@elte.hu> 19.05.09 11:59 >>>
>
> > Exactly what is 'bizarre' about using the API defined by the 
> > _CPU_ already, without adding any ad-hoc hypecall? Catch the 
> > dom0 WRMSRs, filter out the MTRR indices - that's it.
> 
> But that is *not* the same as using the hypercalls: The hypercall 
> tells Xen "Change all CPUs' MTRRs with the indicated index to the 
> indicated value", while the MSR write says "Change the MTRR with 
> the given index on the physical CPU the current virtual CPU 
> happens to run on to the given value". [...]

The change of MTRR's on _any_ of the guest CPUs in a dom0 context 
should immediately be refected on all CPUs. Assymetric MTRR settings 
are madness.

( And the thing is, changing MTRRs is fragile and racy on native 
  Linux no matter what - even without any hypervisors - due to SMM 
  contexts possibly relying on them etc. )

> [...] A write-base/write-mask pair may happen to get interrupted 
> (preempted) by the hypervisor, and hence the two writes may happen 
> on different pCPU-s. Teaching the hypervisor to (correctly!) guess 
> what the guest meant in that situation isn't trivial, as then it 
> needs to handle all possible situations (and it can never know 
> whether Dom0 really intended to do something that may look 
> bogus/inconsistent at the first glance). [...]

None of this is a problem really if a sane approach is used: a 
change to the MTRR state on dom0 is applied symmetrically on all 
CPUs.

Or, alternatively, the hypervisor can expose its own administrative 
interface to manage MTRRs.

There's no need to fuglify the Linux kernel for that.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 11:08                           ` Ingo Molnar
@ 2009-05-19 12:04                             ` Gerd Hoffmann
  2009-05-19 12:26                               ` Ingo Molnar
  2009-05-20 16:35                             ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 12:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

On 05/19/09 13:08, Ingo Molnar wrote:
> Or, alternatively, the hypervisor can expose its own administrative
> interface to manage MTRRs.

Guess what?  Xen does exactly that.  And the xen mtrr_ops implementation 
uses that interface ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 12:04                             ` Gerd Hoffmann
@ 2009-05-19 12:26                               ` Ingo Molnar
  2009-05-19 12:32                                 ` Alan Cox
  2009-05-19 13:21                                 ` Gerd Hoffmann
  0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 12:26 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds


* Gerd Hoffmann <kraxel@redhat.com> wrote:

> On 05/19/09 13:08, Ingo Molnar wrote:
>> Or, alternatively, the hypervisor can expose its own administrative
>> interface to manage MTRRs.
>
> Guess what?  Xen does exactly that.  And the xen mtrr_ops 
> implementation uses that interface ...

No, that is not an 'administrative interface' - that is a guest 
kernel level hack that complicates Linux, extends its effective ABI 
dependencies and which has to be maintained there from that point 
on.

There's really just three proper technical solutions here:

- either catch the lowlevel CPU hw ops (the MSR modifications, which
  isnt really all that different from the mtrr_ops approach so it
  shouldnt pose undue difficulties to the Xen hypervisor). That will
  be maximally transparent and compatible, with zero changes needed 
  to the Linux kernel.

- or introduce its own hypercall API based administration API,
  without bothering the guest kernel with crap. Trivially patch Xorg
  to make use of it and that's it.

There is absolutely no reason to introduce some intermediate crap 
into Linux.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 12:26                               ` Ingo Molnar
@ 2009-05-19 12:32                                 ` Alan Cox
  2009-05-19 12:37                                   ` Ingo Molnar
  2009-05-19 13:21                                 ` Gerd Hoffmann
  1 sibling, 1 reply; 20+ messages in thread
From: Alan Cox @ 2009-05-19 12:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gerd Hoffmann, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Jan Beulich, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Eric W. Biederman

> - or introduce its own hypercall API based administration API,
>   without bothering the guest kernel with crap. Trivially patch Xorg
>   to make use of it and that's it.

PCI pass through mixed with that isn't going to be fun and PCI pass
through is one area where a lot of the MTRR manipulation is meaningful
and valid (and can be handled for the guest).

I really don't see why rd/wrmsr processing isn't sufficient for this

Alan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 12:32                                 ` Alan Cox
@ 2009-05-19 12:37                                   ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 12:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: Gerd Hoffmann, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Jan Beulich, H. Peter Anvin, Thomas Gleixner, Linus Torvalds,
	Eric W. Biederman


* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > - or introduce its own hypercall API based administration API,
> >   without bothering the guest kernel with crap. Trivially patch Xorg
> >   to make use of it and that's it.
> 
> PCI pass through mixed with that isn't going to be fun and PCI 
> pass through is one area where a lot of the MTRR manipulation is 
> meaningful and valid (and can be handled for the guest).

virtualizing that aspect of the CPU properly is certainly the 
highest grade approach.

> I really don't see why rd/wrmsr processing isn't sufficient for 
> this

/me neither.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 12:26                               ` Ingo Molnar
  2009-05-19 12:32                                 ` Alan Cox
@ 2009-05-19 13:21                                 ` Gerd Hoffmann
  2009-05-19 13:31                                   ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 13:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

On 05/19/09 14:26, Ingo Molnar wrote:
> * Gerd Hoffmann<kraxel@redhat.com>  wrote:
>
>> On 05/19/09 13:08, Ingo Molnar wrote:
>>> Or, alternatively, the hypervisor can expose its own administrative
>>> interface to manage MTRRs.
>> Guess what?  Xen does exactly that.  And the xen mtrr_ops
>> implementation uses that interface ...
>
> No, that is not an 'administrative interface' - that is a guest
> kernel level hack that complicates Linux, extends its effective ABI
> dependencies and which has to be maintained there from that point
> on.
>
> There's really just three proper technical solutions here:
>
> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>    isnt really all that different from the mtrr_ops approach so it
>    shouldnt pose undue difficulties to the Xen hypervisor).

Devil is in the details.

The dom0 kernel might not see all physical cpus on the system.  So Xen 
can't leave the job of looping over all cpus to the dom0 kernel, Xen has 
to apply the changes made by the (priviledged) guest kernel on any 
(virtual) cpu to all (physical) cpus in the machine.

Which in turn means the "lowlevel cpu hw op" would work in a slightly 
different way on Xen and native.  Nasty.

>    That will
>    be maximally transparent and compatible, with zero changes needed
>    to the Linux kernel.

No, the linux kernel probably should do the wrmsr on one cpu only then.

> - or introduce its own hypercall API based administration API,
>    without bothering the guest kernel with crap. Trivially patch Xorg
>    to make use of it and that's it.

I have serious doubts that this is going to fly with KMS.

Oops, the third "proper technical solutions" is missing.

cheers,
   Gerd


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 13:21                                 ` Gerd Hoffmann
@ 2009-05-19 13:31                                   ` Ingo Molnar
  2009-05-19 13:51                                     ` Gerd Hoffmann
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 13:31 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds


* Gerd Hoffmann <kraxel@redhat.com> wrote:

> On 05/19/09 14:26, Ingo Molnar wrote:
>> * Gerd Hoffmann<kraxel@redhat.com>  wrote:
>>
>>> On 05/19/09 13:08, Ingo Molnar wrote:
>>>> Or, alternatively, the hypervisor can expose its own administrative
>>>> interface to manage MTRRs.
>>> Guess what?  Xen does exactly that.  And the xen mtrr_ops
>>> implementation uses that interface ...
>>
>> No, that is not an 'administrative interface' - that is a guest
>> kernel level hack that complicates Linux, extends its effective ABI
>> dependencies and which has to be maintained there from that point
>> on.
>>
>> There's really just three proper technical solutions here:
>>
>> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>>    isnt really all that different from the mtrr_ops approach so it
>>    shouldnt pose undue difficulties to the Xen hypervisor).
>
> Devil is in the details.
>
> The dom0 kernel might not see all physical cpus on the system.  So 
> Xen can't leave the job of looping over all cpus to the dom0 
> kernel, Xen has to apply the changes made by the (priviledged) 
> guest kernel on any (virtual) cpu to all (physical) cpus in the 
> machine.

Applying MTRR changes to only part of the CPUs is utter madness.

> Which in turn means the "lowlevel cpu hw op" would work in a 
> slightly different way on Xen and native.  Nasty.
>
>>    That will
>>    be maximally transparent and compatible, with zero changes needed
>>    to the Linux kernel.
>
> No, the linux kernel probably should do the wrmsr on one cpu only then.

Why?

>> - or introduce its own hypercall API based administration API,
>>    without bothering the guest kernel with crap. Trivially patch Xorg
>>    to make use of it and that's it.
>
> I have serious doubts that this is going to fly with KMS.
>
> Oops, the third "proper technical solutions" is missing.

Yeah, the third one is to not touch MTRRs after bootup and use PAT.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 13:31                                   ` Ingo Molnar
@ 2009-05-19 13:51                                     ` Gerd Hoffmann
  2009-05-19 14:17                                       ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 13:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

On 05/19/09 15:31, Ingo Molnar wrote:
> * Gerd Hoffmann<kraxel@redhat.com>  wrote:
>
>> On 05/19/09 14:26, Ingo Molnar wrote:
>>> * Gerd Hoffmann<kraxel@redhat.com>   wrote:
>>>
>>>> On 05/19/09 13:08, Ingo Molnar wrote:
>>>>> Or, alternatively, the hypervisor can expose its own administrative
>>>>> interface to manage MTRRs.
>>>> Guess what?  Xen does exactly that.  And the xen mtrr_ops
>>>> implementation uses that interface ...
>>> No, that is not an 'administrative interface' - that is a guest
>>> kernel level hack that complicates Linux, extends its effective ABI
>>> dependencies and which has to be maintained there from that point
>>> on.
>>>
>>> There's really just three proper technical solutions here:
>>>
>>> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>>>     isnt really all that different from the mtrr_ops approach so it
>>>     shouldnt pose undue difficulties to the Xen hypervisor).
>> Devil is in the details.
>>
>> The dom0 kernel might not see all physical cpus on the system.  So
>> Xen can't leave the job of looping over all cpus to the dom0
>> kernel, Xen has to apply the changes made by the (priviledged)
>> guest kernel on any (virtual) cpu to all (physical) cpus in the
>> machine.
>
> Applying MTRR changes to only part of the CPUs is utter madness.

Sure.  Do you read what I'm writing?

>> Which in turn means the "lowlevel cpu hw op" would work in a
>> slightly different way on Xen and native.  Nasty.
>>
>>>     That will
>>>     be maximally transparent and compatible, with zero changes needed
>>>     to the Linux kernel.
>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>
> Why?

See above.  Xen has to apply the changes to all cpus anyway.

>> Oops, the third "proper technical solutions" is missing.
>
> Yeah, the third one is to not touch MTRRs after bootup and use PAT.

Works only in case the CPU has PAT support.

cheers,
   Gerd


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 13:51                                     ` Gerd Hoffmann
@ 2009-05-19 14:17                                       ` Ingo Molnar
  2009-05-19 14:55                                         ` Gerd Hoffmann
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 14:17 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds


* Gerd Hoffmann <kraxel@redhat.com> wrote:

> On 05/19/09 15:31, Ingo Molnar wrote:
>> * Gerd Hoffmann<kraxel@redhat.com>  wrote:
>>
>>> On 05/19/09 14:26, Ingo Molnar wrote:
>>>> * Gerd Hoffmann<kraxel@redhat.com>   wrote:
>>>>
>>>>> On 05/19/09 13:08, Ingo Molnar wrote:
>>>>>> Or, alternatively, the hypervisor can expose its own administrative
>>>>>> interface to manage MTRRs.
>>>>> Guess what?  Xen does exactly that.  And the xen mtrr_ops
>>>>> implementation uses that interface ...
>>>> No, that is not an 'administrative interface' - that is a guest
>>>> kernel level hack that complicates Linux, extends its effective ABI
>>>> dependencies and which has to be maintained there from that point
>>>> on.
>>>>
>>>> There's really just three proper technical solutions here:
>>>>
>>>> - either catch the lowlevel CPU hw ops (the MSR modifications, which
>>>>     isnt really all that different from the mtrr_ops approach so it
>>>>     shouldnt pose undue difficulties to the Xen hypervisor).
>>> Devil is in the details.
>>>
>>> The dom0 kernel might not see all physical cpus on the system.  So
>>> Xen can't leave the job of looping over all cpus to the dom0
>>> kernel, Xen has to apply the changes made by the (priviledged)
>>> guest kernel on any (virtual) cpu to all (physical) cpus in the
>>> machine.
>>
>> Applying MTRR changes to only part of the CPUs is utter madness.
>
> Sure.  Do you read what I'm writing?
>
>>> Which in turn means the "lowlevel cpu hw op" would work in a
>>> slightly different way on Xen and native.  Nasty.
>>>
>>>>     That will
>>>>     be maximally transparent and compatible, with zero changes needed
>>>>     to the Linux kernel.
>>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>>
>> Why?
>
> See above.  Xen has to apply the changes to all cpus anyway.

do _you_ read what i wrote, in the thread you are replying to:

|
| The change of MTRR's on _any_ of the guest CPUs in a dom0 context 
| should immediately be refected on all CPUs. Assymetric MTRR 
| settings are madness.
|


>>> Oops, the third "proper technical solutions" is missing.
>>
>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>
> Works only in case the CPU has PAT support.

Which specific CPU without PAT support do you worry about?

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 14:17                                       ` Ingo Molnar
@ 2009-05-19 14:55                                         ` Gerd Hoffmann
  2009-05-19 15:24                                           ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-19 14:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

>>>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>>> Why?

> | The change of MTRR's on _any_ of the guest CPUs in a dom0 context
> | should immediately be refected on all CPUs. Assymetric MTRR
> | settings are madness.

Exactly.  And thats why it is pointless to let the dom0 kernel write the 
mtrr msrs on more than one cpu.

>>>> Oops, the third "proper technical solutions" is missing.
>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>> Works only in case the CPU has PAT support.
>
> Which specific CPU without PAT support do you worry about?

For example:  I have a Notebook here, with MTRR and without PAT 
according to the boot log.  "Pentium III (Coppermine)" says /proc/cpuinfo.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 14:55                                         ` Gerd Hoffmann
@ 2009-05-19 15:24                                           ` Ingo Molnar
  2009-05-20  8:01                                             ` Gerd Hoffmann
  0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2009-05-19 15:24 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

* Gerd Hoffmann <kraxel@redhat.com> wrote:

>>>>> No, the linux kernel probably should do the wrmsr on one cpu only then.
>>>> Why?
>
>> | The change of MTRR's on _any_ of the guest CPUs in a dom0 context
>> | should immediately be refected on all CPUs. Assymetric MTRR
>> | settings are madness.
>
> Exactly.  And thats why it is pointless to let the dom0 kernel 
> write the mtrr msrs on more than one cpu.

How does this negate my claim that the Linux kernel needs no 
modifications? (which i think your point is - let me know if you 
have some other point here.)

the Xen hypervisor can simply repeat all requests (i.e. not care at 
all about the fact that a guest does these modifications on all CPUs 
it sees), or realize that the modification has already been done and 
skip it. This is in no way a performance sensitive codepath - mtrrs 
are only modified in init sequences - and setting mtrrs is slow and 
globally serialized to begin with.

>>>>> Oops, the third "proper technical solutions" is missing.
>>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.
>>> Works only in case the CPU has PAT support.
>>
>> Which specific CPU without PAT support do you worry about?
>
> For example: I have a Notebook here, with MTRR and without PAT 
> according to the boot log.  "Pentium III (Coppermine)" says 
> /proc/cpuinfo.

That's a really old CPU, but even Coppermine has PAT support in the 
CPU. You need to go back to things like P5 200 MHz CPUs to find 
PAT-less CPUs.

On the Linux side it's easy to enable it (and _such_ a patch would 
make sense indeed) - it's just that nobody has yet gone through the 
effort of testing and validating the PAT code on that CPU. If you 
care enough, you can do it, send a patch and ping the Intel folks 
about it.

Once the issues are framed correctly, Linux is helped for real.

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 15:24                                           ` Ingo Molnar
@ 2009-05-20  8:01                                             ` Gerd Hoffmann
  0 siblings, 0 replies; 20+ messages in thread
From: Gerd Hoffmann @ 2009-05-20  8:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Jesse Barnes,
	Eric W. Biederman, H. Peter Anvin, Thomas Gleixner,
	Linus Torvalds

On 05/19/09 17:24, Ingo Molnar wrote:
> the Xen hypervisor can simply repeat all requests (i.e. not care at
> all about the fact that a guest does these modifications on all CPUs
> it sees), or realize that the modification has already been done and
> skip it.

Could be done, yes.  It still feels wrong that wrmsr(mtrr) works 
slightly different on xen and on native.  And it wouldn't work on 
existing Xen deployments as the Xen hypervisor doesn't support that today.

>>>>> Yeah, the third one is to not touch MTRRs after bootup and use PAT.

> That's a really old CPU, but even Coppermine has PAT support in the
> CPU. You need to go back to things like P5 200 MHz CPUs to find
> PAT-less CPUs.

Linux shouln't say "PAT not supported by CPU." then.

Also it doesn't make sense to me to handle things differently on native 
and xen.  While it might make sense to deprecate mtrrs in favor of PAT 
(don't know enougth about all the different cpus in the wild to justify 
that) I don't think it makes sense to do that for xen only.  Native 
should declare mtrrs obsolete as well.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation
  2009-05-19 11:08                           ` Ingo Molnar
  2009-05-19 12:04                             ` Gerd Hoffmann
@ 2009-05-20 16:35                             ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 20+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-20 16:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jan Beulich, the arch/x86 maintainers, Thomas Gleixner,
	Linus Torvalds, Xen-devel, Linux Kernel Mailing List,
	Jesse Barnes, Eric W. Biederman, H. Peter Anvin

Ingo Molnar wrote:
> * Jan Beulich <JBeulich@novell.com> wrote:
>
>   
>>>>> Ingo Molnar <mingo@elte.hu> 19.05.09 11:59 >>>
>>>>>           
>>> Exactly what is 'bizarre' about using the API defined by the 
>>> _CPU_ already, without adding any ad-hoc hypecall? Catch the 
>>> dom0 WRMSRs, filter out the MTRR indices - that's it.
>>>       
>> But that is *not* the same as using the hypercalls: The hypercall 
>> tells Xen "Change all CPUs' MTRRs with the indicated index to the 
>> indicated value", while the MSR write says "Change the MTRR with 
>> the given index on the physical CPU the current virtual CPU 
>> happens to run on to the given value". [...]
>>     
>
> The change of MTRR's on _any_ of the guest CPUs in a dom0 context 
> should immediately be refected on all CPUs. Assymetric MTRR settings 
> are madness.
>
> ( And the thing is, changing MTRRs is fragile and racy on native 
>   Linux no matter what - even without any hypervisors - due to SMM 
>   contexts possibly relying on them etc. )
>
>   
>> [...] A write-base/write-mask pair may happen to get interrupted 
>> (preempted) by the hypervisor, and hence the two writes may happen 
>> on different pCPU-s. Teaching the hypervisor to (correctly!) guess 
>> what the guest meant in that situation isn't trivial, as then it 
>> needs to handle all possible situations (and it can never know 
>> whether Dom0 really intended to do something that may look 
>> bogus/inconsistent at the first glance). [...]
>>     
>
> None of this is a problem really if a sane approach is used: a 
> change to the MTRR state on dom0 is applied symmetrically on all 
> CPUs.
>
> Or, alternatively, the hypervisor can expose its own administrative 
> interface to manage MTRRs.
>
> There's no need to fuglify the Linux kernel for that.

I'm not sure what you mean by that, other than as a description of the 
current case.  The Xen MTRR hypercall:

   1. treats MTRR ranges as allocatable resources, and keep track of how
      many uses there are of each
   2. updates all physical cpus synchronously (ie, the MTRR is not
      presented as a property of dom0's virtual CPU, but as a
      system-wide resource)
   3. prevents guests from setting inconsistent or conflicting MTRRs

Mapping from MSR writes to this interface is moderately complex, because 
it requires a mapping from a low-semantic-content interface to a 
high-semantic-content interface.  It essentially requires parsing the 
MSR writes to map them back to the relatively high-level operations at 
the mtrr_ops interface and then present that to Xen.

There are at least a couple of secondary issues which arise from that 
approach:

    * mtrr/generic.c also has to do a number of other things like
      disabling caching, tlb flushes, etc.  That adds complexity because
      Xen guests are never allowed to globally disable caching, so we'd
      have to add additional filtering to remove those cr0 writes
    * As we've discussed, we'd need to make the mtrr writes implicitly
      change all cpus atomically, as the dom0 kernel can't see physical cpus


The net effect would be that we would be making a pile of apparently 
generic CPU operations (MSR writes, control register writes) actually 
feed a fairly complex parser, increasing the difference between the Xen 
and native cases even more.

mtrr/generic.c about 730 lines of fairly intricate arch-specific code.  
mtrr/xen.c is 120 lines of straightforward hypercalls.  The mtrr_ops 
interface and the Xen hypercall interface are a close semantic match, so 
there's very little glue code in there.


But that said, this a huge distraction, an unbelievable amount of noise 
for a fairly minor point.  We can live without these changes, and 
they're certainly easy enough to carry out of tree in the meantime.  If 
you can't live with these changes, then drop them and we'll work out 
something else.

    J

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-05-25 16:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-25  8:42 [Xen-devel] Re: [GIT PULL] xen /proc/mtrr implementation Tomasz Chmielewski
2009-05-25  9:15 ` Andi Kleen
2009-05-25  9:31   ` Jan Beulich
2009-05-25  9:47     ` Andi Kleen
2009-05-25 16:05   ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2009-05-12 23:27 Jeremy Fitzhardinge
2009-05-13 13:30 ` Ingo Molnar
2009-05-13 14:39   ` Jeremy Fitzhardinge
2009-05-15 18:27     ` Ingo Molnar
2009-05-15 20:09       ` Jeremy Fitzhardinge
2009-05-15 23:26         ` Eric W. Biederman
2009-05-15 23:49           ` Jeremy Fitzhardinge
2009-05-16  3:22             ` Jesse Barnes
2009-05-16  4:26               ` Eric W. Biederman
2009-05-18  4:57                 ` Jeremy Fitzhardinge
2009-05-18  8:59                   ` Ingo Molnar
2009-05-18 13:17                     ` [Xen-devel] " Jan Beulich
2009-05-18 18:07                     ` Jeremy Fitzhardinge
2009-05-19  9:59                       ` Ingo Molnar
2009-05-19 10:22                         ` [Xen-devel] " Jan Beulich
2009-05-19 11:08                           ` Ingo Molnar
2009-05-19 12:04                             ` Gerd Hoffmann
2009-05-19 12:26                               ` Ingo Molnar
2009-05-19 12:32                                 ` Alan Cox
2009-05-19 12:37                                   ` Ingo Molnar
2009-05-19 13:21                                 ` Gerd Hoffmann
2009-05-19 13:31                                   ` Ingo Molnar
2009-05-19 13:51                                     ` Gerd Hoffmann
2009-05-19 14:17                                       ` Ingo Molnar
2009-05-19 14:55                                         ` Gerd Hoffmann
2009-05-19 15:24                                           ` Ingo Molnar
2009-05-20  8:01                                             ` Gerd Hoffmann
2009-05-20 16:35                             ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).