* Re: [PATCH] xen: core dom0 support
@ 2009-03-11 19:58 devzero
2009-03-14 1:08 ` Morten P.D. Stevens
0 siblings, 1 reply; 49+ messages in thread
From: devzero @ 2009-03-11 19:58 UTC (permalink / raw)
To: linux-kernel; +Cc: Jeremy Fitzhardinge
so - after a lot of discussion about this, can we expect linux getting dom0 support or not ?
besides vmware, xen is the most important and popular virtualization solution, existing for years now.
there are still many (new) processors around which do have NO virtualization support, and if i look at the current market share of kvm based virtualization solution, i wonder why xen vs. kvm is being discussed at all.
is dom0 support really that intrusive, especially in comparison to domU support (which is already in) ?
i think dom0 AND domU support is what users want.....
>to patched up (old) kernels for dom0. This is a real problem because
>all the drivers for interesting new devices are in the new kernels, so
>there's an additional burden of backporting device support into old kernels.
>
> J
yes. i already had big trouble to install xenserver enterprise product on several machines now, and this really sucks big !
it`s that old game: new machine, windows - no problem. linux? sles? rhel? driver disk? doh!
esx sucks because of proprietary, raped linux drivers where also half of the ones i need are enirely missing and xenserver is not better with that, as it is using an outdated dom0 kernel.....
regards
roland
List: linux-kernel
Subject: [PATCH] xen: core dom0 support
From: Jeremy Fitzhardinge <jeremy () goop ! org>
Date: 2009-02-28 1:59:06
Message-ID: 1235786365-17744-1-git-send-email-jeremy () goop ! org
[Download message RAW]
Hi,
This series implements the core parts of Xen dom0 support; that is, just
enough to get the kernel started when booted by Xen as a dom0 kernel.
The Xen dom0 kernel runs as a normal paravirtualized Xen kernel, but
it also has the additional responsibilty for managing all the machine's
hardware, as Xen itself has almost no internal driver support (it barely
even knows about PCI).
This series includes:
- setting up a Xen hvc console
- initializing Xenbus
- enabling IO permissions for the kernel
- MTRR setup hooks
- Use _PAGE_IOMAP to allow direct hardware mappings
- add a paravirt-ops for page_is_ram, to allow Xen to exclude granted pages
- enable the use of a vga console
Not included in this series is the hooks into apic setup; that's next.
This may be pulled from:
The following changes since commit cc2f3b455c8efa01c66b8e66df8aad1da9310901:
Ingo Molnar (1):
Merge branch 'sched/urgent'
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git push/xen/dom0/core
Ian Campbell (4):
xen: disable PAT
xen/dom0: Use host E820 map
xen: implement XENMEM_machphys_mapping
xen: clear reserved bits in l3 entries given in the initial pagetables
Jeremy Fitzhardinge (6):
xen dom0: Make hvc_xen console work for dom0.
xen-dom0: only selectively disable cpu features
xen/dom0: use _PAGE_IOMAP in ioremap to do machine mappings
paravirt/xen: add pvop for page_is_ram
xen/dom0: add XEN_DOM0 config option
xen: allow enable use of VGA console on dom0
Juan Quintela (2):
xen dom0: Initialize xenbus for dom0.
xen dom0: Set up basic IO permissions for dom0.
Mark McLoughlin (5):
xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has()
xen mtrr: Kill some unneccessary includes
xen mtrr: Use generic_validate_add_page()
xen mtrr: Implement xen_get_free_region()
xen mtrr: Add xen_{get,set}_mtrr() implementations
Stephen Tweedie (2):
xen dom0: Add support for the platform_ops hypercall
xen mtrr: Add mtrr_ops support for Xen mtrr
arch/x86/include/asm/page.h | 9 +-
arch/x86/include/asm/paravirt.h | 7 +
arch/x86/include/asm/pat.h | 5 +
arch/x86/include/asm/xen/hypercall.h | 8 +
arch/x86/include/asm/xen/interface.h | 6 +-
arch/x86/include/asm/xen/interface_32.h | 5 +
arch/x86/include/asm/xen/interface_64.h | 13 +--
arch/x86/include/asm/xen/page.h | 15 +--
arch/x86/kernel/cpu/mtrr/Makefile | 1 +
arch/x86/kernel/cpu/mtrr/amd.c | 1 +
arch/x86/kernel/cpu/mtrr/centaur.c | 1 +
arch/x86/kernel/cpu/mtrr/cyrix.c | 1 +
arch/x86/kernel/cpu/mtrr/generic.c | 1 +
arch/x86/kernel/cpu/mtrr/main.c | 11 +-
arch/x86/kernel/cpu/mtrr/mtrr.h | 7 +
arch/x86/kernel/cpu/mtrr/xen.c | 120 ++++++++++++++++
arch/x86/kernel/paravirt.c | 1 +
arch/x86/mm/ioremap.c | 2 +-
arch/x86/mm/pat.c | 5 -
arch/x86/xen/Kconfig | 26 ++++
arch/x86/xen/Makefile | 3 +-
arch/x86/xen/enlighten.c | 58 ++++++--
arch/x86/xen/mmu.c | 135 ++++++++++++++++++-
arch/x86/xen/setup.c | 51 ++++++-
arch/x86/xen/vga.c | 65 +++++++++
arch/x86/xen/xen-ops.h | 12 ++
drivers/char/hvc_xen.c | 101 +++++++++-----
drivers/xen/events.c | 2 +-
drivers/xen/xenbus/xenbus_probe.c | 30 ++++-
include/xen/events.h | 2 +
include/xen/interface/memory.h | 42 ++++++
include/xen/interface/platform.h | 232 +++++++++++++++++++++++++++++++
include/xen/interface/xen.h | 41 ++++++
33 files changed, 931 insertions(+), 88 deletions(-)
create mode 100644 arch/x86/kernel/cpu/mtrr/xen.c
create mode 100644 arch/x86/xen/vga.c
create mode 100644 include/xen/interface/platform.h
Thanks,
J
_______________________________________________________________________
DSL zum Nulltarif + 20 Euro Extraprämie bei Online-Bestellung über die
DSL Freundschaftswerbung! http://dsl.web.de/?ac=OM.AD.AD008K15279B7069a
^ permalink raw reply [flat|nested] 49+ messages in thread* RE: [PATCH] xen: core dom0 support 2009-03-11 19:58 [PATCH] xen: core dom0 support devzero @ 2009-03-14 1:08 ` Morten P.D. Stevens 0 siblings, 0 replies; 49+ messages in thread From: Morten P.D. Stevens @ 2009-03-14 1:08 UTC (permalink / raw) To: linux-kernel; +Cc: Jeremy Fitzhardinge, devzero, akpm, torvalds > so - after a lot of discussion about this, can we expect linux getting dom0 support or not ? @ Andrew & Linus Please support linux with xen dom0 support. Linux and xen in combination is a great virtualization technology. The performance is great and thousands of companies are using xen with linux. > i think dom0 AND domU support is what users want..... Yes, i think so, too. Xen is only supported with linux 2.6.18 and this will be a big problem in future with new server hardware. For the users and big companies which are using xen it´s very important to support xen as dom0 in future linux versions. Thanks a lot and I'm hopefully that linux will support xen as dom0 :-) Best regards, Morten -----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of devzero@web.de Sent: Wednesday, March 11, 2009 8:59 PM To: linux-kernel@vger.kernel.org Cc: Jeremy Fitzhardinge Subject: Re: [PATCH] xen: core dom0 support so - after a lot of discussion about this, can we expect linux getting dom0 support or not ? besides vmware, xen is the most important and popular virtualization solution, existing for years now. there are still many (new) processors around which do have NO virtualization support, and if i look at the current market share of kvm based virtualization solution, i wonder why xen vs. kvm is being discussed at all. is dom0 support really that intrusive, especially in comparison to domU support (which is already in) ? i think dom0 AND domU support is what users want..... >to patched up (old) kernels for dom0. This is a real problem because >all the drivers for interesting new devices are in the new kernels, so >there's an additional burden of backporting device support into old kernels. > > J yes. i already had big trouble to install xenserver enterprise product on several machines now, and this really sucks big ! it`s that old game: new machine, windows - no problem. linux? sles? rhel? driver disk? doh! esx sucks because of proprietary, raped linux drivers where also half of the ones i need are enirely missing and xenserver is not better with that, as it is using an outdated dom0 kernel..... regards roland List: linux-kernel Subject: [PATCH] xen: core dom0 support From: Jeremy Fitzhardinge <jeremy () goop ! org> Date: 2009-02-28 1:59:06 Message-ID: 1235786365-17744-1-git-send-email-jeremy () goop ! org [Download message RAW] Hi, This series implements the core parts of Xen dom0 support; that is, just enough to get the kernel started when booted by Xen as a dom0 kernel. The Xen dom0 kernel runs as a normal paravirtualized Xen kernel, but it also has the additional responsibilty for managing all the machine's hardware, as Xen itself has almost no internal driver support (it barely even knows about PCI). This series includes: - setting up a Xen hvc console - initializing Xenbus - enabling IO permissions for the kernel - MTRR setup hooks - Use _PAGE_IOMAP to allow direct hardware mappings - add a paravirt-ops for page_is_ram, to allow Xen to exclude granted pages - enable the use of a vga console Not included in this series is the hooks into apic setup; that's next. This may be pulled from: The following changes since commit cc2f3b455c8efa01c66b8e66df8aad1da9310901: Ingo Molnar (1): Merge branch 'sched/urgent' are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git push/xen/dom0/core Ian Campbell (4): xen: disable PAT xen/dom0: Use host E820 map xen: implement XENMEM_machphys_mapping xen: clear reserved bits in l3 entries given in the initial pagetables Jeremy Fitzhardinge (6): xen dom0: Make hvc_xen console work for dom0. xen-dom0: only selectively disable cpu features xen/dom0: use _PAGE_IOMAP in ioremap to do machine mappings paravirt/xen: add pvop for page_is_ram xen/dom0: add XEN_DOM0 config option xen: allow enable use of VGA console on dom0 Juan Quintela (2): xen dom0: Initialize xenbus for dom0. xen dom0: Set up basic IO permissions for dom0. Mark McLoughlin (5): xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has() xen mtrr: Kill some unneccessary includes xen mtrr: Use generic_validate_add_page() xen mtrr: Implement xen_get_free_region() xen mtrr: Add xen_{get,set}_mtrr() implementations Stephen Tweedie (2): xen dom0: Add support for the platform_ops hypercall xen mtrr: Add mtrr_ops support for Xen mtrr arch/x86/include/asm/page.h | 9 +- arch/x86/include/asm/paravirt.h | 7 + arch/x86/include/asm/pat.h | 5 + arch/x86/include/asm/xen/hypercall.h | 8 + arch/x86/include/asm/xen/interface.h | 6 +- arch/x86/include/asm/xen/interface_32.h | 5 + arch/x86/include/asm/xen/interface_64.h | 13 +-- arch/x86/include/asm/xen/page.h | 15 +-- arch/x86/kernel/cpu/mtrr/Makefile | 1 + arch/x86/kernel/cpu/mtrr/amd.c | 1 + arch/x86/kernel/cpu/mtrr/centaur.c | 1 + arch/x86/kernel/cpu/mtrr/cyrix.c | 1 + arch/x86/kernel/cpu/mtrr/generic.c | 1 + arch/x86/kernel/cpu/mtrr/main.c | 11 +- arch/x86/kernel/cpu/mtrr/mtrr.h | 7 + arch/x86/kernel/cpu/mtrr/xen.c | 120 ++++++++++++++++ arch/x86/kernel/paravirt.c | 1 + arch/x86/mm/ioremap.c | 2 +- arch/x86/mm/pat.c | 5 - arch/x86/xen/Kconfig | 26 ++++ arch/x86/xen/Makefile | 3 +- arch/x86/xen/enlighten.c | 58 ++++++-- arch/x86/xen/mmu.c | 135 ++++++++++++++++++- arch/x86/xen/setup.c | 51 ++++++- arch/x86/xen/vga.c | 65 +++++++++ arch/x86/xen/xen-ops.h | 12 ++ drivers/char/hvc_xen.c | 101 +++++++++----- drivers/xen/events.c | 2 +- drivers/xen/xenbus/xenbus_probe.c | 30 ++++- include/xen/events.h | 2 + include/xen/interface/memory.h | 42 ++++++ include/xen/interface/platform.h | 232 +++++++++++++++++++++++++++++++ include/xen/interface/xen.h | 41 ++++++ 33 files changed, 931 insertions(+), 88 deletions(-) create mode 100644 arch/x86/kernel/cpu/mtrr/xen.c create mode 100644 arch/x86/xen/vga.c create mode 100644 include/xen/interface/platform.h Thanks, J _______________________________________________________________________ DSL zum Nulltarif + 20 Euro Extraprämie bei Online-Bestellung über die DSL Freundschaftswerbung! http://dsl.web.de/?ac=OM.AD.AD008K15279B7069a -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH] xen: core dom0 support
@ 2009-02-28 1:59 Jeremy Fitzhardinge
2009-02-28 5:28 ` Andrew Morton
0 siblings, 1 reply; 49+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-28 1:59 UTC (permalink / raw)
To: H. Peter Anvin
Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel
Hi,
This series implements the core parts of Xen dom0 support; that is, just
enough to get the kernel started when booted by Xen as a dom0 kernel.
The Xen dom0 kernel runs as a normal paravirtualized Xen kernel, but
it also has the additional responsibilty for managing all the machine's
hardware, as Xen itself has almost no internal driver support (it barely
even knows about PCI).
This series includes:
- setting up a Xen hvc console
- initializing Xenbus
- enabling IO permissions for the kernel
- MTRR setup hooks
- Use _PAGE_IOMAP to allow direct hardware mappings
- add a paravirt-ops for page_is_ram, to allow Xen to exclude granted pages
- enable the use of a vga console
Not included in this series is the hooks into apic setup; that's next.
This may be pulled from:
The following changes since commit cc2f3b455c8efa01c66b8e66df8aad1da9310901:
Ingo Molnar (1):
Merge branch 'sched/urgent'
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git push/xen/dom0/core
Ian Campbell (4):
xen: disable PAT
xen/dom0: Use host E820 map
xen: implement XENMEM_machphys_mapping
xen: clear reserved bits in l3 entries given in the initial pagetables
Jeremy Fitzhardinge (6):
xen dom0: Make hvc_xen console work for dom0.
xen-dom0: only selectively disable cpu features
xen/dom0: use _PAGE_IOMAP in ioremap to do machine mappings
paravirt/xen: add pvop for page_is_ram
xen/dom0: add XEN_DOM0 config option
xen: allow enable use of VGA console on dom0
Juan Quintela (2):
xen dom0: Initialize xenbus for dom0.
xen dom0: Set up basic IO permissions for dom0.
Mark McLoughlin (5):
xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has()
xen mtrr: Kill some unneccessary includes
xen mtrr: Use generic_validate_add_page()
xen mtrr: Implement xen_get_free_region()
xen mtrr: Add xen_{get,set}_mtrr() implementations
Stephen Tweedie (2):
xen dom0: Add support for the platform_ops hypercall
xen mtrr: Add mtrr_ops support for Xen mtrr
arch/x86/include/asm/page.h | 9 +-
arch/x86/include/asm/paravirt.h | 7 +
arch/x86/include/asm/pat.h | 5 +
arch/x86/include/asm/xen/hypercall.h | 8 +
arch/x86/include/asm/xen/interface.h | 6 +-
arch/x86/include/asm/xen/interface_32.h | 5 +
arch/x86/include/asm/xen/interface_64.h | 13 +--
arch/x86/include/asm/xen/page.h | 15 +--
arch/x86/kernel/cpu/mtrr/Makefile | 1 +
arch/x86/kernel/cpu/mtrr/amd.c | 1 +
arch/x86/kernel/cpu/mtrr/centaur.c | 1 +
arch/x86/kernel/cpu/mtrr/cyrix.c | 1 +
arch/x86/kernel/cpu/mtrr/generic.c | 1 +
arch/x86/kernel/cpu/mtrr/main.c | 11 +-
arch/x86/kernel/cpu/mtrr/mtrr.h | 7 +
arch/x86/kernel/cpu/mtrr/xen.c | 120 ++++++++++++++++
arch/x86/kernel/paravirt.c | 1 +
arch/x86/mm/ioremap.c | 2 +-
arch/x86/mm/pat.c | 5 -
arch/x86/xen/Kconfig | 26 ++++
arch/x86/xen/Makefile | 3 +-
arch/x86/xen/enlighten.c | 58 ++++++--
arch/x86/xen/mmu.c | 135 ++++++++++++++++++-
arch/x86/xen/setup.c | 51 ++++++-
arch/x86/xen/vga.c | 65 +++++++++
arch/x86/xen/xen-ops.h | 12 ++
drivers/char/hvc_xen.c | 101 +++++++++-----
drivers/xen/events.c | 2 +-
drivers/xen/xenbus/xenbus_probe.c | 30 ++++-
include/xen/events.h | 2 +
include/xen/interface/memory.h | 42 ++++++
include/xen/interface/platform.h | 232 +++++++++++++++++++++++++++++++
include/xen/interface/xen.h | 41 ++++++
33 files changed, 931 insertions(+), 88 deletions(-)
create mode 100644 arch/x86/kernel/cpu/mtrr/xen.c
create mode 100644 arch/x86/xen/vga.c
create mode 100644 include/xen/interface/platform.h
Thanks,
J
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH] xen: core dom0 support 2009-02-28 1:59 Jeremy Fitzhardinge @ 2009-02-28 5:28 ` Andrew Morton 2009-02-28 6:52 ` Jeremy Fitzhardinge ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: Andrew Morton @ 2009-02-28 5:28 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel On Fri, 27 Feb 2009 17:59:06 -0800 Jeremy Fitzhardinge <jeremy@goop.org> wrote: > This series implements the core parts of Xen dom0 support; that is, just > enough to get the kernel started when booted by Xen as a dom0 kernel. And what other patches can we expect to see to complete the xen dom0 support? and.. I hate to be the one to say it, but we should sit down and work out whether it is justifiable to merge any of this into Linux. I think it's still the case that the Xen technology is the "old" way and that the world is moving off in the "new" direction, KVM? In three years time, will we regret having merged this? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 5:28 ` Andrew Morton @ 2009-02-28 6:52 ` Jeremy Fitzhardinge 2009-02-28 7:20 ` Ingo Molnar ` (2 more replies) 2009-02-28 8:42 ` Ingo Molnar 2009-03-05 13:52 ` Morten P.D. Stevens 2 siblings, 3 replies; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-02-28 6:52 UTC (permalink / raw) To: Andrew Morton Cc: H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Andrew Morton wrote: > On Fri, 27 Feb 2009 17:59:06 -0800 Jeremy Fitzhardinge <jeremy@goop.org> wrote: > > >> This series implements the core parts of Xen dom0 support; that is, just >> enough to get the kernel started when booted by Xen as a dom0 kernel. >> > > And what other patches can we expect to see to complete the xen dom0 > support? > There's a bit of a gradient. There's probably another 2-3 similarly sized series to get everything so that you can boot dom0 out of the box (core, apic, swiotlb/agp/drm, backend drivers, tools). And then a scattering of smaller things which may or may not be upstreamable. The vast majority of it is Xen-specific code, rather than changes to core kernel. I'm in no particular rush to get it all into the kernel, but I would like to get the core parts in for .30 so that its basically useful, and the delta to feature-complete isn't very large (a big reason is to keep the out-of-tree patch size down for distros). > I hate to be the one to say it, but we should sit down and work out > whether it is justifiable to merge any of this into Linux. I think > it's still the case that the Xen technology is the "old" way and that > the world is moving off in the "new" direction, KVM? > I don't think that's a particularly useful way to look at it. They're different approaches to the problem, and have different tradeoffs. The more important question is: are there real users for this stuff? Does not merging it cause more net disadvantage than merging it? Despite all the noise made about kvm in kernel circles, Xen has a large and growing installed base. At the moment its all running on massive out-of-tree patches, which doesn't make anyone happy. It's best that it be in the mainline kernel. You know, like we argue for everything else. > In three years time, will we regret having merged this? > Its a pretty minor amount of extra stuff on top of what's been added over the last 3 years, so I don't think it's going to tip the scales on its own. I wouldn't be comfortable in trying to merge something that's very intrusive. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 6:52 ` Jeremy Fitzhardinge @ 2009-02-28 7:20 ` Ingo Molnar 2009-02-28 8:05 ` Jeremy Fitzhardinge 2009-03-02 9:26 ` Gerd Hoffmann 2009-02-28 12:09 ` Nick Piggin 2009-02-28 16:14 ` Andi Kleen 2 siblings, 2 replies; 49+ messages in thread From: Ingo Molnar @ 2009-02-28 7:20 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel * Jeremy Fitzhardinge <jeremy@goop.org> wrote: > [...] At the moment its all running on massive out-of-tree ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > patches, which doesn't make anyone happy. It's best that it ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > be in the mainline kernel. You know, like we argue for > everything else. > >> In three years time, will we regret having merged this? > > Its a pretty minor amount of extra stuff on top of what's been > added over the last 3 years, so I don't think it's going to > tip the scales on its own. I wouldn't be comfortable in > trying to merge something that's very intrusive. Hm, how can the same code that you call "massive out-of-tree patches which doesn't make anyone happy" in an out of tree context suddenly become non-intrusive "minor amount of extra stuff" in an upstream context? I wish the upstream kernel was able to do such magic, but i'm afraid it is not. Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 7:20 ` Ingo Molnar @ 2009-02-28 8:05 ` Jeremy Fitzhardinge 2009-02-28 8:36 ` Ingo Molnar 2009-03-02 9:26 ` Gerd Hoffmann 1 sibling, 1 reply; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-02-28 8:05 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Ingo Molnar wrote: > Hm, how can the same code that you call "massive out-of-tree > patches which doesn't make anyone happy" in an out of tree > context suddenly become non-intrusive "minor amount of extra > stuff" in an upstream context? > > I wish the upstream kernel was able to do such magic, but i'm > afraid it is not. No, but I am ;) The current out of tree Xen patches are very intrusive because there hasn't been much incentive to reduce their impact. I've going through it all and very carefully rewriting it 1) be cleaner, 2) enable/disable itself at runtime, 3) have clean interfaces and interactions with the rest of the kernel, and 4) address any concerns that others have. In other words, make Xen a first-class kernel citizen. Most of the intrusive stuff has already been merged (and merged for some time now), but without dom0 support its only half done; as it stands people are using mainline Linux for their domUs, but are still limited to patched up (old) kernels for dom0. This is a real problem because all the drivers for interesting new devices are in the new kernels, so there's an additional burden of backporting device support into old kernels. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 8:05 ` Jeremy Fitzhardinge @ 2009-02-28 8:36 ` Ingo Molnar 2009-02-28 9:57 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 49+ messages in thread From: Ingo Molnar @ 2009-02-28 8:36 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel * Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Ingo Molnar wrote: >> Hm, how can the same code that you call "massive out-of-tree patches >> which doesn't make anyone happy" in an out of tree context suddenly >> become non-intrusive "minor amount of extra stuff" in an upstream >> context? >> >> I wish the upstream kernel was able to do such magic, but i'm afraid it >> is not. > > No, but I am ;) The current out of tree Xen patches are very > intrusive because there hasn't been much incentive to reduce > their impact. I've going through it all and very carefully > rewriting it 1) be cleaner, 2) enable/disable itself at > runtime, 3) have clean interfaces and interactions with the > rest of the kernel, and 4) address any concerns that others > have. In other words, make Xen a first-class kernel citizen. > > Most of the intrusive stuff has already been merged (and > merged for some time now), but without dom0 support its only > half done; as it stands people are using mainline Linux for > their domUs, but are still limited to patched up (old) kernels > for dom0. This is a real problem because all the drivers for > interesting new devices are in the new kernels, so there's an > additional burden of backporting device support into old > kernels. This means that the "massive out-of-tree patches which doesn't make anyone happy" argument above is really ... a hiperbole and should be replaced with: "small, unintrusive out-of-tree patch"? Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 8:36 ` Ingo Molnar @ 2009-02-28 9:57 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-02-28 9:57 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Ingo Molnar wrote: > This means that the "massive out-of-tree patches which doesn't > make anyone happy" argument above is really ... a hiperbole and > should be replaced with: "small, unintrusive out-of-tree patch"? Well at the moment we're in the "doesn't make anybody happy" state. The dom0 changes I have are, I'll admit, non-trivial. I don't think they're unreasonable or particularly intrusive, but they are large enough to be awkward to maintain out of tree. What I'm looking to achieve now is to get enough into the kernel so that the remaining patches are a "small unintrusive out-of-tree patch" (but ultimately I'd like to get everything in). But I think that's sort of beside the point. Its not like we're talking about something extremely obscure here; these changes do serve a large existing user-base. The (often repeated) kernel policy is "merge it". I'm happy to talk about the specifics of how all this stuff can be made to fit together - and whether the current approach is OK or if something else would be better, but ultimately I think this functionality does belong in mainline. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 7:20 ` Ingo Molnar 2009-02-28 8:05 ` Jeremy Fitzhardinge @ 2009-03-02 9:26 ` Gerd Hoffmann 2009-03-02 12:04 ` Ingo Molnar 1 sibling, 1 reply; 49+ messages in thread From: Gerd Hoffmann @ 2009-03-02 9:26 UTC (permalink / raw) To: Ingo Molnar Cc: Jeremy Fitzhardinge, Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Ingo Molnar wrote: >>> In three years time, will we regret having merged this? >> Its a pretty minor amount of extra stuff on top of what's been >> added over the last 3 years, so I don't think it's going to >> tip the scales on its own. I wouldn't be comfortable in >> trying to merge something that's very intrusive. > > Hm, how can the same code that you call "massive out-of-tree > patches which doesn't make anyone happy" in an out of tree > context suddenly become non-intrusive "minor amount of extra > stuff" in an upstream context? The current, out-of-tree xen kernel stuff is based on 2.6.18. That predates pv_ops and is quite intrusive stuff, with alot of cut+paste programming and dirty hacks. Alot has happened in x86 land since 2.6.18. Being one of the x86 arch maintainers you should know that very well. Most notably: * pv_ops. Point of adding these is to allow virtualization-friendly kernels *without* being intrusive as hell. * x86 arch merge, followed up by tons of cleanups and code reorganizations. These changes also make it easier to merge xen support in a non-intrusive manner. Also the xen support code in the linux kernel itself is basically a rewrite from scratch, it hasn't much in common with the 2.6.18 code base. > I wish the upstream kernel was able to do such magic, but i'm > afraid it is not. It's no magic, it's alot of hard work. cheers, Gerd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 9:26 ` Gerd Hoffmann @ 2009-03-02 12:04 ` Ingo Molnar 2009-03-02 12:26 ` Gerd Hoffmann 0 siblings, 1 reply; 49+ messages in thread From: Ingo Molnar @ 2009-03-02 12:04 UTC (permalink / raw) To: Gerd Hoffmann Cc: Jeremy Fitzhardinge, Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel * Gerd Hoffmann <kraxel@redhat.com> wrote: > Ingo Molnar wrote: > >>> In three years time, will we regret having merged this? > >> Its a pretty minor amount of extra stuff on top of what's been > >> added over the last 3 years, so I don't think it's going to > >> tip the scales on its own. I wouldn't be comfortable in > >> trying to merge something that's very intrusive. > > > > Hm, how can the same code that you call "massive out-of-tree > > patches which doesn't make anyone happy" in an out of tree > > context suddenly become non-intrusive "minor amount of extra > > stuff" in an upstream context? > > The current, out-of-tree xen kernel stuff is based on 2.6.18. > [...] Sure, but what i'm pointing out is the following aspect of communication: >>> [...] At the moment its all running on massive out-of-tree >>> patches, which doesn't make anyone happy. It's best that it >>> be in the mainline kernel. You know, like we argue for >>> everything else. Comparing it to a 2.6.18 base is simply misleading when it comes to upstreaming something. Enterprise distros will rebase, and their out-of-tree pile of patches will shrink. Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 12:04 ` Ingo Molnar @ 2009-03-02 12:26 ` Gerd Hoffmann 0 siblings, 0 replies; 49+ messages in thread From: Gerd Hoffmann @ 2009-03-02 12:26 UTC (permalink / raw) To: Ingo Molnar Cc: Jeremy Fitzhardinge, Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Ingo Molnar wrote: > * Gerd Hoffmann <kraxel@redhat.com> wrote: >> The current, out-of-tree xen kernel stuff is based on 2.6.18. >> [...] > > Sure, but what i'm pointing out is the following aspect of > communication: > >>>> [...] At the moment its all running on massive out-of-tree >>>> patches, which doesn't make anyone happy. It's best that it >>>> be in the mainline kernel. You know, like we argue for >>>> everything else. > > Comparing it to a 2.6.18 base is simply misleading when it comes > to upstreaming something. Enterprise distros will rebase, and > their out-of-tree pile of patches will shrink. I think Jeremy refers to the 2.6.18 kernel though. And IMHO it isn't misleading as this is the only option for a dom0 kernel. Well, was until very recently, now you can also run the latest pv_ops bits with dom0 support. That is still very young code though and I wouldn't use that (yet) for production systems. Works fine on my development box though. The bits needed for pv_ops based dom0 support in arch/x86 is small compared to what is already there for domU support. cheers, Gerd ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 6:52 ` Jeremy Fitzhardinge 2009-02-28 7:20 ` Ingo Molnar @ 2009-02-28 12:09 ` Nick Piggin 2009-02-28 18:11 ` [Xen-devel] " Jody Belka 2009-03-01 23:27 ` Jeremy Fitzhardinge 2009-02-28 16:14 ` Andi Kleen 2 siblings, 2 replies; 49+ messages in thread From: Nick Piggin @ 2009-02-28 12:09 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel On Saturday 28 February 2009 17:52:24 Jeremy Fitzhardinge wrote: > Andrew Morton wrote: > > I hate to be the one to say it, but we should sit down and work out > > whether it is justifiable to merge any of this into Linux. I think > > it's still the case that the Xen technology is the "old" way and that > > the world is moving off in the "new" direction, KVM? > > I don't think that's a particularly useful way to look at it. They're > different approaches to the problem, and have different tradeoffs. > > The more important question is: are there real users for this stuff? > Does not merging it cause more net disadvantage than merging it? > Despite all the noise made about kvm in kernel circles, Xen has a large > and growing installed base. At the moment its all running on massive > out-of-tree patches, which doesn't make anyone happy. It's best that it > be in the mainline kernel. You know, like we argue for everything else. OTOH, there are good reasons not to duplicate functionality, and many many times throughout the kernel history competing solutions have been rejected even though the same arguments could be made about them. There have also been many times duplicate functionality has been merged, although that does often start with the intention of eliminating duplicate implementations and ends with pain. So I think Andrew's question is pretty important. The user issue aside -- that is a valid point -- you don't really touch on the technical issues. What tradeoffs, and where Xen does better than KVM would be interesting to know, can Xen tools and users ever be migrated to KVM or vice versa (I know very little about this myself, so I'm just an interested observer). Ideally of course, consensus would be made that one or the other is the better technical solution, and we should encourage developers to improve that one and users to use it. Although obviously a consensus can't always be made (usually when there is no right answer -- different tradeoffs etc). ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Xen-devel] Re: [PATCH] xen: core dom0 support 2009-02-28 12:09 ` Nick Piggin @ 2009-02-28 18:11 ` Jody Belka 2009-02-28 18:15 ` Andi Kleen 2009-03-01 23:27 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 49+ messages in thread From: Jody Belka @ 2009-02-28 18:11 UTC (permalink / raw) To: Nick Piggin Cc: Jeremy Fitzhardinge, Xen-devel, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin On Sat, Feb 28, 2009 at 11:09:07PM +1100, Nick Piggin wrote: > On Saturday 28 February 2009 17:52:24 Jeremy Fitzhardinge wrote: > > Andrew Morton wrote: > > > > I hate to be the one to say it, but we should sit down and work out > > > whether it is justifiable to merge any of this into Linux. I think > > > it's still the case that the Xen technology is the "old" way and that > > > the world is moving off in the "new" direction, KVM? > > > > I don't think that's a particularly useful way to look at it. They're > > different approaches to the problem, and have different tradeoffs. > > > > The more important question is: are there real users for this stuff? > > Does not merging it cause more net disadvantage than merging it? > > Despite all the noise made about kvm in kernel circles, Xen has a large > > and growing installed base. At the moment its all running on massive > > out-of-tree patches, which doesn't make anyone happy. It's best that it > > be in the mainline kernel. You know, like we argue for everything else. > > OTOH, there are good reasons not to duplicate functionality, and many > many times throughout the kernel history competing solutions have been > rejected even though the same arguments could be made about them. Is it duplication though? I personally have machines with older processors that don't have hvm support. I plan on keeping these around for a good amount of time, and would love to be running them on mainline. So for me, unless KVM is somehow going to support para-virtualisation, this isn't duplication. Just my own personal viewpoint as a user of xen. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 18:11 ` [Xen-devel] " Jody Belka @ 2009-02-28 18:15 ` Andi Kleen 2009-03-01 23:38 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 49+ messages in thread From: Andi Kleen @ 2009-02-28 18:15 UTC (permalink / raw) To: Jody Belka Cc: Nick Piggin, Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin, Andrew Morton Jody Belka <lists-xen@pimb.org> writes: > > Is it duplication though? I personally have machines with older processors > that don't have hvm support. I plan on keeping these around for a good amount > of time, and would love to be running them on mainline. So for me, unless KVM > is somehow going to support para-virtualisation, this isn't duplication. The old systems will continue to run fine with a 2.6.18 Dom0 though. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 18:15 ` Andi Kleen @ 2009-03-01 23:38 ` Jeremy Fitzhardinge 2009-03-02 0:14 ` Andi Kleen 0 siblings, 1 reply; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-01 23:38 UTC (permalink / raw) To: Andi Kleen Cc: Jody Belka, Nick Piggin, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin, Andrew Morton Andi Kleen wrote: > Jody Belka <lists-xen@pimb.org> writes: > >> Is it duplication though? I personally have machines with older processors >> that don't have hvm support. I plan on keeping these around for a good amount >> of time, and would love to be running them on mainline. So for me, unless KVM >> is somehow going to support para-virtualisation, this isn't duplication. >> > > The old systems will continue to run fine with a 2.6.18 Dom0 though. But that suggests the *only* reason to update kernels is to get new hardware support. Or conversely, we should stop trying to be backwards compatible with old hardware in new kernels because there's no reason to keep it. While a lot of the delta between 2.6.18 has been hardware support updates, there have been a lot of other useful things: a new CPU scheduler, tickless operation (which directly important for virtualization), all the cgroups stuff, new filesystems, IO schedulers, etc, etc. All good things to have, even on older hardware. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-01 23:38 ` Jeremy Fitzhardinge @ 2009-03-02 0:14 ` Andi Kleen 0 siblings, 0 replies; 49+ messages in thread From: Andi Kleen @ 2009-03-02 0:14 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andi Kleen, Jody Belka, Nick Piggin, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin, Andrew Morton > But that suggests the *only* reason to update kernels is to get new Wait, it was about Dom0. They could still get all the features of the new kernels in a guest DomU. > While a lot of the delta between 2.6.18 has been hardware support > updates, there have been a lot of other useful things: a new CPU > scheduler, tickless operation (which directly important for The old Dom0s always did no idle tick. And I suspect most of the other things don't matter very much in a minimal Dom0. That said I'm not arguing that it shouldn't be merged, but it seems like the "old hardware" argument is not very strong. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 12:09 ` Nick Piggin 2009-02-28 18:11 ` [Xen-devel] " Jody Belka @ 2009-03-01 23:27 ` Jeremy Fitzhardinge 2009-03-02 6:37 ` Nick Piggin 2009-03-04 19:03 ` Anthony Liguori 1 sibling, 2 replies; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-01 23:27 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Nick Piggin wrote: > On Saturday 28 February 2009 17:52:24 Jeremy Fitzhardinge wrote: > >> Andrew Morton wrote: >> > > >>> I hate to be the one to say it, but we should sit down and work out >>> whether it is justifiable to merge any of this into Linux. I think >>> it's still the case that the Xen technology is the "old" way and that >>> the world is moving off in the "new" direction, KVM? >>> >> I don't think that's a particularly useful way to look at it. They're >> different approaches to the problem, and have different tradeoffs. >> >> The more important question is: are there real users for this stuff? >> Does not merging it cause more net disadvantage than merging it? >> Despite all the noise made about kvm in kernel circles, Xen has a large >> and growing installed base. At the moment its all running on massive >> out-of-tree patches, which doesn't make anyone happy. It's best that it >> be in the mainline kernel. You know, like we argue for everything else. >> > > OTOH, there are good reasons not to duplicate functionality, and many > many times throughout the kernel history competing solutions have been > rejected even though the same arguments could be made about them. > > There have also been many times duplicate functionality has been merged, > although that does often start with the intention of eliminating > duplicate implementations and ends with pain. So I think Andrew's > question is pretty important. > Those would be pertinent questions if I were suddenly popping up and saying "hey, let's add Xen support to the kernel!" But Xen support has been in the kernel for well over a year now, and is widely used, enabled in distros, etc. The patches I'm proposing here are not a whole new thing, they're part of the last 10% to fill out the kernel's support to make it actually useful. > The user issue aside -- that is a valid point -- you don't really touch > on the technical issues. What tradeoffs, and where Xen does better > than KVM would be interesting to know, can Xen tools and users ever be > migrated to KVM or vice versa (I know very little about this myself, so > I'm just an interested observer). > OK, fair point, its probably time for another Xen architecture refresher post. There are two big architectural differences between Xen and KVM: Firstly, Xen has a separate hypervisor who's primary role is to context switch between the guest domains (virtual machines). The hypervisor is relatively small and single purpose. It doesn't, for example, contain any device drivers or even much knowledge of things like pci buses and their structure. The domains themselves are more or less peers; some are more privileged than others, but from Xen's perspective they are more or less equivalent. The first domain, dom0, is special because its started by Xen itself, and has some inherent initial privileges; its main job is to start other domains, and it also typically provides virtualized/multiplexed device services to other domains via a frontend/backend split driver structure. KVM, on the other hand, builds all the hypervisor stuff into the kernel itself, so you end up with a kernel which does all the normal kernel stuff, and can run virtual machines by making them look like slightly strange processes. Because Xen is dedicated to just running virtual machines, its internal architecture can be more heavily oriented towards that task, which affects things from how its scheduler works, its use and multiplexing of physical memory. For example, Xen manages to use new hardware virtualization features pretty quickly, partly because it doesn't need to trade-off against normal kernel functions. The clear distinction between the privileged hypervisor and the rest of the domains makes the security people happy as well. Also, because Xen is small and fairly self-contained, there's quite a few hardware vendors shipping it burned into the firmware so that it really is the first thing to boot (many of instant-on features that laptops have are based on Xen). Both HP and Dell, at least, are selling servers with Xen pre-installed in the firmware. The second big difference is the use of paravirtualization. Xen can securely virtualize a machine without needing any particular hardware support. Xen works well on any post-P6 or any ia64 machine, without needing any virtualzation hardware support. When Xen runs a kernel in paravirtualized mode, it runs the kernel in an unprivileged processor state. The allows the hypervisor to vet all the guest kernel's privileged operations, which are carried out are either via hypercalls or by memory shared between each guest and Xen. By contrast, KVM relies on at least VT/SVM (and whatever the ia64 equiv is called) being available in the CPUs, and needs the most modern of hardware to get the best performance. Once important area of paravirtualization is that Xen guests directly use the processor's pagetables; there is no shadow pagetable or use of hardware pagetable nesting. This means that a tlb miss is just a tlb miss, and happens at full processor performance. This is possible because 1) pagetables are always read-only to the guest, and 2) the guest is responsible for looking up in a table to map guest-local pfns into machine-wide mfns before installing them in a pte. Xen will check that any new mapping or pagetable satisfies all the rules, by checking that the writable reference count is 0, and that the domain owns (or has been allowed access to) any mfn it tries to install in a pagetable. The other interesting part of paravirtualization is the abstraction of interrupts into event channels. Each domain has a bit-array of 1024 bits which correspond to 1024 possible event channels. An event channel can have one of several sources, such as a timer virtual interrupt, an inter-domain event, an inter-vcpu IPI, or mapped from a hardware interrupt. We end up mapping the event channels back to irqs and they are delivered as normal interrupts as far as the rest of the kernel is concerned. The net result is that a paravirtualized Xen guest runs a very close to full speed. Workloads which modify live pagetables a lot take a bit of a performance hit (since the pte updates have to trap to the hypervisor for validation), but in general this is not a huge deal. Hardware support for nested pagetables is only just beginning to get close to getting performance parity, but with different tradeoffs (pagetable updates are cheap, but tlb misses are much more expensive, and hits consume more tlb entries). Xen can also make full use of whatever hardware virtualization features are available when running an "hvm" domain. This is typically how you'd run Windows or other unmodified operating systems. All of this is stuff that's necessary to support any PV Xen domain, and has been in the kernel for a long time now. The additions I'm proposing now are those needed for a Xen domain to control the physical hardware, in order to provide virtual device support for other less-privileged domains. These changes affect a few areas: * interrupts: mapping a device interrupt into an event channel for delivery to the domain with the device driver for that interrupt * mappings: allowing direct hardware mapping of device memory into a domain * dma: making sure that hardware gets programmed with machine memory address, nor virtual ones, and that pages are machine-contiguous when expected Interrupts require a few hooks into the x86 APIC code, but the end result is that hardware interrupts are delivered via event channels, but then they're mapped back to irqs and delivered normally (they even end up with the same irq number as they'd usually have). Device mappings are fairly easy to arrange. I'm using a software pte bit, _PAGE_IOMAP, to indicate that a mapping is a device mapping. This bit is set by things like ioremap() and remap_pfn_range, and the Xen mmu code just uses the pfn in the pte as-is, rather than doing the normal pfn->mfn translation. DMA is handled via the normal DMA API, with some hooks to swiotlb to make sure that the memory underlying its pools is really DMA-ready (ie, is contiguous and low enough in machine memory). The changes I'm proposing may look a bit strange from a purely x86 perspective, but they fit in relatively well because they're not all that different from what other architectures require, and so the kernel-wide infrastructure is mostly already in place. I hope that helps clarify what I'm trying to do here, and why Xen and KVM do have distinct roles to play. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-01 23:27 ` Jeremy Fitzhardinge @ 2009-03-02 6:37 ` Nick Piggin 2009-03-02 8:05 ` Jeremy Fitzhardinge 2009-03-04 17:31 ` Anthony Liguori 2009-03-04 19:03 ` Anthony Liguori 1 sibling, 2 replies; 49+ messages in thread From: Nick Piggin @ 2009-03-02 6:37 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel On Monday 02 March 2009 10:27:29 Jeremy Fitzhardinge wrote: > Nick Piggin wrote: > > On Saturday 28 February 2009 17:52:24 Jeremy Fitzhardinge wrote: > >> Andrew Morton wrote: > >>> I hate to be the one to say it, but we should sit down and work out > >>> whether it is justifiable to merge any of this into Linux. I think > >>> it's still the case that the Xen technology is the "old" way and that > >>> the world is moving off in the "new" direction, KVM? > >> > >> I don't think that's a particularly useful way to look at it. They're > >> different approaches to the problem, and have different tradeoffs. > >> > >> The more important question is: are there real users for this stuff? > >> Does not merging it cause more net disadvantage than merging it? > >> Despite all the noise made about kvm in kernel circles, Xen has a large > >> and growing installed base. At the moment its all running on massive > >> out-of-tree patches, which doesn't make anyone happy. It's best that it > >> be in the mainline kernel. You know, like we argue for everything else. > > > > OTOH, there are good reasons not to duplicate functionality, and many > > many times throughout the kernel history competing solutions have been > > rejected even though the same arguments could be made about them. > > > > There have also been many times duplicate functionality has been merged, > > although that does often start with the intention of eliminating > > duplicate implementations and ends with pain. So I think Andrew's > > question is pretty important. > > Those would be pertinent questions if I were suddenly popping up and > saying "hey, let's add Xen support to the kernel!" But Xen support has > been in the kernel for well over a year now, and is widely used, enabled > in distros, etc. The patches I'm proposing here are not a whole new > thing, they're part of the last 10% to fill out the kernel's support to > make it actually useful. As a guest, I guess it has been agreed that guest support for all different hypervisors is "a good thing". dom0 is more like a piece of the hypervisor itself, right? > > The user issue aside -- that is a valid point -- you don't really touch > > on the technical issues. What tradeoffs, and where Xen does better > > than KVM would be interesting to know, can Xen tools and users ever be > > migrated to KVM or vice versa (I know very little about this myself, so > > I'm just an interested observer). > > OK, fair point, its probably time for another Xen architecture refresher > post. Thanks. > There are two big architectural differences between Xen and KVM: > > Firstly, Xen has a separate hypervisor who's primary role is to context > switch between the guest domains (virtual machines). The hypervisor is > relatively small and single purpose. It doesn't, for example, contain > any device drivers or even much knowledge of things like pci buses and > their structure. The domains themselves are more or less peers; some > are more privileged than others, but from Xen's perspective they are > more or less equivalent. The first domain, dom0, is special because its > started by Xen itself, and has some inherent initial privileges; its > main job is to start other domains, and it also typically provides > virtualized/multiplexed device services to other domains via a > frontend/backend split driver structure. > > KVM, on the other hand, builds all the hypervisor stuff into the kernel > itself, so you end up with a kernel which does all the normal kernel > stuff, and can run virtual machines by making them look like slightly > strange processes. > > Because Xen is dedicated to just running virtual machines, its internal > architecture can be more heavily oriented towards that task, which > affects things from how its scheduler works, its use and multiplexing of > physical memory. For example, Xen manages to use new hardware > virtualization features pretty quickly, partly because it doesn't need > to trade-off against normal kernel functions. The clear distinction > between the privileged hypervisor and the rest of the domains makes the > security people happy as well. Also, because Xen is small and fairly > self-contained, there's quite a few hardware vendors shipping it burned > into the firmware so that it really is the first thing to boot (many of > instant-on features that laptops have are based on Xen). Both HP and > Dell, at least, are selling servers with Xen pre-installed in the firmware. That would kind of seem like Xen has a better design to me, OTOH if it needs this dom0 for most device drivers and things, then how much difference is it really? Is KVM really disadvantaged by being a part of the kernel? > The second big difference is the use of paravirtualization. Xen can > securely virtualize a machine without needing any particular hardware > support. Xen works well on any post-P6 or any ia64 machine, without > needing any virtualzation hardware support. When Xen runs a kernel in > paravirtualized mode, it runs the kernel in an unprivileged processor > state. The allows the hypervisor to vet all the guest kernel's > privileged operations, which are carried out are either via hypercalls > or by memory shared between each guest and Xen. > > By contrast, KVM relies on at least VT/SVM (and whatever the ia64 equiv > is called) being available in the CPUs, and needs the most modern of > hardware to get the best performance. > > Once important area of paravirtualization is that Xen guests directly > use the processor's pagetables; there is no shadow pagetable or use of > hardware pagetable nesting. This means that a tlb miss is just a tlb > miss, and happens at full processor performance. This is possible > because 1) pagetables are always read-only to the guest, and 2) the > guest is responsible for looking up in a table to map guest-local pfns > into machine-wide mfns before installing them in a pte. Xen will check > that any new mapping or pagetable satisfies all the rules, by checking > that the writable reference count is 0, and that the domain owns (or has > been allowed access to) any mfn it tries to install in a pagetable. Xen's memory virtualization is pretty neat, I'll give it that. Is it faster than KVM on a modern CPU? Would it be possible I wonder to make a MMU virtualization layer for CPUs without support, using Xen's page table protection methods, and have KVM use that? Or does that amount to putting a significant amount of Xen hypervisor into the kernel..? > The other interesting part of paravirtualization is the abstraction of > interrupts into event channels. Each domain has a bit-array of 1024 > bits which correspond to 1024 possible event channels. An event channel > can have one of several sources, such as a timer virtual interrupt, an > inter-domain event, an inter-vcpu IPI, or mapped from a hardware > interrupt. We end up mapping the event channels back to irqs and they > are delivered as normal interrupts as far as the rest of the kernel is > concerned. > > The net result is that a paravirtualized Xen guest runs a very close to > full speed. Workloads which modify live pagetables a lot take a bit of > a performance hit (since the pte updates have to trap to the hypervisor > for validation), but in general this is not a huge deal. Hardware > support for nested pagetables is only just beginning to get close to > getting performance parity, but with different tradeoffs (pagetable > updates are cheap, but tlb misses are much more expensive, and hits > consume more tlb entries). > > Xen can also make full use of whatever hardware virtualization features > are available when running an "hvm" domain. This is typically how you'd > run Windows or other unmodified operating systems. > > All of this is stuff that's necessary to support any PV Xen domain, and > has been in the kernel for a long time now. > > > The additions I'm proposing now are those needed for a Xen domain to > control the physical hardware, in order to provide virtual device > support for other less-privileged domains. These changes affect a few > areas: > > * interrupts: mapping a device interrupt into an event channel for > delivery to the domain with the device driver for that interrupt > * mappings: allowing direct hardware mapping of device memory into a > domain > * dma: making sure that hardware gets programmed with machine memory > address, nor virtual ones, and that pages are machine-contiguous > when expected > > Interrupts require a few hooks into the x86 APIC code, but the end > result is that hardware interrupts are delivered via event channels, but > then they're mapped back to irqs and delivered normally (they even end > up with the same irq number as they'd usually have). > > Device mappings are fairly easy to arrange. I'm using a software pte > bit, _PAGE_IOMAP, to indicate that a mapping is a device mapping. This > bit is set by things like ioremap() and remap_pfn_range, and the Xen mmu > code just uses the pfn in the pte as-is, rather than doing the normal > pfn->mfn translation. > > DMA is handled via the normal DMA API, with some hooks to swiotlb to > make sure that the memory underlying its pools is really DMA-ready (ie, > is contiguous and low enough in machine memory). > > The changes I'm proposing may look a bit strange from a purely x86 > perspective, but they fit in relatively well because they're not all > that different from what other architectures require, and so the > kernel-wide infrastructure is mostly already in place. > > > I hope that helps clarify what I'm trying to do here, and why Xen and > KVM do have distinct roles to play. Thanks, it's very informative to me and hopefully helps others with the discussion (I don't pretend to be able to judge whether your dom0 patches should be merged or not! :)). I'll continue to read with interest. Thanks, Nick ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 6:37 ` Nick Piggin @ 2009-03-02 8:05 ` Jeremy Fitzhardinge 2009-03-02 8:19 ` Nick Piggin 2009-03-04 17:34 ` Anthony Liguori 2009-03-04 17:31 ` Anthony Liguori 1 sibling, 2 replies; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-02 8:05 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Nick Piggin wrote: >> Those would be pertinent questions if I were suddenly popping up and >> saying "hey, let's add Xen support to the kernel!" But Xen support has >> been in the kernel for well over a year now, and is widely used, enabled >> in distros, etc. The patches I'm proposing here are not a whole new >> thing, they're part of the last 10% to fill out the kernel's support to >> make it actually useful. >> > > As a guest, I guess it has been agreed that guest support for all > different hypervisors is "a good thing". dom0 is more like a piece > of the hypervisor itself, right? > Hm, I wouldn't put it like that. dom0 is no more part of the hypervisor than the hypervisor is part of dom0. The hypervisor provides one set of services (domain isolation and multiplexing). Domains with direct hardware access and drivers provide arbitration for virtualized device access. They provide orthogonal sets of functionality which are both required to get a working system. Also, the machinery needed to allow a kernel to operate as dom0 is more than that: it allows direct access to hardware in general. An otherwise unprivileged domU can be given access to a specific PCI device via PCI-passthrough so that it can drive it directly. This is often used for direct access to 3D hardware, or high-performance networking (esp with multi-context hardware that's designed for virtualization use). >> Because Xen is dedicated to just running virtual machines, its internal >> architecture can be more heavily oriented towards that task, which >> affects things from how its scheduler works, its use and multiplexing of >> physical memory. For example, Xen manages to use new hardware >> virtualization features pretty quickly, partly because it doesn't need >> to trade-off against normal kernel functions. The clear distinction >> between the privileged hypervisor and the rest of the domains makes the >> security people happy as well. Also, because Xen is small and fairly >> self-contained, there's quite a few hardware vendors shipping it burned >> into the firmware so that it really is the first thing to boot (many of >> instant-on features that laptops have are based on Xen). Both HP and >> Dell, at least, are selling servers with Xen pre-installed in the firmware. >> > > That would kind of seem like Xen has a better design to me, OTOH if it > needs this dom0 for most device drivers and things, then how much > difference is it really? Is KVM really disadvantaged by being a part of > the kernel? > Well, you can lump everything together in dom0 if you want, and that is a common way to run a Xen system. But there's no reason you can't disaggregate drivers into their own domains, each with the responsibility for a particular device or set of devices (or indeed, any other service you want provided). Xen can use hardware features like VT-d to really enforce the partitioning so that the domains can't program their hardware to touch anything except what they're allowed to touch, so nothing is trusted beyond its actual area of responsibility. It also means that killing off and restarting a driver domain is a fairly lightweight and straightforward operation because the state is isolated and self-contained; guests using a device have to be able to deal with a disconnect/reconnect anyway (for migration), so it doesn't affect them much. Part of the reason there's a lot of academic interest in Xen is because it has the architectural flexibility to try out lots of different configurations. I wouldn't say that KVM is necessarily disadvantaged by its design; its just a particular set of tradeoffs made up-front. It loses Xen's flexibility, but the result is very familiar to Linux people. A guest domain just looks like a qemu process that happens to run in a strange processor mode a lot of the time. The qemu process provides virtual device access to its domain, and accesses the normal device drivers like any other usermode process would. The domains are as isolated from each other as much as processes normally are, but they're all floating around in the same kernel; whether that provides enough isolation for whatever technical, billing, security, compliance/regulatory or other requirements you have is up to the user to judge. >> Once important area of paravirtualization is that Xen guests directly >> use the processor's pagetables; there is no shadow pagetable or use of >> hardware pagetable nesting. This means that a tlb miss is just a tlb >> miss, and happens at full processor performance. This is possible >> because 1) pagetables are always read-only to the guest, and 2) the >> guest is responsible for looking up in a table to map guest-local pfns >> into machine-wide mfns before installing them in a pte. Xen will check >> that any new mapping or pagetable satisfies all the rules, by checking >> that the writable reference count is 0, and that the domain owns (or has >> been allowed access to) any mfn it tries to install in a pagetable. >> > > Xen's memory virtualization is pretty neat, I'll give it that. Is it > faster than KVM on a modern CPU? It really depends on the workload. There's three cases to consider: software shadow pagetables, hardware nested pagetables, and Xen direct pagetables. Even now, Xen's (highly optimised) shadow pagetable code generally out-performs modern nested pagetables, at least when running Windows (for which that code was most heavily tuned). Shadow pagetables and nested pagetables will generally outperform direct pagetables when the workload does lots of pagetable updates compared to accesses. (I don't know what the current state of kvm's shadow pagetable performance is, but it seems OK.) But if you're mostly accessing the pagetable, direct pagetables still win. On a tlb miss, it gets 4 memory accesses, whereas a nested pagetable tlb miss needs 24 memory accesses; and a nested tlb hit means that you have 24 tlb entries being tied up to service the hit, vs 4. (Though the chip vendors are fairly secretive about exactly how they structure their tlbs to deal with nested lookups, so I may be off here.) (It also depends on whether you arrange to put the guest, host or both memory into large pages; doing so helps a lot.) > Would it be possible I wonder to make > a MMU virtualization layer for CPUs without support, using Xen's page > table protection methods, and have KVM use that? Or does that amount > to putting a significant amount of Xen hypervisor into the kernel..? > At one point Avi was considering doing it, but I don't think he ever made any real effort in that direction. KVM is pretty wedded to having hardware support anyway, so there's not much point in removing it in this one area. The Xen technique gets its performance from collapsing a level of indirection, but that has a cost in terms of flexibility; the hypervisor can't do as much mucking around behind the guest's back (for example, the guest sees real hardware memory addresses in the form of mfns, so Xen can't move pages around, at least not without some form of explicit synchronisation). J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 8:05 ` Jeremy Fitzhardinge @ 2009-03-02 8:19 ` Nick Piggin 2009-03-02 9:05 ` Jeremy Fitzhardinge 2009-03-04 17:34 ` Anthony Liguori 1 sibling, 1 reply; 49+ messages in thread From: Nick Piggin @ 2009-03-02 8:19 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel On Monday 02 March 2009 19:05:10 Jeremy Fitzhardinge wrote: > Nick Piggin wrote: > > That would kind of seem like Xen has a better design to me, OTOH if it > > needs this dom0 for most device drivers and things, then how much > > difference is it really? Is KVM really disadvantaged by being a part of > > the kernel? > > Well, you can lump everything together in dom0 if you want, and that is > a common way to run a Xen system. But there's no reason you can't > disaggregate drivers into their own domains, each with the > responsibility for a particular device or set of devices (or indeed, any > other service you want provided). Xen can use hardware features like > VT-d to really enforce the partitioning so that the domains can't > program their hardware to touch anything except what they're allowed to > touch, so nothing is trusted beyond its actual area of responsibility. > It also means that killing off and restarting a driver domain is a > fairly lightweight and straightforward operation because the state is > isolated and self-contained; guests using a device have to be able to > deal with a disconnect/reconnect anyway (for migration), so it doesn't > affect them much. Part of the reason there's a lot of academic interest > in Xen is because it has the architectural flexibility to try out lots > of different configurations. > > I wouldn't say that KVM is necessarily disadvantaged by its design; its > just a particular set of tradeoffs made up-front. It loses Xen's > flexibility, but the result is very familiar to Linux people. A guest > domain just looks like a qemu process that happens to run in a strange > processor mode a lot of the time. The qemu process provides virtual > device access to its domain, and accesses the normal device drivers like > any other usermode process would. The domains are as isolated from each > other as much as processes normally are, but they're all floating around > in the same kernel; whether that provides enough isolation for whatever > technical, billing, security, compliance/regulatory or other > requirements you have is up to the user to judge. Well what is the advantage of KVM? Just that it is integrated into the kernel? Can we look at the argument the other way around and ask why Xen can't replace KVM? (is it possible to make use of HW memory virtualization in Xen?) The hypervisor is GPL, right? > > Would it be possible I wonder to make > > a MMU virtualization layer for CPUs without support, using Xen's page > > table protection methods, and have KVM use that? Or does that amount > > to putting a significant amount of Xen hypervisor into the kernel..? > > At one point Avi was considering doing it, but I don't think he ever > made any real effort in that direction. KVM is pretty wedded to having > hardware support anyway, so there's not much point in removing it in > this one area. Not removing it, but making it available as an alternative form of "hardware supported" MMU virtualization. As you say if direct protected page tables often are faster than existing HW solutoins anyway, then it could be a win for KVM even on newer CPUs. > The Xen technique gets its performance from collapsing a level of > indirection, but that has a cost in terms of flexibility; the hypervisor > can't do as much mucking around behind the guest's back (for example, > the guest sees real hardware memory addresses in the form of mfns, so > Xen can't move pages around, at least not without some form of explicit > synchronisation). Any problem can be solved by adding another level of indirection... :) ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 8:19 ` Nick Piggin @ 2009-03-02 9:05 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-02 9:05 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Nick Piggin wrote: >> I wouldn't say that KVM is necessarily disadvantaged by its design; its >> just a particular set of tradeoffs made up-front. It loses Xen's >> flexibility, but the result is very familiar to Linux people. A guest >> domain just looks like a qemu process that happens to run in a strange >> processor mode a lot of the time. The qemu process provides virtual >> device access to its domain, and accesses the normal device drivers like >> any other usermode process would. The domains are as isolated from each >> other as much as processes normally are, but they're all floating around >> in the same kernel; whether that provides enough isolation for whatever >> technical, billing, security, compliance/regulatory or other >> requirements you have is up to the user to judge. >> > > Well what is the advantage of KVM? Just that it is integrated into > the kernel? Can we look at the argument the other way around and > ask why Xen can't replace KVM? Xen was around before KVM was even a twinkle, so KVM is redundant from that perspective; they're certainly broadly equivalent in functionality. But Xen has had a fairly fraught history with respect to being merged into the kernel, and being merged gets your feet into a lot of doors. The upshot is that using Xen has generally required some preparation - like installing special kernels - before you can use it, and so tends to get used for servers which are specifically intended to be virtualized. KVM runs like an accelerated qemu, so it easy to just fire up an instance of windows in the middle of a normal Linux desktop session, with no special preparation. But Xen is getting better at being on laptops and desktops, and doing all the things people expect there (power management, suspend/resume, etc). And people are definitely interested in using KVM in server environments, so the lines are not very clear any more. (Of course, we're completely forgetting VMI in all this, but VMware seem to have as well. And we're all waiting for Rusty to make his World Domination move.) > (is it possible to make use of HW > memory virtualization in Xen?) Yes, Xen will use all available hardware features when running hvm domains (== fully virtualized == Windows). > The hypervisor is GPL, right? > Yep. >>> Would it be possible I wonder to make >>> a MMU virtualization layer for CPUs without support, using Xen's page >>> table protection methods, and have KVM use that? Or does that amount >>> to putting a significant amount of Xen hypervisor into the kernel..? >>> >> At one point Avi was considering doing it, but I don't think he ever >> made any real effort in that direction. KVM is pretty wedded to having >> hardware support anyway, so there's not much point in removing it in >> this one area. >> > > Not removing it, but making it available as an alternative form of > "hardware supported" MMU virtualization. As you say if direct protected > page tables often are faster than existing HW solutoins anyway, then it > could be a win for KVM even on newer CPUs. > Well, yes. I'm sure it will make someone a nice little project. It should be fairly easy to try out - all the hooks are in place, so its just a matter of implementing the kvm bits. But it probably wouldn't be a comfortable fit with the rest of Linux; all the memory mapped via direct pagetables would be solidly pinned down, completely unswappable, giving the VM subsystem much less flexibility about allocating resources. I guess it would be no worse than a multi-hundred megabyte/gigabyte process mlocking itself down, but I don't know if anyone actually does that. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 8:05 ` Jeremy Fitzhardinge 2009-03-02 8:19 ` Nick Piggin @ 2009-03-04 17:34 ` Anthony Liguori 2009-03-04 17:38 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 49+ messages in thread From: Anthony Liguori @ 2009-03-04 17:34 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Nick Piggin, Xen-devel, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin Jeremy Fitzhardinge wrote: > Nick Piggin wrote: > It really depends on the workload. There's three cases to consider: > software shadow pagetables, hardware nested pagetables, and Xen direct > pagetables. Even now, Xen's (highly optimised) shadow pagetable code > generally out-performs modern nested pagetables, at least when running > Windows (for which that code was most heavily tuned). Can you point to benchmarks? I have a hard time believing this. How can shadow paging beat nested paging assuming the presence of large pages? Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-04 17:34 ` Anthony Liguori @ 2009-03-04 17:38 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-04 17:38 UTC (permalink / raw) To: Anthony Liguori Cc: Nick Piggin, Xen-devel, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin Anthony Liguori wrote: > Jeremy Fitzhardinge wrote: >> Nick Piggin wrote: > >> It really depends on the workload. There's three cases to consider: >> software shadow pagetables, hardware nested pagetables, and Xen >> direct pagetables. Even now, Xen's (highly optimised) shadow >> pagetable code generally out-performs modern nested pagetables, at >> least when running Windows (for which that code was most heavily tuned). > > Can you point to benchmarks? I have a hard time believing this. Erm, not that I know of off-hand. I don't really have any interest in Windows performance, so I'm reduced to repeating (highly reliable) Xen Summit corridor chat. > How can shadow paging beat nested paging assuming the presence of > large pages? I think large pages do turn the tables, and its close to parity with shadow with 4k pages on recent cpus. But see above for reliability on that info. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 6:37 ` Nick Piggin 2009-03-02 8:05 ` Jeremy Fitzhardinge @ 2009-03-04 17:31 ` Anthony Liguori 1 sibling, 0 replies; 49+ messages in thread From: Anthony Liguori @ 2009-03-04 17:31 UTC (permalink / raw) To: Nick Piggin Cc: Jeremy Fitzhardinge, Xen-devel, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin Nick Piggin wrote: > On Monday 02 March 2009 10:27:29 Jeremy Fitzhardinge wrote: >> Once important area of paravirtualization is that Xen guests directly >> use the processor's pagetables; there is no shadow pagetable or use of >> hardware pagetable nesting. This means that a tlb miss is just a tlb >> miss, and happens at full processor performance. This is possible >> because 1) pagetables are always read-only to the guest, and 2) the >> guest is responsible for looking up in a table to map guest-local pfns >> into machine-wide mfns before installing them in a pte. Xen will check >> that any new mapping or pagetable satisfies all the rules, by checking >> that the writable reference count is 0, and that the domain owns (or has >> been allowed access to) any mfn it tries to install in a pagetable. > > Xen's memory virtualization is pretty neat, I'll give it that. Is it > faster than KVM on a modern CPU? There is nothing architecturally that prevents KVM from making use of Direct Paging. KVM doesn't use Direct Paging because we don't expect it will not be worth it. Modern CPUs (Barcelona and Nehalem class) include hardware support for MMU virtualization (via NPT and EPT respectively). I think that for the most part (especially with large page backed guests), there's wide agreement that even within the context of Xen, NPT/EPT often beats PV performance. TLB miss overhead increases due to additional memory accesses but this is largely mitigated by large pages (see Ben Serebin's SOSP paper from a couple years ago). > Would it be possible I wonder to make > a MMU virtualization layer for CPUs without support, using Xen's page > table protection methods, and have KVM use that? Or does that amount > to putting a significant amount of Xen hypervisor into the kernel..? There are various benchmarks out there (check KVM Forum and Xen Summit presentations) showing NPT/EPT beating Direct Paging but FWIW the direct paging could be implemented in KVM. A really unfortunate aspect of direct paging is that it requires the guest to know the host physical addresses. This requires the guest to cooperate when doing any fancy memory tricks (live migration, save/restore, swapping, page sharing, etc.). This introduces guest code paths to ensure that things like live migration works which is extremely undesirable. FWIW, I'm not advocating not taking the Xen dom0 patches. Just pointing out that direct paging is orthogonal to the architectural differences between Xen and KVM. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-01 23:27 ` Jeremy Fitzhardinge 2009-03-02 6:37 ` Nick Piggin @ 2009-03-04 19:03 ` Anthony Liguori 2009-03-04 19:16 ` H. Peter Anvin 1 sibling, 1 reply; 49+ messages in thread From: Anthony Liguori @ 2009-03-04 19:03 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Nick Piggin, Xen-devel, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin Jeremy Fitzhardinge wrote: > OK, fair point, its probably time for another Xen architecture refresher > post. > > There are two big architectural differences between Xen and KVM: > > Firstly, Xen has a separate hypervisor who's primary role is to context > switch between the guest domains (virtual machines). The hypervisor is > relatively small and single purpose. It doesn't, for example, contain > any device drivers or even much knowledge of things like pci buses and > their structure. The domains themselves are more or less peers; some > are more privileged than others, but from Xen's perspective they are > more or less equivalent. The first domain, dom0, is special because its > started by Xen itself, and has some inherent initial privileges; its > main job is to start other domains, and it also typically provides > virtualized/multiplexed device services to other domains via a > frontend/backend split driver structure. > > KVM, on the other hand, builds all the hypervisor stuff into the kernel > itself, so you end up with a kernel which does all the normal kernel > stuff, and can run virtual machines by making them look like slightly > strange processes. > > Because Xen is dedicated to just running virtual machines, its internal > architecture can be more heavily oriented towards that task, which > affects things from how its scheduler works, its use and multiplexing of > physical memory. For example, Xen manages to use new hardware > virtualization features pretty quickly, partly because it doesn't need > to trade-off against normal kernel functions. The clear distinction > between the privileged hypervisor and the rest of the domains makes the > security people happy as well. Also, because Xen is small and fairly > self-contained, there's quite a few hardware vendors shipping it burned > into the firmware so that it really is the first thing to boot (many of > instant-on features that laptops have are based on Xen). Both HP and > Dell, at least, are selling servers with Xen pre-installed in the firmware. I think this is a bit misleading. I think you can understand the true differences between Xen and KVM by s/hypervisor/Operating System/. Fundamentally, a hypervisor is just an operating system that provides a hardware-like interface to it's processes. Today, the Xen operating system does not have that many features so it requires a special process (domain-0) to drive hardware. It uses Linux for this and it happens that the Linux domain-0 has full access to all system resources so there is absolutely no isolation between Xen and domain-0. The domain-0 guest is like a Linux userspace process with access to an old-style /dev/mem. You can argue that in theory, one could build a small, decoupled domain-0, but you could also do this, in theory, with Linux and KVM. It is not necessary to have all of your device drivers in your Linux kernel. You could build an initramfs that passed all PCI devices through (via VT-d) to a single guest, and then provided and interface to allow that guest to create more guests. This is essentially what dom0 support is. The real difference between KVM and Xen is that Xen is a separate Operating System dedicated to virtualization. In many ways, it's a fork of Linux since it uses quite a lot of Linux code. The argument for Xen as a separate OS is no different than the argument for a dedicated Real Time Operating System, a dedicated OS for embedded systems, or a dedicated OS for a very large system. Having the distros ship Xen was a really odd thing from a Linux perspective. It's as if Red Hat started shipping VXworks with a Linux emulation layer as Real Time Linux. The arguments for dedicated OSes are well-known. You can do a better scheduler for embedded/real-time/large systems. You can do a better memory allocate for embedded/real-time/large systems. These are the arguments that are made for Xen. In theory, Xen, the hypervisor, could be merged with upstream Linux but there is certainly no parties interested in that currently. My point is not to rail on Xen, but to point out that there isn't really a choice to be made here from a Linux perspective. It's like saying do we really need FreeBSD and Linux, maybe those FreeBSD guys should just merge with Linux. It's not going to happen. KVM turns Linux into a hypervisor by adding virtualization support. Xen is a separate hypervisor. So the real discussion shouldn't be should KVM and Xen converge because it really doesn't make sense. It's whether it makes sense for upstream Linux to support being a domain-0 guest under the Xen hypervisor. Regards, Anthony Liguori > > The second big difference is the use of paravirtualization. Xen can > securely virtualize a machine without needing any particular hardware > support. Xen works well on any post-P6 or any ia64 machine, without > needing any virtualzation hardware support. When Xen runs a kernel in > paravirtualized mode, it runs the kernel in an unprivileged processor > state. The allows the hypervisor to vet all the guest kernel's > privileged operations, which are carried out are either via hypercalls > or by memory shared between each guest and Xen. > > By contrast, KVM relies on at least VT/SVM (and whatever the ia64 equiv > is called) being available in the CPUs, and needs the most modern of > hardware to get the best performance. > > Once important area of paravirtualization is that Xen guests directly > use the processor's pagetables; there is no shadow pagetable or use of > hardware pagetable nesting. This means that a tlb miss is just a tlb > miss, and happens at full processor performance. This is possible > because 1) pagetables are always read-only to the guest, and 2) the > guest is responsible for looking up in a table to map guest-local pfns > into machine-wide mfns before installing them in a pte. Xen will check > that any new mapping or pagetable satisfies all the rules, by checking > that the writable reference count is 0, and that the domain owns (or has > been allowed access to) any mfn it tries to install in a pagetable. > > The other interesting part of paravirtualization is the abstraction of > interrupts into event channels. Each domain has a bit-array of 1024 > bits which correspond to 1024 possible event channels. An event channel > can have one of several sources, such as a timer virtual interrupt, an > inter-domain event, an inter-vcpu IPI, or mapped from a hardware > interrupt. We end up mapping the event channels back to irqs and they > are delivered as normal interrupts as far as the rest of the kernel is > concerned. > > The net result is that a paravirtualized Xen guest runs a very close to > full speed. Workloads which modify live pagetables a lot take a bit of > a performance hit (since the pte updates have to trap to the hypervisor > for validation), but in general this is not a huge deal. Hardware > support for nested pagetables is only just beginning to get close to > getting performance parity, but with different tradeoffs (pagetable > updates are cheap, but tlb misses are much more expensive, and hits > consume more tlb entries). > > Xen can also make full use of whatever hardware virtualization features > are available when running an "hvm" domain. This is typically how you'd > run Windows or other unmodified operating systems. > > All of this is stuff that's necessary to support any PV Xen domain, and > has been in the kernel for a long time now. > > > The additions I'm proposing now are those needed for a Xen domain to > control the physical hardware, in order to provide virtual device > support for other less-privileged domains. These changes affect a few > areas: > > * interrupts: mapping a device interrupt into an event channel for > delivery to the domain with the device driver for that interrupt > * mappings: allowing direct hardware mapping of device memory into a > domain > * dma: making sure that hardware gets programmed with machine memory > address, nor virtual ones, and that pages are machine-contiguous > when expected > > Interrupts require a few hooks into the x86 APIC code, but the end > result is that hardware interrupts are delivered via event channels, but > then they're mapped back to irqs and delivered normally (they even end > up with the same irq number as they'd usually have). > > Device mappings are fairly easy to arrange. I'm using a software pte > bit, _PAGE_IOMAP, to indicate that a mapping is a device mapping. This > bit is set by things like ioremap() and remap_pfn_range, and the Xen mmu > code just uses the pfn in the pte as-is, rather than doing the normal > pfn->mfn translation. > > DMA is handled via the normal DMA API, with some hooks to swiotlb to > make sure that the memory underlying its pools is really DMA-ready (ie, > is contiguous and low enough in machine memory). > > The changes I'm proposing may look a bit strange from a purely x86 > perspective, but they fit in relatively well because they're not all > that different from what other architectures require, and so the > kernel-wide infrastructure is mostly already in place. > > > I hope that helps clarify what I'm trying to do here, and why Xen and > KVM do have distinct roles to play. > > J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-04 19:03 ` Anthony Liguori @ 2009-03-04 19:16 ` H. Peter Anvin 2009-03-04 19:33 ` Anthony Liguori 0 siblings, 1 reply; 49+ messages in thread From: H. Peter Anvin @ 2009-03-04 19:16 UTC (permalink / raw) To: Anthony Liguori Cc: Jeremy Fitzhardinge, Nick Piggin, Xen-devel, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List Anthony Liguori wrote: > > I think this is a bit misleading. I think you can understand the true > differences between Xen and KVM by s/hypervisor/Operating System/. > Fundamentally, a hypervisor is just an operating system that provides a > hardware-like interface to it's processes. > [...] > > The real difference between KVM and Xen is that Xen is a separate > Operating System dedicated to virtualization. In many ways, it's a fork > of Linux since it uses quite a lot of Linux code. > > The argument for Xen as a separate OS is no different than the argument > for a dedicated Real Time Operating System, a dedicated OS for embedded > systems, or a dedicated OS for a very large system. > In particular, Xen is a microkernel-type operating system. The dom0 model is a classic single-server, in the style of Mach. A lot of the "Xen could use a distributed dom0" arguments were also done with Mach ("the real goal is a multi-server") but such a system never materialized (Hurd was supposed to be one.) Building multiservers is *hard*, and building multiservers which don't suck is even harder. -hpa ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-04 19:16 ` H. Peter Anvin @ 2009-03-04 19:33 ` Anthony Liguori 0 siblings, 0 replies; 49+ messages in thread From: Anthony Liguori @ 2009-03-04 19:33 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Nick Piggin, Xen-devel, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List H. Peter Anvin wrote: > In particular, Xen is a microkernel-type operating system. The dom0 > model is a classic single-server, in the style of Mach. A lot of the > "Xen could use a distributed dom0" arguments were also done with Mach > ("the real goal is a multi-server") but such a system never > materialized (Hurd was supposed to be one.) Building multiservers is > *hard*, and building multiservers which don't suck is even harder. A lot of the core Xen concepts (domains, event channels, etc.) were present in the Nemesis[1] exo-kernel project. Two other interest papers on the subject "Are virtual machine monitors microkernels done right?"[2] from the Xen folks and a rebuttal from the l4ka group[3]. [1] http://www.cl.cam.ac.uk/research/srg/netos/old-projects/nemesis/ [2] http://portal.acm.org/citation.cfm?id=1251124 [3] http://l4ka.org/publications/paper.php?docid=2189 Regards, Anthony Liguori > -hpa ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 6:52 ` Jeremy Fitzhardinge 2009-02-28 7:20 ` Ingo Molnar 2009-02-28 12:09 ` Nick Piggin @ 2009-02-28 16:14 ` Andi Kleen 2009-03-01 23:34 ` Jeremy Fitzhardinge 2 siblings, 1 reply; 49+ messages in thread From: Andi Kleen @ 2009-02-28 16:14 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andrew Morton, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin Jeremy Fitzhardinge <jeremy@goop.org> writes: > Andrew Morton wrote: >> On Fri, 27 Feb 2009 17:59:06 -0800 Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> >> >>> This series implements the core parts of Xen dom0 support; that is, just >>> enough to get the kernel started when booted by Xen as a dom0 kernel. >>> >> >> And what other patches can we expect to see to complete the xen dom0 >> support? >> > > There's a bit of a gradient. There's probably another 2-3 similarly > sized series to get everything so that you can boot dom0 out of the > box (core, apic, swiotlb/agp/drm, backend drivers, tools). And then a > scattering of smaller things which may or may not be upstreamable. > The vast majority of it is Xen-specific code, rather than changes to > core kernel. I would say the more interesting question is less how much additional code it is or even how much it changes the main kernel, but more how different the code execution paths in interaction with Xen are compared to what a native kernel would do. Because such differences always would need to be considered in future changes. For example things like: doesn't use PAT with Xen or apparently very different routing are somewhat worrying because it means it's a completely different operation modus with Xen that needs to be taken care of later, adding to complexity. Unfortunately it also looks like that Xen the HV does things more and more different from what mainline kernel does so these differences will likely continue to grow over time. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 16:14 ` Andi Kleen @ 2009-03-01 23:34 ` Jeremy Fitzhardinge 2009-03-01 23:52 ` H. Peter Anvin 2009-03-02 0:10 ` Andi Kleen 0 siblings, 2 replies; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-01 23:34 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin Andi Kleen wrote: > I would say the more interesting question is less how much additional > code it is or even how much it changes the main kernel, but more how > different the code execution paths in interaction with Xen are > compared to what a native kernel would do. Because such differences > always would need to be considered in future changes. > Yes. A big part of what I'm doing is trying to keep the Xen changes self-contained to try and minimize their system-wide impact. Basically it comes down to that if you use (mostly existing) kernel APIs in the way they're intended to be used, then things just work out for both Xen and native cases. The whole point of keeping the kernel modular is so that if people implement and use the the interfaces correctly, the internal details shouldn't matter very much. Often the process of adding Xen support has resulted in putting clear, well defined interfaces into parts of the kernel where previously things were, well, in need of cleaning up. > For example things like: doesn't use PAT with Xen or apparently very > different routing are somewhat worrying because it means it's a > completely different operation modus with Xen that needs to be taken > care of later, adding to complexity. > Unless we're planning on dropping support for processes with no or broken PAT support, we're always going to have to deal with the non-PAT case. Xen just falls into the "processor with no PAT" case. And if/when we work out how to paravirtualize PAT, it will no longer be in that case. > Unfortunately it also looks like that Xen the HV does things > more and more different from what mainline kernel does so > these differences will likely continue to grow over time. I hope that won't be the case. As part of considering any change to Xen is considering what changes would be needed to the guest operating systems to make use of that feature. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-01 23:34 ` Jeremy Fitzhardinge @ 2009-03-01 23:52 ` H. Peter Anvin 2009-03-02 0:08 ` Jeremy Fitzhardinge 2009-03-02 0:10 ` Andi Kleen 1 sibling, 1 reply; 49+ messages in thread From: H. Peter Anvin @ 2009-03-01 23:52 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andi Kleen, Andrew Morton, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List Jeremy Fitzhardinge wrote: > > Unless we're planning on dropping support for processes with no or > broken PAT support, we're always going to have to deal with the non-PAT > case. Xen just falls into the "processor with no PAT" case. And > if/when we work out how to paravirtualize PAT, it will no longer be in > that case. > In this particular case, this is actually false. "No PAT" in the processor is *not* the same thing as "no cacheability controls in the page tables". Every processor since the 386 has had UC, WT, and WB controls in the page tables; PAT only added the ability to do WC (and WP, which we don't use). Since the number of processors which can do WC at all but don't have PAT is a small set of increasingly obsolete processors, we may very well choose to simply ignore the WC capabilities of these particular processors. -hpa ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-01 23:52 ` H. Peter Anvin @ 2009-03-02 0:08 ` Jeremy Fitzhardinge 2009-03-02 0:14 ` H. Peter Anvin 0 siblings, 1 reply; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-02 0:08 UTC (permalink / raw) To: H. Peter Anvin Cc: Andi Kleen, Andrew Morton, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List H. Peter Anvin wrote: > In this particular case, this is actually false. "No PAT" in the > processor is *not* the same thing as "no cacheability controls in the > page tables". Every processor since the 386 has had UC, WT, and WB > controls in the page tables; PAT only added the ability to do WC (and > WP, which we don't use). Since the number of processors which can do > WC at all but don't have PAT is a small set of increasingly obsolete > processors, we may very well choose to simply ignore the WC > capabilities of these particular processors. I'm not quite sure what you're referring to with "this is actually false". Certainly we support cachability control in ptes under Xen. We just don't support full PAT because Xen uses PAT for itself. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 0:08 ` Jeremy Fitzhardinge @ 2009-03-02 0:14 ` H. Peter Anvin 2009-03-02 0:42 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 49+ messages in thread From: H. Peter Anvin @ 2009-03-02 0:14 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andi Kleen, Andrew Morton, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List Jeremy Fitzhardinge wrote: > H. Peter Anvin wrote: >> In this particular case, this is actually false. "No PAT" in the >> processor is *not* the same thing as "no cacheability controls in the >> page tables". Every processor since the 386 has had UC, WT, and WB >> controls in the page tables; PAT only added the ability to do WC (and >> WP, which we don't use). Since the number of processors which can do >> WC at all but don't have PAT is a small set of increasingly obsolete >> processors, we may very well choose to simply ignore the WC >> capabilities of these particular processors. > > I'm not quite sure what you're referring to with "this is actually > false". Certainly we support cachability control in ptes under Xen. We > just don't support full PAT because Xen uses PAT for itself. > What do you define as "full PAT"? If what you mean is that Xen lays claims to the PAT MSR and only allows a certain mapping that's hardly a problem... other than that it's not an exhaustible resource so I guess I really don't understand what you're trying to say here. -hpa ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 0:14 ` H. Peter Anvin @ 2009-03-02 0:42 ` Jeremy Fitzhardinge 2009-03-02 0:46 ` H. Peter Anvin 0 siblings, 1 reply; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-02 0:42 UTC (permalink / raw) To: H. Peter Anvin Cc: Andi Kleen, Andrew Morton, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List H. Peter Anvin wrote: > Jeremy Fitzhardinge wrote: >> H. Peter Anvin wrote: >>> In this particular case, this is actually false. "No PAT" in the >>> processor is *not* the same thing as "no cacheability controls in >>> the page tables". Every processor since the 386 has had UC, WT, and >>> WB controls in the page tables; PAT only added the ability to do WC >>> (and WP, which we don't use). Since the number of processors which >>> can do WC at all but don't have PAT is a small set of increasingly >>> obsolete processors, we may very well choose to simply ignore the WC >>> capabilities of these particular processors. >> >> I'm not quite sure what you're referring to with "this is actually >> false". Certainly we support cachability control in ptes under Xen. >> We just don't support full PAT because Xen uses PAT for itself. >> > > What do you define as "full PAT"? If what you mean is that Xen lays > claims to the PAT MSR and only allows a certain mapping that's hardly > a problem... other than that it's not an exhaustible resource so I > guess I really don't understand what you're trying to say here. It does not allow guests to set their own PAT MSRs. It can't easily be multiplexed either, as all CPUs must have the same settings for their PAT MSRs. I guess it could be handled by allowing domains to set their own virtual PAT MSRs, and then rewriting the ptes to convert from the guest PAT settings to Xen's, but I don't know if this is possible in general (and it poses some problems because the pte modifications would be guest-visible). J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 0:42 ` Jeremy Fitzhardinge @ 2009-03-02 0:46 ` H. Peter Anvin 0 siblings, 0 replies; 49+ messages in thread From: H. Peter Anvin @ 2009-03-02 0:46 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andi Kleen, Andrew Morton, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List Jeremy Fitzhardinge wrote: >> >> What do you define as "full PAT"? If what you mean is that Xen lays >> claims to the PAT MSR and only allows a certain mapping that's hardly >> a problem... other than that it's not an exhaustible resource so I >> guess I really don't understand what you're trying to say here. > > It does not allow guests to set their own PAT MSRs. It can't easily be > multiplexed either, as all CPUs must have the same settings for their > PAT MSRs. I guess it could be handled by allowing domains to set their > own virtual PAT MSRs, and then rewriting the ptes to convert from the > guest PAT settings to Xen's, but I don't know if this is possible in > general (and it poses some problems because the pte modifications would > be guest-visible). > It would make a lot more sense to simply specify a particular set of mappings. Since the only one anyone cares about that isn't in the default set is WC anyway, it's easy to do. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-01 23:34 ` Jeremy Fitzhardinge 2009-03-01 23:52 ` H. Peter Anvin @ 2009-03-02 0:10 ` Andi Kleen 1 sibling, 0 replies; 49+ messages in thread From: Andi Kleen @ 2009-03-02 0:10 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andi Kleen, Andrew Morton, Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List, H. Peter Anvin > Yes. A big part of what I'm doing is trying to keep the Xen changes > self-contained to try and minimize their system-wide impact. Basically > it comes down to that if you use (mostly existing) kernel APIs in the > way they're intended to be used, then things just work out for both Xen > and native cases. The whole point of keeping the kernel modular is so > that if people implement and use the the interfaces correctly, the That's a big if. It sounds good in theory, but I in practice it will be different. Kernel interfaces tend to have hidden assumptions too that matter and the more special case code is in there the more additional hidden assumptions will be there too. > internal details shouldn't matter very much. Often the process of > adding Xen support has resulted in putting clear, well defined > interfaces into parts of the kernel where previously things were, well, > in need of cleaning up. That's true, but it's still much more complex than before semantically. > > >For example things like: doesn't use PAT with Xen or apparently very > >different routing are somewhat worrying because it means it's a > >completely different operation modus with Xen that needs to be taken > >care of later, adding to complexity. > > > > Unless we're planning on dropping support for processes with no or > broken PAT support, we're always going to have to deal with the non-PAT > case. These are all really old hardware[1], no modern 3d chips etc. Xen on the other hand .. [1] afaik you have to go back to PPro to get real PAT bugs. > >Unfortunately it also looks like that Xen the HV does things > >more and more different from what mainline kernel does so > >these differences will likely continue to grow over time. > > I hope that won't be the case. As part of considering any change to Xen My impression from looking occasionally at Xen source is like this at least. It used to be that Xen was basically Linux 2.4 with some tweaks in many ways, but now it's often completely new code doing things in very different ways. Basically a real fork diverging more and more. That said there's probably no way around merging the Dom0 support too, but I think it should be clearly said that it has a quite high long term cost for Linux. Hopefully it's worth it. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 5:28 ` Andrew Morton 2009-02-28 6:52 ` Jeremy Fitzhardinge @ 2009-02-28 8:42 ` Ingo Molnar 2009-02-28 9:46 ` Jeremy Fitzhardinge 2009-03-05 13:52 ` Morten P.D. Stevens 2 siblings, 1 reply; 49+ messages in thread From: Ingo Molnar @ 2009-02-28 8:42 UTC (permalink / raw) To: Andrew Morton Cc: Jeremy Fitzhardinge, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel * Andrew Morton <akpm@linux-foundation.org> wrote: > I hate to be the one to say it, but we should sit down and > work out whether it is justifiable to merge any of this into > Linux. I think it's still the case that the Xen technology is > the "old" way and that the world is moving off in the "new" > direction, KVM? > > In three years time, will we regret having merged this? Personally i'd like to see a sufficient reply to the mmap-perf paravirt regressions pointed out by Nick and reproduced by myself as well. (They were in the 4-5% macro-performance range iirc, which is huge.) So i havent seen any real progress on reducing native kernel overhead with paravirt. Patches were sent but no measurements were done and it seemed to have all fizzled out while the dom0 patches are being pursued. Which is not a particularly good basis on which to add even _more_ paravirt stuff, is it? Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 8:42 ` Ingo Molnar @ 2009-02-28 9:46 ` Jeremy Fitzhardinge 2009-03-02 12:08 ` Ingo Molnar 0 siblings, 1 reply; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-02-28 9:46 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Ingo Molnar wrote: > Personally i'd like to see a sufficient reply to the mmap-perf > paravirt regressions pointed out by Nick and reproduced by > myself as well. (They were in the 4-5% macro-performance range > iirc, which is huge.) > > So i havent seen any real progress on reducing native kernel > overhead with paravirt. Patches were sent but no measurements > were done and it seemed to have all fizzled out while the dom0 > patches are being pursued. > Hm, I'm not sure what you want me to do here. I sent out patches, they got merged, I posted the results of my measurements showing that the patches made a substantial improvement. I'd love to see confirmation from others that the patches help them, but I don't think you can say I've been unresponsive about this. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-02-28 9:46 ` Jeremy Fitzhardinge @ 2009-03-02 12:08 ` Ingo Molnar 2009-03-07 9:06 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 49+ messages in thread From: Ingo Molnar @ 2009-03-02 12:08 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel * Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Ingo Molnar wrote: >> Personally i'd like to see a sufficient reply to the >> mmap-perf paravirt regressions pointed out by Nick and >> reproduced by myself as well. (They were in the 4-5% >> macro-performance range iirc, which is huge.) >> >> So i havent seen any real progress on reducing native kernel >> overhead with paravirt. Patches were sent but no measurements >> were done and it seemed to have all fizzled out while the >> dom0 patches are being pursued. >> > > Hm, I'm not sure what you want me to do here. I sent out > patches, they got merged, I posted the results of my > measurements showing that the patches made a substantial > improvement. I'd love to see confirmation from others that > the patches help them, but I don't think you can say I've been > unresponsive about this. Have i missed a mail of yours perhaps? I dont have any track of you having posted mmap-perf perfcounters results. I grepped my mbox and the last mail i saw from you containing the string "mmap-perf" is from January 20, and it only includes my numbers. What i'd expect you to do is to proactively measure the overhead of CONFIG_PARAVIRT overhead of the native kernel, and analyze and address the results. Not just minimalistically reply to my performance measurements - as that does not really scale in the long run. Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-02 12:08 ` Ingo Molnar @ 2009-03-07 9:06 ` Jeremy Fitzhardinge 2009-03-08 11:01 ` Ingo Molnar 0 siblings, 1 reply; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-07 9:06 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel [-- Attachment #1: Type: text/plain, Size: 1800 bytes --] Ingo Molnar wrote: > Have i missed a mail of yours perhaps? I dont have any track of > you having posted mmap-perf perfcounters results. I grepped my > mbox and the last mail i saw from you containing the string > "mmap-perf" is from January 20, and it only includes my numbers. Yes, I think you must have missed a mail. I've attached it for reference, along with a more complete set of measurements I made regarding the series of patches applied (series ending at 1f4f931501e9270c156d05ee76b7b872de486304) to improve pvops performance. My results showed a dramatic drop in cache references (from about 300% pvop vs non-pvop, down to 125% with the full set of patches applied), but it didn't seem to make much of an effect on the overall wallclock time. I'm a bit sceptical of the numbers here because, while each run's passes are fairly consistent, booting and remeasuring seemed to cause larger variations than we're looking at. It would be easy to handwave it away with "cache effects", but its not very satisfying. I also didn't find the measurements very convincing because the number of CPU cycles and instructions executed count is effectively unchanged (ie, the baseline non-pvops vs original pvops apparently execute exactly the same number of instructions, but we know that there's a lot more going on), and with no change as each added patch definitely removes some amount of pvops overhead in terms of instructions in the instruction stream. Is it just measuring usermode stats? I ran it as root, with the command line you suggested ("./perfstat -e -5,-4,-3,0,1,2,3 ./mmap-perf 1"). Cache misses wandered up and down in a fairly non-intuitive way as well. I'll do a rerun comparing current tip.git pvops vs non-pvops to see if I can get some better results. J [-- Attachment #2: pvops-mmap-measurements.ods --] [-- Type: application/vnd.oasis.opendocument.spreadsheet, Size: 20038 bytes --] [-- Attachment #3: Attached Message --] [-- Type: message/rfc822, Size: 51779 bytes --] [-- Attachment #3.1.1: Type: text/plain, Size: 1640 bytes --] Ingo Molnar wrote: > ping? > > This is a very serious paravirt_ops slowdown affecting the native kernel's > performance to the tune of 5-10% in certain workloads. > > It's been about 2 years ago that paravirt_ops went upstream, when you told > us that something like this would never happen, that paravirt_ops is > designed so flexibly that it will never hinder the native kernel - and if > it does it will be easy to fix it. Now is the time to fulfill that > promise. I couldn't exactly reproduce your results, but I guess they're similar in shape. Comparing 2.6.29-rc2-nopv with -pvops, I saw this ratio (pass 1-5). Interestingly I'm seeing identical instruction counts for pvops vs non-pvops, and a lower cycle count. The cache references are way up and the miss rate is up a bit, which I guess is the source of the slowdown. With the attached patch, I get a clear improvement; it replaces the do-nothing pte_val/make_pte functions with inlined movs to move the argument to return, overpatching the 6-byte indirect call (on i386 it would just be all nopped out). CPU cycles and cache misses are way down, and the tick count is down from ~5% worse to ~2%. But the cache reference rate is even higher, which really doesn't make sense to me. But the patch is a clear improvement, and its hard to see how it could make anything worse (its always going to replace an indirect call with simple inlined code). (Full numbers in spreadsheet.) I have a couple of other patches to reduce the register pressure of the pvops calls, but I'm trying to work out how to make sure its not all to complex and/or fragile. J [-- Attachment #3.1.2: pvops-mmap-measurements.ods --] [-- Type: application/vnd.oasis.opendocument.spreadsheet, Size: 30546 bytes --] [-- Attachment #3.1.3: paravirt-ident.patch --] [-- Type: text/plain, Size: 6903 bytes --] Subject: x86/pvops: add a paravirt_indent functions to allow special patching Several paravirt ops implementations simply return their arguments, the most obvious being the make_pte/pte_val class of operations on native. On 32-bit, the identity function is literally a no-op, as the calling convention uses the same registers for the first argument and return. On 64-bit, it can be implemented with a single "mov". This patch adds special identity functions for 32 and 64 bit argument, and machinery to recognize them and replace them with either nops or a mov as appropriate. At the moment, the only users for the identity functions are the pagetable entry conversion functions. The result is a measureable improvement on pagetable-heavy benchmarks (2-3%, reducing the pvops overhead from 5 to 2%). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/paravirt.h | 5 ++ arch/x86/kernel/paravirt.c | 75 ++++++++++++++++++++++++++++++----- arch/x86/kernel/paravirt_patch_32.c | 12 +++++ arch/x86/kernel/paravirt_patch_64.c | 15 +++++++ 4 files changed, 98 insertions(+), 9 deletions(-) =================================================================== --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -390,6 +390,8 @@ asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":") unsigned paravirt_patch_nop(void); +unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len); +unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len); unsigned paravirt_patch_ignore(unsigned len); unsigned paravirt_patch_call(void *insnbuf, const void *target, u16 tgt_clobbers, @@ -1378,6 +1380,9 @@ } void _paravirt_nop(void); +u32 _paravirt_ident_32(u32); +u64 _paravirt_ident_64(u64); + #define paravirt_nop ((void *)_paravirt_nop) void paravirt_use_bytelocks(void); =================================================================== --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -44,6 +44,17 @@ { } +/* identity function, which can be inlined */ +u32 _paravirt_ident_32(u32 x) +{ + return x; +} + +u64 _paravirt_ident_64(u64 x) +{ + return x; +} + static void __init default_banner(void) { printk(KERN_INFO "Booting paravirtualized kernel on %s\n", @@ -138,9 +149,16 @@ if (opfunc == NULL) /* If there's no function, patch it with a ud2a (BUG) */ ret = paravirt_patch_insns(insnbuf, len, ud2a, ud2a+sizeof(ud2a)); - else if (opfunc == paravirt_nop) + else if (opfunc == _paravirt_nop) /* If the operation is a nop, then nop the callsite */ ret = paravirt_patch_nop(); + + /* identity functions just return their single argument */ + else if (opfunc == _paravirt_ident_32) + ret = paravirt_patch_ident_32(insnbuf, len); + else if (opfunc == _paravirt_ident_64) + ret = paravirt_patch_ident_64(insnbuf, len); + else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) || type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) || type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) || @@ -373,6 +391,45 @@ #endif }; +typedef pte_t make_pte_t(pteval_t); +typedef pmd_t make_pmd_t(pmdval_t); +typedef pud_t make_pud_t(pudval_t); +typedef pgd_t make_pgd_t(pgdval_t); + +typedef pteval_t pte_val_t(pte_t); +typedef pmdval_t pmd_val_t(pmd_t); +typedef pudval_t pud_val_t(pud_t); +typedef pgdval_t pgd_val_t(pgd_t); + + +#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE) +/* 32-bit pagetable entries */ +#define paravirt_native_make_pte (make_pte_t *)_paravirt_ident_32 +#define paravirt_native_pte_val (pte_val_t *)_paravirt_ident_32 + +#define paravirt_native_make_pmd (make_pmd_t *)_paravirt_ident_32 +#define paravirt_native_pmd_val (pmd_val_t *)_paravirt_ident_32 + +#define paravirt_native_make_pud (make_pud_t *)_paravirt_ident_32 +#define paravirt_native_pud_val (pud_val_t *)_paravirt_ident_32 + +#define paravirt_native_make_pgd (make_pgd_t *)_paravirt_ident_32 +#define paravirt_native_pgd_val (pgd_val_t *)_paravirt_ident_32 +#else +/* 64-bit pagetable entries */ +#define paravirt_native_make_pte (make_pte_t *)_paravirt_ident_64 +#define paravirt_native_pte_val (pte_val_t *)_paravirt_ident_64 + +#define paravirt_native_make_pmd (make_pmd_t *)_paravirt_ident_64 +#define paravirt_native_pmd_val (pmd_val_t *)_paravirt_ident_64 + +#define paravirt_native_make_pud (make_pud_t *)_paravirt_ident_64 +#define paravirt_native_pud_val (pud_val_t *)_paravirt_ident_64 + +#define paravirt_native_make_pgd (make_pgd_t *)_paravirt_ident_64 +#define paravirt_native_pgd_val (pgd_val_t *)_paravirt_ident_64 +#endif + struct pv_mmu_ops pv_mmu_ops = { #ifndef CONFIG_X86_64 .pagetable_setup_start = native_pagetable_setup_start, @@ -424,21 +481,21 @@ .pmd_clear = native_pmd_clear, #endif .set_pud = native_set_pud, - .pmd_val = native_pmd_val, - .make_pmd = native_make_pmd, + .pmd_val = paravirt_native_pmd_val, + .make_pmd = paravirt_native_make_pmd, #if PAGETABLE_LEVELS == 4 - .pud_val = native_pud_val, - .make_pud = native_make_pud, + .pud_val = paravirt_native_pud_val, + .make_pud = paravirt_native_make_pud, .set_pgd = native_set_pgd, #endif #endif /* PAGETABLE_LEVELS >= 3 */ - .pte_val = native_pte_val, - .pgd_val = native_pgd_val, + .pte_val = paravirt_native_pte_val, + .pgd_val = paravirt_native_pgd_val, - .make_pte = native_make_pte, - .make_pgd = native_make_pgd, + .make_pte = paravirt_native_make_pte, + .make_pgd = paravirt_native_make_pgd, .dup_mmap = paravirt_nop, .exit_mmap = paravirt_nop, =================================================================== --- a/arch/x86/kernel/paravirt_patch_32.c +++ b/arch/x86/kernel/paravirt_patch_32.c @@ -12,6 +12,18 @@ DEF_NATIVE(pv_cpu_ops, clts, "clts"); DEF_NATIVE(pv_cpu_ops, read_tsc, "rdtsc"); +unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len) +{ + /* arg in %eax, return in %eax */ + return 0; +} + +unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len) +{ + /* arg in %edx:%eax, return in %edx:%eax */ + return 0; +} + unsigned native_patch(u8 type, u16 clobbers, void *ibuf, unsigned long addr, unsigned len) { =================================================================== --- a/arch/x86/kernel/paravirt_patch_64.c +++ b/arch/x86/kernel/paravirt_patch_64.c @@ -19,6 +19,21 @@ DEF_NATIVE(pv_cpu_ops, usergs_sysret32, "swapgs; sysretl"); DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs"); +DEF_NATIVE(, mov32, "mov %edi, %eax"); +DEF_NATIVE(, mov64, "mov %rdi, %rax"); + +unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len) +{ + return paravirt_patch_insns(insnbuf, len, + start__mov32, end__mov32); +} + +unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len) +{ + return paravirt_patch_insns(insnbuf, len, + start__mov64, end__mov64); +} + unsigned native_patch(u8 type, u16 clobbers, void *ibuf, unsigned long addr, unsigned len) { ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-07 9:06 ` Jeremy Fitzhardinge @ 2009-03-08 11:01 ` Ingo Molnar 2009-03-08 21:56 ` H. Peter Anvin 0 siblings, 1 reply; 49+ messages in thread From: Ingo Molnar @ 2009-03-08 11:01 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andrew Morton, H. Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel * Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Ingo Molnar wrote: >> Have i missed a mail of yours perhaps? I dont have any track of you >> having posted mmap-perf perfcounters results. I grepped my mbox and the >> last mail i saw from you containing the string "mmap-perf" is from >> January 20, and it only includes my numbers. > > > Yes, I think you must have missed a mail. I've attached it for > reference, along with a more complete set of measurements I > made regarding the series of patches applied (series ending at > 1f4f931501e9270c156d05ee76b7b872de486304) to improve pvops > performance. Yeah - indeed i missed those numbers - they were embedded in a spreadsheet document attached to the mail ;) > My results showed a dramatic drop in cache references (from > about 300% pvop vs non-pvop, down to 125% with the full set of > patches applied), but it didn't seem to make much of an effect > on the overall wallclock time. I'm a bit sceptical of the > numbers here because, while each run's passes are fairly > consistent, booting and remeasuring seemed to cause larger > variations than we're looking at. It would be easy to handwave > it away with "cache effects", but its not very satisfying. Well it's the L2 cache references which are being measured here, and the L2 cache is likely very large on your test-system. So we can easily run into associativity limits in the L1 cache while still being mostly in L2 cache otherwise. Associativity effects do depend on the kernel image layout and on the precise allocations of kernel data structure allocations we do during bootup - and they dont really change after that. > I also didn't find the measurements very convincing because > the number of CPU cycles and instructions executed count is > effectively unchanged (ie, the baseline non-pvops vs original > pvops apparently execute exactly the same number of > instructions, but we know that there's a lot more going on), > and with no change as each added patch definitely removes some > amount of pvops overhead in terms of instructions in the > instruction stream. Is it just measuring usermode stats? I ran > it as root, with the command line you suggested ("./perfstat > -e -5,-4,-3,0,1,2,3 ./mmap-perf 1"). Cache misses wandered up > and down in a fairly non-intuitive way as well. It's measuring kernel stats too - and i very much saw the instruction count change to the tune of 10% or so. > I'll do a rerun comparing current tip.git pvops vs non-pvops > to see if I can get some better results. Thanks - i'll also try your patch on the same system i measured for my numbers so we'll have some comparison. Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-08 11:01 ` Ingo Molnar @ 2009-03-08 21:56 ` H. Peter Anvin 2009-03-08 22:06 ` Ingo Molnar 0 siblings, 1 reply; 49+ messages in thread From: H. Peter Anvin @ 2009-03-08 21:56 UTC (permalink / raw) To: Ingo Molnar Cc: Jeremy Fitzhardinge, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Ingo Molnar wrote: > > Associativity effects do depend on the kernel image layout and > on the precise allocations of kernel data structure allocations > we do during bootup - and they dont really change after that. > By the way, there is a really easy way (if a bit time consuming) to get the actual variability here -- you have to reboot between runs, even for the same kernel. It makes the data collection take a long time, but at least it can be scripted. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-08 21:56 ` H. Peter Anvin @ 2009-03-08 22:06 ` Ingo Molnar 2009-03-08 22:08 ` H. Peter Anvin 0 siblings, 1 reply; 49+ messages in thread From: Ingo Molnar @ 2009-03-08 22:06 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel * H. Peter Anvin <hpa@zytor.com> wrote: > Ingo Molnar wrote: > > > > Associativity effects do depend on the kernel image layout > > and on the precise allocations of kernel data structure > > allocations we do during bootup - and they dont really > > change after that. > > > > By the way, there is a really easy way (if a bit time > consuming) to get the actual variability here -- you have to > reboot between runs, even for the same kernel. It makes the > data collection take a long time, but at least it can be > scripted. Since it's the same kernel image i think the only truly reliable method would be to reboot between _different_ kernel images: same instructions but randomly re-align variables both in terms of absolute address and in terms of relative position to each other. Plus randomize bootmem allocs and never-gets-freed-really boot-time allocations. Really hard to do i think ... Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-08 22:06 ` Ingo Molnar @ 2009-03-08 22:08 ` H. Peter Anvin 2009-03-08 22:12 ` Ingo Molnar 0 siblings, 1 reply; 49+ messages in thread From: H. Peter Anvin @ 2009-03-08 22:08 UTC (permalink / raw) To: Ingo Molnar Cc: Jeremy Fitzhardinge, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Ingo Molnar wrote: > > Since it's the same kernel image i think the only truly reliable > method would be to reboot between _different_ kernel images: > same instructions but randomly re-align variables both in terms > of absolute address and in terms of relative position to each > other. Plus randomize bootmem allocs and never-gets-freed-really > boot-time allocations. > > Really hard to do i think ... > Ouch, yeah. On the other hand, the numbers made sense to me, so I don't see why there is any reason to distrust them. They show a 5% overhead with pv_ops enabled, reduced to a 2% overhead with the changed. That is more or less what would match my intuition from seeing the code. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-08 22:08 ` H. Peter Anvin @ 2009-03-08 22:12 ` Ingo Molnar 2009-03-09 18:06 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 49+ messages in thread From: Ingo Molnar @ 2009-03-08 22:12 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel * H. Peter Anvin <hpa@zytor.com> wrote: > Ingo Molnar wrote: > > > > Since it's the same kernel image i think the only truly reliable > > method would be to reboot between _different_ kernel images: > > same instructions but randomly re-align variables both in terms > > of absolute address and in terms of relative position to each > > other. Plus randomize bootmem allocs and never-gets-freed-really > > boot-time allocations. > > > > Really hard to do i think ... > > > > Ouch, yeah. > > On the other hand, the numbers made sense to me, so I don't > see why there is any reason to distrust them. They show a 5% > overhead with pv_ops enabled, reduced to a 2% overhead with > the changed. That is more or less what would match my > intuition from seeing the code. Yeah - it was Jeremy expressed doubt in the numbers, not me. And we need to eliminate that 2% as well - 2% is still an awful lot of native kernel overhead from a kernel feature that 95%+ of users do not make any use of. Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-08 22:12 ` Ingo Molnar @ 2009-03-09 18:06 ` Jeremy Fitzhardinge 2009-03-10 12:44 ` Ingo Molnar 2009-03-10 12:49 ` Nick Piggin 0 siblings, 2 replies; 49+ messages in thread From: Jeremy Fitzhardinge @ 2009-03-09 18:06 UTC (permalink / raw) To: Ingo Molnar Cc: H. Peter Anvin, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel Ingo Molnar wrote: > * H. Peter Anvin <hpa@zytor.com> wrote: > > >> Ingo Molnar wrote: >> >>> Since it's the same kernel image i think the only truly reliable >>> method would be to reboot between _different_ kernel images: >>> same instructions but randomly re-align variables both in terms >>> of absolute address and in terms of relative position to each >>> other. Plus randomize bootmem allocs and never-gets-freed-really >>> boot-time allocations. >>> >>> Really hard to do i think ... >>> >>> >> Ouch, yeah. >> >> On the other hand, the numbers made sense to me, so I don't >> see why there is any reason to distrust them. They show a 5% >> overhead with pv_ops enabled, reduced to a 2% overhead with >> the changed. That is more or less what would match my >> intuition from seeing the code. >> > > Yeah - it was Jeremy expressed doubt in the numbers, not me. > Mainly because I was seeing the instruction and cycle counts completely unchanged from run to run, which is implausible. They're not zero, so they're clearly measurements of *something*, but not cycles and instructions, since we know that they're changing. So what are they measurements of? And if they're not what they claim, are the other numbers more meaningful? It's easy to read the numbers as confirmations of preconceived expectations of the outcomes, but that's - as I said - unsatisfying. > And we need to eliminate that 2% as well - 2% is still an awful > lot of native kernel overhead from a kernel feature that 95%+ of > users do not make any use of. > Well, I think there's a few points here: 1. the test in question is a bit vague about kernel and user measurements. I assume the stuff coming from perfcounters is kernel-only state, but the elapsed time includes the usermode component, and so will be affected by the usermode page placement and cache effects. If I change the test to copy the test executable (statically linked, to avoid libraries), then that should at least fuzz out user page placement. 2. Its true that the cache effects could be due to the precise layout of the kernel executable; but if those effects are swamping effects of the changes to improve pvops then its unclear what the point of the exercise is. Especially since: 3. It is a config option, so if someone is sensitive to the performance hit and it gives them no useful functionality to offset it, then it can be disabled. Distros tend to enable it because they tend to value function and flexibility over raw performance; they tend to enable things like audit, selinux, modules which all have performance hits of a similar scale (of course, you could argue that more people get benefit from those features to offset their costs). But, 4. I think you're underestimating the number of people who get benefit from pvops; the Xen userbase is actually pretty large, and KVM will use pvops hooks when available to improve Linux-as-guest. 5. Also, we're looking at a single benchmark with no obvious relevance to a real workload. Perhaps there are workloads which continuously mash mmap/munmap/mremap(!), but I think they're fairly rare. Such a benchmark is useful for tuning specific areas, but if we're going to evaluate pvops overhead, it would be nice to use something a bit broader to base our measurements on. Also, what weighting are we going to put on 32 vs 64 bit? Equally important? One more than the other? All that said, I would like to get the pvops overhead down to unmeasureable - the ideal would be to be able to justify removing the config option altogether and leave it always enabled. The tradeoff, as always, is how much other complexity are we willing to stand to get there? The addition of a new calling convention is already fairly esoteric, but so far it has got us a 60% reduction in overhead (in this test). But going further is going to get more complex. For example, the next step would be to attack set_pte (including set_pte_*, pte_clear, etc), to make them use the new calling convention, and possibly make them inlineable (ie, to get it as close as possible to the non-pvops case). But that will require them to be implemented in asm (to guarantee that they only use the registers they're allowed to use), and we already have 3 variants of each for the different pagetable modes. All completely doable, and not even very hard, but it will be just one more thing to maintain - we just need to be sure the payoff is worth it. J ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-09 18:06 ` Jeremy Fitzhardinge @ 2009-03-10 12:44 ` Ingo Molnar 2009-03-10 12:49 ` Nick Piggin 1 sibling, 0 replies; 49+ messages in thread From: Ingo Molnar @ 2009-03-10 12:44 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: H. Peter Anvin, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel * Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> Yeah - it was Jeremy expressed doubt in the numbers, not me. > > Mainly because I was seeing the instruction and cycle counts > completely unchanged from run to run, which is implausible. > They're not zero, so they're clearly measurements of > *something*, but not cycles and instructions, since we know > that they're changing. So what are they measurements of? And > if they're not what they claim, are the other numbers more > meaningful? cycle count not changing in a macro-workload is not plausible. Instruction count not changing can happen sometimes - if the workload is deterministic (which this one is) and we happen to get exactly the same number of timer irqs during the test. But it's more common that it varies slightly - especially on SMP where task balancing can be timing-dependent and hence is noise. Ingo ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-09 18:06 ` Jeremy Fitzhardinge 2009-03-10 12:44 ` Ingo Molnar @ 2009-03-10 12:49 ` Nick Piggin 1 sibling, 0 replies; 49+ messages in thread From: Nick Piggin @ 2009-03-10 12:49 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Ingo Molnar, H. Peter Anvin, Andrew Morton, the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel On Tuesday 10 March 2009 05:06:40 Jeremy Fitzhardinge wrote: > Ingo Molnar wrote: > > * H. Peter Anvin <hpa@zytor.com> wrote: > >> Ingo Molnar wrote: > >>> Since it's the same kernel image i think the only truly reliable > >>> method would be to reboot between _different_ kernel images: > >>> same instructions but randomly re-align variables both in terms > >>> of absolute address and in terms of relative position to each > >>> other. Plus randomize bootmem allocs and never-gets-freed-really > >>> boot-time allocations. > >>> > >>> Really hard to do i think ... > >> > >> Ouch, yeah. > >> > >> On the other hand, the numbers made sense to me, so I don't > >> see why there is any reason to distrust them. They show a 5% > >> overhead with pv_ops enabled, reduced to a 2% overhead with > >> the changed. That is more or less what would match my > >> intuition from seeing the code. > > > > Yeah - it was Jeremy expressed doubt in the numbers, not me. > > Mainly because I was seeing the instruction and cycle counts completely > unchanged from run to run, which is implausible. They're not zero, so > they're clearly measurements of *something*, but not cycles and > instructions, since we know that they're changing. So what are they > measurements of? And if they're not what they claim, are the other > numbers more meaningful? > > It's easy to read the numbers as confirmations of preconceived > expectations of the outcomes, but that's - as I said - unsatisfying. > > > And we need to eliminate that 2% as well - 2% is still an awful > > lot of native kernel overhead from a kernel feature that 95%+ of > > users do not make any use of. > > Well, I think there's a few points here: > > 1. the test in question is a bit vague about kernel and user > measurements. I assume the stuff coming from perfcounters is > kernel-only state, but the elapsed time includes the usermode > component, and so will be affected by the usermode page placement > and cache effects. If I change the test to copy the test > executable (statically linked, to avoid libraries), then that > should at least fuzz out user page placement. > 2. Its true that the cache effects could be due to the precise layout > of the kernel executable; but if those effects are swamping > effects of the changes to improve pvops then its unclear what the > point of the exercise is. Especially since: > 3. It is a config option, so if someone is sensitive to the > performance hit and it gives them no useful functionality to > offset it, then it can be disabled. Distros tend to enable it > because they tend to value function and flexibility over raw > performance; they tend to enable things like audit, selinux, > modules which all have performance hits of a similar scale (of > course, you could argue that more people get benefit from those > features to offset their costs). But, > 4. I think you're underestimating the number of people who get > benefit from pvops; the Xen userbase is actually pretty large, and > KVM will use pvops hooks when available to improve Linux-as-guest. > 5. Also, we're looking at a single benchmark with no obvious > relevance to a real workload. Perhaps there are workloads which > continuously mash mmap/munmap/mremap(!), but I think they're > fairly rare. Such a benchmark is useful for tuning specific > areas, but if we're going to evaluate pvops overhead, it would be > nice to use something a bit broader to base our measurements on. > Also, what weighting are we going to put on 32 vs 64 bit? Equally > important? One more than the other? I saw _most_ of the extra overhead show up in page fault path. And also don't forget that fork/exit workloads are essentially mashing mmap/munmap. So things which mash these paths include kbuild, scripts, and some malloc patters (like you might see in MySQL running OLTP). Of course they tend to do more other stuff as well, so 2% in a microbenchmark will be much smaller, but that was never in dispute. One hardest problems is adding lots of features to critical paths that individually "never show a statistical difference on any real workload", but combine to slow things down. It really sucks to have people upgrade and performance go down. As an anecdote, I had a problem where an ISV upgraded SLES9 to SLES10 and their software's performance dropped 30% or so. And there were like 3 or 4 things that could be bisected to show a few % of that. This was without pvops mind you, but in very similar paths (mmap/munmap/page fault/teardown). The pvops stuff was basically just an extension of that saga. OK, that's probably an extreme case, but any of this stuff must always be considered a critical fastpath IMO. We know any slowdown is going to hurt in the long run. > All that said, I would like to get the pvops overhead down to > unmeasureable - the ideal would be to be able to justify removing the > config option altogether and leave it always enabled. > > The tradeoff, as always, is how much other complexity are we willing to > stand to get there? The addition of a new calling convention is already > fairly esoteric, but so far it has got us a 60% reduction in overhead > (in this test). But going further is going to get more complex. If the complexity is not in generic code and constrained within pvops stuff, then from my POV "as much as it takes", and you get to maintain it ;) Well, that's a bit unfair. From a distro POV, I'd love that to be the case because we ship pvops. From a kernel.org point of view, you provide a service that inevitably will have some cost but can be configured out. But I do think that it would be in your interest too because the speed of these paths should be important even for virtualised systems. > For example, the next step would be to attack set_pte (including > set_pte_*, pte_clear, etc), to make them use the new calling convention, > and possibly make them inlineable (ie, to get it as close as possible to > the non-pvops case). But that will require them to be implemented in > asm (to guarantee that they only use the registers they're allowed to > use), and we already have 3 variants of each for the different pagetable > modes. All completely doable, and not even very hard, but it will be > just one more thing to maintain - we just need to be sure the payoff is > worth it. Thanks for what you've done so far. I would like to see this taken as far as possible. I think it is very worthwhile although complexity is obviously a very real concern too. ^ permalink raw reply [flat|nested] 49+ messages in thread
* RE: [PATCH] xen: core dom0 support 2009-02-28 5:28 ` Andrew Morton 2009-02-28 6:52 ` Jeremy Fitzhardinge 2009-02-28 8:42 ` Ingo Molnar @ 2009-03-05 13:52 ` Morten P.D. Stevens 2009-03-08 14:25 ` Manfred Knick 2 siblings, 1 reply; 49+ messages in thread From: Morten P.D. Stevens @ 2009-03-05 13:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, George.Dunlap Hi, i think xen is a great virtualization technology. Many company´s work with xen and the performance is better than KVM. Here some benchmarks with Citrix XenServer 5.0 vs KVM with linux and windows guests on an IBM x3400 server: HDD: XEN | KVM Write: 110 MB/s | 60 MB/s Read: 130 MB/s | 80 MB/s Network Performance: (download an 4 GB iso image from apache webserver) XEN | KVM download Speed: 105 MB/s | 50 MB/s xen is using the full 1000 mbit network.. great performance! On our IBM servers Xen is still faster than KVM. -----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Andrew Morton Sent: Saturday, February 28, 2009 6:28 AM To: Jeremy Fitzhardinge Cc: H. Peter Anvin; the arch/x86 maintainers; Linux Kernel Mailing List; Xen-devel Subject: Re: [PATCH] xen: core dom0 support On Fri, 27 Feb 2009 17:59:06 -0800 Jeremy Fitzhardinge <jeremy@goop.org> wrote: > This series implements the core parts of Xen dom0 support; that is, just > enough to get the kernel started when booted by Xen as a dom0 kernel. And what other patches can we expect to see to complete the xen dom0 support? and.. I hate to be the one to say it, but we should sit down and work out whether it is justifiable to merge any of this into Linux. I think it's still the case that the Xen technology is the "old" way and that the world is moving off in the "new" direction, KVM? In three years time, will we regret having merged this? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] xen: core dom0 support 2009-03-05 13:52 ` Morten P.D. Stevens @ 2009-03-08 14:25 ` Manfred Knick 0 siblings, 0 replies; 49+ messages in thread From: Manfred Knick @ 2009-03-08 14:25 UTC (permalink / raw) To: linux-kernel Morten P.D. Stevens <mstevens <at> win-professional.com> writes: > Here some benchmarks with Citrix XenServer 5.0 vs KVM with linux and windows guests on an IBM x3400 server: Thanks! Just out of curiosity, to complete your appreciated impression: Do you perhaps have the corresponding figures regarding e.g. VMware ESX(i) available? Thanks in advance! Manfred ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2009-03-14 1:09 UTC | newest] Thread overview: 49+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-11 19:58 [PATCH] xen: core dom0 support devzero 2009-03-14 1:08 ` Morten P.D. Stevens -- strict thread matches above, loose matches on Subject: below -- 2009-02-28 1:59 Jeremy Fitzhardinge 2009-02-28 5:28 ` Andrew Morton 2009-02-28 6:52 ` Jeremy Fitzhardinge 2009-02-28 7:20 ` Ingo Molnar 2009-02-28 8:05 ` Jeremy Fitzhardinge 2009-02-28 8:36 ` Ingo Molnar 2009-02-28 9:57 ` Jeremy Fitzhardinge 2009-03-02 9:26 ` Gerd Hoffmann 2009-03-02 12:04 ` Ingo Molnar 2009-03-02 12:26 ` Gerd Hoffmann 2009-02-28 12:09 ` Nick Piggin 2009-02-28 18:11 ` [Xen-devel] " Jody Belka 2009-02-28 18:15 ` Andi Kleen 2009-03-01 23:38 ` Jeremy Fitzhardinge 2009-03-02 0:14 ` Andi Kleen 2009-03-01 23:27 ` Jeremy Fitzhardinge 2009-03-02 6:37 ` Nick Piggin 2009-03-02 8:05 ` Jeremy Fitzhardinge 2009-03-02 8:19 ` Nick Piggin 2009-03-02 9:05 ` Jeremy Fitzhardinge 2009-03-04 17:34 ` Anthony Liguori 2009-03-04 17:38 ` Jeremy Fitzhardinge 2009-03-04 17:31 ` Anthony Liguori 2009-03-04 19:03 ` Anthony Liguori 2009-03-04 19:16 ` H. Peter Anvin 2009-03-04 19:33 ` Anthony Liguori 2009-02-28 16:14 ` Andi Kleen 2009-03-01 23:34 ` Jeremy Fitzhardinge 2009-03-01 23:52 ` H. Peter Anvin 2009-03-02 0:08 ` Jeremy Fitzhardinge 2009-03-02 0:14 ` H. Peter Anvin 2009-03-02 0:42 ` Jeremy Fitzhardinge 2009-03-02 0:46 ` H. Peter Anvin 2009-03-02 0:10 ` Andi Kleen 2009-02-28 8:42 ` Ingo Molnar 2009-02-28 9:46 ` Jeremy Fitzhardinge 2009-03-02 12:08 ` Ingo Molnar 2009-03-07 9:06 ` Jeremy Fitzhardinge 2009-03-08 11:01 ` Ingo Molnar 2009-03-08 21:56 ` H. Peter Anvin 2009-03-08 22:06 ` Ingo Molnar 2009-03-08 22:08 ` H. Peter Anvin 2009-03-08 22:12 ` Ingo Molnar 2009-03-09 18:06 ` Jeremy Fitzhardinge 2009-03-10 12:44 ` Ingo Molnar 2009-03-10 12:49 ` Nick Piggin 2009-03-05 13:52 ` Morten P.D. Stevens 2009-03-08 14:25 ` Manfred Knick
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).