From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <aliguori@us.ibm.com>
Subject: Re: Essay on an important Xen decision (long)
Date: Tue, 10 Jan 2006 13:55:31 -0600
Message-ID: <43C41133.3050606@us.ibm.com>
References: <516F50407E01324991DD6D07B0531AD59030F8@cacexc12.americas.cpqcorp.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <516F50407E01324991DD6D07B0531AD59030F8@cacexc12.americas.cpqcorp.net>
List-Unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: "Magenheimer, Dan (HP Labs Fort Collins)" <dan.magenheimer@hp.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

Hi Dan,

Thanks for the thorough explaination of physical memory virtualization.  
It's a topic that there isn't a lot of good reference on.

You seem to conclude that the only possible solutions are making the 
dom0 either P==M or P2M.  Is it not possible to make dom0 VP?

If the only issue for making dom0 VP is DMA, wouldn't it be easier to 
modify the Linux DMA subsystem[1] to make a special hypercall to 
essentially pin a VP to a particular MFN that could be used for the 
DMA?  One could imagine the hypervisor reversing low memory specifically 
for DMA such that bounce buffers could be avoided too.

VP makes a lot of interesting memory optimizations considerably easier 
(memory compacting, swapping, etc.).

[1] Realizing that I know very little about the Linux DMA subsystem so I 
don't know if this is outside the realm of possibilities.

Regards,

Anthony Liguori

Magenheimer, Dan (HP Labs Fort Collins) wrote:

>A fundamental architectural decision has to be made for
>Xen regarding handling of physical/machine memory; at a high
>level, the question is:
>
>	Should Xen drivers be made more flexible to accommodate
>	different approaches to managing physical memory, or
>	should other architectures be required to conform to
>	the Xen/x86 model?
>
>A more detailed description of the specific decision is below.
>The Xen/ia64 community would like to make this decision soon --
>possibly at the Xen summit -- as next steps of Xen/ia64
>functionality are significantly affected.  Since either choice
>has an impact on common code and on future Xen architecture,
>this decision must involve core Xen developers and the broader
>Xen community rather than just Xen/ia64 developers.
>
>While this may seem to be a trivial matter, such fundamental
>choices often have a way of pre-selecting future design and
>implementation directions that can have major negative or positive
>impacts -- possibly unexpected -- on different parties.  For example,
>a decision might make a Xen developers' life easier but create
>headaches for a distro or a Linux maintainer.  If nothing else,
>discussing fundamental decision points often helps to
>bring out and codify/document hidden assumptions about
>the future.
>
>This is a lengthy document but I hope to touch on most of
>the various issues and tradeoffs.  Understanding -- or, at
>a minimum, reading -- this document should probably be
>a prerequisite for involvement in discussions to resolve this.
>I would encourage all readers to give the issues and tradeoffs
>some thought as the "obvious x86" answer may not be the best
>answer for the future of Xen.
>
>First a little terminology and background:
>
>In a virtualized environment, the resources of the physical
>machine must subdivided and/or shared between multiple virtual
>machines.  Like an OS manages memory for its applications, one of
>the primary roles of a hypervisor is to provide the illusion to
>each guest OS that it owns some amount of "RAM" in the system.
>Thus there are two kinds of physical memory addresses: the
>addresses that a guest believes to be physical addresses and
>the addresses that actually refer to RAM (e.g. bus addresses).
>The literature (and Xen) confusingly labels these as "physical"
>addresses and "machine" addresses.  In a virtualized environment,
>there must be some way of maintaining the relationship -- or
>"mapping" -- between physical addresses and machine addresses.
>
>In Xen (across all architectures), there are currently three
>different approaches for mapping physical addresses to machine
>addresses:
>
>1) P==M: The guest is given a subset of machine memory that it
>   can access "directly".  Accesses to machine memory addresses
>   outside of this range must somehow be restricted (but not
>   necessarily disallowed) by Xen.
>
>2) guest-aware p!=m (P2M): The guest is given max_pages of
>   contiguous physical memory starting at zero and the knowledge
>   that physical addresses are different than machine addresses.
>   The guest must understand the difference between a physical
>   address and a machine address and utilize the correct one in
>   different situations.
>
>3) virtual physical (VP): The guest is given max_pages of
>   contiguous physical memory starting at zero.  Xen provides
>   the illusion to the guest that this is machine memory;
>   any physical-to-machine translation required for functional
>   correctness is handled invisibly by Xen.  VP cannot be used
>   by guests that directly program DMA-based I/O devices
>   because a DMA device requires a machine address and, by
>   definition, the guest knows only about physical addresses.
>
>Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow
>mode") for an unprivileged guest when a migration is underway.
>Xen/ia64 currently uses P==M for domain0 and VP for unprivileged
>guests.  Xen/ppc intends to use VP only.
>
>There is an architectural proposal to change Xen/ia64 so that
>domain0 uses P2M instead of P==M.  We will call this choice P2M
>and the choice to stay on the current path P==M.
>
>Here's what I think are the key issues/tradeoffs:
>
>XEN CODE IMPACT
>
>Some Xen drivers, such as the blkif driver, have been "converted"
>to accommodate P==M. Others have not.  For example, the balloon driver
>currently assumes domain0 is P2M and thus does not currently work
>on Xen/ia64 or Xen/ppc.  The word "converted" is quoted because
>nobody is particularly satisfied with the current state of the
>converted drivers.  Many apparently significant function calls are
>define'd out of existence by macros.  Other code does radically
>different things depending on the architecture or on whether it
>is being executed by dom0 or an unprivileged domain.  And a few
>ifdef's are sprinkled about.  In short, what's done works but is
>an ugly hack.  Some believe that the best way to solve this mess
>is for other architectures to do things more like Xen/x86.  Others
>believe there is an advantage to defining clear abstractions and
>making the drivers truly more architecture-independent.
>
>P2M will require some rewriting of existing Xen/ia64 core code and the
>addition of significant changes to Xenlinux/ia64 code but will allow
>much easier porting of Xen's balloon/networking/migration drivers
>and also enable some simplifying changes in the Xen block driver.
>It is fair to guess that it will take at least several weeks/months
>to rewrite and debug the core and Xenlinux code to get Xen/ia64 back
>to where it is today, but future driver work will be much faster.
>Fewer differences from Xen/x86 means less maintenance work for Xen
>core and Xen/ia64 developers.  I'd imagine also that more code will
>be shared between Xen/VT-i and Xen/VT-x.
>
>P==M will require Xen's balloon/networking/migration drivers to
>evolve to incorporate non-P2M models.  This can be done, but is most
>likely to end up (at least in the short term) as a collection of
>unpalatable hacks like with the Xen block driver.  However, making
>Xen drivers more tolerant of different approaches may be a good
>thing in the long run for Xen.
>
>XENLINUX IMPACT
>
>Today's operating systems are not implemented with an understanding
>that a physical address and a machine address might be different.
>Building this awareness into an OS requires non-trivial source
>code change.  For example, Xenlinux/x86 maintains a "p2m" mapping
>table for quick translation and provides a "m2p" hypercall to keep
>Xen in sync.  OS code that manipulates physical addresses must be
>modified to access/manage this table and make hypercalls when
>appropriate.  Macros can hide much of the complexity but much OS/driver
>code exists that does not use standard macros.  There is some
>disagreement on how extensive are the required source code changes,
>and how difficult it will be to maintain these changes across future
>versions of guest OS's.  One illustrative example however:  In
>paravirtualizing Xenlinux/ia64, seven header files are changed;
>it is closer to 40 for Xenlinux/x86.
>
>Related, some would assert that pushing a small number of changes into
>Linux (or any OS, open source or not) is far easier that pushing a
>large number of changes into Linux.  Until all the Xen/x86 changes are
>in, it remains to be seen whether this is true or not.  There is
>a reasonable concern that the broad review required for such
>an extensive set of changes will involve a large number of people
>with a large number of agendas and force a number of Xen design
>issues to be revisited -- at least clearly justified if not changed.
>This is especially true if Xen's foes have any influence in the
>process.
>
>Transparent paravirtualization (also called "shared binary") is the
>ability for the same binary to be used both as a Xen guest and
>natively on real hardware.  Xenlinux/ia64 currently support this;
>indeed, ignoring a couple of existing bugs, the same Xenlinux/ia64
>binary can be used natively, and as domain0 and as an unprivileged
>domain. There have been proposals to do the same for Xenlinux/x86,
>but the degree of code changed is much much higher.  There is debate
>about the cost/benefit of transparent paravirtualization, but the
>primary beneficiaries -- distros and end customers -- are not very
>well represented here.
>
>With P2M, it is unlikely that Xenlinux/ia64 will ever again be
>transparently paravirtualizable.  As with Xenlinux/x86, the changes
>will probably be pushed into a subarch (mach-xen).  Since Linux/ia64
>has a more diverse set of subarch's, there may be additional work
>to ensure that Xen is orthogonal (and thus works with) all the
>subarch's.
>
>P==M would continue to allow transparent paravirtualization.
>This plus the reduced number of changes should make it easier to
>get Xen/ia64 support into Linux/ia64 (assuming Xen/x86 support
>gets included in Linux/x86).
>
>DRIVER DOMAINS
>
>Driver domains are "coming soon" and support of driver domains is a
>"must", however support for hybrid driver domains (i.e. domains that
>utilize both backend and frontend drivers) is open to debate.  It can
>be assumed however that all driver domains will require DMA access.
>
>P2M should make driver domains easier to implement (once the initial
>Xenlinux/ia64 work is completed) and able to support a broader range
>of functionality.  P==M may disallow hybrid driver domains and
>create other restrictions, though some creative person may be able
>to solve these.
>
>FUTURE XEN FEATURE SUPPORT
>
>None of the approaches have been "design-tested" significantly for
>support or compatibility with future Xen functionality such as
>oversubscription or machine-memory hot-plug, nor for exotic
>machine memory topologies such as NUMA or discontig (sparsely
>populated).  Such functionalities and topologies are much more
>likely to be encountered in high-end server architectures rather
>than widely-available PCs and low-end servers.  There is some
>debate as to whether the existing Xen memory architecture will easily
>evolve to accommodate these future changes or if more fundamental
>changes will be required.  Architectural decisions and restrictions
>should be made with these uncertainties in mind.
>
>Some believe that discovery and policy for machine memory will
>eventually need to move out of Xen into domain0, leaving only
>enforcement mechanism in Xen.  For example, oversubscription, NUMA
>or hot-plug memory support are likely to be fairly complicated
>and a commonly stated goal is to move unnecessary complexity out
>of Xen.  And the plethora of recent changes in Linux/ia64
>involving machine memory models indicates there are still many
>unknowns.  P==M more easily supports a model where domain0
>owns ALL of machine memory *except* a small amount reserved for
>and protected by Xen itself.  If this is all true, Xen/x86 may
>eventually need to move to a dom0 P==M model, in which case it
>would be silly for Xen/ia64 to move to P2M and then back to P==M.
>
>Others think these features will be easy to implement in Xen and,
>with minor changes, entirely compatible with P2M.  And that
>P2M is the once and future model for domain0.
>
>SUMMARY
>
>I'm sure there are more issues and tradeoffs that will come up
>in discussion, but let me summarize these:
>
>Move domain0 to P2M:
>+ Fewer differences in Xen drivers between Xen/x86 and Xen/ia64
>+ Fewer differences in Xen drivers between Xen/VT-x and Xen/VT-i
>+ Easier to implement remaining Xen drivers for Xen/ia64
>- Major changes may require months for Xen/ia64 to regain stability
>- Many more changes to Xenlinux/ia64; more difficulty pushing upstream
>- No attempt to make Xen more resilient for future architectures
>
>Leave domain0 as P==M:
>+ Fewer changes in Xenlinux; easier to push upstream
>+ Making Xen more flexible is a good thing
>? May provide better foundation for future features (oversubscr, NUMA)
>- More restrictions on driver domains
>- More hacks required for some Xen drivers, or
>- More work to better abstract and define a portable driver
>  architecture abstract
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>
>  
>