From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: Essay on an important Xen decision (long) Date: Tue, 10 Jan 2006 13:55:31 -0600 Message-ID: <43C41133.3050606@us.ibm.com> References: <516F50407E01324991DD6D07B0531AD59030F8@cacexc12.americas.cpqcorp.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <516F50407E01324991DD6D07B0531AD59030F8@cacexc12.americas.cpqcorp.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Magenheimer, Dan (HP Labs Fort Collins)" Cc: xen-devel List-Id: xen-devel@lists.xenproject.org Hi Dan, Thanks for the thorough explaination of physical memory virtualization. It's a topic that there isn't a lot of good reference on. You seem to conclude that the only possible solutions are making the dom0 either P==M or P2M. Is it not possible to make dom0 VP? If the only issue for making dom0 VP is DMA, wouldn't it be easier to modify the Linux DMA subsystem[1] to make a special hypercall to essentially pin a VP to a particular MFN that could be used for the DMA? One could imagine the hypervisor reversing low memory specifically for DMA such that bounce buffers could be avoided too. VP makes a lot of interesting memory optimizations considerably easier (memory compacting, swapping, etc.). [1] Realizing that I know very little about the Linux DMA subsystem so I don't know if this is outside the realm of possibilities. Regards, Anthony Liguori Magenheimer, Dan (HP Labs Fort Collins) wrote: >A fundamental architectural decision has to be made for >Xen regarding handling of physical/machine memory; at a high >level, the question is: > > Should Xen drivers be made more flexible to accommodate > different approaches to managing physical memory, or > should other architectures be required to conform to > the Xen/x86 model? > >A more detailed description of the specific decision is below. >The Xen/ia64 community would like to make this decision soon -- >possibly at the Xen summit -- as next steps of Xen/ia64 >functionality are significantly affected. Since either choice >has an impact on common code and on future Xen architecture, >this decision must involve core Xen developers and the broader >Xen community rather than just Xen/ia64 developers. > >While this may seem to be a trivial matter, such fundamental >choices often have a way of pre-selecting future design and >implementation directions that can have major negative or positive >impacts -- possibly unexpected -- on different parties. For example, >a decision might make a Xen developers' life easier but create >headaches for a distro or a Linux maintainer. If nothing else, >discussing fundamental decision points often helps to >bring out and codify/document hidden assumptions about >the future. > >This is a lengthy document but I hope to touch on most of >the various issues and tradeoffs. Understanding -- or, at >a minimum, reading -- this document should probably be >a prerequisite for involvement in discussions to resolve this. >I would encourage all readers to give the issues and tradeoffs >some thought as the "obvious x86" answer may not be the best >answer for the future of Xen. > >First a little terminology and background: > >In a virtualized environment, the resources of the physical >machine must subdivided and/or shared between multiple virtual >machines. Like an OS manages memory for its applications, one of >the primary roles of a hypervisor is to provide the illusion to >each guest OS that it owns some amount of "RAM" in the system. >Thus there are two kinds of physical memory addresses: the >addresses that a guest believes to be physical addresses and >the addresses that actually refer to RAM (e.g. bus addresses). >The literature (and Xen) confusingly labels these as "physical" >addresses and "machine" addresses. In a virtualized environment, >there must be some way of maintaining the relationship -- or >"mapping" -- between physical addresses and machine addresses. > >In Xen (across all architectures), there are currently three >different approaches for mapping physical addresses to machine >addresses: > >1) P==M: The guest is given a subset of machine memory that it > can access "directly". Accesses to machine memory addresses > outside of this range must somehow be restricted (but not > necessarily disallowed) by Xen. > >2) guest-aware p!=m (P2M): The guest is given max_pages of > contiguous physical memory starting at zero and the knowledge > that physical addresses are different than machine addresses. > The guest must understand the difference between a physical > address and a machine address and utilize the correct one in > different situations. > >3) virtual physical (VP): The guest is given max_pages of > contiguous physical memory starting at zero. Xen provides > the illusion to the guest that this is machine memory; > any physical-to-machine translation required for functional > correctness is handled invisibly by Xen. VP cannot be used > by guests that directly program DMA-based I/O devices > because a DMA device requires a machine address and, by > definition, the guest knows only about physical addresses. > >Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow >mode") for an unprivileged guest when a migration is underway. >Xen/ia64 currently uses P==M for domain0 and VP for unprivileged >guests. Xen/ppc intends to use VP only. > >There is an architectural proposal to change Xen/ia64 so that >domain0 uses P2M instead of P==M. We will call this choice P2M >and the choice to stay on the current path P==M. > >Here's what I think are the key issues/tradeoffs: > >XEN CODE IMPACT > >Some Xen drivers, such as the blkif driver, have been "converted" >to accommodate P==M. Others have not. For example, the balloon driver >currently assumes domain0 is P2M and thus does not currently work >on Xen/ia64 or Xen/ppc. The word "converted" is quoted because >nobody is particularly satisfied with the current state of the >converted drivers. Many apparently significant function calls are >define'd out of existence by macros. Other code does radically >different things depending on the architecture or on whether it >is being executed by dom0 or an unprivileged domain. And a few >ifdef's are sprinkled about. In short, what's done works but is >an ugly hack. Some believe that the best way to solve this mess >is for other architectures to do things more like Xen/x86. Others >believe there is an advantage to defining clear abstractions and >making the drivers truly more architecture-independent. > >P2M will require some rewriting of existing Xen/ia64 core code and the >addition of significant changes to Xenlinux/ia64 code but will allow >much easier porting of Xen's balloon/networking/migration drivers >and also enable some simplifying changes in the Xen block driver. >It is fair to guess that it will take at least several weeks/months >to rewrite and debug the core and Xenlinux code to get Xen/ia64 back >to where it is today, but future driver work will be much faster. >Fewer differences from Xen/x86 means less maintenance work for Xen >core and Xen/ia64 developers. I'd imagine also that more code will >be shared between Xen/VT-i and Xen/VT-x. > >P==M will require Xen's balloon/networking/migration drivers to >evolve to incorporate non-P2M models. This can be done, but is most >likely to end up (at least in the short term) as a collection of >unpalatable hacks like with the Xen block driver. However, making >Xen drivers more tolerant of different approaches may be a good >thing in the long run for Xen. > >XENLINUX IMPACT > >Today's operating systems are not implemented with an understanding >that a physical address and a machine address might be different. >Building this awareness into an OS requires non-trivial source >code change. For example, Xenlinux/x86 maintains a "p2m" mapping >table for quick translation and provides a "m2p" hypercall to keep >Xen in sync. OS code that manipulates physical addresses must be >modified to access/manage this table and make hypercalls when >appropriate. Macros can hide much of the complexity but much OS/driver >code exists that does not use standard macros. There is some >disagreement on how extensive are the required source code changes, >and how difficult it will be to maintain these changes across future >versions of guest OS's. One illustrative example however: In >paravirtualizing Xenlinux/ia64, seven header files are changed; >it is closer to 40 for Xenlinux/x86. > >Related, some would assert that pushing a small number of changes into >Linux (or any OS, open source or not) is far easier that pushing a >large number of changes into Linux. Until all the Xen/x86 changes are >in, it remains to be seen whether this is true or not. There is >a reasonable concern that the broad review required for such >an extensive set of changes will involve a large number of people >with a large number of agendas and force a number of Xen design >issues to be revisited -- at least clearly justified if not changed. >This is especially true if Xen's foes have any influence in the >process. > >Transparent paravirtualization (also called "shared binary") is the >ability for the same binary to be used both as a Xen guest and >natively on real hardware. Xenlinux/ia64 currently support this; >indeed, ignoring a couple of existing bugs, the same Xenlinux/ia64 >binary can be used natively, and as domain0 and as an unprivileged >domain. There have been proposals to do the same for Xenlinux/x86, >but the degree of code changed is much much higher. There is debate >about the cost/benefit of transparent paravirtualization, but the >primary beneficiaries -- distros and end customers -- are not very >well represented here. > >With P2M, it is unlikely that Xenlinux/ia64 will ever again be >transparently paravirtualizable. As with Xenlinux/x86, the changes >will probably be pushed into a subarch (mach-xen). Since Linux/ia64 >has a more diverse set of subarch's, there may be additional work >to ensure that Xen is orthogonal (and thus works with) all the >subarch's. > >P==M would continue to allow transparent paravirtualization. >This plus the reduced number of changes should make it easier to >get Xen/ia64 support into Linux/ia64 (assuming Xen/x86 support >gets included in Linux/x86). > >DRIVER DOMAINS > >Driver domains are "coming soon" and support of driver domains is a >"must", however support for hybrid driver domains (i.e. domains that >utilize both backend and frontend drivers) is open to debate. It can >be assumed however that all driver domains will require DMA access. > >P2M should make driver domains easier to implement (once the initial >Xenlinux/ia64 work is completed) and able to support a broader range >of functionality. P==M may disallow hybrid driver domains and >create other restrictions, though some creative person may be able >to solve these. > >FUTURE XEN FEATURE SUPPORT > >None of the approaches have been "design-tested" significantly for >support or compatibility with future Xen functionality such as >oversubscription or machine-memory hot-plug, nor for exotic >machine memory topologies such as NUMA or discontig (sparsely >populated). Such functionalities and topologies are much more >likely to be encountered in high-end server architectures rather >than widely-available PCs and low-end servers. There is some >debate as to whether the existing Xen memory architecture will easily >evolve to accommodate these future changes or if more fundamental >changes will be required. Architectural decisions and restrictions >should be made with these uncertainties in mind. > >Some believe that discovery and policy for machine memory will >eventually need to move out of Xen into domain0, leaving only >enforcement mechanism in Xen. For example, oversubscription, NUMA >or hot-plug memory support are likely to be fairly complicated >and a commonly stated goal is to move unnecessary complexity out >of Xen. And the plethora of recent changes in Linux/ia64 >involving machine memory models indicates there are still many >unknowns. P==M more easily supports a model where domain0 >owns ALL of machine memory *except* a small amount reserved for >and protected by Xen itself. If this is all true, Xen/x86 may >eventually need to move to a dom0 P==M model, in which case it >would be silly for Xen/ia64 to move to P2M and then back to P==M. > >Others think these features will be easy to implement in Xen and, >with minor changes, entirely compatible with P2M. And that >P2M is the once and future model for domain0. > >SUMMARY > >I'm sure there are more issues and tradeoffs that will come up >in discussion, but let me summarize these: > >Move domain0 to P2M: >+ Fewer differences in Xen drivers between Xen/x86 and Xen/ia64 >+ Fewer differences in Xen drivers between Xen/VT-x and Xen/VT-i >+ Easier to implement remaining Xen drivers for Xen/ia64 >- Major changes may require months for Xen/ia64 to regain stability >- Many more changes to Xenlinux/ia64; more difficulty pushing upstream >- No attempt to make Xen more resilient for future architectures > >Leave domain0 as P==M: >+ Fewer changes in Xenlinux; easier to push upstream >+ Making Xen more flexible is a good thing >? May provide better foundation for future features (oversubscr, NUMA) >- More restrictions on driver domains >- More hacks required for some Xen drivers, or >- More work to better abstract and define a portable driver > architecture abstract > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel > > >