From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree Date: Tue, 16 Sep 2014 05:52:43 +0200 Message-ID: <5417B40B.4000703@suse.com> References: <1410256709-25885-1-git-send-email-jgross@suse.com> <1410256709-25885-2-git-send-email-jgross@suse.com> <540ED600.3060102@citrix.com> <540EDB4F.30402@suse.com> <5412CB80.9030208@suse.com> <5416A379.5@citrix.com> <5416A8CE.5020400@suse.com> <5416B518.8030504@citrix.com> <5416B6F1.2020102@suse.com> <5416BFC2.4040900@citrix.com> <5416C392.1010707@suse.com> <5416F809.7060509@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5416F809.7060509@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: David Vrabel , Andrew Cooper , ian.campbell@citrix.com, ian.jackson@eu.citrix.com, jbeulich@suse.com, keir@xen.org, tim@xen.org, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 09/15/2014 04:30 PM, David Vrabel wrote: > On 15/09/14 11:46, Juergen Gross wrote: >> On 09/15/2014 12:30 PM, David Vrabel wrote: >>> On 15/09/14 10:52, Juergen Gross wrote: >>>> On 09/15/2014 11:44 AM, David Vrabel wrote: >>>>> On 15/09/14 09:52, Juergen Gross wrote: >>>>>> On 09/15/2014 10:29 AM, Andrew Cooper wrote: >>>>>>> >>>>>>> On 12/09/2014 11:31, Juergen Gross wrote: >>>>>>>> On 09/09/2014 12:49 PM, Juergen Gross wrote: >>>>>>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote: >>>>>>>>>> On 09/09/14 10:58, Juergen Gross wrote: >>>>>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list >>>>>>>>>>> currently contains the mfn of the top level page frame of the 3 >>>>>>>>>>> level >>>>>>>>>>> p2m tree, which is used by the Xen tools during saving and >>>>>>>>>>> restoring >>>>>>>>>>> (and live migration) of pv domains. With three levels of the p2m >>>>>>>>>>> tree >>>>>>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv >>>>>>>>>>> domain. >>>>>>>>>>> A 32 bit pv domain can support more, as each memory page can hold >>>>>>>>>>> 1024 >>>>>>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to >>>>>>>>>>> support more RAM on x86-64 an additional level is to be added. >>>>>>>>>>> >>>>>>>>>>> This patch expands struct arch_shared_info with a new p2m tree >>>>>>>>>>> root >>>>>>>>>>> and the number of levels of the p2m tree. The new information is >>>>>>>>>>> indicated by the domain to be valid by storing ~0UL into >>>>>>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than >>>>>>>>>>> three levels are needed, of course). >>>>>>>>>> >>>>>>>>>> A small domain feeling a little tight on space could easily opt >>>>>>>>>> for >>>>>>>>>> a 2 >>>>>>>>>> or even 1 level p2m. (After all, one advantage of virt is to cram >>>>>>>>>> many >>>>>>>>>> small VMs into a server). >>>>>>>>>> >>>>>>>>>> How is xen and toolstack support for n-level p2ms going to be >>>>>>>>>> advertised >>>>>>>>>> to guests? Simply assuming the toolstack is capable of dealing >>>>>>>>>> with >>>>>>>>>> this new scheme wont work with a new pv guest running on an older >>>>>>>>>> Xen. >>>>>>>>> >>>>>>>>> Is it really worth doing such an optimization? This would save only >>>>>>>>> very >>>>>>>>> few pages. >>>>>>>>> >>>>>>>>> If you think it should be done we can add another SIF_* flag to >>>>>>>>> start_info->flags. In this case a domain using this feature could >>>>>>>>> not be >>>>>>>>> migrated to a server with old tools, however. So we would probably >>>>>>>>> end >>>>>>>>> with the need to be able to suppress that flag on a per-domain >>>>>>>>> base. >>>>>>>> >>>>>>>> Any further comments? >>>>>>>> >>>>>>>> Which way should I go? >>>>>>>> >>>>>>> >>>>>>> There are two approaches, with different up/downsides >>>>>>> >>>>>>> 1) continue to use the old method, and use the new method only when >>>>>>> absolutely required. This will function, but on old toolstacks, >>>>>>> suffer >>>>>>> migration/suspend failures when the toolstack fails to find the p2m. >>>>>>> >>>>>>> 2) Provide a Xen feature flag indicating the presence of N-level p2m >>>>>>> support. Guests which can see this flag are free to use N-level, and >>>>>>> guests which can't are not. >>>>>>> >>>>>>> Ultimately, giving more than 512GB to a current 64bit PV domain is >>>>>>> not >>>>>>> going to work, and the choice above depends on which failure mode you >>>>>>> wish a new/old mix to have. >>>>>> >>>>>> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB >>>>>> without requiring a change of any Xen component. Additionally large >>>>>> domains can be started by users who don't care for migrating or >>>>>> suspending them. >>>>>> >>>>>> So I'd rather keep my patch as posted. >>>>> >>>>> PV guests can have extra memory added, beyond their initial limit. >>>>> Supporting this would require option 2. >>>> >>>> I don't see why this should require option 2. >>> >>> Um... >>> >>>> Option 1 only prohibits suspending/migrating a domain with more than >>>> 512 GB. >>> >>> ...this is the reason. >>> >>> With the exception of VMs that have assigned direct access to hardware, >>> migration is an essential feature and must be supported. >> >> So you'd prefer: >> >> 1) >512GB pv-domains (including Dom0) will be supported only with new >> Xen (4.6?), no matter if the user requires migration to be supported > > Yes. >512 GiB and not being able to migrate are not obviously related > from the point of view of the end user (unlike assigning a PCI device). > > Failing at domain save time is most likely too late for the end user. What would you think about following compromise: We add a flag that indicates support of multi-level p2m. Additionally the Linux kernel can ignore the flag not being set either if started as Dom0 or if told so via kernel parameter. > >> to: >> >> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw access) can >> be started on current Xen versions, migration is possible only if Xen >> is new (4.6?) > > There's also my preferred option: > > 3) >512 GiB PV domains are not supported. Large guests must be PVH or > PVHVM. In theory okay, but not right now, I think. PVH Dom0 is not production ready. Juergen > >> What is the common use case for migration? I doubt it is used very often >> for really huge domains. > > XenServer uses it for server pool upgrades with no VM downtime. > > Also, today's huge VM is tomorrow's common size. > > David > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >