From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree Date: Mon, 15 Sep 2014 11:52:49 +0200 Message-ID: <5416B6F1.2020102@suse.com> References: <1410256709-25885-1-git-send-email-jgross@suse.com> <1410256709-25885-2-git-send-email-jgross@suse.com> <540ED600.3060102@citrix.com> <540EDB4F.30402@suse.com> <5412CB80.9030208@suse.com> <5416A379.5@citrix.com> <5416A8CE.5020400@suse.com> <5416B518.8030504@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5416B518.8030504@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: David Vrabel , Andrew Cooper , ian.campbell@citrix.com, ian.jackson@eu.citrix.com, jbeulich@suse.com, keir@xen.org, tim@xen.org, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 09/15/2014 11:44 AM, David Vrabel wrote: > On 15/09/14 09:52, Juergen Gross wrote: >> On 09/15/2014 10:29 AM, Andrew Cooper wrote: >>> >>> On 12/09/2014 11:31, Juergen Gross wrote: >>>> On 09/09/2014 12:49 PM, Juergen Gross wrote: >>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote: >>>>>> On 09/09/14 10:58, Juergen Gross wrote: >>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list >>>>>>> currently contains the mfn of the top level page frame of the 3 level >>>>>>> p2m tree, which is used by the Xen tools during saving and restoring >>>>>>> (and live migration) of pv domains. With three levels of the p2m tree >>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain. >>>>>>> A 32 bit pv domain can support more, as each memory page can hold >>>>>>> 1024 >>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to >>>>>>> support more RAM on x86-64 an additional level is to be added. >>>>>>> >>>>>>> This patch expands struct arch_shared_info with a new p2m tree root >>>>>>> and the number of levels of the p2m tree. The new information is >>>>>>> indicated by the domain to be valid by storing ~0UL into >>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than >>>>>>> three levels are needed, of course). >>>>>> >>>>>> A small domain feeling a little tight on space could easily opt for >>>>>> a 2 >>>>>> or even 1 level p2m. (After all, one advantage of virt is to cram >>>>>> many >>>>>> small VMs into a server). >>>>>> >>>>>> How is xen and toolstack support for n-level p2ms going to be >>>>>> advertised >>>>>> to guests? Simply assuming the toolstack is capable of dealing with >>>>>> this new scheme wont work with a new pv guest running on an older Xen. >>>>> >>>>> Is it really worth doing such an optimization? This would save only >>>>> very >>>>> few pages. >>>>> >>>>> If you think it should be done we can add another SIF_* flag to >>>>> start_info->flags. In this case a domain using this feature could >>>>> not be >>>>> migrated to a server with old tools, however. So we would probably end >>>>> with the need to be able to suppress that flag on a per-domain base. >>>> >>>> Any further comments? >>>> >>>> Which way should I go? >>>> >>> >>> There are two approaches, with different up/downsides >>> >>> 1) continue to use the old method, and use the new method only when >>> absolutely required. This will function, but on old toolstacks, suffer >>> migration/suspend failures when the toolstack fails to find the p2m. >>> >>> 2) Provide a Xen feature flag indicating the presence of N-level p2m >>> support. Guests which can see this flag are free to use N-level, and >>> guests which can't are not. >>> >>> Ultimately, giving more than 512GB to a current 64bit PV domain is not >>> going to work, and the choice above depends on which failure mode you >>> wish a new/old mix to have. >> >> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB >> without requiring a change of any Xen component. Additionally large >> domains can be started by users who don't care for migrating or >> suspending them. >> >> So I'd rather keep my patch as posted. > > PV guests can have extra memory added, beyond their initial limit. > Supporting this would require option 2. I don't see why this should require option 2. Option 1 only prohibits suspending/migrating a domain with more than 512 GB. Just running it is fine with either option. I should mention, however, that the number of p2m tree levels will be increased on demand when needed. The tree won't be created with more than 3 levels if the domain isn't started with more than 512 GB. Juergen