From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree Date: Tue, 16 Sep 2014 14:44:26 +0200 Message-ID: <541830AA.1000308@suse.com> References: <1410256709-25885-1-git-send-email-jgross@suse.com> <1410256709-25885-2-git-send-email-jgross@suse.com> <540ED600.3060102@citrix.com> <540EDB4F.30402@suse.com> <5412CB80.9030208@suse.com> <5416A379.5@citrix.com> <5416A8CE.5020400@suse.com> <5416B518.8030504@citrix.com> <5416B6F1.2020102@suse.com> <5416BFC2.4040900@citrix.com> <5416C392.1010707@suse.com> <5416F809.7060509@citrix.com> <5417B40B.4000703@suse.com> <54180D98.8030903@citrix.com> <54181329.7030000@suse.com> <54182561.6060403@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <54182561.6060403@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: David Vrabel , Andrew Cooper , ian.campbell@citrix.com, ian.jackson@eu.citrix.com, jbeulich@suse.com, keir@xen.org, tim@xen.org, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 09/16/2014 01:56 PM, David Vrabel wrote: > On 16/09/14 11:38, Juergen Gross wrote: >> On 09/16/2014 12:14 PM, David Vrabel wrote: >>> On 16/09/14 04:52, Juergen Gross wrote: >>>> On 09/15/2014 04:30 PM, David Vrabel wrote: >>>>> On 15/09/14 11:46, Juergen Gross wrote: >>>>>> So you'd prefer: >>>>>> >>>>>> 1) >512GB pv-domains (including Dom0) will be supported only with new >>>>>> Xen (4.6?), no matter if the user requires migration to be >>>>>> supported >>>>> >>>>> Yes. >512 GiB and not being able to migrate are not obviously related >>>>> from the point of view of the end user (unlike assigning a PCI device). >>>>> >>>>> Failing at domain save time is most likely too late for the end user. >>>> >>>> What would you think about following compromise: >>>> >>>> We add a flag that indicates support of multi-level p2m. Additionally >>>> the Linux kernel can ignore the flag not being set either if started as >>>> Dom0 or if told so via kernel parameter. >>> >>> This sounds fine but this override should be via the command line >>> parameter only. Crash dump analysis tools may not understand the 4 >>> level p2m. >>> >>>>>> to: >>>>>> >>>>>> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw >>>>>> access) can >>>>>> be started on current Xen versions, migration is possible only if >>>>>> Xen >>>>>> is new (4.6?) >>>>> >>>>> There's also my preferred option: >>>>> >>>>> 3) >512 GiB PV domains are not supported. Large guests must be PVH or >>>>> PVHVM. >>>> >>>> In theory okay, but not right now, I think. PVH Dom0 is not production >>>> ready. >>> >>> I'm not really seeing the need for such a large dom0. >> >> Okay, then I'd come back to V1 of my patches. This is the minimum >> required to be able to boot up a system with Xen and more than 512GB >> memory without having to reduce the Dom0 memory via Xen boot parameter. >> >> Otherwise the hypervisor built mfn_list mapped into the initial address >> space will be too large. >> >> And no, I don't think setting the boot parameter is the solution here. >> Dom0 should be usable on a huge machine without special parameters. > > Ok. The case where's dom0's p2m format matters is pretty specialized. > >>> I also think a flat array for the p2m might be better (less complex). >>> There's plenty of virtual address space in a 64-bit guest to allow for >>> this. >> >> Hmm, do you think we could reserve an area of many GBs for Xen in >> virtual space? I suspect this would be rejected as another "Xen-ism". > > alloc_vm_area() Nice idea, but alloc_vm_area() allocates ptes for the whole area. __get_vm_area() would be better, I think. > >> BTW: the mfn_list_list will still be required to be built as a tree. > > The tools could be given the guest virtual address and walk the guest > page tables. > > This is probably too much of a difference from the existing ABI to be > worth pursuing at this point. Okay, coming back to the main question: What to do regarding support of >512GB domains: 1. we need another level of the p2m map 2. we are trying the linear p2m table a) with a 4 level mfn_list_list b) with access to the p2m table via page tables 3. my V1 patches are okay, as they enable Dom0 to start on machines with huge memory Juergen