xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	Jan Beulich <JBeulich@suse.com>
Cc: David Vrabel <david.vrabel@citrix.com>,
	xen-devel@lists.xensource.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com, linux-kernel@vger.kernel.org
Subject: Re: [Xen-devel] [PATCH 3/3] xen: eliminate scalability issues from initial mapping setup
Date: Fri, 05 Sep 2014 11:44:12 +0200	[thread overview]
Message-ID: <540985EC.5050205@suse.com> (raw)
In-Reply-To: <54097CF5.7080201@citrix.com>

On 09/05/2014 11:05 AM, Andrew Cooper wrote:
> On 05/09/14 08:55, Juergen Gross wrote:
>> On 09/04/2014 04:43 PM, Andrew Cooper wrote:
>>> On 04/09/14 15:31, Jan Beulich wrote:
>>>>>>> On 04.09.14 at 15:02, <andrew.cooper3@citrix.com> wrote:
>>>>> On 04/09/14 13:59, David Vrabel wrote:
>>>>>> On 04/09/14 13:38, Juergen Gross wrote:
>>>>>>> Direct Xen to place the initial P->M table outside of the initial
>>>>>>> mapping, as otherwise the 1G (implementation) / 2G (theoretical)
>>>>>>> restriction on the size of the initial mapping limits the amount
>>>>>>> of memory a domain can be handed initially.
>>>>>> The three level p2m limits memory to 512 GiB on x86-64 but this patch
>>>>>> doesn't seem to address this limit and thus seems a bit useless to
>>>>>> me.
>>>>> Any increase of the p2m beyond 3 levels will need to come with
>>>>> substantial libxc changes first.  3 level p2ms are hard coded
>>>>> throughout
>>>>> all the PV build and migrate code.
>>>> No, there no such dependency - the kernel could use 4 levels at
>>>> any time (sacrificing being able to get migrated), making sure it
>>>> only exposes the 3 levels hanging off the fourth level (or not
>>>> exposing this information at all) to external entities making this
>>>> wrong assumption.
>>>>
>>>> Jan
>>>>
>>>
>>> That would require that the PV kernel must start with a 3 level p2m and
>>> fudge things afterwards.
>>
>> I always thought the 3 level p2m is constructed by the kernel, not by
>> the tools.
>>
>> It starts with the linear p2m list anchored at xen_start_info->mfn_list,
>> constructs the p2m tree and writes the p2m_top_mfn mfn to
>> HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list
>>
>> See comment in the kernel source arch/x86/xen/p2m.c
>>
>> So booting with a larger p2m list can be handled completely by the
>> kernel itself.
>
> Ah yes - I remember now.  All the toolstack does is create the linear
> p2m.  In which case building such a domain will be fine.
>
>>
>>>
>>> At a minimum, I would expect a patch to libxc to detect a 4 level PV
>>> guest and fail with a meaningful error, rather than an obscure "m2p
>>> doesn't match p2m for mfn/pfn X".
>>
>> I'd rather fix it in a clean way.
>>
>> I think the best way to do it would be an indicator in the p2m array
>> anchor, e.g. setting 1<<61 in pfn_to_mfn_frame_list_list. This will
>> result in an early error with old tools:
>> "Couldn't map p2m_frame_list_list"
>
> No it wont.  The is_mapped() macro in the toolstack is quite broken.  It
> stems from a lack of Design/API/ABI concerning things like the p2m.  In
> particular, INVALID_MFN is not an ABI constant, nor is any notion of
> mapped vs unmapped.

That's not relevant here. map_frame_list_list() in xc_domain_save.c
reads pfn_to_mfn_frame_list_list and tries to map that mfn directly.
This will fail and result in above error message.

> Its current implementation is a relic of 32bit days, and only checks bit
> 31.  It also means that it is impossible to migrate a PV VM with pfns
> above the 43bit limit; a restriction which is lifted by my migration v2
> series.  A lot of the other migration constructs are in a similar state,
> which is why they are being deleted by the v2 series.
>
> The clean way to fix this is to leave pfn_to_mfn_frame_list_list as
> INVALID_MFN. Introduce two new fields beside it named p2m_levels and
> p2m_root, which then caters for levels greater than 4 in a compatible
> manner.

I don't mind doing it this way.


Juergen

  reply	other threads:[~2014-09-05  9:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-04 12:38 [PATCH 0/3] xen: remove memory limits from pv-domains Juergen Gross
2014-09-04 12:38 ` [PATCH 1/3] xen: sync some headers with xen tree Juergen Gross
2014-09-04 12:52   ` Jan Beulich
2014-09-05  8:06     ` Juergen Gross
2014-09-04 12:38 ` [PATCH 2/3] xen: eliminate scalability issues from initrd handling Juergen Gross
2014-09-04 12:52   ` David Vrabel
2014-09-04 14:29     ` Jan Beulich
2014-09-04 14:53       ` [Xen-devel] " David Vrabel
2014-09-05  8:04         ` Juergen Gross
2014-09-04 12:38 ` [PATCH 3/3] xen: eliminate scalability issues from initial mapping setup Juergen Gross
2014-09-04 12:59   ` David Vrabel
2014-09-04 13:02     ` [Xen-devel] " Andrew Cooper
2014-09-04 14:31       ` Jan Beulich
2014-09-04 14:43         ` Andrew Cooper
2014-09-05  7:55           ` Juergen Gross
2014-09-05  9:05             ` Andrew Cooper
2014-09-05  9:44               ` Juergen Gross [this message]
2014-09-04 15:13         ` David Vrabel
2014-09-05  8:03     ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=540985EC.5050205@suse.com \
    --to=jgross@suse.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).