All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Juergen Gross <jgross@suse.com>, Jan Beulich <JBeulich@suse.com>
Cc: David Vrabel <david.vrabel@citrix.com>,
	<xen-devel@lists.xensource.com>, <boris.ostrovsky@oracle.com>,
	<konrad.wilk@oracle.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [Xen-devel] [PATCH 3/3] xen: eliminate scalability issues from initial mapping setup
Date: Fri, 5 Sep 2014 10:05:57 +0100	[thread overview]
Message-ID: <54097CF5.7080201@citrix.com> (raw)
In-Reply-To: <54096C83.2080608@suse.com>

On 05/09/14 08:55, Juergen Gross wrote:
> On 09/04/2014 04:43 PM, Andrew Cooper wrote:
>> On 04/09/14 15:31, Jan Beulich wrote:
>>>>>> On 04.09.14 at 15:02, <andrew.cooper3@citrix.com> wrote:
>>>> On 04/09/14 13:59, David Vrabel wrote:
>>>>> On 04/09/14 13:38, Juergen Gross wrote:
>>>>>> Direct Xen to place the initial P->M table outside of the initial
>>>>>> mapping, as otherwise the 1G (implementation) / 2G (theoretical)
>>>>>> restriction on the size of the initial mapping limits the amount
>>>>>> of memory a domain can be handed initially.
>>>>> The three level p2m limits memory to 512 GiB on x86-64 but this patch
>>>>> doesn't seem to address this limit and thus seems a bit useless to
>>>>> me.
>>>> Any increase of the p2m beyond 3 levels will need to come with
>>>> substantial libxc changes first.  3 level p2ms are hard coded
>>>> throughout
>>>> all the PV build and migrate code.
>>> No, there no such dependency - the kernel could use 4 levels at
>>> any time (sacrificing being able to get migrated), making sure it
>>> only exposes the 3 levels hanging off the fourth level (or not
>>> exposing this information at all) to external entities making this
>>> wrong assumption.
>>>
>>> Jan
>>>
>>
>> That would require that the PV kernel must start with a 3 level p2m and
>> fudge things afterwards.
>
> I always thought the 3 level p2m is constructed by the kernel, not by
> the tools.
>
> It starts with the linear p2m list anchored at xen_start_info->mfn_list,
> constructs the p2m tree and writes the p2m_top_mfn mfn to
> HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list
>
> See comment in the kernel source arch/x86/xen/p2m.c
>
> So booting with a larger p2m list can be handled completely by the
> kernel itself.

Ah yes - I remember now.  All the toolstack does is create the linear
p2m.  In which case building such a domain will be fine.

>
>>
>> At a minimum, I would expect a patch to libxc to detect a 4 level PV
>> guest and fail with a meaningful error, rather than an obscure "m2p
>> doesn't match p2m for mfn/pfn X".
>
> I'd rather fix it in a clean way.
>
> I think the best way to do it would be an indicator in the p2m array
> anchor, e.g. setting 1<<61 in pfn_to_mfn_frame_list_list. This will
> result in an early error with old tools:
> "Couldn't map p2m_frame_list_list"

No it wont.  The is_mapped() macro in the toolstack is quite broken.  It
stems from a lack of Design/API/ABI concerning things like the p2m.  In
particular, INVALID_MFN is not an ABI constant, nor is any notion of
mapped vs unmapped.

Its current implementation is a relic of 32bit days, and only checks bit
31.  It also means that it is impossible to migrate a PV VM with pfns
above the 43bit limit; a restriction which is lifted by my migration v2
series.  A lot of the other migration constructs are in a similar state,
which is why they are being deleted by the v2 series.

The clean way to fix this is to leave pfn_to_mfn_frame_list_list as
INVALID_MFN. Introduce two new fields beside it named p2m_levels and
p2m_root, which then caters for levels greater than 4 in a compatible
manner.

~Andrew

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Juergen Gross <jgross@suse.com>, Jan Beulich <JBeulich@suse.com>
Cc: David Vrabel <david.vrabel@citrix.com>,
	xen-devel@lists.xensource.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com, linux-kernel@vger.kernel.org
Subject: Re: [Xen-devel] [PATCH 3/3] xen: eliminate scalability issues from initial mapping setup
Date: Fri, 5 Sep 2014 10:05:57 +0100	[thread overview]
Message-ID: <54097CF5.7080201@citrix.com> (raw)
In-Reply-To: <54096C83.2080608@suse.com>

On 05/09/14 08:55, Juergen Gross wrote:
> On 09/04/2014 04:43 PM, Andrew Cooper wrote:
>> On 04/09/14 15:31, Jan Beulich wrote:
>>>>>> On 04.09.14 at 15:02, <andrew.cooper3@citrix.com> wrote:
>>>> On 04/09/14 13:59, David Vrabel wrote:
>>>>> On 04/09/14 13:38, Juergen Gross wrote:
>>>>>> Direct Xen to place the initial P->M table outside of the initial
>>>>>> mapping, as otherwise the 1G (implementation) / 2G (theoretical)
>>>>>> restriction on the size of the initial mapping limits the amount
>>>>>> of memory a domain can be handed initially.
>>>>> The three level p2m limits memory to 512 GiB on x86-64 but this patch
>>>>> doesn't seem to address this limit and thus seems a bit useless to
>>>>> me.
>>>> Any increase of the p2m beyond 3 levels will need to come with
>>>> substantial libxc changes first.  3 level p2ms are hard coded
>>>> throughout
>>>> all the PV build and migrate code.
>>> No, there no such dependency - the kernel could use 4 levels at
>>> any time (sacrificing being able to get migrated), making sure it
>>> only exposes the 3 levels hanging off the fourth level (or not
>>> exposing this information at all) to external entities making this
>>> wrong assumption.
>>>
>>> Jan
>>>
>>
>> That would require that the PV kernel must start with a 3 level p2m and
>> fudge things afterwards.
>
> I always thought the 3 level p2m is constructed by the kernel, not by
> the tools.
>
> It starts with the linear p2m list anchored at xen_start_info->mfn_list,
> constructs the p2m tree and writes the p2m_top_mfn mfn to
> HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list
>
> See comment in the kernel source arch/x86/xen/p2m.c
>
> So booting with a larger p2m list can be handled completely by the
> kernel itself.

Ah yes - I remember now.  All the toolstack does is create the linear
p2m.  In which case building such a domain will be fine.

>
>>
>> At a minimum, I would expect a patch to libxc to detect a 4 level PV
>> guest and fail with a meaningful error, rather than an obscure "m2p
>> doesn't match p2m for mfn/pfn X".
>
> I'd rather fix it in a clean way.
>
> I think the best way to do it would be an indicator in the p2m array
> anchor, e.g. setting 1<<61 in pfn_to_mfn_frame_list_list. This will
> result in an early error with old tools:
> "Couldn't map p2m_frame_list_list"

No it wont.  The is_mapped() macro in the toolstack is quite broken.  It
stems from a lack of Design/API/ABI concerning things like the p2m.  In
particular, INVALID_MFN is not an ABI constant, nor is any notion of
mapped vs unmapped.

Its current implementation is a relic of 32bit days, and only checks bit
31.  It also means that it is impossible to migrate a PV VM with pfns
above the 43bit limit; a restriction which is lifted by my migration v2
series.  A lot of the other migration constructs are in a similar state,
which is why they are being deleted by the v2 series.

The clean way to fix this is to leave pfn_to_mfn_frame_list_list as
INVALID_MFN. Introduce two new fields beside it named p2m_levels and
p2m_root, which then caters for levels greater than 4 in a compatible
manner.

~Andrew

  reply	other threads:[~2014-09-05  9:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-04 12:38 [PATCH 0/3] xen: remove memory limits from pv-domains Juergen Gross
2014-09-04 12:38 ` [PATCH 1/3] xen: sync some headers with xen tree Juergen Gross
2014-09-04 12:52   ` Jan Beulich
2014-09-04 12:52     ` Jan Beulich
2014-09-05  8:06     ` Juergen Gross
2014-09-04 12:38 ` [PATCH 2/3] xen: eliminate scalability issues from initrd handling Juergen Gross
2014-09-04 12:52   ` David Vrabel
2014-09-04 12:52     ` David Vrabel
2014-09-04 14:29     ` Jan Beulich
2014-09-04 14:29       ` Jan Beulich
2014-09-04 14:53       ` [Xen-devel] " David Vrabel
2014-09-04 14:53         ` David Vrabel
2014-09-05  8:04         ` Juergen Gross
2014-09-04 12:38 ` [PATCH 3/3] xen: eliminate scalability issues from initial mapping setup Juergen Gross
2014-09-04 12:59   ` David Vrabel
2014-09-04 12:59     ` David Vrabel
2014-09-04 13:02     ` [Xen-devel] " Andrew Cooper
2014-09-04 13:02       ` Andrew Cooper
2014-09-04 14:31       ` Jan Beulich
2014-09-04 14:31         ` Jan Beulich
2014-09-04 14:43         ` Andrew Cooper
2014-09-04 14:43           ` Andrew Cooper
2014-09-05  7:55           ` Juergen Gross
2014-09-05  9:05             ` Andrew Cooper [this message]
2014-09-05  9:05               ` Andrew Cooper
2014-09-05  9:44               ` Juergen Gross
2014-09-04 15:13         ` David Vrabel
2014-09-04 15:13           ` David Vrabel
2014-09-05  8:03     ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54097CF5.7080201@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=jgross@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.