All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel@lists.xen.org, Ian.Campbell@citrix.com,
	ian.jackson@eu.citrix.com, stefano.stabellini@eu.citrix.com,
	wei.liu2@citrix.com
Subject: Re: [PATCH 2/4] libxc: support of linear p2m list for migration of pv-domains
Date: Fri, 11 Dec 2015 16:12:41 +0100	[thread overview]
Message-ID: <566AE7E9.8010704@suse.com> (raw)
In-Reply-To: <566AE2D8.8050601@citrix.com>

On 11/12/15 15:51, Andrew Cooper wrote:
> On 11/12/15 11:31, Juergen Gross wrote:
>> In order to be able to migrate pv-domains with more than 512 GB of RAM
>> the p2m information can be specified by the guest kernel via a virtual
>> mapped linear p2m list instead of a 3 level tree.
>>
>> Add support for this new p2m format in libxc.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>  tools/libxc/xc_sr_save_x86_pv.c | 139 +++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 136 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/libxc/xc_sr_save_x86_pv.c b/tools/libxc/xc_sr_save_x86_pv.c
>> index d7acd37..0237378 100644
>> --- a/tools/libxc/xc_sr_save_x86_pv.c
>> +++ b/tools/libxc/xc_sr_save_x86_pv.c
>> @@ -116,7 +116,7 @@ static int map_p2m_leaves(struct xc_sr_context *ctx, xen_pfn_t *mfns,
>>   * frames making up the guests p2m table.  Construct a list of pfns making up
>>   * the table.
>>   */
>> -static int map_p2m(struct xc_sr_context *ctx)
>> +static int map_p2m_tree(struct xc_sr_context *ctx)
>>  {
>>      /* Terminology:
>>       *
>> @@ -138,8 +138,6 @@ static int map_p2m(struct xc_sr_context *ctx)
>>      void *guest_fl = NULL;
>>      size_t local_fl_size;
>>  
>> -    ctx->x86_pv.max_pfn = GET_FIELD(ctx->x86_pv.shinfo, arch.max_pfn,
>> -                                    ctx->x86_pv.width) - 1;
>>      fpp = PAGE_SIZE / ctx->x86_pv.width;
>>      fll_entries = (ctx->x86_pv.max_pfn / (fpp * fpp)) + 1;
>>      if ( fll_entries > fpp )
>> @@ -270,6 +268,141 @@ err:
>>  }
>>  
>>  /*
>> + * Map the guest p2m frames specified via a cr3 value, a virtual address, and
>> + * the maximum pfn.
> 
> Probably worth stating that this function assumes PAE paging is in use.

Okay. I don't mind.

> 
>> + */
>> +static int map_p2m_list(struct xc_sr_context *ctx, uint64_t p2m_cr3)
>> +{
>> +    xc_interface *xch = ctx->xch;
>> +    xen_vaddr_t p2m_vaddr, p2m_end, mask, off;
>> +    xen_pfn_t p2m_mfn, mfn, saved_mfn, max_pfn;
>> +    uint64_t *ptes;
>> +    xen_pfn_t *mfns;
>> +    unsigned fpp, n_pages, level, shift, idx_start, idx_end, idx, saved_idx;
>> +    int rc = -1;
>> +
>> +    p2m_mfn = cr3_to_mfn(ctx, p2m_cr3);
>> +    if ( p2m_mfn == 0 || p2m_mfn > ctx->x86_pv.max_mfn )
> 
> mfn 0 isn't invalid to use here.  It could, in principle, be available
> for PV guest use.

No, the value 0 indicates that the linear p2m info isn't valid. See
comments in xen/include/public/arch-x86/xen.h

> 
>> +    {
>> +        ERROR("Bad p2m_cr3 value %#lx", p2m_cr3);
>> +        errno = ERANGE;
>> +        return -1;
>> +    }
>> +
>> +    p2m_vaddr = GET_FIELD(ctx->x86_pv.shinfo, arch.p2m_vaddr,
>> +                          ctx->x86_pv.width);
>> +    fpp = PAGE_SIZE / ctx->x86_pv.width;
>> +    ctx->x86_pv.p2m_frames = (ctx->x86_pv.max_pfn + fpp) / fpp;
> 
> ctx->x86_pv.max_pfn / fpp + 1
> 
> It is mathematically identically, but resilient to overflow.

Okay.

> 
>> +    p2m_end = p2m_vaddr + ctx->x86_pv.p2m_frames * PAGE_SIZE - 1;
> 
> You probably want to sanity check both p2m_vaddr and p2m_end for being
> either <4G or canonical, depending on the guest, and out of the Xen
> mappings.

Yes, you are right.

> 
> I believe this allows you drop 'mask' in its entirety.

Hmm, no. I'd still have to mask possible top 16 '1' bits away.

> 
>> +    DPRINTF("p2m list from %#lx to %#lx, root at %#lx", p2m_vaddr, p2m_end,
>> +            p2m_mfn);
>> +    DPRINTF("max_pfn %#lx, p2m_frames %d", ctx->x86_pv.max_pfn,
>> +            ctx->x86_pv.p2m_frames);
>> +
>> +    mask = (ctx->x86_pv.width == 8) ?
>> +           0x0000ffffffffffffULL : 0x00000000ffffffffULL;
>> +
>> +    mfns = malloc(sizeof(*mfns));
>> +    if ( !mfns )
>> +    {
>> +        ERROR("Cannot allocate memory for array of %u mfns", 1);
>> +        goto err;
>> +    }
>> +    mfns[0] = p2m_mfn;
>> +    off = 0;
>> +    saved_mfn = 0;
>> +    idx_start = idx_end = saved_idx = 0;
>> +
>> +    for ( level = ctx->x86_pv.levels; level > 0; level-- )
>> +    {
>> +        n_pages = idx_end - idx_start + 1;
>> +        ptes = xc_map_foreign_pages(xch, ctx->domid, PROT_READ, mfns, n_pages);
>> +        if ( !ptes )
>> +        {
>> +            PERROR("Failed to map %u page table pages for p2m list", n_pages);
>> +            goto err;
>> +        }
>> +        free(mfns);
>> +
>> +        shift = level * 9 + 3;
>> +        idx_start = ((p2m_vaddr - off) & mask) >> shift;
>> +        idx_end = ((p2m_end - off) & mask) >> shift;
>> +        idx = idx_end - idx_start + 1;
>> +        mfns = malloc(sizeof(*mfns) * idx);
>> +        if ( !mfns )
>> +        {
>> +            ERROR("Cannot allocate memory for array of %u mfns", idx);
>> +            goto err;
>> +        }
>> +
>> +        for ( idx = idx_start; idx <= idx_end; idx++ )
>> +        {
>> +            mfn = pte_to_frame(ptes[idx]);
>> +            if ( mfn == 0 || mfn > ctx->x86_pv.max_mfn )
>> +            {
>> +                ERROR("Bad mfn %#lx during page table walk for vaddr %#lx at level %d of p2m list",
>> +                      mfn, off + ((xen_vaddr_t)idx << shift), level);
>> +                errno = ERANGE;
>> +                goto err;
>> +            }
>> +            mfns[idx - idx_start] = mfn;
>> +
>> +            /* Maximum pfn check at level 2. Same reasoning as for p2m tree. */
>> +            if ( level == 2 )
>> +            {
>> +                if ( mfn != saved_mfn )
>> +                {
>> +                    saved_mfn = mfn;
>> +                    saved_idx = idx - idx_start;
>> +                }
>> +            }
>> +        }
>> +
>> +        if ( level == 2 )
>> +        {
>> +            max_pfn = ((xen_pfn_t)saved_idx << 9) * fpp - 1;
>> +            if ( max_pfn < ctx->x86_pv.max_pfn )
>> +            {
>> +                ctx->x86_pv.max_pfn = max_pfn;
>> +                ctx->x86_pv.p2m_frames = (ctx->x86_pv.max_pfn + fpp) / fpp;
>> +                p2m_end = p2m_vaddr + ctx->x86_pv.p2m_frames * PAGE_SIZE - 1;
>> +                idx_end = idx_start + saved_idx;
>> +            }
>> +        }
>> +
>> +        munmap(ptes, n_pages * PAGE_SIZE);
>> +        ptes = NULL;
>> +        off = p2m_vaddr & ((mask >> shift) << shift);
>> +    }
>> +
>> +    /* Map the p2m leaves themselves. */
>> +    rc = map_p2m_leaves(ctx, mfns, idx_end - idx_start + 1);
>> +
>> +err:
>> +    free(mfns);
>> +    if ( ptes )
>> +        munmap(ptes, n_pages * PAGE_SIZE);
> 
> Well - I think I have understood what is going on here, and it looks
> plausible.

I hope so. :-)


Juergen

  reply	other threads:[~2015-12-11 15:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-11 11:31 [PATCH 0/4] support linear p2m list in migrate stream v2 Juergen Gross
2015-12-11 11:31 ` [PATCH 1/4] libxc: split mapping p2m leaves into a separate function Juergen Gross
2015-12-11 14:21   ` Andrew Cooper
2015-12-11 11:31 ` [PATCH 2/4] libxc: support of linear p2m list for migration of pv-domains Juergen Gross
2015-12-11 14:51   ` Andrew Cooper
2015-12-11 15:12     ` Juergen Gross [this message]
2015-12-11 15:24       ` Andrew Cooper
2015-12-11 16:00         ` Juergen Gross
2015-12-11 16:09           ` Andrew Cooper
2015-12-11 16:17             ` Juergen Gross
2015-12-11 11:31 ` [PATCH 3/4] libxc: stop migration in case of p2m list structural changes Juergen Gross
2015-12-11 15:20   ` Andrew Cooper
2015-12-11 16:02     ` Juergen Gross
2015-12-11 11:31 ` [PATCH 4/4] libxc: set flag for support of linear p2m list in domain builder Juergen Gross
2015-12-11 14:18 ` [PATCH 0/4] support linear p2m list in migrate stream v2 Andrew Cooper
2015-12-11 14:20   ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=566AE7E9.8010704@suse.com \
    --to=jgross@suse.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.