All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] libxc: try to find last used pfn when migrating
@ 2015-11-27 14:50 Juergen Gross
  2015-11-27 15:01 ` David Vrabel
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Juergen Gross @ 2015-11-27 14:50 UTC (permalink / raw)
  To: xen-devel, Ian.Campbell, ian.jackson, stefano.stabellini,
	wei.liu2, andrew.cooper3
  Cc: Juergen Gross

For migration the last used pfn of a guest is needed to size the
logdirty bitmap and as an upper bound of the page loop. Unfortunately
there are pv-kernels advertising a much higher maximum pfn as they
are really using in order to support memory hotplug. This will lead
to allocation of much more memory in Xen tools during migration as
really needed.

Try to find the last used guest pfn of a pv-domu by scanning the p2m
tree from the last entry towards it's start and search for an entry
not being invalid.

Normally the mid pages of the p2m tree containing all invalid entries
are being reused, so we can just scan the top page for identical
entries and skip them but the first one.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 tools/libxc/xc_sr_save.c        |  8 ++++----
 tools/libxc/xc_sr_save_x86_pv.c | 34 +++++++++++++++++++++++++++++++---
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 0c12e56..22b3f18 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -677,6 +677,10 @@ static int setup(struct xc_sr_context *ctx)
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->save.dirty_bitmap_hbuf);
 
+    rc = ctx->save.ops.setup(ctx);
+    if ( rc )
+        goto err;
+
     dirty_bitmap = xc_hypercall_buffer_alloc_pages(
                    xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
     ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
@@ -692,10 +696,6 @@ static int setup(struct xc_sr_context *ctx)
         goto err;
     }
 
-    rc = ctx->save.ops.setup(ctx);
-    if ( rc )
-        goto err;
-
     rc = 0;
 
  err:
diff --git a/tools/libxc/xc_sr_save_x86_pv.c b/tools/libxc/xc_sr_save_x86_pv.c
index f63f40b..6189fda 100644
--- a/tools/libxc/xc_sr_save_x86_pv.c
+++ b/tools/libxc/xc_sr_save_x86_pv.c
@@ -83,8 +83,8 @@ static int map_p2m(struct xc_sr_context *ctx)
      */
     xc_interface *xch = ctx->xch;
     int rc = -1;
-    unsigned x, fpp, fll_entries, fl_entries;
-    xen_pfn_t fll_mfn;
+    unsigned x, saved_x, fpp, fll_entries, fl_entries;
+    xen_pfn_t fll_mfn, saved_mfn, max_pfn;
 
     xen_pfn_t *local_fll = NULL;
     void *guest_fll = NULL;
@@ -96,7 +96,6 @@ static int map_p2m(struct xc_sr_context *ctx)
 
     fpp = PAGE_SIZE / ctx->x86_pv.width;
     fll_entries = (ctx->x86_pv.max_pfn / (fpp * fpp)) + 1;
-    fl_entries  = (ctx->x86_pv.max_pfn / fpp) + 1;
 
     fll_mfn = GET_FIELD(ctx->x86_pv.shinfo, arch.pfn_to_mfn_frame_list_list,
                         ctx->x86_pv.width);
@@ -131,6 +130,8 @@ static int map_p2m(struct xc_sr_context *ctx)
     }
 
     /* Check for bad mfns in frame list list. */
+    saved_mfn = 0;
+    saved_x = 0;
     for ( x = 0; x < fll_entries; ++x )
     {
         if ( local_fll[x] == 0 || local_fll[x] > ctx->x86_pv.max_mfn )
@@ -139,8 +140,35 @@ static int map_p2m(struct xc_sr_context *ctx)
                   local_fll[x], x, fll_entries);
             goto err;
         }
+        if ( local_fll[x] != saved_mfn )
+        {
+            saved_mfn = local_fll[x];
+            saved_x = x;
+        }
     }
 
+    /*
+     * Check for actual lower max_pfn:
+     * If the trailing entries of the frame list list were all the same we can
+     * assume they all reference mid pages all referencing p2m pages with all
+     * invalid entries. Otherwise there would be multiple pfns referencing all
+     * the same mfn which can't work across migration, as this sharing would be
+     * broken by the migration process.
+     * Adjust max_pfn if possible to avoid allocating much larger areas as
+     * needed for p2m and logdirty map.
+     */
+    max_pfn = (saved_x + 1) * fpp * fpp - 1;
+    if ( max_pfn < ctx->x86_pv.max_pfn )
+    {
+        ctx->x86_pv.max_pfn = max_pfn;
+        ctx->x86_pv.p2m_frames = (ctx->x86_pv.max_pfn + fpp) / fpp;
+        ctx->save.p2m_size = max_pfn + 1;
+        fll_entries = (ctx->x86_pv.max_pfn / (fpp * fpp)) + 1;
+        DPRINTF("lowering max_pfn to %#lx, p2m_frames %d", max_pfn,
+                ctx->x86_pv.p2m_frames);
+    }
+    fl_entries  = (ctx->x86_pv.max_pfn / fpp) + 1;
+
     /* Map the guest mid p2m frames. */
     guest_fl = xc_map_foreign_pages(xch, ctx->domid, PROT_READ,
                                     local_fll, fll_entries);
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 14:50 [PATCH] libxc: try to find last used pfn when migrating Juergen Gross
@ 2015-11-27 15:01 ` David Vrabel
  2015-11-27 15:11   ` Juergen Gross
  2015-11-27 15:33 ` Wei Liu
  2015-11-27 16:42 ` Andrew Cooper
  2 siblings, 1 reply; 15+ messages in thread
From: David Vrabel @ 2015-11-27 15:01 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2, andrew.cooper3

On 27/11/15 14:50, Juergen Gross wrote:
> For migration the last used pfn of a guest is needed to size the
> logdirty bitmap and as an upper bound of the page loop. Unfortunately
> there are pv-kernels advertising a much higher maximum pfn as they
> are really using in order to support memory hotplug. This will lead
> to allocation of much more memory in Xen tools during migration as
> really needed.
> 
> Try to find the last used guest pfn of a pv-domu by scanning the p2m
> tree from the last entry towards it's start and search for an entry
> not being invalid.
> 
> Normally the mid pages of the p2m tree containing all invalid entries
> are being reused, so we can just scan the top page for identical
> entries and skip them but the first one.

This has been fixed in Linux by "x86/xen/p2m: hint at the last populated
P2M entry" 98dd166ea3a3c3b57919e20d9b0d1237fcd0349d which is tagged for
stable.

Do we really need a toolstack fix as well?

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 15:01 ` David Vrabel
@ 2015-11-27 15:11   ` Juergen Gross
  2015-11-27 15:18     ` Ian Campbell
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Juergen Gross @ 2015-11-27 15:11 UTC (permalink / raw)
  To: David Vrabel, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2, andrew.cooper3

On 27/11/15 16:01, David Vrabel wrote:
> On 27/11/15 14:50, Juergen Gross wrote:
>> For migration the last used pfn of a guest is needed to size the
>> logdirty bitmap and as an upper bound of the page loop. Unfortunately
>> there are pv-kernels advertising a much higher maximum pfn as they
>> are really using in order to support memory hotplug. This will lead
>> to allocation of much more memory in Xen tools during migration as
>> really needed.
>>
>> Try to find the last used guest pfn of a pv-domu by scanning the p2m
>> tree from the last entry towards it's start and search for an entry
>> not being invalid.
>>
>> Normally the mid pages of the p2m tree containing all invalid entries
>> are being reused, so we can just scan the top page for identical
>> entries and skip them but the first one.
> 
> This has been fixed in Linux by "x86/xen/p2m: hint at the last populated
> P2M entry" 98dd166ea3a3c3b57919e20d9b0d1237fcd0349d which is tagged for
> stable.
> 
> Do we really need a toolstack fix as well?

xl migrate will use much less resources for a domain with a 3.x kernel
started with max_mem being much larger than mem. E.g. in case you start
a domain on a small stand-by system and migrate it later to the large
production system and want to balloon it up there.

Additionally there was a discussion this week on irc regarding this
topic and concern was raised this could block dom0 responsiveness.


Juergen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 15:11   ` Juergen Gross
@ 2015-11-27 15:18     ` Ian Campbell
  2015-11-27 15:20     ` David Vrabel
  2015-11-27 15:22     ` Ian Jackson
  2 siblings, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2015-11-27 15:18 UTC (permalink / raw)
  To: Juergen Gross, David Vrabel, xen-devel, ian.jackson,
	stefano.stabellini, wei.liu2, andrew.cooper3

On Fri, 2015-11-27 at 16:11 +0100, Juergen Gross wrote:
> On 27/11/15 16:01, David Vrabel wrote:
> > On 27/11/15 14:50, Juergen Gross wrote:
> > > For migration the last used pfn of a guest is needed to size the
> > > logdirty bitmap and as an upper bound of the page loop. Unfortunately
> > > there are pv-kernels advertising a much higher maximum pfn as they
> > > are really using in order to support memory hotplug. This will lead
> > > to allocation of much more memory in Xen tools during migration as
> > > really needed.
> > > 
> > > Try to find the last used guest pfn of a pv-domu by scanning the p2m
> > > tree from the last entry towards it's start and search for an entry
> > > not being invalid.
> > > 
> > > Normally the mid pages of the p2m tree containing all invalid entries
> > > are being reused, so we can just scan the top page for identical
> > > entries and skip them but the first one.
> > 
> > This has been fixed in Linux by "x86/xen/p2m: hint at the last
> > populated
> > P2M entry" 98dd166ea3a3c3b57919e20d9b0d1237fcd0349d which is tagged for
> > stable.
> > 
> > Do we really need a toolstack fix as well?
> 
> xl migrate will use much less resources for a domain with a 3.x kernel
> started with max_mem being much larger than mem. E.g. in case you start
> a domain on a small stand-by system and migrate it later to the large
> production system and want to balloon it up there.
> 
> Additionally there was a discussion this week on irc regarding this
> topic and concern was raised this could block dom0 responsiveness.

We certainly want the toolstack domain to be more resilient in the face of
ludicrous guest-supplied numbers than it is today.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 15:11   ` Juergen Gross
  2015-11-27 15:18     ` Ian Campbell
@ 2015-11-27 15:20     ` David Vrabel
  2015-11-27 15:22     ` Ian Jackson
  2 siblings, 0 replies; 15+ messages in thread
From: David Vrabel @ 2015-11-27 15:20 UTC (permalink / raw)
  To: Juergen Gross, David Vrabel, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2, andrew.cooper3

On 27/11/15 15:11, Juergen Gross wrote:
> On 27/11/15 16:01, David Vrabel wrote:
>> On 27/11/15 14:50, Juergen Gross wrote:
>>> For migration the last used pfn of a guest is needed to size the
>>> logdirty bitmap and as an upper bound of the page loop. Unfortunately
>>> there are pv-kernels advertising a much higher maximum pfn as they
>>> are really using in order to support memory hotplug. This will lead
>>> to allocation of much more memory in Xen tools during migration as
>>> really needed.
>>>
>>> Try to find the last used guest pfn of a pv-domu by scanning the p2m
>>> tree from the last entry towards it's start and search for an entry
>>> not being invalid.
>>>
>>> Normally the mid pages of the p2m tree containing all invalid entries
>>> are being reused, so we can just scan the top page for identical
>>> entries and skip them but the first one.
>>
>> This has been fixed in Linux by "x86/xen/p2m: hint at the last populated
>> P2M entry" 98dd166ea3a3c3b57919e20d9b0d1237fcd0349d which is tagged for
>> stable.
>>
>> Do we really need a toolstack fix as well?
> 
> xl migrate will use much less resources for a domain with a 3.x kernel
> started with max_mem being much larger than mem. E.g. in case you start
> a domain on a small stand-by system and migrate it later to the large
> production system and want to balloon it up there.
> 
> Additionally there was a discussion this week on irc regarding this
> topic and concern was raised this could block dom0 responsiveness.

Ok.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 15:11   ` Juergen Gross
  2015-11-27 15:18     ` Ian Campbell
  2015-11-27 15:20     ` David Vrabel
@ 2015-11-27 15:22     ` Ian Jackson
  2015-11-27 15:50       ` Juergen Gross
  2 siblings, 1 reply; 15+ messages in thread
From: Ian Jackson @ 2015-11-27 15:22 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	xen-devel, David Vrabel

Juergen Gross writes ("Re: [Xen-devel] [PATCH] libxc: try to find last used pfn when migrating"):
> xl migrate will use much less resources for a domain with a 3.x kernel
> started with max_mem being much larger than mem. E.g. in case you start
> a domain on a small stand-by system and migrate it later to the large
> production system and want to balloon it up there.
> 
> Additionally there was a discussion this week on irc regarding this
> topic and concern was raised this could block dom0 responsiveness.

I agree that this is a real problem but AFAICT I don't think the
approach taken in Juergen's toolstack patch will solve it completely.

I would phrase the bug like this:

   A malicious guest kernel can cause the toolstack, when attempting
   to migrate the domain, to use wildly excessive dom0 RAM.

I think where the administrator has configured a guest with (say) 1G
of RAM, the memory used by the toolstack to migrate it should be
significantly less than that 1G.

If the toolstack algorithms are such that strange behaviour by a guest
could violate this assumption, then the toolstack should have an
explicit check and (by default, at least) refuse to migrate such a
guest.

I think Juergen's patch is a good workaround for existing guests which
/accidentally/ exhibit undesirable behaviour, if we want to keep
supporting them.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 14:50 [PATCH] libxc: try to find last used pfn when migrating Juergen Gross
  2015-11-27 15:01 ` David Vrabel
@ 2015-11-27 15:33 ` Wei Liu
  2015-11-27 15:53   ` Juergen Gross
  2015-11-27 16:42 ` Andrew Cooper
  2 siblings, 1 reply; 15+ messages in thread
From: Wei Liu @ 2015-11-27 15:33 UTC (permalink / raw)
  To: Juergen Gross
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	ian.jackson, xen-devel

On Fri, Nov 27, 2015 at 03:50:53PM +0100, Juergen Gross wrote:
> For migration the last used pfn of a guest is needed to size the
> logdirty bitmap and as an upper bound of the page loop. Unfortunately
> there are pv-kernels advertising a much higher maximum pfn as they
> are really using in order to support memory hotplug. This will lead
> to allocation of much more memory in Xen tools during migration as
> really needed.
> 
> Try to find the last used guest pfn of a pv-domu by scanning the p2m
> tree from the last entry towards it's start and search for an entry
> not being invalid.
> 
> Normally the mid pages of the p2m tree containing all invalid entries
> are being reused, so we can just scan the top page for identical
> entries and skip them but the first one.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  tools/libxc/xc_sr_save.c        |  8 ++++----
>  tools/libxc/xc_sr_save_x86_pv.c | 34 +++++++++++++++++++++++++++++++---
>  2 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 0c12e56..22b3f18 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -677,6 +677,10 @@ static int setup(struct xc_sr_context *ctx)
>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>                                      &ctx->save.dirty_bitmap_hbuf);
>  
> +    rc = ctx->save.ops.setup(ctx);
> +    if ( rc )
> +        goto err;
> +
>      dirty_bitmap = xc_hypercall_buffer_alloc_pages(
>                     xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>      ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
> @@ -692,10 +696,6 @@ static int setup(struct xc_sr_context *ctx)
>          goto err;
>      }
>  

p2m_size is set in two places for PV guest.  Since we are now relying on
ops.setup to set p2m_size, the invocation to xc_domain_nr_gpfns in
xc_domain_save should be removed, so that we only have one place to set
p2m_size.

You then also need to call xc_domain_nr_gpfns in HVM's ops.setup
(x86_hvm_setup).

Wei.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 15:22     ` Ian Jackson
@ 2015-11-27 15:50       ` Juergen Gross
  0 siblings, 0 replies; 15+ messages in thread
From: Juergen Gross @ 2015-11-27 15:50 UTC (permalink / raw)
  To: Ian Jackson
  Cc: wei.liu2, Ian.Campbell, stefano.stabellini, andrew.cooper3,
	xen-devel, David Vrabel

On 27/11/15 16:22, Ian Jackson wrote:
> Juergen Gross writes ("Re: [Xen-devel] [PATCH] libxc: try to find last used pfn when migrating"):
>> xl migrate will use much less resources for a domain with a 3.x kernel
>> started with max_mem being much larger than mem. E.g. in case you start
>> a domain on a small stand-by system and migrate it later to the large
>> production system and want to balloon it up there.
>>
>> Additionally there was a discussion this week on irc regarding this
>> topic and concern was raised this could block dom0 responsiveness.
> 
> I agree that this is a real problem but AFAICT I don't think the
> approach taken in Juergen's toolstack patch will solve it completely.
> 
> I would phrase the bug like this:
> 
>    A malicious guest kernel can cause the toolstack, when attempting
>    to migrate the domain, to use wildly excessive dom0 RAM.
> 
> I think where the administrator has configured a guest with (say) 1G
> of RAM, the memory used by the toolstack to migrate it should be
> significantly less than that 1G.
> 
> If the toolstack algorithms are such that strange behaviour by a guest
> could violate this assumption, then the toolstack should have an
> explicit check and (by default, at least) refuse to migrate such a
> guest.
> 
> I think Juergen's patch is a good workaround for existing guests which
> /accidentally/ exhibit undesirable behaviour, if we want to keep
> supporting them.

It should be considered to be a working base for being able to reject
migrating a misbehaving domain.

I guess any pv-guest which active p2m list is covering more than twice
it's max_mem can be considered to be misbehaving and migration could be
rejected.

More fine tuning would be possible by supporting a sparse p2m list, but
this would require more work. I'm not sure if it's possible to support
a sparse logdirty bitmap with current hypervisor interfaces.


Juergen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 15:33 ` Wei Liu
@ 2015-11-27 15:53   ` Juergen Gross
  2015-11-27 17:16     ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Juergen Gross @ 2015-11-27 15:53 UTC (permalink / raw)
  To: Wei Liu, andrew.cooper3
  Cc: stefano.stabellini, ian.jackson, Ian.Campbell, xen-devel

On 27/11/15 16:33, Wei Liu wrote:
> On Fri, Nov 27, 2015 at 03:50:53PM +0100, Juergen Gross wrote:
>> For migration the last used pfn of a guest is needed to size the
>> logdirty bitmap and as an upper bound of the page loop. Unfortunately
>> there are pv-kernels advertising a much higher maximum pfn as they
>> are really using in order to support memory hotplug. This will lead
>> to allocation of much more memory in Xen tools during migration as
>> really needed.
>>
>> Try to find the last used guest pfn of a pv-domu by scanning the p2m
>> tree from the last entry towards it's start and search for an entry
>> not being invalid.
>>
>> Normally the mid pages of the p2m tree containing all invalid entries
>> are being reused, so we can just scan the top page for identical
>> entries and skip them but the first one.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>  tools/libxc/xc_sr_save.c        |  8 ++++----
>>  tools/libxc/xc_sr_save_x86_pv.c | 34 +++++++++++++++++++++++++++++++---
>>  2 files changed, 35 insertions(+), 7 deletions(-)
>>
>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>> index 0c12e56..22b3f18 100644
>> --- a/tools/libxc/xc_sr_save.c
>> +++ b/tools/libxc/xc_sr_save.c
>> @@ -677,6 +677,10 @@ static int setup(struct xc_sr_context *ctx)
>>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>                                      &ctx->save.dirty_bitmap_hbuf);
>>  
>> +    rc = ctx->save.ops.setup(ctx);
>> +    if ( rc )
>> +        goto err;
>> +
>>      dirty_bitmap = xc_hypercall_buffer_alloc_pages(
>>                     xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>>      ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
>> @@ -692,10 +696,6 @@ static int setup(struct xc_sr_context *ctx)
>>          goto err;
>>      }
>>  
> 
> p2m_size is set in two places for PV guest.  Since we are now relying on
> ops.setup to set p2m_size, the invocation to xc_domain_nr_gpfns in
> xc_domain_save should be removed, so that we only have one place to set
> p2m_size.

I wasn't sure this is needed in save case only. If it is I can make
the change, of course. Andrew?

> You then also need to call xc_domain_nr_gpfns in HVM's ops.setup
> (x86_hvm_setup).

Sure.


Juergen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 14:50 [PATCH] libxc: try to find last used pfn when migrating Juergen Gross
  2015-11-27 15:01 ` David Vrabel
  2015-11-27 15:33 ` Wei Liu
@ 2015-11-27 16:42 ` Andrew Cooper
  2015-11-27 16:51   ` Juergen Gross
  2 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2015-11-27 16:42 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2

On 27/11/15 14:50, Juergen Gross wrote:
> For migration the last used pfn of a guest is needed to size the
> logdirty bitmap and as an upper bound of the page loop. Unfortunately
> there are pv-kernels advertising a much higher maximum pfn as they
> are really using in order to support memory hotplug. This will lead
> to allocation of much more memory in Xen tools during migration as
> really needed.
>
> Try to find the last used guest pfn of a pv-domu by scanning the p2m
> tree from the last entry towards it's start and search for an entry
> not being invalid.
>
> Normally the mid pages of the p2m tree containing all invalid entries
> are being reused, so we can just scan the top page for identical
> entries and skip them but the first one.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  tools/libxc/xc_sr_save.c        |  8 ++++----
>  tools/libxc/xc_sr_save_x86_pv.c | 34 +++++++++++++++++++++++++++++++---
>  2 files changed, 35 insertions(+), 7 deletions(-)
>
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 0c12e56..22b3f18 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -677,6 +677,10 @@ static int setup(struct xc_sr_context *ctx)
>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>                                      &ctx->save.dirty_bitmap_hbuf);
>  
> +    rc = ctx->save.ops.setup(ctx);
> +    if ( rc )
> +        goto err;
> +
>      dirty_bitmap = xc_hypercall_buffer_alloc_pages(
>                     xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>      ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
> @@ -692,10 +696,6 @@ static int setup(struct xc_sr_context *ctx)
>          goto err;
>      }
>  
> -    rc = ctx->save.ops.setup(ctx);
> -    if ( rc )
> -        goto err;
> -
>      rc = 0;
>  
>   err:

While moving this, the restore side should be consistent (turns out it
already is), and the docs updated.  There was an inaccuracy, so I went
ahead and did it.

--8<--
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 64f6082..ae77155 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -54,9 +54,11 @@ struct xc_sr_save_ops
                           void **page);
 
     /**
-     * Set up local environment to restore a domain.  This is called before
-     * any records are written to the stream.  (Typically querying running
-     * domain state, setting up mappings etc.)
+     * Set up local environment to save a domain. (Typically querying
+     * running domain state, setting up mappings etc.)
+     *
+     * This is called once before any common setup has occurred,
allowing for
+     * guest-specific adjustments to be made to common state.
      */
     int (*setup)(struct xc_sr_context *ctx);
 
@@ -121,8 +123,10 @@ struct xc_sr_restore_ops
     int (*localise_page)(struct xc_sr_context *ctx, uint32_t type, void
*page);
 
     /**
-     * Set up local environment to restore a domain.  This is called before
-     * any records are read from the stream.
+     * Set up local environment to restore a domain.
+     *
+     * This is called once before any common setup has occurred,
allowing for
+     * guest-specific adjustments to be made to common state.
      */
     int (*setup)(struct xc_sr_context *ctx);
 
--8<--

Feel free to fold this into your patch, or I can submit it alone as a
cleanup prerequisite for your functional change below.

> diff --git a/tools/libxc/xc_sr_save_x86_pv.c b/tools/libxc/xc_sr_save_x86_pv.c
> index f63f40b..6189fda 100644
> --- a/tools/libxc/xc_sr_save_x86_pv.c
> +++ b/tools/libxc/xc_sr_save_x86_pv.c
> @@ -83,8 +83,8 @@ static int map_p2m(struct xc_sr_context *ctx)
>       */
>      xc_interface *xch = ctx->xch;
>      int rc = -1;
> -    unsigned x, fpp, fll_entries, fl_entries;
> -    xen_pfn_t fll_mfn;
> +    unsigned x, saved_x, fpp, fll_entries, fl_entries;
> +    xen_pfn_t fll_mfn, saved_mfn, max_pfn;
>  
>      xen_pfn_t *local_fll = NULL;
>      void *guest_fll = NULL;
> @@ -96,7 +96,6 @@ static int map_p2m(struct xc_sr_context *ctx)
>  
>      fpp = PAGE_SIZE / ctx->x86_pv.width;
>      fll_entries = (ctx->x86_pv.max_pfn / (fpp * fpp)) + 1;
> -    fl_entries  = (ctx->x86_pv.max_pfn / fpp) + 1;
>  
>      fll_mfn = GET_FIELD(ctx->x86_pv.shinfo, arch.pfn_to_mfn_frame_list_list,
>                          ctx->x86_pv.width);
> @@ -131,6 +130,8 @@ static int map_p2m(struct xc_sr_context *ctx)
>      }
>  
>      /* Check for bad mfns in frame list list. */
> +    saved_mfn = 0;
> +    saved_x = 0;
>      for ( x = 0; x < fll_entries; ++x )
>      {
>          if ( local_fll[x] == 0 || local_fll[x] > ctx->x86_pv.max_mfn )
> @@ -139,8 +140,35 @@ static int map_p2m(struct xc_sr_context *ctx)
>                    local_fll[x], x, fll_entries);
>              goto err;
>          }
> +        if ( local_fll[x] != saved_mfn )
> +        {
> +            saved_mfn = local_fll[x];
> +            saved_x = x;
> +        }
>      }
>  
> +    /*
> +     * Check for actual lower max_pfn:
> +     * If the trailing entries of the frame list list were all the same we can
> +     * assume they all reference mid pages all referencing p2m pages with all
> +     * invalid entries. Otherwise there would be multiple pfns referencing all
> +     * the same mfn which can't work across migration, as this sharing would be
> +     * broken by the migration process.
> +     * Adjust max_pfn if possible to avoid allocating much larger areas as
> +     * needed for p2m and logdirty map.
> +     */
> +    max_pfn = (saved_x + 1) * fpp * fpp - 1;
> +    if ( max_pfn < ctx->x86_pv.max_pfn )
> +    {
> +        ctx->x86_pv.max_pfn = max_pfn;
> +        ctx->x86_pv.p2m_frames = (ctx->x86_pv.max_pfn + fpp) / fpp;
> +        ctx->save.p2m_size = max_pfn + 1;
> +        fll_entries = (ctx->x86_pv.max_pfn / (fpp * fpp)) + 1;
> +        DPRINTF("lowering max_pfn to %#lx, p2m_frames %d", max_pfn,
> +                ctx->x86_pv.p2m_frames);
> +    }
> +    fl_entries  = (ctx->x86_pv.max_pfn / fpp) + 1;
> +
>      /* Map the guest mid p2m frames. */
>      guest_fl = xc_map_foreign_pages(xch, ctx->domid, PROT_READ,
>                                      local_fll, fll_entries);

For the actual change, Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 16:42 ` Andrew Cooper
@ 2015-11-27 16:51   ` Juergen Gross
  0 siblings, 0 replies; 15+ messages in thread
From: Juergen Gross @ 2015-11-27 16:51 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel, Ian.Campbell, ian.jackson,
	stefano.stabellini, wei.liu2

On 27/11/15 17:42, Andrew Cooper wrote:
> On 27/11/15 14:50, Juergen Gross wrote:
>> For migration the last used pfn of a guest is needed to size the
>> logdirty bitmap and as an upper bound of the page loop. Unfortunately
>> there are pv-kernels advertising a much higher maximum pfn as they
>> are really using in order to support memory hotplug. This will lead
>> to allocation of much more memory in Xen tools during migration as
>> really needed.
>>
>> Try to find the last used guest pfn of a pv-domu by scanning the p2m
>> tree from the last entry towards it's start and search for an entry
>> not being invalid.
>>
>> Normally the mid pages of the p2m tree containing all invalid entries
>> are being reused, so we can just scan the top page for identical
>> entries and skip them but the first one.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>  tools/libxc/xc_sr_save.c        |  8 ++++----
>>  tools/libxc/xc_sr_save_x86_pv.c | 34 +++++++++++++++++++++++++++++++---
>>  2 files changed, 35 insertions(+), 7 deletions(-)
>>
>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>> index 0c12e56..22b3f18 100644
>> --- a/tools/libxc/xc_sr_save.c
>> +++ b/tools/libxc/xc_sr_save.c
>> @@ -677,6 +677,10 @@ static int setup(struct xc_sr_context *ctx)
>>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>                                      &ctx->save.dirty_bitmap_hbuf);
>>  
>> +    rc = ctx->save.ops.setup(ctx);
>> +    if ( rc )
>> +        goto err;
>> +
>>      dirty_bitmap = xc_hypercall_buffer_alloc_pages(
>>                     xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>>      ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
>> @@ -692,10 +696,6 @@ static int setup(struct xc_sr_context *ctx)
>>          goto err;
>>      }
>>  
>> -    rc = ctx->save.ops.setup(ctx);
>> -    if ( rc )
>> -        goto err;
>> -
>>      rc = 0;
>>  
>>   err:
> 
> While moving this, the restore side should be consistent (turns out it
> already is), and the docs updated.  There was an inaccuracy, so I went
> ahead and did it.
> 
> --8<--
> diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
> index 64f6082..ae77155 100644
> --- a/tools/libxc/xc_sr_common.h
> +++ b/tools/libxc/xc_sr_common.h
> @@ -54,9 +54,11 @@ struct xc_sr_save_ops
>                            void **page);
>  
>      /**
> -     * Set up local environment to restore a domain.  This is called before
> -     * any records are written to the stream.  (Typically querying running
> -     * domain state, setting up mappings etc.)
> +     * Set up local environment to save a domain. (Typically querying
> +     * running domain state, setting up mappings etc.)
> +     *
> +     * This is called once before any common setup has occurred,
> allowing for
> +     * guest-specific adjustments to be made to common state.
>       */
>      int (*setup)(struct xc_sr_context *ctx);
>  
> @@ -121,8 +123,10 @@ struct xc_sr_restore_ops
>      int (*localise_page)(struct xc_sr_context *ctx, uint32_t type, void
> *page);
>  
>      /**
> -     * Set up local environment to restore a domain.  This is called before
> -     * any records are read from the stream.
> +     * Set up local environment to restore a domain.
> +     *
> +     * This is called once before any common setup has occurred,
> allowing for
> +     * guest-specific adjustments to be made to common state.
>       */
>      int (*setup)(struct xc_sr_context *ctx);
>  
> --8<--
> 
> Feel free to fold this into your patch, or I can submit it alone as a
> cleanup prerequisite for your functional change below.

I'll fold it in, thanks.


Juergen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 15:53   ` Juergen Gross
@ 2015-11-27 17:16     ` Andrew Cooper
  2015-11-30  8:17       ` Juergen Gross
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2015-11-27 17:16 UTC (permalink / raw)
  To: Juergen Gross, Wei Liu
  Cc: stefano.stabellini, ian.jackson, Ian.Campbell, xen-devel

On 27/11/15 15:53, Juergen Gross wrote:
> On 27/11/15 16:33, Wei Liu wrote:
>> On Fri, Nov 27, 2015 at 03:50:53PM +0100, Juergen Gross wrote:
>>> For migration the last used pfn of a guest is needed to size the
>>> logdirty bitmap and as an upper bound of the page loop. Unfortunately
>>> there are pv-kernels advertising a much higher maximum pfn as they
>>> are really using in order to support memory hotplug. This will lead
>>> to allocation of much more memory in Xen tools during migration as
>>> really needed.
>>>
>>> Try to find the last used guest pfn of a pv-domu by scanning the p2m
>>> tree from the last entry towards it's start and search for an entry
>>> not being invalid.
>>>
>>> Normally the mid pages of the p2m tree containing all invalid entries
>>> are being reused, so we can just scan the top page for identical
>>> entries and skip them but the first one.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>>  tools/libxc/xc_sr_save.c        |  8 ++++----
>>>  tools/libxc/xc_sr_save_x86_pv.c | 34 +++++++++++++++++++++++++++++++---
>>>  2 files changed, 35 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>>> index 0c12e56..22b3f18 100644
>>> --- a/tools/libxc/xc_sr_save.c
>>> +++ b/tools/libxc/xc_sr_save.c
>>> @@ -677,6 +677,10 @@ static int setup(struct xc_sr_context *ctx)
>>>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>>                                      &ctx->save.dirty_bitmap_hbuf);
>>>  
>>> +    rc = ctx->save.ops.setup(ctx);
>>> +    if ( rc )
>>> +        goto err;
>>> +
>>>      dirty_bitmap = xc_hypercall_buffer_alloc_pages(
>>>                     xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>>>      ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
>>> @@ -692,10 +696,6 @@ static int setup(struct xc_sr_context *ctx)
>>>          goto err;
>>>      }
>>>  
>> p2m_size is set in two places for PV guest.  Since we are now relying on
>> ops.setup to set p2m_size, the invocation to xc_domain_nr_gpfns in
>> xc_domain_save should be removed, so that we only have one place to set
>> p2m_size.
> I wasn't sure this is needed in save case only. If it is I can make
> the change, of course. Andrew?
>
>> You then also need to call xc_domain_nr_gpfns in HVM's ops.setup
>> (x86_hvm_setup).
> Sure.

That is most likely a byproduct of this codes somewhat-organic growth
over 18 months.

The check in xc_domain_save() is a check left over from legacy
migration.  It was present in legacy migration as legacy merged the page
type into the upper 4 bits of the unsigned long pfn, meaning that a pfn
of (2^28)-1 was the INVALID_PFN representation in the 32bit stream.

The check is less important for migration v2, but I left it in as an
upper sanity check.

I am not aversed to removing the check if we are no longer going to
believe the result of XENMEM_maximum_gpfn, and requiring that setup()
get and sanity check the domain size.

(Incidently, it seems crazy that we make a XENMEM_maximum_gpfn hypercall
to read a value from the shared info page, which is also (about to be)
mapped locally.)

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-27 17:16     ` Andrew Cooper
@ 2015-11-30  8:17       ` Juergen Gross
  2015-11-30  9:47         ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Juergen Gross @ 2015-11-30  8:17 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu
  Cc: stefano.stabellini, ian.jackson, Ian.Campbell, xen-devel

On 27/11/15 18:16, Andrew Cooper wrote:
> On 27/11/15 15:53, Juergen Gross wrote:
>> On 27/11/15 16:33, Wei Liu wrote:
>>> On Fri, Nov 27, 2015 at 03:50:53PM +0100, Juergen Gross wrote:
>>>> For migration the last used pfn of a guest is needed to size the
>>>> logdirty bitmap and as an upper bound of the page loop. Unfortunately
>>>> there are pv-kernels advertising a much higher maximum pfn as they
>>>> are really using in order to support memory hotplug. This will lead
>>>> to allocation of much more memory in Xen tools during migration as
>>>> really needed.
>>>>
>>>> Try to find the last used guest pfn of a pv-domu by scanning the p2m
>>>> tree from the last entry towards it's start and search for an entry
>>>> not being invalid.
>>>>
>>>> Normally the mid pages of the p2m tree containing all invalid entries
>>>> are being reused, so we can just scan the top page for identical
>>>> entries and skip them but the first one.
>>>>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>>  tools/libxc/xc_sr_save.c        |  8 ++++----
>>>>  tools/libxc/xc_sr_save_x86_pv.c | 34 +++++++++++++++++++++++++++++++---
>>>>  2 files changed, 35 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>>>> index 0c12e56..22b3f18 100644
>>>> --- a/tools/libxc/xc_sr_save.c
>>>> +++ b/tools/libxc/xc_sr_save.c
>>>> @@ -677,6 +677,10 @@ static int setup(struct xc_sr_context *ctx)
>>>>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>>>                                      &ctx->save.dirty_bitmap_hbuf);
>>>>  
>>>> +    rc = ctx->save.ops.setup(ctx);
>>>> +    if ( rc )
>>>> +        goto err;
>>>> +
>>>>      dirty_bitmap = xc_hypercall_buffer_alloc_pages(
>>>>                     xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>>>>      ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
>>>> @@ -692,10 +696,6 @@ static int setup(struct xc_sr_context *ctx)
>>>>          goto err;
>>>>      }
>>>>  
>>> p2m_size is set in two places for PV guest.  Since we are now relying on
>>> ops.setup to set p2m_size, the invocation to xc_domain_nr_gpfns in
>>> xc_domain_save should be removed, so that we only have one place to set
>>> p2m_size.
>> I wasn't sure this is needed in save case only. If it is I can make
>> the change, of course. Andrew?

I got this one wrong, sorry. I was relating to the xc_domain_nr_gpfns()
call in x86_pv_domain_info(). And here the question really applies:

Do we need this call at all? I don't think it is needed for the
restore operation. At least in my tests the upper pfn bound was always
taken from the data stream, not from xc_domain_nr_gpfns().

> That is most likely a byproduct of this codes somewhat-organic growth
> over 18 months.
> 
> The check in xc_domain_save() is a check left over from legacy
> migration.  It was present in legacy migration as legacy merged the page
> type into the upper 4 bits of the unsigned long pfn, meaning that a pfn
> of (2^28)-1 was the INVALID_PFN representation in the 32bit stream.
> 
> The check is less important for migration v2, but I left it in as an
> upper sanity check.
> 
> I am not aversed to removing the check if we are no longer going to
> believe the result of XENMEM_maximum_gpfn, and requiring that setup()
> get and sanity check the domain size.

As I'm planning to support migration of larger domains in the future
I think the check has to go away. Why not now.

> (Incidently, it seems crazy that we make a XENMEM_maximum_gpfn hypercall
> to read a value from the shared info page, which is also (about to be)
> mapped locally.)

Indeed. It should be necessary for hvm guests only.


Juergen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-30  8:17       ` Juergen Gross
@ 2015-11-30  9:47         ` Andrew Cooper
  2015-11-30 12:16           ` Juergen Gross
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2015-11-30  9:47 UTC (permalink / raw)
  To: Juergen Gross, Wei Liu
  Cc: stefano.stabellini, ian.jackson, Ian.Campbell, xen-devel

On 30/11/15 08:17, Juergen Gross wrote:
> On 27/11/15 18:16, Andrew Cooper wrote:
>> On 27/11/15 15:53, Juergen Gross wrote:
>>> On 27/11/15 16:33, Wei Liu wrote:
>>>> On Fri, Nov 27, 2015 at 03:50:53PM +0100, Juergen Gross wrote:
>>>>> For migration the last used pfn of a guest is needed to size the
>>>>> logdirty bitmap and as an upper bound of the page loop. Unfortunately
>>>>> there are pv-kernels advertising a much higher maximum pfn as they
>>>>> are really using in order to support memory hotplug. This will lead
>>>>> to allocation of much more memory in Xen tools during migration as
>>>>> really needed.
>>>>>
>>>>> Try to find the last used guest pfn of a pv-domu by scanning the p2m
>>>>> tree from the last entry towards it's start and search for an entry
>>>>> not being invalid.
>>>>>
>>>>> Normally the mid pages of the p2m tree containing all invalid entries
>>>>> are being reused, so we can just scan the top page for identical
>>>>> entries and skip them but the first one.
>>>>>
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>> ---
>>>>>  tools/libxc/xc_sr_save.c        |  8 ++++----
>>>>>  tools/libxc/xc_sr_save_x86_pv.c | 34 +++++++++++++++++++++++++++++++---
>>>>>  2 files changed, 35 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>>>>> index 0c12e56..22b3f18 100644
>>>>> --- a/tools/libxc/xc_sr_save.c
>>>>> +++ b/tools/libxc/xc_sr_save.c
>>>>> @@ -677,6 +677,10 @@ static int setup(struct xc_sr_context *ctx)
>>>>>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>>>>                                      &ctx->save.dirty_bitmap_hbuf);
>>>>>  
>>>>> +    rc = ctx->save.ops.setup(ctx);
>>>>> +    if ( rc )
>>>>> +        goto err;
>>>>> +
>>>>>      dirty_bitmap = xc_hypercall_buffer_alloc_pages(
>>>>>                     xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>>>>>      ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
>>>>> @@ -692,10 +696,6 @@ static int setup(struct xc_sr_context *ctx)
>>>>>          goto err;
>>>>>      }
>>>>>  
>>>> p2m_size is set in two places for PV guest.  Since we are now relying on
>>>> ops.setup to set p2m_size, the invocation to xc_domain_nr_gpfns in
>>>> xc_domain_save should be removed, so that we only have one place to set
>>>> p2m_size.
>>> I wasn't sure this is needed in save case only. If it is I can make
>>> the change, of course. Andrew?
> I got this one wrong, sorry. I was relating to the xc_domain_nr_gpfns()
> call in x86_pv_domain_info(). And here the question really applies:
>
> Do we need this call at all? I don't think it is needed for the
> restore operation. At least in my tests the upper pfn bound was always
> taken from the data stream, not from xc_domain_nr_gpfns().

x86_pv_domain_info() was the result of attempting to consolidate all the
random checks for PV guests, for both the save and restore sides,
although most of the callsites subsequently vanished.

The function as a whole gets used several times on the restore side, and
once of save side.  Unilaterally rewriting ctx->x86_pv.max_pfn is not
the best option, so would probably be best to disband
x86_pv_domain_info() altogether and hoist the width/p2m checks into the
relevant callers.

>
>> That is most likely a byproduct of this codes somewhat-organic growth
>> over 18 months.
>>
>> The check in xc_domain_save() is a check left over from legacy
>> migration.  It was present in legacy migration as legacy merged the page
>> type into the upper 4 bits of the unsigned long pfn, meaning that a pfn
>> of (2^28)-1 was the INVALID_PFN representation in the 32bit stream.
>>
>> The check is less important for migration v2, but I left it in as an
>> upper sanity check.
>>
>> I am not aversed to removing the check if we are no longer going to
>> believe the result of XENMEM_maximum_gpfn, and requiring that setup()
>> get and sanity check the domain size.
> As I'm planning to support migration of larger domains in the future
> I think the check has to go away. Why not now.

Ideally, there should still be a sanity check.  Ian's idea of the caller
passing the expected size seems like a good candidate.

>
>> (Incidently, it seems crazy that we make a XENMEM_maximum_gpfn hypercall
>> to read a value from the shared info page, which is also (about to be)
>> mapped locally.)
> Indeed. It should be necessary for hvm guests only.

I agree.

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] libxc: try to find last used pfn when migrating
  2015-11-30  9:47         ` Andrew Cooper
@ 2015-11-30 12:16           ` Juergen Gross
  0 siblings, 0 replies; 15+ messages in thread
From: Juergen Gross @ 2015-11-30 12:16 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu
  Cc: stefano.stabellini, ian.jackson, Ian.Campbell, xen-devel

On 30/11/15 10:47, Andrew Cooper wrote:
> On 30/11/15 08:17, Juergen Gross wrote:
>> On 27/11/15 18:16, Andrew Cooper wrote:
>>> On 27/11/15 15:53, Juergen Gross wrote:
>>>> On 27/11/15 16:33, Wei Liu wrote:
>>>>> On Fri, Nov 27, 2015 at 03:50:53PM +0100, Juergen Gross wrote:
>>>>>> For migration the last used pfn of a guest is needed to size the
>>>>>> logdirty bitmap and as an upper bound of the page loop. Unfortunately
>>>>>> there are pv-kernels advertising a much higher maximum pfn as they
>>>>>> are really using in order to support memory hotplug. This will lead
>>>>>> to allocation of much more memory in Xen tools during migration as
>>>>>> really needed.
>>>>>>
>>>>>> Try to find the last used guest pfn of a pv-domu by scanning the p2m
>>>>>> tree from the last entry towards it's start and search for an entry
>>>>>> not being invalid.
>>>>>>
>>>>>> Normally the mid pages of the p2m tree containing all invalid entries
>>>>>> are being reused, so we can just scan the top page for identical
>>>>>> entries and skip them but the first one.
>>>>>>
>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>> ---
>>>>>>  tools/libxc/xc_sr_save.c        |  8 ++++----
>>>>>>  tools/libxc/xc_sr_save_x86_pv.c | 34 +++++++++++++++++++++++++++++++---
>>>>>>  2 files changed, 35 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>>>>>> index 0c12e56..22b3f18 100644
>>>>>> --- a/tools/libxc/xc_sr_save.c
>>>>>> +++ b/tools/libxc/xc_sr_save.c
>>>>>> @@ -677,6 +677,10 @@ static int setup(struct xc_sr_context *ctx)
>>>>>>      DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
>>>>>>                                      &ctx->save.dirty_bitmap_hbuf);
>>>>>>  
>>>>>> +    rc = ctx->save.ops.setup(ctx);
>>>>>> +    if ( rc )
>>>>>> +        goto err;
>>>>>> +
>>>>>>      dirty_bitmap = xc_hypercall_buffer_alloc_pages(
>>>>>>                     xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
>>>>>>      ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
>>>>>> @@ -692,10 +696,6 @@ static int setup(struct xc_sr_context *ctx)
>>>>>>          goto err;
>>>>>>      }
>>>>>>  
>>>>> p2m_size is set in two places for PV guest.  Since we are now relying on
>>>>> ops.setup to set p2m_size, the invocation to xc_domain_nr_gpfns in
>>>>> xc_domain_save should be removed, so that we only have one place to set
>>>>> p2m_size.
>>>> I wasn't sure this is needed in save case only. If it is I can make
>>>> the change, of course. Andrew?
>> I got this one wrong, sorry. I was relating to the xc_domain_nr_gpfns()
>> call in x86_pv_domain_info(). And here the question really applies:
>>
>> Do we need this call at all? I don't think it is needed for the
>> restore operation. At least in my tests the upper pfn bound was always
>> taken from the data stream, not from xc_domain_nr_gpfns().
> 
> x86_pv_domain_info() was the result of attempting to consolidate all the
> random checks for PV guests, for both the save and restore sides,
> although most of the callsites subsequently vanished.
> 
> The function as a whole gets used several times on the restore side, and
> once of save side.  Unilaterally rewriting ctx->x86_pv.max_pfn is not
> the best option, so would probably be best to disband
> x86_pv_domain_info() altogether and hoist the width/p2m checks into the
> relevant callers.

Hmm, this would mean to have the remaining code of x86_pv_domain_info()
two or three times. I think it's better to keep it without the call of
xc_domain_nr_gpfns().

> 
>>
>>> That is most likely a byproduct of this codes somewhat-organic growth
>>> over 18 months.
>>>
>>> The check in xc_domain_save() is a check left over from legacy
>>> migration.  It was present in legacy migration as legacy merged the page
>>> type into the upper 4 bits of the unsigned long pfn, meaning that a pfn
>>> of (2^28)-1 was the INVALID_PFN representation in the 32bit stream.
>>>
>>> The check is less important for migration v2, but I left it in as an
>>> upper sanity check.
>>>
>>> I am not aversed to removing the check if we are no longer going to
>>> believe the result of XENMEM_maximum_gpfn, and requiring that setup()
>>> get and sanity check the domain size.
>> As I'm planning to support migration of larger domains in the future
>> I think the check has to go away. Why not now.
> 
> Ideally, there should still be a sanity check.  Ian's idea of the caller
> passing the expected size seems like a good candidate.

But which size is expected? I think it would make more sense to limit
the needed dom0 memory to some fraction of the guest size. We could use
a default which might be configurable globally or per domain.


Juergen

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-30 12:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-27 14:50 [PATCH] libxc: try to find last used pfn when migrating Juergen Gross
2015-11-27 15:01 ` David Vrabel
2015-11-27 15:11   ` Juergen Gross
2015-11-27 15:18     ` Ian Campbell
2015-11-27 15:20     ` David Vrabel
2015-11-27 15:22     ` Ian Jackson
2015-11-27 15:50       ` Juergen Gross
2015-11-27 15:33 ` Wei Liu
2015-11-27 15:53   ` Juergen Gross
2015-11-27 17:16     ` Andrew Cooper
2015-11-30  8:17       ` Juergen Gross
2015-11-30  9:47         ` Andrew Cooper
2015-11-30 12:16           ` Juergen Gross
2015-11-27 16:42 ` Andrew Cooper
2015-11-27 16:51   ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.