* [PATCH for 4.6 v3 1/5] libxc: clearer migration v2 debug message
2015-09-06 20:05 [PATCH for 4.6 v3 0/5] Migration v2 fix Wei Liu
@ 2015-09-06 20:05 ` Wei Liu
2015-09-06 20:05 ` [PATCH for 4.6 v3 2/5] libxc: migration v2 prefix Memory -> Frames Wei Liu
` (5 subsequent siblings)
6 siblings, 0 replies; 20+ messages in thread
From: Wei Liu @ 2015-09-06 20:05 UTC (permalink / raw)
To: Xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell
Previous the message was like:
SAVE:
xc: detail: 32 bits, 3 levels
xc: detail: max_pfn 0xfffff, p2m_frames 1024
xc: detail: max_mfn 0x130000
RESTORE:
xc: detail: max_mfn 0x130000
xc: detail: 32 bits, 3 levels
xc: detail: Expanded p2m from 0 to 0xfffff
It's not immediately clear that the last line in restore messages was
referring to max_pfn. Change the debug message a bit to make that
clearer.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
tools/libxc/xc_sr_restore_x86_pv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/libxc/xc_sr_restore_x86_pv.c b/tools/libxc/xc_sr_restore_x86_pv.c
index ef26c64..c65a2f1 100644
--- a/tools/libxc/xc_sr_restore_x86_pv.c
+++ b/tools/libxc/xc_sr_restore_x86_pv.c
@@ -66,7 +66,7 @@ static int expand_p2m(struct xc_sr_context *ctx, unsigned long max_pfn)
for ( i = (old_end_frame ? old_end_frame + 1 : 0); i <= end_frame; ++i )
ctx->x86_pv.p2m_pfns[i] = INVALID_MFN;
- DPRINTF("Expanded p2m from %#lx to %#lx", old_max, max_pfn);
+ DPRINTF("Changed max_pfn from %#lx to %#lx", old_max, max_pfn);
return 0;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH for 4.6 v3 2/5] libxc: migration v2 prefix Memory -> Frames
2015-09-06 20:05 [PATCH for 4.6 v3 0/5] Migration v2 fix Wei Liu
2015-09-06 20:05 ` [PATCH for 4.6 v3 1/5] libxc: clearer migration v2 debug message Wei Liu
@ 2015-09-06 20:05 ` Wei Liu
2015-09-06 20:10 ` Wei Liu
2015-09-07 9:07 ` Andrew Cooper
2015-09-06 20:05 ` [PATCH for 4.6 v3 3/5] libxc: fix indentation Wei Liu
` (4 subsequent siblings)
6 siblings, 2 replies; 20+ messages in thread
From: Wei Liu @ 2015-09-06 20:05 UTC (permalink / raw)
To: Xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell
The prefix "Memory" is confusing because the numbers shown after that
are referring to frames.
Change a bunch of prefixes from "Memory" to "Frames". Also rename
send_memory_verify to verify_frames.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
v3:
1. Change all relevant instances of "Memory".
2. Rename function.
---
tools/libxc/xc_sr_save.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 58667af..7dc3a48 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -451,7 +451,7 @@ static int update_progress_string(struct xc_sr_context *ctx,
xc_interface *xch = ctx->xch;
char *new_str = NULL;
- if ( asprintf(&new_str, "Memory iteration %u of %u",
+ if ( asprintf(&new_str, "Frames iteration %u of %u",
iter, ctx->save.max_iterations) == -1 )
{
PERROR("Unable to allocate new progress string");
@@ -569,7 +569,7 @@ static int suspend_and_send_dirty(struct xc_sr_context *ctx)
return rc;
}
-static int send_memory_verify(struct xc_sr_context *ctx)
+static int verify_frames(struct xc_sr_context *ctx)
{
xc_interface *xch = ctx->xch;
xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
@@ -586,7 +586,7 @@ static int send_memory_verify(struct xc_sr_context *ctx)
if ( rc )
goto out;
- xc_set_progress_prefix(xch, "Memory verify");
+ xc_set_progress_prefix(xch, "Frames verify");
rc = send_all_pages(ctx);
if ( rc )
goto out;
@@ -629,7 +629,7 @@ static int send_domain_memory_live(struct xc_sr_context *ctx)
if ( ctx->save.debug && !ctx->save.checkpointed )
{
- rc = send_memory_verify(ctx);
+ rc = verify_frames(ctx);
if ( rc )
goto out;
}
@@ -659,7 +659,7 @@ static int send_domain_memory_nonlive(struct xc_sr_context *ctx)
if ( rc )
goto err;
- xc_set_progress_prefix(xch, "Memory");
+ xc_set_progress_prefix(xch, "Frames");
rc = send_all_pages(ctx);
if ( rc )
--
2.1.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH for 4.6 v3 3/5] libxc: fix indentation
2015-09-06 20:05 [PATCH for 4.6 v3 0/5] Migration v2 fix Wei Liu
2015-09-06 20:05 ` [PATCH for 4.6 v3 1/5] libxc: clearer migration v2 debug message Wei Liu
2015-09-06 20:05 ` [PATCH for 4.6 v3 2/5] libxc: migration v2 prefix Memory -> Frames Wei Liu
@ 2015-09-06 20:05 ` Wei Liu
2015-09-06 20:05 ` [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns Wei Liu
` (3 subsequent siblings)
6 siblings, 0 replies; 20+ messages in thread
From: Wei Liu @ 2015-09-06 20:05 UTC (permalink / raw)
To: Xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
tools/libxc/xc_sr_restore_x86_pv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/libxc/xc_sr_restore_x86_pv.c b/tools/libxc/xc_sr_restore_x86_pv.c
index c65a2f1..bc604b3 100644
--- a/tools/libxc/xc_sr_restore_x86_pv.c
+++ b/tools/libxc/xc_sr_restore_x86_pv.c
@@ -970,7 +970,7 @@ static int x86_pv_localise_page(struct xc_sr_context *ctx,
}
if ( to_populate && populate_pfns(ctx, to_populate, pfns, NULL) )
- return -1;
+ return -1;
for ( i = 0; i < (PAGE_SIZE / sizeof(uint64_t)); ++i )
{
--
2.1.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
2015-09-06 20:05 [PATCH for 4.6 v3 0/5] Migration v2 fix Wei Liu
` (2 preceding siblings ...)
2015-09-06 20:05 ` [PATCH for 4.6 v3 3/5] libxc: fix indentation Wei Liu
@ 2015-09-06 20:05 ` Wei Liu
2015-09-07 7:18 ` Jan Beulich
2015-09-06 20:05 ` [PATCH for 4.6 v3 5/5] libxc: add assertion to avoid setting same bit more than once Wei Liu
` (2 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Wei Liu @ 2015-09-06 20:05 UTC (permalink / raw)
To: Xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell
The original implementation of populate_pfns didn't consider the same
pfn can be present multiple times in the array. The mechanism to prevent
populating the same pfn multiple times only worked if the recurring pfn
appeared in different batches.
This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
which has several ptes pointing to same pfn, which results in an array
containing recurring pfn. When libxc called x86_pv_localise_page, the
original implementation would populate the same pfn more than once.
The fix is to set bit in populated bitmap as we generate list of pfns to
be populated.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
tools/libxc/xc_sr_restore.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index df885b6..924dd55 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -214,6 +214,9 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned count,
types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
!pfn_is_populated(ctx, original_pfns[i]) )
{
+ rc = pfn_set_populated(ctx, original_pfns[i]);
+ if ( rc )
+ goto err;
pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
++nr_pfns;
}
@@ -238,9 +241,6 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned count,
goto err;
}
- rc = pfn_set_populated(ctx, pfns[i]);
- if ( rc )
- goto err;
ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
}
}
--
2.1.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
2015-09-06 20:05 ` [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns Wei Liu
@ 2015-09-07 7:18 ` Jan Beulich
2015-09-07 9:36 ` Wei Liu
0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2015-09-07 7:18 UTC (permalink / raw)
To: Wei Liu; +Cc: Xen-devel, Ian Jackson, Ian Campbell
>>> On 06.09.15 at 22:05, <wei.liu2@citrix.com> wrote:
> The original implementation of populate_pfns didn't consider the same
> pfn can be present multiple times in the array. The mechanism to prevent
> populating the same pfn multiple times only worked if the recurring pfn
> appeared in different batches.
>
> This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
> which has several ptes pointing to same pfn, which results in an array
> containing recurring pfn.
Since you must have debugged this, and since the bisector appears
to have fingered a patch of mine on the Linux side which triggered
this, would you mind explaining this a little more? In particular I'm
worried that this may point out some other bug in Linux, as in the
context of the change there - dealing with the 1:1 mapping - I can't
see a legitimate reason for multiple PTEs to reference the same PFN.
Thanks, Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
2015-09-07 7:18 ` Jan Beulich
@ 2015-09-07 9:36 ` Wei Liu
2015-09-07 9:53 ` Jan Beulich
2015-09-07 9:59 ` David Vrabel
0 siblings, 2 replies; 20+ messages in thread
From: Wei Liu @ 2015-09-07 9:36 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, Ian Campbell, Xen-devel
On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
> >>> On 06.09.15 at 22:05, <wei.liu2@citrix.com> wrote:
> > The original implementation of populate_pfns didn't consider the same
> > pfn can be present multiple times in the array. The mechanism to prevent
> > populating the same pfn multiple times only worked if the recurring pfn
> > appeared in different batches.
> >
> > This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
> > which has several ptes pointing to same pfn, which results in an array
> > containing recurring pfn.
>
> Since you must have debugged this, and since the bisector appears
> to have fingered a patch of mine on the Linux side which triggered
> this, would you mind explaining this a little more? In particular I'm
> worried that this may point out some other bug in Linux, as in the
> context of the change there - dealing with the 1:1 mapping - I can't
> see a legitimate reason for multiple PTEs to reference the same PFN.
>
Sure. I can try to explain this as clear as possible. Note that I didn't
even look at Linux side changes because at that point I was sure there
was a bug in migration v2.
So there is a step called normalise_page in migration v2. It's nop for
HVM guest. For PV guest, it only cares about page table frames. To
normalise a page table frame, the core idea is to replace all MFNs in
page tables to PFNs inside the guest.
When restoring, there is a step called localise_page, which again is a
nop for HVM guest. For PV guest, it does the reverse of normalise_page.
It goes through all page table frames, extract all PFNs pointed to by
PTEs in such frames, populate them, then reconstruct page tables.
What I discovered is that PTEs inside one page table frame contained the
same PFN (something like fd42). The original implementation of toolstack
populate_pfns didn't consider such scenario. As for what that PFN
referred to, I wasn't sure and I didn't really care about that.
Let me know if you need more information.
Wei.
> Thanks, Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
2015-09-07 9:36 ` Wei Liu
@ 2015-09-07 9:53 ` Jan Beulich
2015-09-07 9:57 ` Wei Liu
2015-09-07 10:08 ` Juergen Gross
2015-09-07 9:59 ` David Vrabel
1 sibling, 2 replies; 20+ messages in thread
From: Jan Beulich @ 2015-09-07 9:53 UTC (permalink / raw)
To: Wei Liu; +Cc: Xen-devel, Ian Jackson, Ian Campbell
>>> On 07.09.15 at 11:36, <wei.liu2@citrix.com> wrote:
> On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
>> >>> On 06.09.15 at 22:05, <wei.liu2@citrix.com> wrote:
>> > The original implementation of populate_pfns didn't consider the same
>> > pfn can be present multiple times in the array. The mechanism to prevent
>> > populating the same pfn multiple times only worked if the recurring pfn
>> > appeared in different batches.
>> >
>> > This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
>> > which has several ptes pointing to same pfn, which results in an array
>> > containing recurring pfn.
>>
>> Since you must have debugged this, and since the bisector appears
>> to have fingered a patch of mine on the Linux side which triggered
>> this, would you mind explaining this a little more? In particular I'm
>> worried that this may point out some other bug in Linux, as in the
>> context of the change there - dealing with the 1:1 mapping - I can't
>> see a legitimate reason for multiple PTEs to reference the same PFN.
>>
>
> Sure. I can try to explain this as clear as possible. Note that I didn't
> even look at Linux side changes because at that point I was sure there
> was a bug in migration v2.
>
> So there is a step called normalise_page in migration v2. It's nop for
> HVM guest. For PV guest, it only cares about page table frames. To
> normalise a page table frame, the core idea is to replace all MFNs in
> page tables to PFNs inside the guest.
>
> When restoring, there is a step called localise_page, which again is a
> nop for HVM guest. For PV guest, it does the reverse of normalise_page.
> It goes through all page table frames, extract all PFNs pointed to by
> PTEs in such frames, populate them, then reconstruct page tables.
>
> What I discovered is that PTEs inside one page table frame contained the
> same PFN (something like fd42). The original implementation of toolstack
> populate_pfns didn't consider such scenario. As for what that PFN
> referred to, I wasn't sure and I didn't really care about that.
That's unfortunate, as that's precisely the information I was after,
since - as said - taking the repetition of the same PFN together with
what the triggering Linux change is about, it smells like there's
something wrong on the Linux side too. Do you at least recall how
many times that same PFN got repeated?
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
2015-09-07 9:53 ` Jan Beulich
@ 2015-09-07 9:57 ` Wei Liu
2015-09-07 10:08 ` Juergen Gross
1 sibling, 0 replies; 20+ messages in thread
From: Wei Liu @ 2015-09-07 9:57 UTC (permalink / raw)
To: Jan Beulich; +Cc: Ian Jackson, Wei Liu, Ian Campbell, Xen-devel
On Mon, Sep 07, 2015 at 03:53:57AM -0600, Jan Beulich wrote:
> >>> On 07.09.15 at 11:36, <wei.liu2@citrix.com> wrote:
> > On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
> >> >>> On 06.09.15 at 22:05, <wei.liu2@citrix.com> wrote:
> >> > The original implementation of populate_pfns didn't consider the same
> >> > pfn can be present multiple times in the array. The mechanism to prevent
> >> > populating the same pfn multiple times only worked if the recurring pfn
> >> > appeared in different batches.
> >> >
> >> > This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
> >> > which has several ptes pointing to same pfn, which results in an array
> >> > containing recurring pfn.
> >>
> >> Since you must have debugged this, and since the bisector appears
> >> to have fingered a patch of mine on the Linux side which triggered
> >> this, would you mind explaining this a little more? In particular I'm
> >> worried that this may point out some other bug in Linux, as in the
> >> context of the change there - dealing with the 1:1 mapping - I can't
> >> see a legitimate reason for multiple PTEs to reference the same PFN.
> >>
> >
> > Sure. I can try to explain this as clear as possible. Note that I didn't
> > even look at Linux side changes because at that point I was sure there
> > was a bug in migration v2.
> >
> > So there is a step called normalise_page in migration v2. It's nop for
> > HVM guest. For PV guest, it only cares about page table frames. To
> > normalise a page table frame, the core idea is to replace all MFNs in
> > page tables to PFNs inside the guest.
> >
> > When restoring, there is a step called localise_page, which again is a
> > nop for HVM guest. For PV guest, it does the reverse of normalise_page.
> > It goes through all page table frames, extract all PFNs pointed to by
> > PTEs in such frames, populate them, then reconstruct page tables.
> >
> > What I discovered is that PTEs inside one page table frame contained the
> > same PFN (something like fd42). The original implementation of toolstack
> > populate_pfns didn't consider such scenario. As for what that PFN
> > referred to, I wasn't sure and I didn't really care about that.
>
> That's unfortunate, as that's precisely the information I was after,
> since - as said - taking the repetition of the same PFN together with
> what the triggering Linux change is about, it smells like there's
> something wrong on the Linux side too. Do you at least recall how
> many times that same PFN got repeated?
>
Thousands of times.
Wei.
> Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
2015-09-07 9:53 ` Jan Beulich
2015-09-07 9:57 ` Wei Liu
@ 2015-09-07 10:08 ` Juergen Gross
1 sibling, 0 replies; 20+ messages in thread
From: Juergen Gross @ 2015-09-07 10:08 UTC (permalink / raw)
To: Jan Beulich, Wei Liu; +Cc: Xen-devel, Ian Jackson, Ian Campbell
On 09/07/2015 11:53 AM, Jan Beulich wrote:
>>>> On 07.09.15 at 11:36, <wei.liu2@citrix.com> wrote:
>> On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
>>>>>> On 06.09.15 at 22:05, <wei.liu2@citrix.com> wrote:
>>>> The original implementation of populate_pfns didn't consider the same
>>>> pfn can be present multiple times in the array. The mechanism to prevent
>>>> populating the same pfn multiple times only worked if the recurring pfn
>>>> appeared in different batches.
>>>>
>>>> This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
>>>> which has several ptes pointing to same pfn, which results in an array
>>>> containing recurring pfn.
>>>
>>> Since you must have debugged this, and since the bisector appears
>>> to have fingered a patch of mine on the Linux side which triggered
>>> this, would you mind explaining this a little more? In particular I'm
>>> worried that this may point out some other bug in Linux, as in the
>>> context of the change there - dealing with the 1:1 mapping - I can't
>>> see a legitimate reason for multiple PTEs to reference the same PFN.
>>>
>>
>> Sure. I can try to explain this as clear as possible. Note that I didn't
>> even look at Linux side changes because at that point I was sure there
>> was a bug in migration v2.
>>
>> So there is a step called normalise_page in migration v2. It's nop for
>> HVM guest. For PV guest, it only cares about page table frames. To
>> normalise a page table frame, the core idea is to replace all MFNs in
>> page tables to PFNs inside the guest.
>>
>> When restoring, there is a step called localise_page, which again is a
>> nop for HVM guest. For PV guest, it does the reverse of normalise_page.
>> It goes through all page table frames, extract all PFNs pointed to by
>> PTEs in such frames, populate them, then reconstruct page tables.
>>
>> What I discovered is that PTEs inside one page table frame contained the
>> same PFN (something like fd42). The original implementation of toolstack
>> populate_pfns didn't consider such scenario. As for what that PFN
>> referred to, I wasn't sure and I didn't really care about that.
>
> That's unfortunate, as that's precisely the information I was after,
> since - as said - taking the repetition of the same PFN together with
> what the triggering Linux change is about, it smells like there's
> something wrong on the Linux side too. Do you at least recall how
> many times that same PFN got repeated?
The linear p2m list support introduced this behaviour. Instead of having
multiple copies of identical p2m pages (e.g. for all entries of the page
being ~0UL) only one such page is existing which is mapped multiple
times in the linear p2m list. This will happen for large regions (2 MB
aligned) of either identity mapped or invalid pfns.
In domUs we see such a scenario rather rarely as it would require either
large memory holes or large identity regions. You might have introduced
the latter.
Juergen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
2015-09-07 9:36 ` Wei Liu
2015-09-07 9:53 ` Jan Beulich
@ 2015-09-07 9:59 ` David Vrabel
2015-09-07 10:07 ` Jan Beulich
1 sibling, 1 reply; 20+ messages in thread
From: David Vrabel @ 2015-09-07 9:59 UTC (permalink / raw)
To: Wei Liu, Jan Beulich; +Cc: Xen-devel, Ian Jackson, Ian Campbell
On 07/09/15 10:36, Wei Liu wrote:
> On Mon, Sep 07, 2015 at 01:18:44AM -0600, Jan Beulich wrote:
>>>>> On 06.09.15 at 22:05, <wei.liu2@citrix.com> wrote:
>>> The original implementation of populate_pfns didn't consider the same
>>> pfn can be present multiple times in the array. The mechanism to prevent
>>> populating the same pfn multiple times only worked if the recurring pfn
>>> appeared in different batches.
>>>
>>> This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
>>> which has several ptes pointing to same pfn, which results in an array
>>> containing recurring pfn.
>>
>> Since you must have debugged this, and since the bisector appears
>> to have fingered a patch of mine on the Linux side which triggered
>> this, would you mind explaining this a little more? In particular I'm
>> worried that this may point out some other bug in Linux, as in the
>> context of the change there - dealing with the 1:1 mapping - I can't
>> see a legitimate reason for multiple PTEs to reference the same PFN.
>>
>
> Sure. I can try to explain this as clear as possible. Note that I didn't
> even look at Linux side changes because at that point I was sure there
> was a bug in migration v2.
>
> So there is a step called normalise_page in migration v2. It's nop for
> HVM guest. For PV guest, it only cares about page table frames. To
> normalise a page table frame, the core idea is to replace all MFNs in
> page tables to PFNs inside the guest.
>
> When restoring, there is a step called localise_page, which again is a
> nop for HVM guest. For PV guest, it does the reverse of normalise_page.
> It goes through all page table frames, extract all PFNs pointed to by
> PTEs in such frames, populate them, then reconstruct page tables.
>
> What I discovered is that PTEs inside one page table frame contained the
> same PFN (something like fd42). The original implementation of toolstack
> populate_pfns didn't consider such scenario. As for what that PFN
> referred to, I wasn't sure and I didn't really care about that.
>
> Let me know if you need more information.
I am somewhat amazed that this worked at all since finding multiple PTEs
with the same PFN in a match of 1024 pages would be really common.
But this bug would be avoided if the page table pages were more often
located towards the end of RAM such that the PFNs were already populated
as part of the normal page transfer.
I suppose it is possible that the Linux commit found by the bisector
results in changes to where page table pages are allocated.
David
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns
2015-09-07 9:59 ` David Vrabel
@ 2015-09-07 10:07 ` Jan Beulich
0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2015-09-07 10:07 UTC (permalink / raw)
To: David Vrabel; +Cc: Ian Jackson, Wei Liu, Ian Campbell, Xen-devel
>>> On 07.09.15 at 11:59, <david.vrabel@citrix.com> wrote:
> But this bug would be avoided if the page table pages were more often
> located towards the end of RAM such that the PFNs were already populated
> as part of the normal page transfer.
Ah, good point.
> I suppose it is possible that the Linux commit found by the bisector
> results in changes to where page table pages are allocated.
It certainly does.
Jan
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH for 4.6 v3 5/5] libxc: add assertion to avoid setting same bit more than once
2015-09-06 20:05 [PATCH for 4.6 v3 0/5] Migration v2 fix Wei Liu
` (3 preceding siblings ...)
2015-09-06 20:05 ` [PATCH for 4.6 v3 4/5] libxc: don't populate same pfn more than once in populate_pfns Wei Liu
@ 2015-09-06 20:05 ` Wei Liu
2015-09-07 10:04 ` [PATCH for 4.6 v3 0/5] Migration v2 fix Andrew Cooper
2015-09-07 11:12 ` Ian Campbell
6 siblings, 0 replies; 20+ messages in thread
From: Wei Liu @ 2015-09-06 20:05 UTC (permalink / raw)
To: Xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
tools/libxc/xc_sr_restore.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 924dd55..f48e7fc 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -181,6 +181,7 @@ static int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
ctx->restore.max_populated_pfn = new_max;
}
+ assert(!test_bit(pfn, ctx->restore.populated_pfns));
set_bit(pfn, ctx->restore.populated_pfns);
return 0;
--
2.1.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 0/5] Migration v2 fix
2015-09-06 20:05 [PATCH for 4.6 v3 0/5] Migration v2 fix Wei Liu
` (4 preceding siblings ...)
2015-09-06 20:05 ` [PATCH for 4.6 v3 5/5] libxc: add assertion to avoid setting same bit more than once Wei Liu
@ 2015-09-07 10:04 ` Andrew Cooper
2015-09-07 10:07 ` Wei Liu
2015-09-07 11:12 ` Ian Campbell
6 siblings, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2015-09-07 10:04 UTC (permalink / raw)
To: Wei Liu, Xen-devel; +Cc: Ian Jackson, Ian Campbell
On 06/09/15 21:05, Wei Liu wrote:
> Wei Liu (5):
> libxc: clearer migration v2 debug message
> libxc: migration v2 prefix Memory -> Frames
> libxc: fix indentation
> libxc: don't populate same pfn more than once in populate_pfns
> libxc: add assertion to avoid setting same bit more than once
>
> tools/libxc/xc_sr_restore.c | 7 ++++---
> tools/libxc/xc_sr_restore_x86_pv.c | 4 ++--
> tools/libxc/xc_sr_save.c | 10 +++++-----
> 3 files changed, 11 insertions(+), 10 deletions(-)
>
Please test against Xen 4.5 as well. I have looked at the legacy code,
but can't convince myself either way as to whether the bug is present or
not.
~Andrew
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 0/5] Migration v2 fix
2015-09-07 10:04 ` [PATCH for 4.6 v3 0/5] Migration v2 fix Andrew Cooper
@ 2015-09-07 10:07 ` Wei Liu
2015-09-07 10:10 ` Andrew Cooper
0 siblings, 1 reply; 20+ messages in thread
From: Wei Liu @ 2015-09-07 10:07 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Ian Jackson, Ian Campbell
On Mon, Sep 07, 2015 at 11:04:25AM +0100, Andrew Cooper wrote:
> On 06/09/15 21:05, Wei Liu wrote:
> >Wei Liu (5):
> > libxc: clearer migration v2 debug message
> > libxc: migration v2 prefix Memory -> Frames
> > libxc: fix indentation
> > libxc: don't populate same pfn more than once in populate_pfns
> > libxc: add assertion to avoid setting same bit more than once
> >
> > tools/libxc/xc_sr_restore.c | 7 ++++---
> > tools/libxc/xc_sr_restore_x86_pv.c | 4 ++--
> > tools/libxc/xc_sr_save.c | 10 +++++-----
> > 3 files changed, 11 insertions(+), 10 deletions(-)
> >
>
> Please test against Xen 4.5 as well. I have looked at the legacy code, but
> can't convince myself either way as to whether the bug is present or not.
>
DYM cross version migration from 4.5 to 4.6 or 32bit Linux 4.1 save /
restore with Xen 4.5?
> ~Andrew
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 0/5] Migration v2 fix
2015-09-07 10:07 ` Wei Liu
@ 2015-09-07 10:10 ` Andrew Cooper
2015-09-07 11:22 ` Wei Liu
0 siblings, 1 reply; 20+ messages in thread
From: Andrew Cooper @ 2015-09-07 10:10 UTC (permalink / raw)
To: Wei Liu; +Cc: Xen-devel, Ian Jackson, Ian Campbell
On 07/09/15 11:07, Wei Liu wrote:
> On Mon, Sep 07, 2015 at 11:04:25AM +0100, Andrew Cooper wrote:
>> On 06/09/15 21:05, Wei Liu wrote:
>>> Wei Liu (5):
>>> libxc: clearer migration v2 debug message
>>> libxc: migration v2 prefix Memory -> Frames
>>> libxc: fix indentation
>>> libxc: don't populate same pfn more than once in populate_pfns
>>> libxc: add assertion to avoid setting same bit more than once
>>>
>>> tools/libxc/xc_sr_restore.c | 7 ++++---
>>> tools/libxc/xc_sr_restore_x86_pv.c | 4 ++--
>>> tools/libxc/xc_sr_save.c | 10 +++++-----
>>> 3 files changed, 11 insertions(+), 10 deletions(-)
>>>
>> Please test against Xen 4.5 as well. I have looked at the legacy code, but
>> can't convince myself either way as to whether the bug is present or not.
>>
> DYM cross version migration from 4.5 to 4.6 or 32bit Linux 4.1 save /
> restore with Xen 4.5?
Just plain legacy migration with Linux 4.1, to see whether legacy
suffers from the same issue. If it does, we should probably backport a
legacy fix.
~Andrew
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 0/5] Migration v2 fix
2015-09-07 10:10 ` Andrew Cooper
@ 2015-09-07 11:22 ` Wei Liu
0 siblings, 0 replies; 20+ messages in thread
From: Wei Liu @ 2015-09-07 11:22 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Ian Jackson, Ian Campbell
On Mon, Sep 07, 2015 at 11:10:24AM +0100, Andrew Cooper wrote:
>
>
> On 07/09/15 11:07, Wei Liu wrote:
> >On Mon, Sep 07, 2015 at 11:04:25AM +0100, Andrew Cooper wrote:
> >>On 06/09/15 21:05, Wei Liu wrote:
> >>>Wei Liu (5):
> >>> libxc: clearer migration v2 debug message
> >>> libxc: migration v2 prefix Memory -> Frames
> >>> libxc: fix indentation
> >>> libxc: don't populate same pfn more than once in populate_pfns
> >>> libxc: add assertion to avoid setting same bit more than once
> >>>
> >>> tools/libxc/xc_sr_restore.c | 7 ++++---
> >>> tools/libxc/xc_sr_restore_x86_pv.c | 4 ++--
> >>> tools/libxc/xc_sr_save.c | 10 +++++-----
> >>> 3 files changed, 11 insertions(+), 10 deletions(-)
> >>>
> >>Please test against Xen 4.5 as well. I have looked at the legacy code, but
> >>can't convince myself either way as to whether the bug is present or not.
> >>
> >DYM cross version migration from 4.5 to 4.6 or 32bit Linux 4.1 save /
> >restore with Xen 4.5?
>
> Just plain legacy migration with Linux 4.1, to see whether legacy suffers
> from the same issue. If it does, we should probably backport a legacy fix.
>
Xen 4.5.1 works fine when migrating 32 bit Linux 4.1 guest.
Wei.
> ~Andrew
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH for 4.6 v3 0/5] Migration v2 fix
2015-09-06 20:05 [PATCH for 4.6 v3 0/5] Migration v2 fix Wei Liu
` (5 preceding siblings ...)
2015-09-07 10:04 ` [PATCH for 4.6 v3 0/5] Migration v2 fix Andrew Cooper
@ 2015-09-07 11:12 ` Ian Campbell
6 siblings, 0 replies; 20+ messages in thread
From: Ian Campbell @ 2015-09-07 11:12 UTC (permalink / raw)
To: Wei Liu, Xen-devel; +Cc: Ian Jackson
On Sun, 2015-09-06 at 21:05 +0100, Wei Liu wrote:
> Wei Liu (5):
> libxc: clearer migration v2 debug message
> libxc: migration v2 prefix Memory -> Frames
> libxc: fix indentation
> libxc: don't populate same pfn more than once in populate_pfns
> libxc: add assertion to avoid setting same bit more than once
All applied.
>
> tools/libxc/xc_sr_restore.c | 7 ++++---
> tools/libxc/xc_sr_restore_x86_pv.c | 4 ++--
> tools/libxc/xc_sr_save.c | 10 +++++-----
> 3 files changed, 11 insertions(+), 10 deletions(-)
>
^ permalink raw reply [flat|nested] 20+ messages in thread